Reading: Identifying and attacking the saddle point problem in high dimensional non convex optimization
Important notes:
“Thus, given the proliferation of saddle points, not local minima, in high dimensional problems, the entire theoretical justification for quasi-Newton methods, i.e. the ability to rapidly descend to the bottom of a convex local minimum, becomes less relevant in high dimensional non-convex optimization.”
The proposed algorithm uses the second order curvature information in a different way than quasi-Newton methods.
Check:
“Typical, random Gaussian error functions over N scalar variables, or dimensions, are increasingly likely to have saddle points rather than local minima as N increases. Indeed the ratio of the number of saddle points to local minima increases exponentially with the dimensionality N.”
“Second order methods, like the Newton method, are designed to rapidly descend plateaus surroudning local minima by rescaling gradient steps by the inverse eigenvalues of the Hessian matrix.” -> “H