Numerical Computing

Poor Conditioning

Conditioning refers to how rapidly a function changes with respect to small changes in its inputs.

Note
When this number is large, matrix inversion is particularly sensitive to error in the input.

Gradient-Based Optimization

Terms

gradient descent
first derivative being zero
- critical points / stationary points
- local minimum
- local maximum
- saddle point
- global minimum

Jacobian and Hessian Matrices

Jacobian

Jacobian wiki

Hessian wiki

Note

Eigenvalues of Hessian matrices determine the scale of learning rate.
In multiple dimensions, there is a different second derivative for each direction at a single point.
The "condition number" of the Hessian at this point measures how much the second derivatives differ from each other.
When the "condition number" is poor, gradient descent performs poorly.

Second Derivative as Curvature

Note
The second derivative can determine whether a critical point is a saddle point or local minimum.

Saddle point containing both positive and negative curvatures

First-Order / Second-Order Optimization

Optimization algorithms that use only the gradient, such as gradient descent are called first-order optimization algorithms.

Optimization algorithms that use Hessian matrix such as Newton's method are called second-order optimization algorithms.

Lipschitz Continuous & Convex Optimization

Deep learning algorithms lack guarantees because the family of functions used in deep learning is quite complicated.

Note
In many other fields, the dominant approach to optimization is to design optimization algorithms for a limited family of functions.

Lipschitz continuous

Convex optimization Convex optimization algorithms are able to provide many more guarantees by making stronger restrictions, i.e. only applicable to convex functions whose Hessian is positive semidefinite everywhere.

numerical computing