Krylov Subspace
Introduction
Problem setting
linear solver : $ Ax = b $
eigenvalue problem: $ Ax = \lambda x$
Krylov subspace method is considered to be one of the hallmarks of modern numerical linear algebra. It is often effective when a matrix-vector product can be easily obtained (i.e. A is sparse, Ax can be accessed via one function call, etc.) There is also a nice convergence analysis.
Krylov subspace
$K_d := \text{span } \{ b, Ab, A^2b, ..., A^{d-1}b \}$The high-level overview of the method consists of two components: 1. It projects the problem onto the Krylov subspace and solves the problem there. 2. Then projects the solution back to the original space.
Some characteristics:
- When $d$ increases, numerical errors will decrease (like all other iterative methods)
- For large enough $d$, we can achieve the exact solution (as if direct methods would do!)
- To avoid numerical errors, we need preconditioning in practice.
Arnoldi process
Goal: obtain a set of orthonormal vectors $\{ q_1, ..., q_d \}$ so that $span \{ q_1, ..., q_d \} = span \{ b, Ab, ..., A^db \}$ via Gram-Schmidt type iterations using the recurrence : $AQ_d = Q_d H_d + r_d e_d^t$ (when $||r_d||$ gets small enough, we terminate the process), where $Q_d$ is a set of orthonormal vectors and $H_d$ is in a form of Hessenberg. For $A = A^T$, $H_d$ becomes tridiagonal (Lanczos process.)
Projection onto Krylov subspace
$H_d = Q_d^* A Q_d$
This matrix can be considered as the orthogonal projection of $A$ onto $K_d$ in the basis of ${ q_1, …, q_d }$. (Recall: similarity transformation = change of basis)
Because $H_d$ is in a Hessenberg form, which is computationally easier to work with, we might as well solve the problem of interest with respect to $H_d$ instead of $A$.
Example: (generalized) lesast squares
Instead of working directly with $A$, we will project $A$ onto $K_d$ and obtain $H_d$. Then, the problem becomes : $ \hat{x} := \min || H_d x - \hat{b} ||$. We can obatin the solution in the original space by: $x = V^* \hat{x}$.
Application to training Deep Neural Network?
In the next post, I’ll go over how Krylov subspace method can be applied to deep learning.