Conjugate Gradient Method Derivation
- CG solves a linear equation $Ax = b$.
Derivation:
- Recall the step size calculation of Steepest Gradient Descent.
Recall that $f(x) = \frac{1}{2} x^T A x - b^Tx + c$ is the corresponding parabollic function for the linear equation $Ax = b$ (the minus sign for b is just to make it nice when we consider the relation between $Ax=b$ and its quadratic form). And $f’(x) = \frac{1}{2}A^Tx + \frac{1}{2}Ax - b$
When $A$ is symmetric, this becomes $f’(x) = Ax - b$. So $-f’(x_i) = r_i$, where $r_i = b - Ax_i$ (residual vector)
Recall the typical trajectory of Steepest Descent. It is often redundant or inefficient when it’s close to the true solution. Can we improve somehow? For example, if we can pick $n$ orthogonal search directions {$d_1, d_2, … d_n$} such that we only need exactly one step for each direction to get to the solution, it would be much more efficient. One obvious choice (but impractical) is to choose coordinate axes as our search directions. This would lead us to:
- Update Equation: $x_{i+1} = x_i + \alpha d_i$
- How to find $\alpha$ :
- Observe that $di$ has to be orthogonal to $e{i+1} = x_{i+1} - x$. So,
But we don’t know $e_i$ (because we don’t know $x$) so this is useless. But we can modify the above idea by picking an $A$-orthogonal set of search directions, instead of orthogonal one. $d_i$ and $d_j$ is $A$-orthogonal if $d_i^T A d_j = 0$. How can we find an appropriate step size?
Recall the original idea of step size: we want to minimize the function value most by choosing the best step size. It means we want to solve $argmin f(x_i + \alpha d_i)$. Setting the derivative w.r.t. $\alpha$ equal to 0 tells us that
$\frac{\partial{f(x_{i+1})}}{\partial \alpha} = 0 \iff \nabla f(x_{i+1})^T \frac{d}{d \alpha} x_{i+1} = 0 \iff -r_{i+1}^T d_i = 0 \\$
(directional derivative: http://tutorial.math.lamar.edu/Classes/CalcIII/DirectionalDeriv.aspx)
Now recall that $r_i = b - Ax_i = b - Ax + Ax - Ax_i = -Ae_i$ (Note: b - Ax = 0). So the last term will become
$$e_{i+1}^T A d_i = 0 \iff (e_i + \alpha d_i)^T A d_i = 0 \\
\iff \alpha = -\frac{e_i^T A d_i}{d_i^T A d_i}
= -\frac{r_i^T d_i}{d_i^T A d_i}$$
- So how to find an A-orthogonal set of search directions? Is the existence even guaranteed?
Gram-Schmidt Conjugation
https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf