Conjugate Gradient Method Derivation
- CG solves a linear equation Ax=b.
Derivation:
- Recall the step size calculation of Steepest Gradient Descent.
Recall that f(x)=12xTAx−bTx+c is the corresponding parabollic function for the linear equation Ax=b (the minus sign for b is just to make it nice when we consider the relation between Ax=b and its quadratic form). And f′(x)=12ATx+12Ax−b
When A is symmetric, this becomes f′(x)=Ax−b. So −f′(xi)=ri, where ri=b−Axi (residual vector)
Recall the typical trajectory of Steepest Descent. It is often redundant or inefficient when it’s close to the true solution. Can we improve somehow? For example, if we can pick n orthogonal search directions {d1,d2,…dn} such that we only need exactly one step for each direction to get to the solution, it would be much more efficient. One obvious choice (but impractical) is to choose coordinate axes as our search directions. This would lead us to:
- Update Equation: xi+1=xi+αdi
- How to find α :
- Observe that $dihastobeorthogonaltoe{i+1} = x_{i+1} - x$. So,
But we don’t know ei (because we don’t know x) so this is useless. But we can modify the above idea by picking an A-orthogonal set of search directions, instead of orthogonal one. di and dj is A-orthogonal if dTiAdj=0. How can we find an appropriate step size?
Recall the original idea of step size: we want to minimize the function value most by choosing the best step size. It means we want to solve argminf(xi+αdi). Setting the derivative w.r.t. α equal to 0 tells us that
∂f(xi+1)∂α=0⟺∇f(xi+1)Tddαxi+1=0⟺−rTi+1di=0
(directional derivative: http://tutorial.math.lamar.edu/Classes/CalcIII/DirectionalDeriv.aspx)
Now recall that ri=b−Axi=b−Ax+Ax−Axi=−Aei (Note: b - Ax = 0). So the last term will become
eTi+1Adi=0⟺(ei+αdi)TAdi=0⟺α=−eTiAdidTiAdi=−rTididTiAdi
- So how to find an A-orthogonal set of search directions? Is the existence even guaranteed?
Gram-Schmidt Conjugation
https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf