Pathwise Estimator

Assumption

  • $z = t(\epsilon, v)$ for $\epsilon \sim s(\epsilon)$ implies $z \sim q(z; v)$.

Example

$$\epsilon \sim N(0,1) \\ z = t(\epsilon, v) \text{, where } t(\epsilon,v) = \epsilon v_1 + v_0 \\ \Rightarrow z \sim q(z; v) \text{, where } q(z; v) = N(v_0, v_1^2)$$

That is, the distribution of $Z$: $q(z)$ is parameterized by $v = [v_0, v_1^2] = [\mu, \sigma^2]$.

  • $\log p(x,z)$ and $\log q(z; v)$ are differentiable with respect to $z$.

Usage

Recall that EBLO is the following:

$$L(v) = E_{q(z;v)}[\log p(x,z) - \log q(z;v)]$$

If we define $g(z,v) = \log p(x,z) - \log q(z;v)$, then we can express $\nabla_v L(v)$ as:

$$\nabla_v L(v) = \nabla_v \int g(z,v) q(z; v) dz \\ = \int \nabla_v q(z; v) g(z,v) + q(z; v) \nabla_v g(z,v) dz \\ = \int q(z;v) \nabla_v \log q(z;v) g(z,v) + q(z;v) \nabla_v g(z,v) dz \\ = E_{q(z;v)} \left [ \nabla_v \log q(z;v) g(z,v) + \nabla_v g(z,v) \right ]$$

We can rewrite the above gradient using $z = t(\epsilon, v)$:

$$\nabla_v L(v) = \int [\nabla_v \log q(z;v) g(z,v) + \nabla_v g(z,v)] q(z;v) dz \\ = E_{s(\epsilon)} \left [ \nabla_v \log s(\epsilon) g(t(\epsilon, v),v) + \nabla_v g(t(\epsilon, v),v) \right ]$$

Since $\log s(\epsilon)$ doesn’t depend on $v$, the first term will disappear. Then,

$$\nabla_v L(v) = E_{s(\epsilon)} [\nabla_v g(t(\epsilon, v),v)] \\ = E_{s(\epsilon)} [ \nabla_v (\log p(x,z) - \log q(z;v)) ] \\ = E_{s(\epsilon)} [ \frac{1}{p(x,z)} \log \nabla_v p(x,z) - \frac{1}{q(z;v)} \log \nabla_v q(z;v)) ] \\ = E_{s(\epsilon)} [ \frac{1}{p(x,z)} \log \nabla_z p(x,z) \frac{\partial z}{\partial v} - \frac{1}{q(z;v)} \log \nabla_v q(z;v)) ] \\$$ $$\nabla_v L(v) = E_{s(\epsilon)} [\nabla_v g(t(\epsilon, v),v)] \\ = E_{s(\epsilon)} [ \nabla_v (\log p(x,z) - \log q(z;v)) ] \\ = E_{s(\epsilon)} [ \nabla_z (\log p(x,z) - \frac{1}{q(z;v)) \nabla_v t(\epsilon, v) - \nabla_v \log q(z; v) ] \\ = E_{s(\epsilon)} [ \frac{1}{p(x,z)} \log \nabla_z p(x,z) \frac{\partial z}{\partial v} - \frac{1}{q(z;v)} \log \nabla_v q(z;v)) ] \\$$

(WIP)

Reference

Variational Inferene NIPS Tutorial:

Blog post about MONTE CARLO GRADIENT ESTIMATORS