Score-Based Model
1. Stein Score Function
The (Stein) score is the gradient of the log-likelihood function with respect to a date point: $$\nabla_x \ln p(x)$$
Score of $q(\mathbf{x}_t|\mathbf{x}_0)=\mathcal{N}\left( \sqrt{\bar{\alpha}_t}\mathbf{x}_0,(1-\bar{\alpha}_t)\mathbf{I} \right)$
$$\nabla_{\mathbf{x}_t}\ln q(\mathbf{x}_t|\mathbf{x}_0)=\nabla_\mathbf{x}\left( -\frac{||\mathbf{x}_t-\sqrt{\bar{\alpha}_t}\mathbf{x}_0||^2}{2(1-\bar{\alpha}_t)}\right) = -\frac{\mathbf{x}_t-\sqrt{\bar{\alpha}_t}\mathbf{x}_0}{1-\bar{\alpha}_t}$$
Use $\mathbf{\epsilon}_t=\frac{1}{\sqrt{1-\bar{\alpha}_t}}(\mathbf{x}_t-\sqrt{\bar{\alpha}_t}\mathbf{x}_0)$ then $$
$$\nabla_{\mathbf{x}_t}\ln q(\mathbf{x}_t|\mathbf{x}_0)=-\frac{\mathbf{\epsilon}_t}{\sqrt{1-\bar{\alpha}_t}}$$
So the noise predictor $\hat{\mathbf{\epsilon}}_\theta(\mathbf{x}_t,t)$ can be interpreted as predicting the score.
2. Tweedie's Formula
We can estimate the true mean of a normal distribution from samples drawn from it.
$$\mathbf{x}\sim p(\mathbf{x})=\mathcal{N}(\mathbf{x};\mathbf{\mu},\mathbf{\Sigma}),\\ \mathbb{E}[\mathbf{\mu}|\mathbf{x}]=\mathbf{x}+\mathbf{\Sigma}\nabla_\mathbf{x}\ln p(\mathbf{x})$$
For $q(\mathbf{x}_t|\mathbf{x}_0)$, $$\mathbb{E}[\mathbf{\mu}|\mathbf{x}_t]=\sqrt{\bar{\alpha}_t}\mathbf{x}_0=\mathbf{x}_t+(1-\bar{\alpha}_t)\nabla_\mathbf{x}\ln q(\mathbf{x}_t|\mathbf{x}_0)$$
3. Langevin Dynamics
Even without knowing $q(\mathbf{x})$, if we have the score function $\nabla_\mathbf{x}\ln q(\mathbf{x})$, we can sample from the distribution $q(\mathbf{x})$ using Langevin dynamics.
1. Sample $\mathbf{x}$ from a prior distribution.
2. Iterate the following procedure $T$ steps: $$\mathbf{x}\leftarrow \mathbf{x}+\eta\nabla_\mathbf{x}\ln q(\mathbf{x})+\sqrt{2\eta}\mathbf{\epsilon},\quad \mathbf{\epsilon}\sim\mathcal{N}(\mathbf{0},\mathbf{I})$$
3. It converges to $q(\mathbf{x})$ when $\eta\rightarrow 0$ and $T\rightarrow \infty$.
4. Noise-Conditional Score-Based Model
Use score prediction network $s_\theta(\mathbf{x})$
$$\mathbb{E}_{\mathbf{x}_0\sim q(\mathbf{x}_0)}[||s_\theta(\mathbf{x}_t)-\nabla_{\mathbf{x}_t}\ln q(\mathbf{x}_t)||^2]\\ =\cdots\\ =\mathbb{E}_{\mathbf{x}_0\sim q(\mathbf{x}_0),\mathbf{x}_t\sim q(\mathbf{x}_t|\mathbf{x}_0)}[||s_\theta(\mathbf{x}_t)-\nabla_{\mathbf{x}_t}\ln q(\mathbf{x}_t|\mathbf{x}_0)||^2]\\ =\mathbb{E}_{\mathbf{x}_0\sim q(\mathbf{x}_0),\mathbf{x}_t\sim q(\mathbf{x}_t|\mathbf{x}_0)}[||s_\theta(\mathbf{x}_t)+\sqrt{1-\bar{\alpha}_t}\mathbf{\epsilon}_t||^2]$$
Idential to the loss function of DDPM.
$$q(\mathbf{x}_t)=\int q(\mathbf{x}_0)\mathcal{N}(\mathbf{x}_t;\mathbf{x}_0,\sigma^2_t\mathbf{I})d\mathbf{x}$$
5. Annealed Langevin Dynamics
6. Stochastic Differential Equations
In a continuous time domain, the date perturbation (forward) process is described by the following SDE:
$$d\mathbf{x}=\mathbf{f}(\mathbf{x},t)dt+g(t)d\mathbf{w}$$
Its reverse process is also formulated as another stochastic differential equation: $$d\mathbf{x}=[\mathbf{f}(\mathbf{x},t)dt-g^2(t)\nabla_\mathbf{x}\ln p_t(\mathbf{x})]dt+g(t)d\mathbf{w}$$