Score-Based Model

 

1. Stein Score Function

The (Stein) score is the gradient of the log-likelihood function with respect to a date point: $$\nabla_x \ln p(x)$$ 

 

Score of $q(\mathbf{x}_t|\mathbf{x}_0)=\mathcal{N}\left( \sqrt{\bar{\alpha}_t}\mathbf{x}_0,(1-\bar{\alpha}_t)\mathbf{I} \right)$

$$\nabla_{\mathbf{x}_t}\ln q(\mathbf{x}_t|\mathbf{x}_0)=\nabla_\mathbf{x}\left(  -\frac{||\mathbf{x}_t-\sqrt{\bar{\alpha}_t}\mathbf{x}_0||^2}{2(1-\bar{\alpha}_t)}\right) = -\frac{\mathbf{x}_t-\sqrt{\bar{\alpha}_t}\mathbf{x}_0}{1-\bar{\alpha}_t}$$

 

Use $\mathbf{\epsilon}_t=\frac{1}{\sqrt{1-\bar{\alpha}_t}}(\mathbf{x}_t-\sqrt{\bar{\alpha}_t}\mathbf{x}_0)$ then $$

$$\nabla_{\mathbf{x}_t}\ln q(\mathbf{x}_t|\mathbf{x}_0)=-\frac{\mathbf{\epsilon}_t}{\sqrt{1-\bar{\alpha}_t}}$$

So the noise predictor $\hat{\mathbf{\epsilon}}_\theta(\mathbf{x}_t,t)$ can be interpreted as predicting the score.

 


2. Tweedie's Formula

We can estimate the true mean of a normal distribution from samples drawn from it.

$$\mathbf{x}\sim p(\mathbf{x})=\mathcal{N}(\mathbf{x};\mathbf{\mu},\mathbf{\Sigma}),\\ \mathbb{E}[\mathbf{\mu}|\mathbf{x}]=\mathbf{x}+\mathbf{\Sigma}\nabla_\mathbf{x}\ln p(\mathbf{x})$$


For $q(\mathbf{x}_t|\mathbf{x}_0)$, $$\mathbb{E}[\mathbf{\mu}|\mathbf{x}_t]=\sqrt{\bar{\alpha}_t}\mathbf{x}_0=\mathbf{x}_t+(1-\bar{\alpha}_t)\nabla_\mathbf{x}\ln q(\mathbf{x}_t|\mathbf{x}_0)$$



3. Langevin Dynamics

Even without knowing $q(\mathbf{x})$, if we have the score function $\nabla_\mathbf{x}\ln q(\mathbf{x})$, we can sample from the distribution $q(\mathbf{x})$ using Langevin dynamics.


1. Sample $\mathbf{x}$ from a prior distribution.

2. Iterate the following procedure $T$ steps: $$\mathbf{x}\leftarrow \mathbf{x}+\eta\nabla_\mathbf{x}\ln q(\mathbf{x})+\sqrt{2\eta}\mathbf{\epsilon},\quad \mathbf{\epsilon}\sim\mathcal{N}(\mathbf{0},\mathbf{I})$$

3. It converges to $q(\mathbf{x})$ when $\eta\rightarrow 0$ and $T\rightarrow \infty$.



4. Noise-Conditional Score-Based Model

Use score prediction network $s_\theta(\mathbf{x})$

$$\mathbb{E}_{\mathbf{x}_0\sim q(\mathbf{x}_0)}[||s_\theta(\mathbf{x}_t)-\nabla_{\mathbf{x}_t}\ln q(\mathbf{x}_t)||^2]\\ =\cdots\\ =\mathbb{E}_{\mathbf{x}_0\sim q(\mathbf{x}_0),\mathbf{x}_t\sim q(\mathbf{x}_t|\mathbf{x}_0)}[||s_\theta(\mathbf{x}_t)-\nabla_{\mathbf{x}_t}\ln q(\mathbf{x}_t|\mathbf{x}_0)||^2]\\ =\mathbb{E}_{\mathbf{x}_0\sim q(\mathbf{x}_0),\mathbf{x}_t\sim q(\mathbf{x}_t|\mathbf{x}_0)}[||s_\theta(\mathbf{x}_t)+\sqrt{1-\bar{\alpha}_t}\mathbf{\epsilon}_t||^2]$$

Idential to the loss function of DDPM.

$$q(\mathbf{x}_t)=\int q(\mathbf{x}_0)\mathcal{N}(\mathbf{x}_t;\mathbf{x}_0,\sigma^2_t\mathbf{I})d\mathbf{x}$$


5. Annealed Langevin Dynamics



6. Stochastic Differential Equations

In a continuous time domain, the date perturbation (forward) process is described by the following SDE:

$$d\mathbf{x}=\mathbf{f}(\mathbf{x},t)dt+g(t)d\mathbf{w}$$

Its reverse process is also formulated as another stochastic differential equation: $$d\mathbf{x}=[\mathbf{f}(\mathbf{x},t)dt-g^2(t)\nabla_\mathbf{x}\ln p_t(\mathbf{x})]dt+g(t)d\mathbf{w}$$