Understanding Denoising Diffusion Probabilistic Models (DDPM)

Denoising Diffusion Probabilistic Models (DDPMs) are a type of generative model that has become popular for high-quality image synthesis. These models work by learning to reverse a diffusion process where noise is added to an image. This post will walk you through the core concepts of DDPMs and how they work to generate realistic images.

What Are Diffusion Models?

At a high level, Diffusion Models are generative models that create new data (like images) by gradually adding noise to an image until it becomes pure noise, and then learning to reverse this process. This process is split into two main stages:

Forward Diffusion Process: Noise is added to the image until it's unrecognizable.
Reverse Diffusion Process: The model learns to reverse the noise addition and recreate a clean image.

These models have proven to produce high-quality, realistic images, often rivaling other generative models like GANs (Generative Adversarial Networks).

FID Score: A Measure of Quality

To evaluate the quality of images generated by diffusion models, we use the Fréchet Inception Distance (FID) score. A perfect FID score is 0, meaning the generated image is indistinguishable from real images. A lower FID score indicates that the model is generating better-quality images.

How Diffusion Models Work

The process of generating images with Diffusion Models can be broken down into three key steps:

1. Starting with an Image (X₀)

The process begins with a base image from your dataset. This is the starting point from which the forward diffusion process will begin.

2. Adding Noise Iteratively

In this step, noise is added to the image gradually in small, controlled increments over many steps (usually between 1000 and 4000 steps). After enough steps, the image becomes completely random noise.

This step is known as the Forward Diffusion Process, where each step adds a tiny bit of noise until the image turns into static.

3. Reversing the Process

Once the image is pure noise, the goal is to reverse the diffusion process. Starting from the noise, the model learns to remove the noise step by step, reconstructing an image that looks like something from the original dataset.

This is called the Reverse Diffusion Process, and it is modeled with the help of a neural network that learns how to undo the noise and restore the image over time.

Key Concepts Behind Diffusion Models

Forward Diffusion: The process of adding noise to the image. At each step, the image becomes more and more corrupted until it becomes pure noise.
Reverse Diffusion: The model’s job is to learn how to reverse this process. It learns how to take a noisy image and progressively remove the noise, eventually reconstructing an image that resembles real data.
Markov Chains: Both the forward and reverse diffusion processes are modeled as Markov chains, where the next state only depends on the current state, not the previous states.

Training a Diffusion Model

Training a diffusion model involves teaching the neural network to approximate the reverse diffusion process. During training, the model learns to predict how to remove the noise added at each step, ultimately allowing it to generate high-quality images starting from random noise.

The forward process is similar to the encoder in a Variational Autoencoder (VAE), while the reverse process is similar to the decoder in a VAE. However, the key difference is in the way noise is added during the forward diffusion process, which is more controlled and specific than in VAEs.

Why Are DDPMs Effective?

DDPMs are effective because they use a well-defined and incremental process to generate images. By learning how to reverse the noise addition process, the model can produce highly detailed and coherent images. The ability to progressively denoise the image helps the model generate images that are both diverse and realistic.

Training Process

The neural network is trained to undo the noise added at each step of the forward diffusion process. By learning the parameters of the reverse diffusion process, the model is able to generate realistic images that match the distribution of real data.

Conclusion

Denoising Diffusion Probabilistic Models (DDPMs) offer a novel approach to generative modeling by introducing a diffusion process that adds and removes noise from images. This unique method allows the model to generate high-quality, realistic images without relying on adversarial training as in GANs.

Feel free to reach out with any questions or comments about diffusion models or generative networks!