Flow Matching (FM) has become a prevalent technique to train a certain class of generative models. In this post we'll try to explore the intuition behind flow matching and how it works.
We'll use this notebook to build a simple flow matching model illustrating linear flow matching based on a minimal toy example. Our goal is to try to keep things simple, intuitive, and visual. We won't be doing any deep dive into the mathematical details of the model, if you're interested in the mathematical details I recommend checking out the references at the end of this post.
In [1]:
Flow matching
Flow matching is a technique to learn how to transport samples from one distribution to another. For example we could learn how to transport samples from a simple distribution we can easily sample from (e.g. Gaussian noise) to a complex distribution (e.g. images , videos , robot actions , etc.).
Toy Example: Mapping Gaussian noise to a bimodal distribution
In this post we'll build a simple toy example of a generative model using flow matching. For illustrative purposes we'll start with a simple 1D bimodal target distribution $π_1$ and learn how to transport samples from a 1D Gaussian noise distribution $π_0$ to this target distribution.
In practice the target points $x_1 \sim π_1$ are approximated by sampling from a limited dataset of training points $X_1$ and the noise points $x_0 \sim π_0$ are sampled from a chosen noise distribution $π_0$ that is easy to sample from (e.g. Gaussian noise).
In [2]:
The flow matching model predicts a velocity field
A flow matching model does not predict flow paths directly, but instead predicts a velocity field that can be used to sample the flow paths. The velocity field describes how to move a sample from the noise distribution to the target distribution.
We can describe the flow matching model with learnable parameters $\theta$ as a function: $${FM}_{\theta}(x_t, t) = v(x_t, t)$$ This function takes a sample $x_t$ at flow step $t$ and predicts the velocity vector $v(x_t, t) = dx_t / dt$ that describes how to move the sample $x_t$ closer to the target distribution at step $t$.
The step $t$ is a value between 0 and 1 that describes the progress of the sample $x_t$ along the flow path from the noise distribution to the target distribution. When $t=0$ the sample $x_t = x_0$ is a sample from the noise distribution $π_0$ and when $t=1$ the sample $x_t = x_1$ is a sample from the target distribution $π_1$.
At inference time we can sample a starting point $x_0$ from the noise distribution $π_0$ and then use the predicted velocity field ${FM}_{\theta}(x_t, t)$ to iteratively move the sample towards the target distribution $π_1$ in small steps $dt$
This is illustrated in the following animation ( generated further down in the notebook ) which shows the integration of a sample from the noise distribution $π_0$ on the left towards the target distribution $π_1$ on the right using the predicted velocity field ${FM}_{\theta}(x_t, t)$. The velocity field is visualized as a heatmap where the vertical axis represents the position of the sample $x_t$ and the horizontal axis represents the flow step $t$ going from 0 on the left to 1 on the right. Red means a positive velocity (sample pushed up towards higher $x$) and blue means a negative velocity (sample pulled down towards lower $x$).
In [3]:
Training the flow matching model is learning the velocity field
Since the flow matching model ${FM}_{\theta}(x_t, t)$ should predict the velocity field $v(x_t, t) = dx_t / dt$ we can train the model on samples of velocity vectors $\mathbf{v}(x_t, t)$.
The flow matching training objective is to minimize the expected reconstruction error of the velocity field: $$ \underset{\theta}{\text{argmin}} \; \mathbb{E}_{t, x_t} \Big\| {FM}_{\theta}(x_t, t) - v(x_t, t) \Big\|^2 $$
with $t \sim \mathcal{U}[0, 1]$ and $x_t$ taken from a sampled reference path evaluated at flow step $t$.
We'll be using straight line reference paths in this post since they are simple and common.
Training: Straight line reference paths
We're going to focus on a common variant of flow matching where we learn a flow matching model based on straight line reference paths. Training flow matching with straight-line conditional paths and independent couplings is also equivalent to the rectified flow training objective.
Linear (straight line) flow matching is trained on a set of reference paths between the noise and target distributions. More specifically, linear flow matching prefers learning from straight line trajectories between the noise and target distributions because they tend to give straighter paths that require fewer steps to reconstruct the target distribution.
To sample a reference path we can independently sample a target point $x_1$ from our target distribution $π_1$ and independently sample a noise point $x_0$ from the noise distribution $π_0$. This gives us a single coupling $(x_0, x_1)$ that allows us to define a straight line reference path between the noise and target samples. During training we'll sample a large set of coupling-inducing paths $(X_0, X_1)$ and use these to train the flow matching model.
The following code illustrates how we define the straight line reference path between a noise and target sample.
In [4]:
The following figure shows a few sampled straight-line reference paths, as well as the reference path distribution approximated by sampling a large number of straight-line reference paths.
In [5]:
Training: Sampling velocity vectors
Since we are using straight-line reference paths, the sampled velocity vectors $\mathbf{v}(x_t, t)$ have a very simple form. Given a sample from the noise distribution $x_0$ and a sample from the target distribution $x_1$ we can describe the conditional velocity vector along the straight-line connecting $x_0$ and $x_1$ as: $\mathbf{v}(x_t, t) = x_1 - x_0$ as illustrated in the following code and figure.
In [6]:
In [7]:
Training: Flow matching objective
We can now write out our objective as a function of the samples from the noise distribution $x_0$ and the target distribution $x_1$: $$ \underset{\theta}{\text{argmin}} \; \mathbb{E}_{t, X_0, X_1} \Big\| {FM}_{\theta}(x_t, t) - (X_1 - X_0) \Big\|^2 \quad\quad $$ with $t \sim \mathcal{U}[0, 1]$, $X_0 \sim \pi_0$, $X_1 \sim \pi_1$, and $x_t = (1 - t) X_0 + t X_1$.
Note that the flow matching model ${FM}_{\theta}(x_t, t)$ is trained conditionally on specific straight-line couplings $(X_0, X_1)$, but since these are averaged out in the training objective, the flow matching model will learn an approximation of the velocity field independent of any specific coupling.
For this simple toy example we could even approximate the flow field directly by sampling a large number of reference paths and computing the average velocity for fixed bins over the flow field. This approximated expectation is illustrated in the following figure, which shows the average flow field. Red means a positive velocity (sample pushed up towards higher $x$) and blue means a negative velocity (sample pulled down towards lower $x$).
In [8]:
Training the Flow Matching model
Now that we have defined our optimization objective, and how we can sample the data to train the model, we can define the flow matching model and train it. We'll create a simple neural network with a single hidden layer that we can train to predict the velocity field.
In [9]:
We can now define the loss function as a function of the flow matching model, the noise samples $X_0$, the target samples $X_1$, and the flow steps $T$:
In [10]:
Using this loss function we can now train the flow matching model in a straightforward gradient-based optimization loop. We'll use a standard Adam optimizer to optimize the model parameters.
In [11]:
In [12]:
Visualizing the trained flow matching model
Now that we have trained this simple flow matching model we can visualize the learned velocity field by getting the predicted velocity field ${FM}_{\theta}(x_t, t)$ at a grid of points $(t, x_t)$ and plotting this grid of velocities as a color image. Red means a positive velocity (sample pushed up towards higher $x$) and blue means a negative velocity (sample pulled down towards lower $x$).
In [13]:
Sampling from the trained model
At inference time we can sample a starting point $x_0$ from the noise distribution $π_0$ and then use the predicted velocity field ${FM}_{\theta}(x_t, t)$ to iteratively move (integrate) the sample towards a sample $\hat{x}_1$ from the target distribution $π_1$.
The code below starts with noise $ x_0 \sim \mathcal{N}(0, 1)$ and integrates the learned ODE using the simple Euler method . The Euler method is a simple integration method that at each step $t$ takes the velocity field prediction ${FM}_{\theta}(x_t, t)$ at the current position $x_t$ and moves the sample a small step $dt$ in the direction of the velocity field.
In [14]:
| 0.000 | 0.067 | 0.133 | 0.200 | 0.267 | 0.333 | 0.400 | 0.467 | 0.533 | 0.600 | 0.667 | 0.733 | 0.800 | 0.867 | 0.933 | 1.000 |
| 0.850 | 0.805 | 0.767 | 0.738 | 0.719 | 0.716 | 0.731 | 0.769 | 0.830 | 0.909 | 0.999 | 1.095 | 1.192 | 1.288 | 1.384 | 1.481 |
We can illustrate this sampled path in the following animation which shows the integration from the noise sample $x_0$ towards the target distribution $\hat{x}_1$ using the predicted velocity field ${FM}_{\theta}(x_t, t)$ above. The velocity field is visualized as a heatmap where the vertical axis represents the position of the sample $x_t$ and the horizontal axis represents the flow step $t$ going from 0 on the left to 1 on the right. Red means a positive velocity (sample pushed up towards higher $x$) and blue means a negative velocity (sample pulled down towards lower $x$).
Notice that while we trained on straight-line paths, the sampled path it not necessarily a straight line. This is because we don't learn the paths directly but learn the unconditioned velocity field by training on a large set of straight-line reference paths.
In [15]:
We can also take a large sample from the model $\hat{X}_1$ and reconstruct the target distribution $\pi_1$. We'll define a sample function that will generate samples by integrating the learned vector field using Euler integration . We'll then plot the target distribution and the reconstructed samples.
In [16]:
In [17]:
As a final illustration, let's illustrate the the path density between the starting noise samples $\hat{X}_0$ and the final reconstructed samples $\hat{X}_1$ by sampling a large number of paths from the noise distribution $\pi_0$ to the target distribution $\pi_1$.
In [18]:
In [19]:
Originally published on November 1, 2025.
.png)

