Ordinary Differential Equations with Stan in R
Update (2022-03-22):
CmdStan 2.27 introduced a new syntax for declaring arrays which uses the array
keyword. Declaring arrays with brackets will be removed in CmdStan 2.32, so I have updated the Stan model code in this post to be compatible with future versions of CmdStan. I kept the
pre-2.27 model code at the bottom of the post for reference.
Update (2020-07-30): CmdStan 2.24 has improved performance with ODEs and I’ve updated the Stan model code to use the new API. See release candidate announcement for more details. I kept the original, pre-2.24 model code at the bottom of the post for reference.
Update (2020-07-16): I have modified this tutorial from its original publication. The previous version (available from the Internet Archive) used the {RStan} package, which, as it turns out, is a nightmare to get working on Windows. This version uses the in-active-development {CmdStanR} interface – a lightweight interface to Stan based on CmdStan and works exactly the same as its Python counterpart CmdStanPy.
This tutorial – inspired by / based on Demetri Pananos’s Introduction of pymc3.ode API – is my first since moving my whole website and blog from Squarespace to a site created with {blogdown} (which uses R Markdown and Hugo), version-controlled on GitHub, and hosted on Netlify. That process wasn’t too hard, mostly just long and boring.
Anyway, I wanted to (1) use the new setup in a cool way, (2) share something I learned – namely, how ODE models are specified in Stan, and (3) figure out how to get everything working correctly (e.g. MathJax, SVG charts).
These are the packages I’ll be using:
library(dplyr) # data wrangling
library(cmdstanr) # inference
library(zeallot) # multi-assignment
# Visualization:
library(ggplot2)
library(hrbrthemes)
theme_set(theme_ipsum_rc(
base_size = 14, subtitle_size = 16, axis_title_size = 12
))
Problem
An object of mass \(m = 2\)
is brought to some height and allowed to fall freely until it reaches the ground. A differential equation describing the object’s speed \(y\)
over time is
$$ y’ = mg - \gamma y $$
The force the object experiences in the downwards direction is \(mg\)
– where \(g = 9.8\)
is the force of gravity – while the force the object experiences in the opposite direction (due to air resistance \(\gamma = 0.4\)
) is proportional to how fast the object is presently moving.
freefall <- function(t, y, params) {
dy <- 2.0 * params["g"] - params["gamma"] * y
return(list(dy))
}
times <- seq(0, 10, 0.5)
c(gamma, g, y0, sigma, n) %<-% c(0.4, 9.8, -2, 2, length(times))
set.seed(42)
speed <- deSolve::ode(
y0, times, freefall,
parms = c("g" = g, "gamma" = gamma)
) %>%
as.data.frame %>%
setNames(c("time", "y")) %>%
mutate(y_obs = y + rnorm(n, 0, sigma)) # measurement noise
The results look like this:
The goal is to infer \(\gamma\)
, \(g\)
, and \(\sigma\)
from the data. We will use
Stan to perform Bayesian inference.
Solution
We compile the following Stan model as speed_model
(via
cmdstan_model
):
functions {
vector freefall(real t, // time
vector y, // state
vector theta, // parameters
real m) { // mass
vector[1] dydt;
// theta[2] is gravity, theta[1] is gamma
dydt[1] = m * theta[2] - theta[1] * y[1];
return dydt;
}
}
data {
int<lower = 1> T;
real t0;
array[T] real<lower = t0> ts;
real y0_obs;
array[T] real y_obs;
real m; // mass of object
}
parameters {
real<lower = 0> sigma; // noise
vector<lower = 0>[2] theta; // [gamma, g]
vector[1] y0; // initial condition to be inferred
}
model {
array[T] vector[1] mu = ode_bdf(freefall, y0, t0, ts, theta, m);
sigma ~ cauchy(0, 5);
theta[1] ~ gamma(2, 0.2); // mostly [0, 1]
theta[2] ~ gamma(3, 2); // mostly [0, 15]
y0 ~ std_normal(); // prior on initial condition
y0_obs ~ normal(y0[1], sigma);
for (t in 1:T) {
y_obs[t] ~ normal(mu[t], sigma);
}
}
Stan’s
ode_bdf
expects times \(t = 1, \ldots, T\)
, not starting at 0 – since \(t_0\)
is provided along with \(y_0\)
. (This is important!)
If we didn’t want to infer the initial condition \(y_0\)
, we would need to:
- change
y0_obs
toy0
in thedata
block - remove:
vector[1] y0;
from theparameters
block- the prior
y0 ~ std_normal();
from themodel
block - the likelihood
y0_obs ~ normal(y0[1], sigma);
from hemodel
block
functions {
vector freefall(real t, // time
vector y, // state
vector theta, // parameters
real m) { // mass
vector[1] dydt;
// theta[2] is gravity, theta[1] is gamma
dydt[1] = m * theta[2] - theta[1] * y[1];
return dydt;
}
}
data {
int<lower = 1> T;
real t0;
array[T] real<lower = t0> ts;
vector[1] y0; // initial condition known
array[T] real y_obs;
real m; // mass of object
}
parameters {
real<lower = 0> sigma; // noise
vector<lower = 0>[2] theta; // [gamma, g]
}
model {
array[T] vector[1] mu = ode_bdf(freefall, y0, t0, ts, theta, m);
sigma ~ cauchy(0, 5);
theta[1] ~ gamma(2, 0.2); // mostly [0, 1]
theta[2] ~ gamma(3, 2); // mostly [0, 15]
for (t in 1:T) {
y_obs[t] ~ normal(mu[t], sigma);
}
}
As you can see, nothing in the ode_bdf
call needs to change since it can handle a y0
that’s either data or parameter, it just has to be a vector
type. For more information see
Ordinary Differential Equations in Stan User Guide and
ODE Solvers in Stan Functions Reference.
Okay, now on to inference by calling
$sample()
on the compiled model and providing the data:
speed_data <- list(
T = n - 1,
t0 = 0,
ts = times[-1],
m = 2,
y0_obs = speed$y_obs[1],
y_obs = speed$y_obs[-1]
)
speed_fit <- speed_model$sample(
data = speed_data,
seed = 42L,
refresh = 0, # silence progress notification
output_dir = here("temp")
)
## Running MCMC with 4 parallel chains...
##
## Chain 3 finished in 16.5 seconds.
## Chain 4 finished in 16.8 seconds.
## Chain 1 finished in 16.9 seconds.
## Chain 2 finished in 17.4 seconds.
##
## All 4 chains finished successfully.
## Mean chain execution time: 16.9 seconds.
## Total execution time: 17.4 seconds.
Notes:
- The
output_dir
is optional. I specify it here to help with caching my RMarkdown chunks. - If not estimating
\(y_0\)
, changey0_obs = ...
toy0 = array(y0, dim = 1)
per the alternative model specification.
Here are the results:
speed_fit$summary()
## # A tibble: 5 × 10
## variable mean median sd mad q5 q95 rhat ess_bulk
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 lp__ -45.2 -44.8 1.46 1.29 -48.0 -43.4 1.00 1585.
## 2 sigma 2.74 2.67 0.490 0.459 2.07 3.63 1.00 1988.
## 3 theta[1] 0.400 0.401 0.0358 0.0353 0.338 0.458 1.00 1506.
## 4 theta[2] 9.71 9.72 0.679 0.663 8.55 10.8 1.00 1537.
## 5 y0[1] 0.0135 0.0188 0.902 0.878 -1.47 1.47 1.00 2281.
## # ℹ 1 more variable: ess_tail <dbl>
And here’s a pretty plot of those results:
Pre-2.27 model code
functions {
vector freefall(real t, // time
vector y, // state
vector theta, // parameters
real m) { // mass
vector[1] dydt;
// theta[2] is gravity, theta[1] is gamma
dydt[1] = m * theta[2] - theta[1] * y[1];
return dydt;
}
}
data {
int<lower = 1> T;
real t0;
real<lower = t0> ts[T];
real y0_obs;
real y_obs[T];
real m; // mass of object
}
parameters {
real<lower = 0> sigma; // noise
vector<lower = 0>[2] theta; // [gamma, g]
vector[1] y0; // initial condition to be inferred
}
model {
vector[1] mu[T] = ode_bdf(freefall, y0, t0, ts, theta, m);
sigma ~ cauchy(0, 5);
theta[1] ~ gamma(2, 0.2); // mostly [0, 1]
theta[2] ~ gamma(3, 2); // mostly [0, 15]
y0 ~ std_normal(); // prior on initial condition
y0_obs ~ normal(y0[1], sigma);
for (t in 1:T) {
y_obs[t] ~ normal(mu[t], sigma);
}
}
Pre-2.24 model code
functions {
real[] freefall(real t, // time
real[] y, // state
real[] theta, // parameters
real[] x_r, // data (real)
int[] x_i) { // data (integer))
real dydt[1];
// x_r[1] is mass, theta[2] is gravity, theta[1] is gamma
dydt[1] = x_r[1] * theta[2] - theta[1] * y[1];
return dydt;
}
}
data {
int<lower = 1> T;
real t0;
real<lower = t0> times[T];
real y0_obs;
real y_obs[T, 1];
real m; // mass of object
}
transformed data {
real x_r[1];
int x_i[0];
x_r[1] = m;
}
parameters {
real<lower = 0> sigma; // noise
real<lower = 0> theta[2]; // [gamma, g]
real y0[1]; // initial condition to be inferred
}
model {
real y[T, 1];
// Priors:
sigma ~ cauchy(0, 5);
theta[1] ~ gamma(2, 0.2); // mostly [0, 1]
theta[2] ~ gamma(3, 2); // mostly [0, 15]
y0 ~ std_normal(); // prior on initial condition
// Likelihood:
y0_obs ~ normal(y0, sigma);
y = integrate_ode_bdf(freefall, y0, t0, times, theta, x_r, x_i);
y_obs[, 1] ~ normal(y[, 1], sigma);
}
- Posted on:
- May 22, 2020
- Length:
- 7 minute read, 1453 words