Ordinary Differential Equations with Stan in R

Update (2022-03-22): CmdStan 2.27 introduced a new syntax for declaring arrays which uses the array keyword. Declaring arrays with brackets will be removed in CmdStan 2.32, so I have updated the Stan model code in this post to be compatible with future versions of CmdStan. I kept the pre-2.27 model code at the bottom of the post for reference.

Update (2020-07-30): CmdStan 2.24 has improved performance with ODEs and I’ve updated the Stan model code to use the new API. See release candidate announcement for more details. I kept the original, pre-2.24 model code at the bottom of the post for reference.

Update (2020-07-16): I have modified this tutorial from its original publication. The previous version (available from the Internet Archive) used the {RStan} package, which, as it turns out, is a nightmare to get working on Windows. This version uses the in-active-development {CmdStanR} interface – a lightweight interface to Stan based on CmdStan and works exactly the same as its Python counterpart CmdStanPy.


This tutorial – inspired by / based on Demetri Pananos’s Introduction of pymc3.ode API – is my first since moving my whole website and blog from Squarespace to a site created with {blogdown} (which uses R Markdown and Hugo), version-controlled on GitHub, and hosted on Netlify. That process wasn’t too hard, mostly just long and boring.

Anyway, I wanted to (1) use the new setup in a cool way, (2) share something I learned – namely, how ODE models are specified in Stan, and (3) figure out how to get everything working correctly (e.g. MathJax, SVG charts).

These are the packages I’ll be using:

library(dplyr) # data wrangling
library(cmdstanr) # inference
library(zeallot) # multi-assignment
# Visualization:
library(ggplot2)
library(hrbrthemes)
theme_set(theme_ipsum_rc(
  base_size = 14, subtitle_size = 16, axis_title_size = 12
))

Problem

An object of mass \(m = 2\) is brought to some height and allowed to fall freely until it reaches the ground. A differential equation describing the object’s speed \(y\) over time is

$$ y’ = mg - \gamma y $$

The force the object experiences in the downwards direction is \(mg\) – where \(g = 9.8\) is the force of gravity – while the force the object experiences in the opposite direction (due to air resistance \(\gamma = 0.4\)) is proportional to how fast the object is presently moving.

freefall <- function(t, y, params) {
  dy <- 2.0 * params["g"] - params["gamma"] * y
  return(list(dy))
}
times <- seq(0, 10, 0.5)
c(gamma, g, y0, sigma, n) %<-% c(0.4, 9.8, -2, 2, length(times))

set.seed(42)
speed <- deSolve::ode(
  y0, times, freefall,
  parms = c("g" = g, "gamma" = gamma)
) %>%
  as.data.frame %>%
  setNames(c("time", "y")) %>%
  mutate(y_obs = y + rnorm(n, 0, sigma)) # measurement noise

The results look like this:

The goal is to infer \(\gamma\), \(g\), and \(\sigma\) from the data. We will use Stan to perform Bayesian inference.

Solution

We compile the following Stan model as speed_model (via cmdstan_model):

functions {
  vector freefall(real t,        // time
                  vector y,      // state
                  vector theta,  // parameters
                  real m) {      // mass
    vector[1] dydt;
    // theta[2] is gravity, theta[1] is gamma
    dydt[1] = m * theta[2] - theta[1] * y[1];
    return dydt;
  }
}
data {
  int<lower = 1>            T;
  real                      t0;
  array[T] real<lower = t0> ts;
  real                      y0_obs;
  array[T] real             y_obs;
  real                      m; // mass of object
}
parameters {
  real<lower = 0>      sigma; // noise
  vector<lower = 0>[2] theta; // [gamma, g]
  vector[1]            y0;    // initial condition to be inferred
}
model {
  array[T] vector[1] mu = ode_bdf(freefall, y0, t0, ts, theta, m);
  sigma ~ cauchy(0, 5);
  theta[1] ~ gamma(2, 0.2); // mostly [0, 1]
  theta[2] ~ gamma(3, 2);   // mostly [0, 15]
  y0 ~ std_normal();        // prior on initial condition
  y0_obs ~ normal(y0[1], sigma);
  for (t in 1:T) {
    y_obs[t] ~ normal(mu[t], sigma);
  }
}

Stan’s ode_bdf expects times \(t = 1, \ldots, T\), not starting at 0 – since \(t_0\) is provided along with \(y_0\). (This is important!)

If we didn’t want to infer the initial condition \(y_0\), we would need to:

  • change y0_obs to y0 in the data block
  • remove:
  • vector[1] y0; from the parameters block
  • the prior y0 ~ std_normal(); from the model block
  • the likelihood y0_obs ~ normal(y0[1], sigma); from he model block
functions {
  vector freefall(real t,        // time
                  vector y,      // state
                  vector theta,  // parameters
                  real m) {      // mass
    vector[1] dydt;
    // theta[2] is gravity, theta[1] is gamma
    dydt[1] = m * theta[2] - theta[1] * y[1];
    return dydt;
  }
}
data {
  int<lower = 1>            T;
  real                      t0;
  array[T] real<lower = t0> ts;
  vector[1]                 y0; // initial condition known
  array[T] real             y_obs;
  real                      m; // mass of object
}
parameters {
  real<lower = 0>      sigma; // noise
  vector<lower = 0>[2] theta; // [gamma, g]
}
model {
  array[T] vector[1] mu = ode_bdf(freefall, y0, t0, ts, theta, m);
  sigma ~ cauchy(0, 5);
  theta[1] ~ gamma(2, 0.2); // mostly [0, 1]
  theta[2] ~ gamma(3, 2);   // mostly [0, 15]
  for (t in 1:T) {
    y_obs[t] ~ normal(mu[t], sigma);
  }
}

As you can see, nothing in the ode_bdf call needs to change since it can handle a y0 that’s either data or parameter, it just has to be a vector type. For more information see Ordinary Differential Equations in Stan User Guide and ODE Solvers in Stan Functions Reference.

Okay, now on to inference by calling $sample() on the compiled model and providing the data:

speed_data <- list(
  T = n - 1,
  t0 = 0,
  ts = times[-1],
  m = 2,
  y0_obs = speed$y_obs[1],
  y_obs = speed$y_obs[-1]
)

speed_fit <- speed_model$sample(
  data = speed_data,
  seed = 42L,
  refresh = 0, # silence progress notification
  output_dir = here("temp")
)
## Running MCMC with 4 parallel chains...
## 
## Chain 3 finished in 16.5 seconds.
## Chain 4 finished in 16.8 seconds.
## Chain 1 finished in 16.9 seconds.
## Chain 2 finished in 17.4 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 16.9 seconds.
## Total execution time: 17.4 seconds.

Notes:

  • The output_dir is optional. I specify it here to help with caching my RMarkdown chunks.
  • If not estimating \(y_0\), change y0_obs = ... to y0 = array(y0, dim = 1) per the alternative model specification.

Here are the results:

speed_fit$summary()
## # A tibble: 5 × 10
##   variable     mean   median     sd    mad      q5     q95  rhat ess_bulk
##   <chr>       <num>    <num>  <num>  <num>   <num>   <num> <num>    <num>
## 1 lp__     -45.2    -44.8    1.46   1.29   -48.0   -43.4    1.00    1585.
## 2 sigma      2.74     2.67   0.490  0.459    2.07    3.63   1.00    1988.
## 3 theta[1]   0.400    0.401  0.0358 0.0353   0.338   0.458  1.00    1506.
## 4 theta[2]   9.71     9.72   0.679  0.663    8.55   10.8    1.00    1537.
## 5 y0[1]      0.0135   0.0188 0.902  0.878   -1.47    1.47   1.00    2281.
## # ℹ 1 more variable: ess_tail <num>

And here’s a pretty plot of those results:

Pre-2.27 model code

functions {
  vector freefall(real t,        // time
                  vector y,      // state
                  vector theta,  // parameters
                  real m) {      // mass
    vector[1] dydt;
    // theta[2] is gravity, theta[1] is gamma
    dydt[1] = m * theta[2] - theta[1] * y[1];
    return dydt;
  }
}
data {
  int<lower = 1>   T;
  real             t0;
  real<lower = t0> ts[T];
  real             y0_obs;
  real             y_obs[T];
  real             m; // mass of object
}
parameters {
  real<lower = 0>      sigma; // noise
  vector<lower = 0>[2] theta; // [gamma, g]
  vector[1]            y0;    // initial condition to be inferred
}
model {
  vector[1] mu[T] = ode_bdf(freefall, y0, t0, ts, theta, m);
  sigma ~ cauchy(0, 5);
  theta[1] ~ gamma(2, 0.2); // mostly [0, 1]
  theta[2] ~ gamma(3, 2);   // mostly [0, 15]
  y0 ~ std_normal();        // prior on initial condition
  y0_obs ~ normal(y0[1], sigma);
  for (t in 1:T) {
    y_obs[t] ~ normal(mu[t], sigma);
  }
}

Pre-2.24 model code

functions {
  real[] freefall(real t,        // time
                  real[] y,      // state
                  real[] theta,  // parameters
                  real[] x_r,    // data (real)
                  int[] x_i) {   // data (integer))
    real dydt[1];
    //  x_r[1] is mass, theta[2] is gravity, theta[1] is gamma
    dydt[1] = x_r[1] * theta[2] - theta[1] * y[1];
    return dydt;
  }
}
data {
  int<lower = 1>   T;
  real             t0;
  real<lower = t0> times[T];
  real             y0_obs;
  real             y_obs[T, 1];
  real             m; // mass of object
}
transformed data {
  real x_r[1];
  int  x_i[0];
  x_r[1] = m;
}
parameters {
  real<lower = 0> sigma;    // noise
  real<lower = 0> theta[2]; // [gamma, g]
  real            y0[1];    // initial condition to be inferred
}
model {
  real y[T, 1];
  // Priors:
  sigma ~ cauchy(0, 5);
  theta[1] ~ gamma(2, 0.2); // mostly [0, 1]
  theta[2] ~ gamma(3, 2);   // mostly [0, 15]
  y0 ~ std_normal(); // prior on initial condition
  // Likelihood:
  y0_obs ~ normal(y0, sigma);
  y = integrate_ode_bdf(freefall, y0, t0, times, theta, x_r, x_i);
  y_obs[, 1] ~ normal(y[, 1], sigma);
}
Posted on:
May 22, 2020
Length:
7 minute read, 1453 words
Tags:
R stats bayesian
See Also:
Wikipedia Preview for R Markdown documents
Even faster matrix math in R on macOS with M1
Making Of: Session Tick visualization