Replacing the knitr engine for Stan

2020-08-03 UPDATE: Good news! A version of this engine is now included in versions 0.1.1 and later of {CmdStanR}. Use cmdstanr::register_knitr_engine() at the top of the R Markdown document to register it as the engine for stan chunks. See the vignette R Markdown CmdStan Engine for examples. Shoutout to the maintainers Jonah Gabry & Rok Češnovar for a super positive code review experience with the pull request for this.

I originally dabbled with custom {knitr} engine creation last month, when I made {dotnet} which enables R Markdown users to write chunks with C# and F# programs in them.

If you’re interested in creating your own, the R Markdown Cookbook has a chapter on that. You can also check out the built-in engines, {reticulate}’s Python engine, {JuliaCall}’s Julia engine, and {dotnet}’s .NET engine for examples.

If you browse the built-in engines, you’ll come across one for Stan and you’ll see that it uses {RStan}, which may be an issue on some computers. What would a CmdStan & {CmdStanR}-based engine for Stan look like if we wanted these benefits:

Let’s make one!


See Getting started with CmdStanR for general installation instructions and Max Mantei’s blogpost for installation on Windows.



We want to keep the essence of the feature

the Stan model within the code chunk is compiled into a stanmodel object, and is assigned to a variable with the name given by the output.var option.

from the existing Stan engine.

The resulting objects will be fundamentally different because {RStan} creates in an in-memory Stan model, while {CmdStanR} creates an object that holds the path to an on-disk executable, but the idea is the same – we should be able to write Stan model code in an R Markdown chunk and get back an object in R that we can use for sampling from the joint posterior distribution in the same document, like so:

```{stan, output.var="ex1"}
// example from r markdown manual

fit <- ex1$sample()

In addition to that, we want to be able to cache the chunk and keep the compiled executable so we wouldn’t have to compile every time the R Markdown document is knit/rendered.

  stan = function(options) {
    code <- knitr:::one_string(options$code)
    out <- options$output.var
    if (!is.character(out) || length(out) != 1L) stop(
      "the chunk option output.var must be a character string ",
      "providing a name for the returned `CmdStanModel` object."
    if (options$eval) {
      if (options$cache) {
        cache_path <- knitr:::valid_path(
        if (!dir.exists(cache_path)) {
        model_file <- tempfile(
          fileext = ".stan",
          tmpdir = cache_path
        cat(code, "\n", file = model_file)
      } else {
        model_file <- write_stan_file(code)
      csm <- cmdstan_model(model_file)
      assign(out, csm, envir = knit_global())
    engine_output(options, code, '')


Let’s try it out! Here’s our {stan bernoulli, output.var="bernoulli", cache=TRUE} chunk:

data { 
  int<lower=0> N; 
  array[N] int<lower=0,upper=1> y;
parameters {
  real<lower=0,upper=1> theta;
model {
  theta ~ beta(0.5, 0.5);
  y ~ bernoulli(theta);

NOTE: As of CmdStan 2.27.0 there’s a new array keyword. Declaration of arrays by placing brackets after a variable name is deprecated and will be removed in Stan 2.32.0. The code above is using that new syntax. For pre-2.27.0 versions of CmdStan the following change needs to be made:

data { 
  int<lower=0> N; 
-  array[N] int<lower=0,upper=1> y;
+  int<lower=0,upper=1> y[N];

Let’s verify:

## data { 
##   int<lower=0> N; 
##   array[N] int<lower=0,upper=1> y;
## } 
## parameters {
##   real<lower=0,upper=1> theta;
## } 
## model {
##   theta ~ beta(0.5, 0.5);
##   y ~ bernoulli(theta);
## }

And now it’s time for inference:

y <- c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1)

fit <- bernoulli$sample(
  data = list(
    N = length(y),
    y = y
  refresh = 0
## Running MCMC with 4 sequential chains...
## Chain 1 finished in 0.0 seconds.
## Chain 2 finished in 0.0 seconds.
## Chain 3 finished in 0.0 seconds.
## Chain 4 finished in 0.0 seconds.
## All 4 chains finished successfully.
## Mean chain execution time: 0.0 seconds.
## Total execution time: 0.5 seconds.
##  variable  mean median   sd  mad    q5   q95 rhat ess_bulk ess_tail
##     lp__  -6.42  -6.13 0.74 0.32 -7.95 -5.90 1.00     1630     1471
##     theta  0.23   0.21 0.12 0.12  0.06  0.45 1.00     1317     1370


Posted on:
July 30, 2020
4 minute read, 746 words
R bayesian
See Also:
Wikipedia Preview for R Markdown documents
Even faster matrix math in R on macOS with M1
Making Of: Session Tick visualization