I recently picked up an M1 Mac mini to replace my more-than-a-decade-old Mac mini while the old one was still operational and I could transfer files (namely my music library) more easily. At the same time I also did a full Windows reset on my Dell laptop that I previously used for gaming and development but decided to use only for gaming, nothing more, after seeing how much RAM was being taken up by various backgroud processes.

So I decided to make the new Mac mini my blogging computer. Turns out it’s great!

Sometimes I blog about Bayesian modeling with Stan, and when I re-knit the post about ODEs I saw sampling time drop from 30 seconds per chain (from that Dell G5 laptop with Core i7 CPU) to 15 seconds per chain. The improvement in computing performance from M1 is going to be really nice for any future technical blog posts I write.

After that the second thing I wanted to do was revisit an earlier blog post – partly because of my own curiosity, but also partly because a reader wrote in:

I recently purchased a new MacBook pro with the M1 chip and RStudio runs are much slower than on my old 2012 MacBook. Through lots of google searching I learned that the old MacBook has BLAS enabled, but the new one does not. I tried using your advice from https://mpopov.com/blog/2019/06/04/faster-matrix-math-in-r-on-macos/ but it is not updating when I recheck the session info.

Okay, there’s a couple of things going on here. First, if you transferred your system from the old MacBook to the new one then you’re using the “Intel 64-bit” version of R that runs via Rosetta 2, rather than the arm64 version of R which was released to run natively on Apple Silicon. If that’s the case I recommend downloading and installing that (available here), together with a copy of GNU Fortran for arm64 (available here) and the latest version of RStudio – since v1.4 is when they added support for R 4.1 and native arm64 builds of R.

In writing this blog post I installed both versions of R 4.1.1 and used RSwitch to switch between them.

There’s still a matter of the BLAS thing. Unfortunately:

New in macOS Big Sur 11.0.1, the system ships with a built-in dynamic linker cache of all system-provided libraries. As part of this change, copies of dynamic libraries are no longer present on the filesystem. Code that attempts to check for dynamic library presence by looking for a file at a path or enumerating a directory will fail. Instead, check for library presence by attempting to dlopen() the path, which will correctly check for the library in the cache. (Source: macOS Big Sur 11.0.1 Release Notes)

So the sym-linking commands in that post only work up to macOS 10.13 “High Sierra” – and I’ve updated the post to mention that. The only other thing I could do is try replacing R’s BLAS dynamic library with OpenBLAS. So I downloaded and built OpenBLAS 0.3.18 from source and tried to use that:

cd ~/Downloads/OpenBLAS-0.3.18
make
sudo make PREFIX=/opt/openblas install

cd /Library/Frameworks/R.framework/Resources/lib

ln -sf /opt/openblas/lib/libopenblas.dylib libRblas.dylib

Note: ln -sf libRblas.0.dylib libRblas.dylib can be used to revert.

This is the code I benchmarked:

set.seed(20211010)
d <- 1e2
a <- matrix(rnorm(d^2), d, d)
n <- 1e3
p <- 1e2
b <- rnorm(p + 1, 0, 10)
x <- matrix(runif(n * p, -10, 10), ncol = p, nrow = n)
y <- cbind(1, x) %*% b + rnorm(n, 0, 2)

mb <- microbenchmark(
  tcrossprod(a), solve(a), svd(a), lm(y ~ x),
  times = 1000L,
  unit = "ms"
)

And here are the results:

Chart comparing benchmark times of different linear algebra and matrix math operations under different configurations. Native arm64 R with OpenBLAS outperforms native arm64 R with R's built-in BLAS and Intel 64-bit R in every scenario.

M1 Mac mini benchmarks of various operations involving matrix math
BLAS Execution time (ms)
Minimum Lower Quartile Median Upper Quartile Maximum
tcrossprod(a)
Apple Silicon arm64 OpenBLAS 0.03 0.04 0.04 0.05 7.71
Apple Silicon arm64 R's BLAS 0.20 0.20 0.20 0.21 3.34
Intel 64-bit via Rosetta2 R's BLAS 0.22 0.22 0.22 0.23 3.07
svd(a)
Apple Silicon arm64 OpenBLAS 1.40 1.50 1.71 3.26 369.81
Apple Silicon arm64 R's BLAS 2.63 2.67 2.67 2.69 7.16
Intel 64-bit via Rosetta2 R's BLAS 2.90 2.92 2.94 2.95 7.23
solve(a)
Apple Silicon arm64 OpenBLAS 0.15 0.17 0.19 0.33 266.34
Apple Silicon arm64 R's BLAS 0.51 0.52 0.53 0.54 5.75
Intel 64-bit via Rosetta2 R's BLAS 0.57 0.58 0.59 0.60 5.26
lm(y ~ x)
Apple Silicon arm64 OpenBLAS 4.99 5.13 6.71 8.92 61.13
Apple Silicon arm64 R's BLAS 8.75 8.85 8.89 8.99 47.96
Intel 64-bit via Rosetta2 R's BLAS 7.84 7.89 7.93 8.08 32.83

It seems to me that if performance is what you’re after, then going with OpenBLAS seems like the way to go. Also – and I haven’t checked this – but there might even be a performance boost in building R from source and linking to OpenBLAS version of LAPACK.

That is, until someone figures out and documents how to use Accelerate framework’s BLAS in macOS 10.14 and newer.