Even faster matrix math in R on macOS with M1

I recently picked up an M1 Mac mini to replace my more-than-a-decade-old Mac mini while the old one was still operational and I could transfer files (namely my music library) more easily. At the same time I also did a full Windows reset on my Dell laptop that I previously used for gaming and development but decided to use only for gaming, nothing more, after seeing how much RAM was being taken up by various background processes.

So I decided to make the new Mac mini my blogging computer. Turns out it’s great!

Sometimes I blog about Bayesian modeling with Stan, and when I re-knit the post about ODEs I saw sampling time drop from 30 seconds per chain (from that Dell G5 laptop with Core i7 CPU) to 15 seconds per chain. The improvement in computing performance from M1 is going to be really nice for any future technical blog posts I write.

After that the second thing I wanted to do was revisit an earlier blog post – partly because of my own curiosity, but also partly because a reader wrote in:

I recently purchased a new MacBook pro with the M1 chip and RStudio runs are much slower than on my old 2012 MacBook. Through lots of google searching I learned that the old MacBook has BLAS enabled, but the new one does not. I tried using your advice from https://mpopov.com/blog/2019/06/04/faster-matrix-math-in-r-on-macos/ but it is not updating when I recheck the session info.

Okay, there’s a couple of things going on here. First, if you transferred your system from the old MacBook to the new one then you’re using the “Intel 64-bit” version of R that runs via Rosetta 2, rather than the arm64 version of R which was released to run natively on Apple Silicon. If that’s the case I recommend downloading and installing that (available here), together with a copy of GNU Fortran for arm64 (available here) and the latest version of RStudio – since v1.4 is when they added support for R 4.1 and native arm64 builds of R.

In writing this blog post I installed both versions of R ~~4.1.1~~ 4.2.0 and used RSwitch to switch between them.

Using instructions posted to the R-SIG-Mac mailing list I switched out the BLAS library to Apple’s vecLib version.

ln -s -i -v libRblas.0.dylib libRblas.dylib can be used to revert.

cd /Library/Frameworks/R.framework/Resources/lib/

ln -s -i -v libRblas.vecLib.dylib libRblas.dylib

Benchmark

This is the code I benchmarked on all four configurations:

set.seed(20211010)
d <- 1e2
a <- matrix(rnorm(d^2), d, d)
n <- 1e3
p <- 1e2
b <- rnorm(p + 1, 0, 10)
x <- matrix(runif(n * p, -10, 10), ncol = p, nrow = n)
y <- cbind(1, x) %*% b + rnorm(n, 0, 2)

mb <- microbenchmark(
  tcrossprod(a), solve(a), svd(a), lm(y ~ x),
  times = 1000L,
  unit = "ms"
)

Results

$Chart comparing benchmark times of different linear algebra and matrix math operations under different configurations. Native arm64 R with vecLib outperforms native arm64 R with R's built-in BLAS and Intel 64-bit R in nearly every scenario.$

	Execution time (ms)
M1 Mac mini benchmarks of various operations involving matrix math
	Minimum	Lower Quartile	Median	Upper Quartile	Maximum
tcrossprod(a)
R 4.2.0 with R's BLAS on Apple Silicon arm64	0.19	0.20	0.20	0.20	3.73
R 4.2.0 with R's BLAS on Intel 64-bit via Rosetta2	0.22	0.22	0.23	0.23	3.11
R 4.2.0 with Apple vecLib on Apple Silicon arm64	0.02	0.02	0.02	0.02	6.39
R 4.2.0 with Apple vecLib on Intel 64-bit via Rosetta2	0.04	0.07	0.07	0.08	2.90
svd(a)
R 4.2.0 with R's BLAS on Apple Silicon arm64	2.57	2.61	2.62	2.64	32.70
R 4.2.0 with R's BLAS on Intel 64-bit via Rosetta2	2.91	2.93	2.94	2.97	29.78
R 4.2.0 with Apple vecLib on Apple Silicon arm64	1.02	1.07	1.09	1.28	5.59
R 4.2.0 with Apple vecLib on Intel 64-bit via Rosetta2	1.70	1.73	1.76	1.79	7.44
solve(a)
R 4.2.0 with R's BLAS on Apple Silicon arm64	0.51	0.53	0.53	0.54	30.58
R 4.2.0 with R's BLAS on Intel 64-bit via Rosetta2	0.57	0.58	0.59	0.60	3.53
R 4.2.0 with Apple vecLib on Apple Silicon arm64	0.12	0.14	0.14	0.15	3.77
R 4.2.0 with Apple vecLib on Intel 64-bit via Rosetta2	0.22	0.26	0.27	0.29	2.97
lm(y ~ x)
R 4.2.0 with R's BLAS on Apple Silicon arm64	8.28	8.37	8.41	8.52	15.27
R 4.2.0 with R's BLAS on Intel 64-bit via Rosetta2	7.85	7.92	7.97	8.13	34.30
R 4.2.0 with Apple vecLib on Apple Silicon arm64	4.38	4.55	4.61	4.71	34.63
R 4.2.0 with Apple vecLib on Intel 64-bit via Rosetta2	3.68	3.74	3.78	3.91	29.09

Posted on:: October 10, 2021

Length:: 4 minute read, 734 words

Tags:: computing R

See Also:: Wikipedia Preview for R Markdown documents; Making Of: Session Tick visualization; Animation of optimization in torch