R Packages

API wrappers

R wrapper for the Wikimedia Analytics Query Service (AQS). This particular wrapper is for the /metrics endpoint of the REST API which provides data and metrics around traffic, users, and content on Wikimedia sites. Get started…
Interface to Wikidata Query Service API for querying Wikidata using SPARQL and getting back data.frames in R. Available on CRAN.

R Markdown

Enables easy integration of Wikipedia Preview popup context cards in R Markdown documents – compatible with distill, pkgdown, and blogdown – refer to the blog post for more info
Template based on memor, for use by the Wikimedia Foundation’s Product Analytics team for PDF reports written in R Markdown

RStudio add-ins

An RStudio add-in for playing with distribution parameters and visualizing the resulting probability density and mass functions.

Machine learning

Implements the DP-means algorithm introduced by Kulis and Jordan in their article Revisiting k-means: New Algorithms via Bayesian Nonparametrics. Instead of specifying how many clusters to partition the data into, like one would with k-means, user specifies a penalty parameter λ which controls if/when new clusters are created during iterations.
Little utility R package for transforming time series data into a format that’s more machine learning-friendly – previous p observations become features.
MultiLabel Prediction Using Gibbs Sampling
Users can employ an external package (e.g. ‘randomForest’, ‘C50’), or supply their own. New observations are classified using a Gibbs sampler since each unobserved label is conditioned on the others. The package includes methods for evaluating the predictions for accuracy and aggregating across iterations and models to produce binary or probabilistic classifications. Available on CRAN.

Python packages

Utilities for accessing and downloading the statistics on a site’s presence in Google’s search results via Search Console API.

Games and apps

Shiny application for browsing R packages listed on CRAN’s Task Views. It includes their URLs and licensing details, which can be very helpful if you are looking for, say, a machine learning package that is MIT-licensed.

My other Shiny applications include freelancr (for figuring out freelancing hourly rates) and the Discovery Dashboards, which I maintain as a Data Scientist on the Product Analytics team at the Wikimedia Foundation.


Lead programmer/engineer on this collaboration with Molleindustria. TradeMarkVille is a free online multiplayer word guessing game playable in your web browser.

Screenshot 1 Screenshot 2

Press: Indie Games, Kill Screen, Gamasutra, Kotaku, The Strange Games Review, Polygon