Tutorials

Approximating probabilities
Using simulation as a substitute for analytical solutions
Solving a logic grid puzzle with integer programming
Modeling a logic puzzle as a constraint satisfaction problem with OMPR
Bayesian optimization in R
Step-by-step demonstration of BayesOpt for derivative-free minimization of a noiseless, black-box function

R Packages

API wrappers

waxer
R wrapper for the Wikimedia Analytics Query Service (AQS). This particular wrapper is for the /metrics endpoint of the REST API which provides data and metrics around traffic, users, and content on Wikimedia sites. Get started…
WikidataQueryServiceR
Interface to Wikidata Query Service API for querying Wikidata using SPARQL and getting back data.frames in R. Available on CRAN.

R Markdown templates

wmfpar is an R Markdown report template based on memor template, for use by my team.

RStudio add-ins

tinydensR
An RStudio add-in for playing with distribution parameters and visualizing the resulting probability density and mass functions.

Machine learning

dpmclust
Implements the DP-means algorithm introduced by Kulis and Jordan in their article Revisiting k-means: New Algorithms via Bayesian Nonparametrics. Instead of specifying how many clusters to partition the data into, like one would with k-means, user specifies a penalty parameter λ which controls if/when new clusters are created during iterations.
maltese
Little utility R package for transforming time series data into a format that’s more machine learning-friendly – previous p observations become features.
MultiLabel Prediction Using Gibbs Sampling
Users can employ an external package (e.g. ‘randomForest’, ‘C50’), or supply their own. New observations are classified using a Gibbs sampler since each unobserved label is conditioned on the others. The package includes methods for evaluating the predictions for accuracy and aggregating across iterations and models to produce binary or probabilistic classifications. Available on CRAN.

Python packages

gsc-utils
Utilities for accessing and downloading the statistics on a site’s presence in Google’s search results via Search Console API.

Games and apps

taskviewr
Shiny application for browsing R packages listed on CRAN’s Task Views. It includes their URLs and licensing details, which can be very helpful if you are looking for, say, a machine learning package that is MIT-licensed.

My other Shiny applications include freelancr (for figuring out freelancing hourly rates) and the Discovery Dashboards, which I maintain as a Data Scientist on the Product Analytics team at the Wikimedia Foundation.

TradeMarkVille

Lead programmer/engineer on this collaboration with Molleindustria. TradeMarkVille is a free online multiplayer word guessing game playable in your web browser.

Screenshot 1 Screenshot 2
Screenshot 1 Screenshot 2

Press: Indie Games, Kill Screen, Gamasutra, Kotaku, The Strange Games Review, Polygon