Mikhail Popov
https://mpopov.com/blog/
Recent content on Mikhail PopovHugo -- gohugo.ioenSat, 09 Apr 2022 00:00:00 +0000Wikipedia Preview for R Markdown documents
https://mpopov.com/blog/2022/04/09/wikipediapreview-rmd-docs/
Sat, 09 Apr 2022 00:00:00 +0000https://mpopov.com/blog/2022/04/09/wikipediapreview-rmd-docs/Wikipedia Preview (developed by Wikimedia’s Inuka team) is so cool:
When readers navigate in and out of a webpage through interacting with several hyperlinks, they can easily lose context of what they were reading in the first place. Content sites would like their readers to read and engage with their content and understand it without having to get contextual information elsewhere. Wikipedia Preview can solve this problem for content providers by allowing readers to have concise and visual contextual information from Wikipedia within a content provider’s mobile properties - website or webapp.Even faster matrix math in R on macOS with M1
https://mpopov.com/blog/2021/10/10/even-faster-matrix-math-in-r-on-macos-with-m1/
Sun, 10 Oct 2021 00:00:00 +0000https://mpopov.com/blog/2021/10/10/even-faster-matrix-math-in-r-on-macos-with-m1/Instructions for switching R to use Apple’s math library optimized for Apple Silicon and some benchmarks comparing the performance.Making Of: Session Tick visualization
https://mpopov.com/blog/2021/03/13/making-of-session-tick-visualization/
Sat, 13 Mar 2021 00:00:00 +0000https://mpopov.com/blog/2021/03/13/making-of-session-tick-visualization/In this post I will walk through my R code for a data visualization I created for the session length dataset project at the Wikimedia Foundation.Animation of optimization in torch
https://mpopov.com/blog/2021/02/28/animation-of-optimization-in-torch/
Sun, 28 Feb 2021 00:00:00 +0000https://mpopov.com/blog/2021/02/28/animation-of-optimization-in-torch/In this post I will show you how to use the {gganimate} R package to make an animated GIF illustrating Adam optimization of a function using {torch}:
library(torch) library(gganimate) library(tidyverse) We will use torch::optim_adam() to find the value of x that minimizes the following function:
f <- function(x) (6 * x - 2) ^ 2 * sin(12 * x - 4) The function looks as follows:
The adam_iters dataset will contain an iter column (for the iteration/step identifier) and an x column (the value of x after each iteration):Pivoting posteriors
https://mpopov.com/blog/2020/09/07/pivoting-posteriors/
Mon, 07 Sep 2020 00:00:00 +0000https://mpopov.com/blog/2020/09/07/pivoting-posteriors/In Stan, when a parameter is declared as an array, the samples/draws data frame will have columns that use the [i] notation to denote the i-th element of the array. For example, suppose we had a model with two parameters – \(\lambda_1\) and a \(\lambda_2\). Instead of declaring them individually – e.g. lambda1 and lambda2, respectively – we may declare them as a single lambda array of size 2:
parameters { real lambda[2]; } When we sample from that model, we will end up with samples for lambda[1] and lambda[2].Git Forensics
https://mpopov.com/blog/2020/08/25/git-forensics/
Tue, 25 Aug 2020 00:00:00 +0000https://mpopov.com/blog/2020/08/25/git-forensics/Earlier today I was helping a coworker with a question about data related to block messages on mobile, like this:
Lol. pic.twitter.com/2ebvab83f3
— Katherine Maher (@krmaher) October 20, 2018 I did not anticipate my investigation to become what I might best describe as “git forensics”. First, let’s introduce our dramatis personae:
MobileFrontend When you browse the mobile (“m.” subdomain) version of Wikipedia in a browser, what you see rendered is largely due to the MobileFrontend extension for MediaWiki (the software that powers Wikipedia, Wikimedia Commons, and Wikimedia projects).Using R to help my wife manage Sims screenshots
https://mpopov.com/blog/2020/08/02/r-sims-screenshots/
Sun, 02 Aug 2020 00:00:00 +0000https://mpopov.com/blog/2020/08/02/r-sims-screenshots/I grew up with The Sims and remember spending what is probably hundreds of hours of my childhood with that game, so it was a special feeling to share that with my wife and introduce her to The Sims 3 a few years ago.
Coming from Animal Crossing: Happy Home Designer, the expanded toolset for interior design (AND addition of architecture tools) hooked her, but she wasn't that into the non-building part of playing The Sims.Replacing the knitr engine for Stan
https://mpopov.com/blog/2020/07/30/replacing-the-knitr-engine-for-stan/
Thu, 30 Jul 2020 00:00:00 +0000https://mpopov.com/blog/2020/07/30/replacing-the-knitr-engine-for-stan/2020-08-03 UPDATE: Good news! A version of this engine is now included in versions 0.1.1 and later of {CmdStanR}. Use cmdstanr::register_knitr_engine() at the top of the R Markdown document to register it as the engine for stan chunks. See the vignette R Markdown CmdStan Engine for examples. Shoutout to the maintainers Jonah Gabry & Rok Češnovar for a super positive code review experience with the pull request for this.
I originally dabbled with custom {knitr} engine creation last month, when I made {dotnet} which enables R Markdown users to write chunks with C# and F# programs in them.Introducing 'dotnet' knitr engine for C# & F# chunks in R Markdown
https://mpopov.com/blog/2020/06/10/introducing-dotnet-knitr-engine/
Wed, 10 Jun 2020 00:00:00 +0000https://mpopov.com/blog/2020/06/10/introducing-dotnet-knitr-engine/I had a thought “wouldn’t it be cool to do a blog post about Bayesian inference with Infer.NET?” and then a follow-up thought “wouldn’t it be even cooler to have the probabilistic programs as R Markdown chunks that would be actually built/compiled and then run/executed just like Python and Julia chunks would be?”
And that’s how I ended up spending an evening learning how to make custom language engines for {knitr} and making one for C# and F# languages.Strings in R 4.x vs 3.x (and earlier)
https://mpopov.com/blog/2020/05/22/strings-in-r-4.x/
Fri, 22 May 2020 10:25:00 +0000https://mpopov.com/blog/2020/05/22/strings-in-r-4.x/Among the several user-facing changes listed in R 4.0.0’s release notes was this point:
There is a new syntax for specifying raw character constants similar to the one used in C++: r"(...)" with ... any character sequence not containing the sequence )". This makes it easier to write strings that contain backslashes or both single and double quotes. For more details see ?Quotes.
To get a better sense of this (wonderful) feature addition, I thought it’d be useful to see some before/after examples.Faster matrix math in R on macOS
https://mpopov.com/blog/2019/06/04/faster-matrix-math-in-r-on-macos/
Tue, 04 Jun 2019 00:00:00 +0000https://mpopov.com/blog/2019/06/04/faster-matrix-math-in-r-on-macos/Update (October 2021): macOS 10.14 “Big Sur” and later do not ship with Accelerate BLAS dynamic libraries in the filesystem, so this trick only works up to macOS 10.13 “High Sierra”
If you want faster matrix operations in R on your Mac, you can use Apple’s BLAS (Basic Linear Algebra Subprograms) library from their Accelerate framework instead of the library which comes with the R binary that you get from CRAN.My recipe for the best breakfast potatoes (and terrific bacon)
https://mpopov.com/blog/2018/08/15/best-breakfast-potatoes-recipe/
Wed, 15 Aug 2018 00:00:00 +0000https://mpopov.com/blog/2018/08/15/best-breakfast-potatoes-recipe/Everyone I treat with these bomb-ass potatoes always tells me how amazing they are and it’s a bit of an elaborate process to describe, so I decided to write it up here. There are actually two recipes in this post and one is (kind of) a prerequisite for the other, but if you’re vegetarian/vegan or don’t eat pork for religious (or other) reasons, feel free to skip to the second stage.Data Analyst vs Data Scientist: Industry Perspectives
https://mpopov.com/blog/2018/05/24/data-analyst-vs-data-scientist-industry-perspectives/
Thu, 24 May 2018 00:00:00 +0000https://mpopov.com/blog/2018/05/24/data-analyst-vs-data-scientist-industry-perspectives/Both “Data Analyst” (DA) and “Data Scientist” (DS) are titles that vary greatly between industries and even amongst individual organizations within industries. As the roles behind titles change over time, it is natural for some teams to ask themselves the following questions: should we have distinct roles or just stick to one? How would we differentiate the roles in a way that fulfills our organization’s needs and is generally consistent with similar organizations?Resources for learning to visualize data with R/ggplot2
https://mpopov.com/blog/2018/03/21/learning-to-visualize-data-with-ggplot2/
Wed, 21 Mar 2018 00:00:00 +0000https://mpopov.com/blog/2018/03/21/learning-to-visualize-data-with-ggplot2/@bearloga I'm currently learning visualisation with R/ggplot2 and was wondering whether you could share tips/links/videos/books/resources that helped you in your journey :-)
— Raya راية (@rayasharbain) March 12, 2018 Sure! Here ya go:
Tips The only tip I’ll give is that you should strive to make every chart look exactly how you want it to look and say exactly what you want it to say. You will learn in the process of doing.The journey so far…
https://mpopov.com/blog/2017/09/21/the-journey-so-far/
Thu, 21 Sep 2017 00:00:00 +0000https://mpopov.com/blog/2017/09/21/the-journey-so-far/I recently received an email which said, “I’m interested in learning more about you and your journey to where you are today,” so I thought I’d describe how I went from studying visual arts to analyzing data at Wikimedia Foundation (WMF).
Growing up I excelled in visual arts and mathematics at school, and they continued to be my strongest subjects. My parents and I immigrated to US from Russia when I was 10, and I spent the first few years focused on learning English – which was especially difficult because I was the only Russian-speaking person at my school.Advice for graduates applying for data science jobs
https://mpopov.com/blog/2017/08/16/advice-for-grads-entering-industry-datasci/
Wed, 16 Aug 2017 00:00:00 +0000https://mpopov.com/blog/2017/08/16/advice-for-grads-entering-industry-datasci/2019-08-01 update Things were a little different when I wrote this in 2017. These days I constantly see new/junior data scientists get rejected because they don’t have the experience. Even those who have an impressive portfolio of projects to show off that they have the technical know-how get thumbs down. I firmly believe this is a failure of employers, not the new generation of recently graduated data scientists entering the field.Installing GPU version of TensorFlow™ for use in R on Windows
https://mpopov.com/blog/2017/06/11/r-win-gpu-tensorflow/
Sun, 11 Jun 2017 00:00:00 +0000https://mpopov.com/blog/2017/06/11/r-win-gpu-tensorflow/Intro The other night I got TensorFlow™ (TF) and Keras-based text classifier in R to successfully run on my gaming PC that has Windows 10 and an NVIDIA GeForce GTX 980 graphics card, so I figured I’d write up a full walkthrough, since I had to make minor detours and the official instructions assume – in my opinion – a certain level of knowledge that might make the process inaccessible to some folks.Probabilistic programming languages for statistical inference
https://mpopov.com/blog/2017/01/10/probabilistic-programming-languages-for-statistical-inference/
Tue, 10 Jan 2017 00:00:00 +0000https://mpopov.com/blog/2017/01/10/probabilistic-programming-languages-for-statistical-inference/Introduction This post was inspired by a question about JAGS vs BUGS vs Stan:
right, that's what got me confused! so they.. do the same thing? @RallidaeRule
— Andrew MacDonald 🌈 (@polesasunder) January 10, 2017 Explaining the differences would be too much for Twitter, so I’m just gonna give a quick explanation here.
2020-05-18 update: Coming from a background of statistical inference in the context of academia and research using R, where these have been the prevalent PPLs for quite some time, I admittedly have a bit of a blind spot for PyMC3.Mostly-free resources for learning data science
https://mpopov.com/blog/2015/12/22/mostly-free-resources-for-learning-statistics/
Tue, 22 Dec 2015 00:00:00 +0000https://mpopov.com/blog/2015/12/22/mostly-free-resources-for-learning-statistics/In the past year or two I’ve had several friends approach me about learning statistics because their employer/organization was moving toward a more data-driven approach to decision making. (This brought me a lot of joy.) I firmly believe you don’t actually need a fancy degree and tens of thousands of dollars in tuition debt to be able to engage with data, glean insights, and make inferences from it. And now, thanks to many wonderful statisticians on the Internet, there is now a plethora of freely accessible resources that enable curious minds to learn the art and science of statistics.Guide to Shiny apps with Shiny Server on Amazon EC2
https://mpopov.com/blog/2014/06/01/guide-to-shiny-apps-with-shiny-server-on-amazon-ec2/
Sun, 01 Jun 2014 00:00:00 +0000https://mpopov.com/blog/2014/06/01/guide-to-shiny-apps-with-shiny-server-on-amazon-ec2/Preface: posting this for archive purposes only. This was the first of its kind and has been succeeded by better guides.
Introduction I am writing this guide because this guide did not exist when I decided to put my 2010 US Census Shiny App on Amazon’s servers. Surely I can’t be the only one who’s never had any experience with EC2 (or SSH or vi, for that matter).
So here’s a newbie’s guide to newbies for deploying your rad Shiny app on Amazon Elastic Compute Cloud (EC2) from scratch.Quartile-Frame Scatterplot with ggplot2
https://mpopov.com/blog/2014/06/01/quartile-frame-scatterplot-with-ggplot2/
Sun, 01 Jun 2014 00:00:00 +0000https://mpopov.com/blog/2014/06/01/quartile-frame-scatterplot-with-ggplot2/Inspired by The Visual Display of Quantitative Information by Edward R. Tufte
The goal is to make the axes tell a better story about the data. This is done by turning the axes into quartile plots (cleaner boxplots).
Usage Example:
Only x and y are required, everything else is optional.
qsplot( x = mtcars$wt, y = mtcars$mpg, main = "Vehicle Weight-Gas Mileage Relationship", xlab = "Vehicle Weight", ylab = "Miles per Gallon", font.