Putting the R in romantic

I've used R for a lot of tasks unrelated to statistics or data analysis. For example, it's usually a lot easier for me to write an intelligent batch file/folder renamer or copier as an R script than a bash shell script.

Earlier today I made a collection of photos that I wanted to put on a digital picture frame to mail to my partner. I also made a set of messages that I wanted to show up randomly. What I needed to do was to shuffle the set of 260+ images in such a way that a subset of them would not show up consecutively.

To make referencing the images easier, let's call the overall set of $n$ images $Y$ (with $Y = y_1, \ldots, y_n$), and let $X \subset Y$ be the images we do not want to have consecutive pairs of after the shuffling. Let $Y' = y_{(1)}, \ldots, y_{(n)}$ be the shuffled set of images.

This was really easy to accomplish in R. I started with k <- 0; set.seed(k) and shuffled all the images (using sample.int()). Then I checked whether our very specific requirement was or was not met.

If we did end up with a pair of consecutive images from $X$, we increment $k$ by 1 and repeat the procedure until $\{y_{(i-1)}, y_{(i)}\} \not\subset X ~\forall~i = 2, \ldots, n$.

I think what makes R really nice to use for tasks like this is vectorized functions and binary operators like which(), %in%, order(), duplicated(), sample(), sub(), and grepl(), as well as data.frames that you can expand to include additional data, such as indicators of whether row $m$ is related to row $m-1$.

Next time you have to do something on the computer that is repetitive and time-consuming, I urge you to consider writing a script/program to do it for you if you know R but haven't considered it before for doing file organization.

Cheers~

Mostly-free resources for learning data science

In the past year or two I've had several friends approach me about learning statistics because their employer/organization was moving toward a more data-driven approach to decision making. (This brought me a lot of joy.) I firmly believe you don't actually need a fancy degree and tens of thousands of dollars in tuition debt to be able to engage with data, glean insights, and make inferences from it. And now, thanks to many wonderful statisticians on the Internet, there is now a plethora of freely accessible resources that enable curious minds to learn the art and science of statistics.

First, I recommend installing R and RStudio for actually using it. They're free and what I use for almost all of my statistical analyses. Most of the links in this post involve learning by doing statistics in R.

Okay, now on to learning stats…

Free, self-paced online courses from trustworthy institutions:

Not free online courses from trustworthy institutions:

Free books and other resources:

Book recommendations:

  • Introductory Statistics with R by Peter Dalgaard
  • Doing Data Science: Straight Talk from the Frontline by Cathy O'Neil
  • Statistics in a Nutshell by Sarah Boslaugh
  • Principles of Uncertainty by Jay Kadane (free PDF at http://uncertainty.stat.cmu.edu/)
  • Statistical Rethinking: A Bayesian Course with Examples in R and Stan by Richard McElreath

Phew! Okay, that should be enough. Feel free to suggest more in the comments below.

Freelancing Hourly Rate Calculator (Shiny app)

The other day I got tired of basically coming up with random hourly rate estimates for freelancing projects because I actually never sat down to figure out what the hell my hourly rate should be. I found a great blog post How to Calculate Hourly Freelance Rates for Web Design, Development Work and made a spreadsheet with the appropriate formulas.

But then I wanted to combine the explanation of the blog post with the dynamic aspect of the spreadsheet. So I opened up R and wrote a Shiny app where you can specify all the different numbers and percentages and it’ll update the plots and details of how the final rate was calculated.

If you want to figure out what you should be charging your clients, go to http://bearloga.shinyapps.io/freelancr/

Words, words, words

I needed a list of adverbs/adjectives that start with "do." First I tried Wolfram|Alpha but that couldn't filter the list to adjectives and there's no way to build a query pipeline (at least with a free account). I ended up using the wordnet package in R:

require(magrittr) # install.packages('magrittr')
require(wordnet) # install.packages('wordnet')
getTermFilter('StartsWithFilter','do',TRUE) %>%
    getIndexTerms('ADVERB',1e4,.) %>% sapply(getLemma) %>%
        paste(collapse=', ')

Output: doctrinally, doggedly, doggo, dogmatically, dolce, dolefully, doltishly, domestically, domineeringly, dorsally, dorsoventrally, dottily, double, double quick, double time, doubly, doubtfully, doubtless, doubtlessly, dourly, dowdily, down, down the stairs, downfield, downhill, downright, downriver, downstage, downstairs, downstream, downtown, downward, downwardly, downwards, downwind

P.S. If you're on OS X, you can use MacPorts to install WordNet with: sudo port install wordnet

Then select the port-installed dictionary in R with: setDict('/opt/local/share/WordNet-3.0/dict')

Guide to Shiny apps with Shiny Server on Amazon EC2

Preface: posting this for archive purposes only. This was the first of its kind and has been succeeded by better guides.

I am writing this guide because this guide did not exist when I decided to put my 2010 US Census Shiny App on Amazon's servers (demo here). Surely I can't be the only one who's never had any experience with EC2 (or SSH or vi, for that matter).

So here's a newbie's guide to newbies for deploying your rad Shiny app on Amazon Elastic Compute Cloud (EC2) from scratch. It took me sixteen 30 Rock episodes to figure this stuff out (counting the time it took to download the census data) but hopefully you'll have your app up and running in less time than...a BBC Sherlock episode.

What are Shiny and Shiny Server?

Shiny is an R package developed by the incredible folks at RStudio for making interactive web applications. Shiny Server is a server program that makes Shiny applications available over the web.

Amazon Elastic Compute Cloud (EC2)

Amazon EC2 is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers. If you have an Amazon account then you can start using Amazon Web Services (AWS) for free! AWS Free Tier includes 750 hours of Linux or Windows Micro Instances each month for one year.

Setting up Shiny web apps on Amazon EC2

Launching an EC2 Instance

The process for creating and launching a new EC2 instance is pretty straightforward. I'd recommend going with Ubuntu 11.10 and then carefully thinking about how much space you'll be using. For example, my app uses the 2010 US Census data which comes in several R packages that total 4.7 GB. You'll have to create a security key and download it. Remember where it gets downloaded to as you'll have to use it to connect with the instance through SSH.

SSH

If you're on Linux or OS X, you can use Terminal and run all the commands from there. On Windows you have to download and install PuTTY. I'm writing this on a Mac, my apologies if you run into a problem on Windows. May I recommend this?

cd Downloads
ssh -i that_key_you_downloaded_earlier.pem ubuntu@your-ec2-instance-address.amazonaws.com

Node.js

sudo apt-get update
sudo apt-get install python-software-properties python g++ make
sudo add-apt-repository ppa:chris-lea/node.js
sudo apt-get update
sudo apt-get install nodejs npm

Installing R and packages

Before you start installing R and Shiny, you need to add a source so that when you install R the latest version (2.15.2) gets installed. If you skip this step then you'll end up installing 2.12 and nothing will work.

Usually you'd open the sources list in gedit or another text editor which has an interface. In this case we'll have to use vi to add our R source. I haven't used vi until today and found this cheat sheet invaluable for learning it.

sudo vi /etc/apt/sources.list.d
# go in there and there should be a list file there from the Node.js step
# open that list file for editing
# type in:
o
# you will then be able to type text on a new line
# type in:
deb http://lib.stat.cmu.edu/R/CRAN/bin/linux/ubuntu/ version/
# where version=oneiric or precise or whatever
# make sure to have a space there! Otherwise you'll the Malformed Line error.
# You can use other CRAN repos; you're not limited to CMU.
# [ESC] to finish editing. To exit and save changes:
:x

Once you're done with that, it's time to install R. Just run the following code (thanks to Ananda Mahto from stack overflow:

gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add -
sudo apt-get update
sudo apt-get install r-base

Then you'll install the Shiny package and Shiny Server itself:

sudo su - -c "R -e \"install.packages('shiny', repos='http://cran.rstudio.com/')\""
sudo npm install -g shiny-server
# Create a system account to run Shiny apps
sudo useradd -r shiny
# Create a root directory for your website
sudo mkdir -p /var/shiny-server/www
# Create a directory for application logs
sudo mkdir -p /var/shiny-server/log
# To be able to run shiny-server as a process later on:
wget https://raw.github.com/rstudio/shiny-server/master/config/upstart/shiny-server.conf
sudo cp shiny-server.conf /etc/init/shiny-server.conf

ui.R and server.R

We need to download ui.R and server.R (and any auxiliary files). This is done with wget:

sudo apt-get install wget
wget https://raw.github.com/.../master/myapp/server.R
wget https://raw.github.com/.../master/myapp/ui.R
mkdir myapp
mv ui.R myapp/ui.R
mv server.R myapp/server.R
sudo cp -R ~/myapp /var/shiny-server/www/

Run sudo start shiny-server to start and sudo stop shiny-server to stop. Open your web browser and go to http://[hostname]:3838/myapp/

Optional: Elastic IPs and Dyn DNS

Problem You own a domain name and DNS hosting from a service like Dyn.com and want people to use your Shiny app by going to your domain. Solution In the AWS Management Console go to EC2 and then Elastic IPs. You can allocate a new address and then associate it with an instance. You can then use that IP when you make a new hostname the Dyn control panel.

Cheers

You got yourself a Shiny app running on Shiny Server on Amazon EC2. Remember to watch out for those free tier limits!