Bummer
- U.S. Senate Roll Call Votes: on the amendment to regulate large capacity ammunition feeding devices.
- Fuck This Week by Lindy West
Hey, cheer up
- Things I Am Thinking as I Listen to My Upstairs Neighbors Have Sex Right Now by Madeleine Davies
- The Most Deranged Sorority Girl Email You Will Ever Read
- Daft Punk’s new single Get Lucky is my 2013 SUMMERJAM. So. Good. (YouTube, Spotify, iTunes)
Read this, maybe?
Just Another Princess Movie by Lili Loofbourow, an essay about Pixar’s Brave.
Watch this, maybe?
- Quad Queens vs Quad Nines - incredible round of Texas Hold-Em
- From A to B - journey of a package
If you have a kid, this is a cool experiment
This is also pretty cool
The Statistics Department at Carnegie Mellon University is one of eight nodes within the National Science Foundation (NSF) Census Research Network. Last semester I joined that project and have been asked to design their website. Making the design proposals in Illustrator has been a great experience since Adobe added multiple artboards feature in CS5.
You’ve written a long R program that does simulations, computations, maybe uses C++ with Rcpp at some point, maybe makes calls to WinBUGS at some point, and then does all of that again over and over and over for hundreds, if not thousands, of iterations. Or something similar, if not as intensive.
If that exaggerated scenario sounds roughly familiar, then you probably know that the completion of such a process might take a while. You may have access to some very powerful hardware (some portion of the code may even be optimized for multi-core processors!) but it’s still going to take a while.
You probably don’t want to have to keep checking the computer every now and then to make sure it’s still doing that thing it’s supposed to do, and you don’t want to constantly think “Is it done? Is it done yet?” …at least I hope you don’t. Wouldn’t you rather get notifications of the progress? Wouldn’t you rather get a notification that the process finished?
Well, that’s what twitteR and sendmailR can help you do. You can get direct messages on twitter that let you know you’re on the 500th iteration out of 1000; you can receive an email in your inbox as soon as your MCMC chain converges. I’m gonna explain how to make that happen. (The notification thing, not the MCMC convergence thing.)
Twitter API
This section can be skipped if you’re only planning on sending emails to yourself.
- Log in to the Developers subsite on Twitter: https://dev.twitter.com/.
- Go into My Applications and Create a new application.
- Fill out the form.
- Once the application has been created, go into the application’s Settings tab and change the Application Type from Read only (default) to
- Read, Write and Access direct messages.
- Back on the Details tab, copy your Consumer key and Consumer secret.
Notifications with R
You need three packages for the full functionality, but feel free to omit twitteR if you don’t plan on using it.
install.packages(c("twitteR", "sendmailR", "ROAuth"))
Twitter Step
We’re going to be using the dmSend() function in the twitteR package but it (and every other function that uses the twitter API) requires authorization. Modify the following code with the consumer key and secret from the Twitter API page earlier. Then run the code, preferably not in RStudio.
library(ROAuth) requestURL <- "https://api.twitter.com/oauth/request_token" accessURL = "http://api.twitter.com/oauth/access_token" authURL = "http://api.twitter.com/oauth/authorize" consumerKey = "CONSUMER_KEY" consumerSecret = "CONSUMER_SECRET" oauth <- OAuthFactory$new(consumerKey = consumerKey, consumerSecret = consumerSecret, requestURL = requestURL, accessURL = accessURL, authURL = authURL) oauth$handshake()
When you run this code, you’ll be asked to enter in a code. Copy the URL in the R console and paste into the browser. (I had issues selecting text when being asked to input text in RStudio so I recommend using basic R window for this step.) Log in if you haven’t yet and authorize the application to receive the code. Copy and paste that code into the R window. You should be OK now. Let’s clean up the workspace of unnecessary objects and then save the authorized handshake for future use (otherwise you’d have to repeat the process every time, big thanks to Karthik Ram (@_inundata) for this suggestion).
rm(requestURL, accessURL, authURL, consumerKey, consumerSecret) save.image("~/OAuth.RData")
That was a one-time (hopefully) setup step. From now on you can just run the following code in any fresh R session:
library(sendmailR) # for sendmail() library(twitteR) # for dmSend() load("~/OAuth.RData") registerTwitterOAuth(oauth)
General Step
Windows users: follow these instructions to enable and configure SMTP on IIS 7. Unix users: run this in terminal:
sudo postfix start
Modify the following code with the appropriate email addresses and twitter handle (if using twitter), then run.
notify <- list(DM = function(msg) { dmSend(msg, "YOUR_TWITTER_USERNAME") }, email = function(msg, attachment = NULL) { if (!is.null(attachment)) { fileConn <- file("attachment.txt") if (is.data.frame(attachment)) { write.table(attachment, fileConn) } else if (is.character(attachment)) { writeLines(attachment, fileConn, quote = F, row.names = F) } else write(attachment, fileConn) mp = mime_part(x = "attachment.txt", name = "attachment.txt") if (sendmail(from = "YOUR_EMAIL", to = "YOUR_EMAIL", subject = "R Process Notification", msg = list(msg, mp), control = list(smtpServer = "localhost"))$code == "221") { unlink("attachment.txt") } else { cat("Message with attachment not sent.\n") } } else { sendmail(from = "YOUR_EMAIL", to = "YOUR_EMAIL", subject = "R Process Notification", msg = msg, control = list(smtpServer = "localhost")) } })
You can now send emails and direct messages to yourself. I made a separate twitter account for this purpose (don’t forget to do the handshake/authorization thing while logged in to that separate account). I think there’s a setting in twitter’s account settings to receive DMs as text messages on your phone, so check that out if you want to be notified via SMS. Here are example uses of the notify object:
notify$DM("This is a test notification.") notify$email("This is a test notification.") notify$email("This is the trees dataset.", trees)
Now just save the workspace with the notify and oauth objects for future use.
Happy coding!
If you have a Facebook account you can download all your data as a zipped archive by going into account settings. THE file of interest is wall.html which contains your entire wall (your status updates + posts that your friends have left).
The script extracts wall posts (and any comments for the wall posts) into a wall data frame and comments data frame which can then be analyzed and mined in R.
I am writing this guide because this guide did not exist when I decided to put my 2010 US Census Shiny App on Amazon’s servers (demo here). Surely I can’t be the only one who’s never had any experience with EC2 (or SSH or vi, for that matter).
So here’s a newbie’s guide to newbies for deploying your rad Shiny app on Amazon Elastic Compute Cloud (EC2) from scratch. It took me sixteen 30 Rock episodes to figure this stuff out (counting the time it took to download the census data) but hopefully you’ll have your app up and running in less time than…a BBC Sherlock episode.
What are Shiny and Shiny Server?
Shiny is an R package developed by the incredible folks at RStudio for making interactive web applications. Shiny Server is a server program that makes Shiny applications available over the web.
Amazon Elastic Compute Cloud (EC2)
Amazon EC2 is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers. If you have an Amazon account then you can start using Amazon Web Services (AWS) for free! AWS Free Tier includes 750 hours of Linux or Windows Micro Instances each month for one year.
Setting up Shiny web apps on Amazon EC2
Launching an EC2 Instance
The process for creating and launching a new EC2 instance is pretty straightforward. I’d recommend going with Ubuntu 11.10 and then carefully thinking about how much space you’ll be using. For example, my app uses the 2010 US Census data which comes in several R packages that total 4.7 GB. You’ll have to create a security key and download it. Remember where it gets downloaded to as you’ll have to use it to connect with the instance through SSH.
SSH
If you’re on Linux or OS X, you can use Terminal and run all the commands from there. On Windows you have to download and install PuTTY. I’m writing this on a Mac, my apologies if you run into a problem on Windows. May I recommend this?
cd Downloads ssh -i that_key_you_downloaded_earlier.pem ubuntu@your-ec2-instance-address.amazonaws.com
Node.js
sudo apt-get update sudo apt-get install python-software-properties python g++ make sudo add-apt-repository ppa:chris-lea/node.js sudo apt-get update sudo apt-get install nodejs npm
Installing R and packages
Before you start installing R and Shiny, you need to add a source so that when you install R the latest version (2.15.2) gets installed. If you skip this step then you’ll end up installing 2.12 and nothing will work.
Usually you’d open the sources list in gedit or another text editor which has an interface. In this case we’ll have to use vi to add our R source. I haven’t used vi until today and found this cheat sheet invaluable for learning it.
sudo vi /etc/apt/sources.list.d # go in there and there should be a list file there from the Node.js step # open that list file for editing # type in: o # you will then be able to type text on a new line # type in: deb http://lib.stat.cmu.edu/R/CRAN/bin/linux/ubuntu/ version/ # where version=oneiric or precise or whatever # make sure to have a space there! Otherwise you'll the Malformed Line error. # You can use other CRAN repos; you're not limited to CMU. # [ESC] to finish editing. To exit and save changes: :x
Once you’re done with that, it’s time to install R. Just run the following code (thanks to Ananda Mahto from stack overflow:
gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9 gpg -a --export E084DAB9 | sudo apt-key add - sudo apt-get update sudo apt-get install r-base
Then you’ll install the Shiny package and Shiny Server itself:
sudo su - -c "R -e \"install.packages('shiny', repos='http://cran.rstudio.com/')\"" sudo npm install -g shiny-server
# Create a system account to run Shiny apps sudo useradd -r shiny # Create a root directory for your website sudo mkdir -p /var/shiny-server/www # Create a directory for application logs sudo mkdir -p /var/shiny-server/log # To be able to run shiny-server as a process later on: wget https://raw.github.com/rstudio/shiny-server/master/config/upstart/shiny-server.conf sudo cp shiny-server.conf /etc/init/shiny-server.conf
ui.R and server.R
We need to download ui.R and server.R (and any auxiliary files). This is done with wget:
sudo apt-get install wget wget https://raw.github.com/.../master/myapp/server.R wget https://raw.github.com/.../master/myapp/ui.R mkdir myapp mv ui.R myapp/ui.R mv server.R myapp/server.R sudo cp -R ~/myapp /var/shiny-server/www/
Run sudo start shiny-server to start and sudo stop shiny-server to stop. Open your web browser and go to http://[hostname]:3838/myapp/
Optional: Elastic IPs and Dyn DNS
Problem You own a domain name and DNS hosting from a service like Dyn.com and want people to use your Shiny app by going to your domain. Solution In the AWS Management Console go to EC2 and then Elastic IPs. You can allocate a new address and then associate it with an instance. You can then use that IP when you make a new hostname the Dyn control panel.
Cheers
You got yourself a Shiny app running on Shiny Server on Amazon EC2. Remember to watch out for those free tier limits!
2010 US Census Shiny App is now on GitHub.
Enables exploration of the 2010 US Census data from the UScensus2010* package(s) via a web interface created in R with Shiny.
Note: the data on the counties, tracts, block groups, blocks, and CDPs totals 4.7 GB, which will take a while to download.
Cheers! Enjoy :)
UPDATE
A live demo is now up and running using Shiny Server on Amazon EC2.
Upon completing the Master’s in Statistical Practice (MSP) program I will probably write a post about my experiences and what I learned. For now I will just say that one of the courses we (the 23 of us) are taking is Statistical Consulting. We had our first client come in today and present two projects, and over the next two weeks more clients will come in and present their projects to us. After every client has presented, we will choose the top three who we want to work with and our instructors will do some matching.
We are supposed to take notes while the clients are presenting, just as we would take notes during actual consultation sessions. We are to note the background information, the [current] status of the project (if any), the aims (goals), and what the client expects us to do. Is the consultant expected to design an experiment/study? Design a survey? Clean/manipulate the data? Analyze the data? Co-author a research paper?
Or as I like to call it, The BEAST Report: Background, Expectations, Aims, Status, and Timeframe. I added that last one in there because it would be useful to note if the project is expected to be a week long endeavor or a 6-month behemoth.