My father, Ben Anderson plays with numbers. As his Twitter bio says "big data, small data, open data, any data". He works with R a lot and has been persuading me to take a look at it. I've held off until now because I'm all for analysing data in real time (primarily using delightful JS libraries such as Chart.js and D3.js). As far as I understood it, R is geared towards static data analysis and because of that, is able to utilise the hardware it runs on to optimise computations. Dad has an SSD in his Mac which reduces the time to load data substantially, but he also makes use of the R package data.table. This library makes manipulation of data ridiculously fast as it stores it all in RAM.
Because I like using real time data, R wasn't really something that I thought I could utilise very much. I was fine with creating beautiful little animated doughnuts and bar charts with Chart.js and D3.js. However, these libraries are designed to display data, not to process it. R, on the other hand, is absurdly powerful with some of the things it can do. Really, I wanted a way to use R to analyse data coming in from a Node.js application and then use Chart or D3 to visualise it, because they can handle updates to the data sets nicely without having to re-run R code, just to get results.
For the last few months, Dad's laptop has had a problem with it's keyboard and mouse. In the end, we took it to the Apple store and it's currently away getting some TLC. In the mean (pun intended) time, Dad has been battling with a rather old Mac Mini which refuses to install half of the R packages. The temporary solution was to steal my mother's computer! I saw what was going on and suggested to him that he install R on his AWS server. We soon discovered that running R in a remote shell is not one of the cleanest and slickest experiences you'll ever have, but it did the job.
The problem was that Dad likes to store all his code on GitHub (woo!) and committing from AWS, so that he could see the output of the code, wasn't really a sensible way to do it. Then, we discovered RStudio Server. We both use RStudio already, so migrating to a server version didn't sound like it would be too complicated. Once installed (took less than half an hour), we were blown away by it. Even though it's the open source version, it's truly fantastic. Due to the fact that it's running on Amazon's infrastructure, installation of packages and anything web-related is almost instantaneous and the web interface is a complete clone of RStudio desktop, so there was nothing new to learn. It even comes with support for git out of the box! All our problems simply melted away.
It uses a user's credentials to the system itself which allows it to securely authenticate users and access their home directory and files. Possibly the best feature is that you can upload files to your account, directly from the browser. No mucking around with FTP or SFTP, it's as easy as pie to get your data where you want it.
I setup the server itself to run Nginx so that we can handle initial requests and then use a reverse proxy to forward all relevant traffic to the RStudio server. The reason for running Nginx in front of everything is that it's then possible to setup other things on the server, without impacting RStudio. An example is the redirect from dataknut.io/blog to Dad's blog. Nginx is also ultra-lightweight and uses hardly any resources so is the perfect choice for a site that has low levels of traffic.
I'm still unconvinced about R's ability to process real time data, as that's not really what it's designed for. If you have any suggestions about doing that, we'd both love to hear from you!
What's next? I think I might build Dad a home page...