Django and R on Heroku

2017 January 3

I’ve been playing with Django for a while now and I’m loving it. Don’t get me wrong, I still love Shiny, but Django is pretty tough to beat for data-heavy projects and managing user sessions. For deployment, I’ve been using Heroku and am very happy so far. Application updates are as simple as a Git push from the command line and it’s hard to get much easier than that.

Until today, the one thing I hadn’t tried to sort out is how to leverage R with the application. R isn’t one of Heroku’s officially support buildpacks (that’d be these), but it is possible to get an instance up and running.

First, though, I’ll build a very basic Django application. This will do nothing more than act as a container for displaying the results of some R code. To start, I’ll create and populate a virtual environment for the project.

mkvirtualenv r_on_heroku
pip install django
pip install gunicorn
pip install whitenoise
pip freeze > requirements.txt

Note that although requirements.txt isn’t necessary for local development, Heroku needs it to know which Python modules to install.

That done, I create a single python module which will serve a static web page at the site root. This is based on an example found in the book Lightweight Django, which is excellent and well worth reading. For Heroku deployment, I’ll need two additional things. First, I need a Procfile. This is simply a one-line file which tells gunicorn to boot up and execute. What it’s executing is the second thing that I need, which is to create an application object. (I’ll also be using WhiteNoise to serve static content. More on that in a moment.)

So, my Procfile looks like this:

web: gunicorn r_min.py

And I’ve added these lines at the bottom of my minimal Django example:

from django.core.wsgi import get_wsgi_application
from whitenoise.django import DjangoWhiteNoise

application = get_wsgi_application()
application = DjangoWhiteNoise(application)

My one and only view has this very basic code:

def index(request):
    return HttpResponse('This is my minimal R example.')

We can confirm that everything is running as expected by issuing either of these two commands:

python r_min.py runserver
heroku local web

Note that for the second command to work, I need to have Heroku’s command line interface installed.

Pushing the app up to Heroku is fairly easy. If you’ve never done this, the Heroku docs will help you get an app up and running in minutes. The sum total of pushing my fresh application to the interwebs is found in these seven lines of code:

heroku login
heroku create r-on-heroku
git init
git remote add heroku https://git.heroku.com/r-on-heroku.git
git add .
git commit -m "Initial commit"
git push heroku master

A quick look at https://r-on-heroku.herokuapp.com/ shows me that everything is working fine. Not terribly interesting yet, but it’s working. To make R available, I need to add another buildpack to my Heroku app. As it happens, someone has built one for R, which you can read about here.

heroku buildpacks:add http://github.com/virtualstaticvoid/heroku-buildpack-r.git#cedar-14-chroot

The buildpack expects a file called “init.R”, which it will execute. This is a good place for installation of packages that the Django application will need. I don’t put anything in here as I’m just going to use base graphics. To make that happen, I create folders for static content and one HTML template. I tweak the settings so that Django knows where to find static content and templates and change my view function to this:

import subprocess

def index(request):
    subprocess.call("Rscript ./myPlot.R", shell=True)
    return render(request, 'index.html')

My R script will do nothing more than create a basic histogram for a normal distribution. The code to create the view will be rerun every time I make a request, so refreshing the page will produce a new, randomized image.

Testing this locally, everything works as expected. However, when I test the page as served by Heroku, things are not as nice. Refreshing the page continues to serve the same .PNG that I pushed up with git. What’s going on?

The proximate issue is that the Rscript command isn’t found on the deployment site. It turns out that installation of R on Heroku isn’t straightforward, as documented here. This is quite a rabbit hole and it chewed up a healthy amount of a Friday afternoon. I did a bit of spelunking on the bash shell with heroku run bash. This tells me that R is installed in a hidden folder: /app/.root/usr/bin/. OK, so I can add that to the PATH and smooth sailing, right? Not so fast. The R command is actually a shell script and one of the first things it does is check for the location of R_HOME. And when I say “check for”, I mean “expect to find” in a specific place, namely usr/lib/R. OK, so I’ll just change that (despite this comment).

R_HOME_DIR=/app/.root/usr/lib/R
if test "${R_HOME_DIR}" = "/app/.root/usr/lib/R"; then

This doesn’t work either. There’s a library that can’t be found, leading me to spend a bit of time playing with LD_LIBRARY_PATHS. At this point, I’m an appreciable distance from a canonical R installation. The buildpack documentation for R isn’t super detailed, but there’s a brief discussion of using the chroot command to run R as installed. Now, no one would mistake me for a Linux command line expert, so I’ll confess that this was a new one for me. Basically, it has the process run in such a way that it presumes the directory where it resides is root. This will make all the path references in the R script and environment variables work. So far, so cool. But there is one new problem. R can’t find my script. Playing with things in bash for a bit, I quickly fixed this. All I need is to prefix references to anything in my app with ~/ to get back to the “real” root directory.

Whew. That was a bit of a long walk, but things now work on Herkou. Just one more hassle: now things are gummed up locally. We can fix this by taking advantage of Heroku’s configuration variables. This allows me to store different environment variables locally and on production. I create two new strings to identify the R command used to run the script and where to find root. These have different values on my local device and Heroku. I test in both places and hurrah! Everything looks good!

Two additional environment variables isn’t terribly clean, but for now I’ll take it. Actually, storing this setting in the environment, rather than a Django configuration variable has at least one benefit: the value is easily accessible in R. I’ll need to ensure that all file access is decorated with the folder prefix, but this doesn’t feel like such a bad thing.

myFile <- file.path(Sys.getenv('R_SCRIPT_FOLDER_PREFIX'), 'static', 'myPlot.png')
png(filename=myFile)

Feel free to have a look at the basic project on https://github.com/PirateGrunt/r_on_heroku.

References

This isn’t an exhaustive list. Suffice to say that I combed through loads of stackoverflow.

comments powered by Disqus