Better GDAL in the Cloud (DotCloud, Round 2)
My previous attempts to make GDAL work in the cloud were sort of silly; compiling directly on the instance is sort of counter to the design of DotCloud, which lets you scale primarily by assuming that a ‘push’ to the server establishes all of the configuration you’ll need. (sshing into the machine is certainly possible, but it seems it is designed more for debugging than it is for active work.)
So, with a few suggestions from the folks in #dotcloud, I started exploring how to set up my dotcloud service in a bit more of a repeatable way — such that if I theoretically needed to scale, I could do it more easily. Rather than manually installing packages on the host, I moved all of that configuration into a ‘postinstall’ file — fenceposted so it runs only once.
After a bit of experimentation (and reporting a bug in GDAL), I was able to get a repeatable installation going; this means that I no longer have any manual steps in creating and setting up a new service doing OpenAerialMap thumbnailing; in fact, the entire process — right down to setting up redis as a distributed cache — can now be completed automatically.
The postinstall is pretty simple: download and configure, then install, curl; Download and configure, then install, GDAL. In both cases, use the respective -config tool as a fencepost so we don’t install multiple times.
Once this is done, setting up a new deployment with the appropriate services is trivial: that’s outlined in setup.sh — configure a redis service, grab the config information, and deploy a wsgi service using that config and the rest of the code.
In fact, while I was writing this post, I followed that very set of instructions, running ‘./setup.sh openaerialmap’ — before I finished the post, I had a running www.openaerialmap.dotcloud.com instance, which I could then commit in place of the silly-named ‘pepperoni’ service that oam.osgeo.org was using before.
The big thing for me about these one-off deployments is that by making available better tools for doing iterative deployment, they allow me to be more disciplined in how I configure services. Normally, the act of installing GDAL or curl is a one-time event: without reformatting my system (or setting up a VM and reinstalling/rolling back every time), it’s non-trivial to test what happens on a fresh system. With DotCloud, deploying another service literally takes seconds — and when I do, it’s a green field, ready for me to test my deployment scripts all over again. No more mistakes in one-off apt-get install commands that I’ll never remember if my system goes down. Now, I’ve got the ability to script deployments of software.
This is of course possible with EC2 directly as well — building custom AMIs, redeploying them, etc. For me, though, DotCloud took the pain out of having to do such a thing. They did the hard part for me — and I got to do the more interesting and fun parts.