DotCloud: GDAL in Python in the Cloud
One of the key components of the OpenAerialMap design that I have been working on since around FOSS4G of last year is to use distributed services rather than a single service — the goal being to identify a way to avoid a single point of failure, and instead allow the data and infrastructure to be shared among many different people and organizations. This is why OpenAerialMap does not provide hosting for imagery directly — instead, it expects that you will upload images to your own dedicated servers and provide the information about them to the OpenAerialMap repository.
One of the things that many people want from OpenAerialMap is (naturally) a way to see the images in the repository more easily. Since the images themselves are quite large — in some cases, hundreds of megabytes — it’s not as easy as just browsing a directory of files.
Since OAM does not host the imagery itself, there is actually nothing about this type of service that would be better served by hosting it centrally; the meta-information about the images is small, and easily fetched by HTTP, and the images themselves are already remote from the catalog.
As a result, when I saw people were interested in trying to get OpenAerialMap image thumbnails, I decided that I would write it as a separately hosted service. Thankfully, Mike Migurski had already made a branch with the ‘interesting’ parts of the problem solved: Python code using the GDAL library to fetch an overviewed version of the image and return it as a reasonably sized thumbnail. With that in mind, I thought that I would take this as an opportunity to explore setting this code up in the ‘cloud’.
A requirement for such a cloud deployment is that it needs to have support for GDAL: this severely limits the options available. In general, most of the ‘cloud hosting’ that is out there is to host your applications, using pre-installed libraries; very few of the sites out there would let you compile and run code in them, at least that I can find. However, I remembered that during my vacation week, I happened to sign up for an account with ‘DotCloud’, a service which is designed to help you “Assemble your stack from pre-configured and heavily tested components.” — instead of tying you to some specific setup, you’re able to build your own.
At first, I wasn’t convinced that I could do what I needed, but after reading about another migration, I realized that I could get access to the instance I was running on directly via SSH, and investigated more deeply.
I followed the instructions to create a Python deployment; this got me a simple instance running a webserver under uwsgi behind nginx. I was able to quickly SSH in, and with a bit of poking about, was able to compile and run curl and GDAL via the SSH command line into my local home:
$ wget http://curl.haxx.se/download/curl-7.21.6.tar.gz; tar -zvxf curl-7.21.6.tar.gz
$ wget http://download.osgeo.org/gdal/gdal-1.8.0.tar.gz; tar -zvxf gdal-1.8.0.tar.gz
$ cd curl-7.21.6.tar.gz
$ ./configure --prefix=$HOME --bindir=$HOME/env/bin
$ make
$ make install
$ cd ../gdal-1.8.0;
$ ./configure --prefix=$HOME --bindir=$HOME/env/bin --with-curl --without-libtool --with-python
$ make
$ make install
$ cd ..
$ LD_LIBRARY_PATH=lib/ python
So this got me a running python from which I could import the GDAL Python libraries; a great first step. Then I had to figure out how to make the webserver know about that LD_LIBRARY_PATH — a much more difficult step.
After some mucking about and poking in /etc/, I realized that the way that the uwsgi process was actually being run was under ‘supervsisord’; basically, whenever I deployed, my Python process running under supervisord was restarted, and was running my Python code inside it. So then all I had to do was figure out how to tell the supervisor on the DotCloud instance to run with my environment variables.
I found the ‘/etc/supervisor/conf.d/uwsgi.conf’ which was actually running my uwsgi process; after some research, I found that DotCloud will let you append to the supervisor configuration by simply creating a ‘supervisord.conf’ into your application install directory; it hooks this config at the end of all your other configs. So, I copied the contents of uwsgi.conf into my own supervisord.conf, dropped it in my app directory, and added: environment=LD_LIBRARY_PATH=”/home/dotcloud/lib” to the end of it.
Pushed my app to the server, and now, instead of “libgdal.so.1: cannot open shared object file: No such file or directory”, I was able to successfully import GDAL; I then easy_installed PIL, and with a little bit more work, I got the image thumbnailing code working on my DotCloud instance.
I’ve still got some more to do to clean this up — I’m not sure I’m actually using DotCloud ‘right’ in this sense — but I was able to get GDAL 1.8.0 in a Python server running on a DotCloud instance and use it to run a service, without having to install or screw with the running GDAL libraries on any of my production servers, so I’ll consider that a win.