Suggestions for Better Geo In the (dot)Cloud
Posted in default on June 10th, 2011 at 00:47:03икониOne of the interesting things that I’ve noticed in exploring various cloud providers is the limited amount of support for strong geo integration. This isn’t particularly surprising — Geo on the web is still a niche market, even if it is a niche market I tend to care about a lot. Here are some things that I think that might make people more eager to adopt cloud-y services for geo, specifically in the context of DotCloud, where I’ve been investing most of my time recently.
DotCloud has the idea of ‘services’ — independent units which work together to form an overall operating picture. The idea behind these services is that they are simple to set up, and perform one particular task well. So rather than having a service which combines PostGIS, GeoDjango, and MapServer, you have three separate services: one for PostGIS, one for GeoDjango, and one for MapServer, or map rendering, and you connect between them. This way, if your application needs to scale at the frontend, you can easily simply deploy additional GeoDjango services; if you need more map rendering ‘oomph’, same deal. (Deploying additional service for databases won’t magically scale them, but you do have the ability to use the software’s internal scaling across your multiple instances.)
So, again, using DotCloud, there are three types of ‘services’ that I think would be interesting to users who are working in Geo:
- PostGIS — Supporting PostGIS on most debian+debian-derivatives platforms is pretty simple; the tools are all easily installable via packages. (pgRouting is another step beyond — something which might actually have some benefit simply because it’s more difficult to get started with, so having it preconfigured could make a big difference in adoption.) Nothing really special needed here other than apt-get install postgis and some wrapper tools to make setting up PostGIS databases easy. There probably isn’t any reason why this couldn’t be included in the default Postgresql service type in DotCloud.
- GeoDjango — This one is even easier than PostGIS. GeoDjango already comes with the appropriate batteries included. The primary blocker on this is to get the GEOS, proj, and GDAL libraries installed on the Django service type. I don’t think there is any need for a separate service type for GeoDjango.
- Map Rendering — This one is a big one for a lot of people, and I’m not entirely sure the best way to work it within DotCloud. Map Rendering — taking a set of raster or vector data, and making it available as rendered tiles based on requests via WMS or other protocols — is one of the things that is not pursued as often by the community right now, and I think a lot of that is in the difficulty of setup. As data grows large, coping with it all on the client side becomes more difficult; some applications simply never get built because the jump from ‘small’ to ‘big’ is too expensive.
There are three different ‘big’ map rendering setups that I can think of that might be worth trying to support:- MapServer — MapServer is a tried and true map rendering tool. It primarily exists as a C program with a set of command line tools around it; it is usually run under CGI/FastCGI. Configuration exists as a set of text-based configuration files (mapfiles); rendering uses GDAL/OGR for most data interactions, and GD or AGG (plus more esoteric output formats) for output. MapServer is often combined with TileCache, for caching purposes; TileCache is based on Python.
- GeoServer — GeoServer is a Java program, which runs inside a servlet container like Tomcat. Like MapServer, it supports a variety of input/output formats; configuration is typically via its internal web interface. Caching is built in (via geowebcache). I think GeoServer would probably run as is under the ‘java’ service type that exists on DotCloud, assuming the appropriate PostGIS support exists for the database side.
- OSM data rendering — This one is a bit less solid. OpenStreetMap data rendering has a number of different rendering frontend environments, but the primary one that I think people would tend to set up these days is a stack of mod_tile (Apache-side frontend) talking to tirex (renderer backend) which calls out to/depends on Mapnik, the actual software which does tile rendering. Data comes from a PostGIS database — though in the case of OSM, even that requires some additional finagling, since getting a worldwide OSM dump is… pretty big. (It’s probably safe to set that point aside as a starting point, and concentrate instead on targeting localized OSM rendering deployments — solve the little problems first, scale up later.)
One thing that all of these tools have in common is that they really like having fast access to lots of disk for quickly reading and writing small files. I’m not sure what the right way to do that within the DotCloud setup is — I don’t see an obvious service type which is designed for this — so that might be another component in the overall picture. (Things like the redis service try to solve this problem I think, but since the tools primarily intend to write to disk as is, adopting them to support other ways of storing tile data persistently would require modifying the upstream libraries.)
I think that there is room to significantly simplify deployment of some components of geographic applications by centralizing installation in cloud-based services; the above sketches out some of the components that it might make sense to consider as a first pass. These components would let someone create a relatively complex client + server side geographic data application; exploring and expanding on these — especially the OSM data rendering component — could make deploying to the cloud easier than deploying locally, with the net effect of increased adoption of cloud-based services… and more geodata in the world to consume, which is something I’m always in favor of. 🙂