MBTiles — a bit of a rant

Earlier this weekend, I was pointed to a post about MBTiles, a new portable map tile distribution format. After a bit of hemming and hawing, I realized what it actually was, and realized that there wasn’t really much to hem and haw about.

This ‘new’ format is really nothing new, nor is it actually a ‘file format’ in the strictest sense of the word. What it is instead is:

  1. An OS X binary that takes a bunch of files in directories on disk and writes them into a sqlite database.

Some of you may remember the old commercials for the iMac: “Step 1: Plug in. Step 2: Get connected. There is no step 3.” In this case, there isn’t even a step two. There’s nothing else here: not a specification document, not a reader of any kind, not even a description of what this magical ‘file format’ is.

Okay, so it’s not much of anything, but it’s not really a *bad* idea, it’s just that… it’s not the way I’d go about it.

First of all, if you actually want people to actually use something, you kind of have to have a reader and a writer. A promise of an iPad app improvement someway down the line along with some vague handwavey “saved 90% of your disk space!” statistics might be good for a flash in the pan in the Twitter-world echo chamber, but if you want success, you gotta do a bit better than that.

So, to outline what the MBTiles ‘file format’ actually seems to be: It’s a sqlite database with two simple tables.

CREATE TABLE metadata (name text, value text);
CREATE TABLE tiles (zoom_level integer, tile_column integer, tile_row integer, tile_data blob);

The metadata is not required to contain anything (so far as I can tell; possibly some reader tools might require it).

Now, a smart reader might notice that there is nothing very complex about this: any programming langauge that can interact with sqlite can create or access the MBTiles data — a very good thing. (It’s possible that not publishing this simple fact is because the Development Seed/MapBox folks plan to extend it and don’t want to actually make it so that other people are using it; dunno.) However, excepting that, as it is, there’s no reason (that I can see) that the code to create a cache should be in C!

For prototyping, or if your goal is to create and develop a ‘standards’ thing, you really want to be working in a language which is more widely understood and easier to prototype. I realize that this is a judgement call on my part, but for things where you encourage people to check it out and use it, you should be working in a language with a wider audience. For example, using the mb_tiles_importer on my tiles produced by TileCache gives me a database that has 4 rows… even though my directory only has two tiles in it, and 2 of the 4 rows are entirely empty. If the code were in Python, I might take a look and offer some feedback, but with it being an OS X only binary, or even a thinly documented C script, there’s almost nothing I can do to figure out what’s going on or help.

Add to this the fact that although this format has a writer, it has no open source reader of any kind that I can find. There’s some chat about various MapBox related software reading it, but no separation of this from MapBox — and with no description of the format, there’s not an excuse that it’s designed for people to write their own clients…

That said, I’ve gone ahead and written support into TileCache for the ‘format’ such as it is; I’m not convinced it’s the ideal thing to do, but the core concept of delivering a single file in the form of an SQLite database for tile data is a pretty solid goal.

Overall, the idea is reasonably sound: Delivering tiles in a single file is important, and sqlite is a nice, lightweight format for that that’s accessible from most C-based languages. Writing a quick cache format to read these things in TileCache was easy enough — because, as I said, there really isn’t much there. I didn’t write write support, because doing so seemed like it could be a waste of the MapBox folks want to ‘own’ this format (Hello, GeoPDF, how are you today) and are still developing it, and the only way that I was able to even do what I did was using a tool that I had to grab from a Github link I got over Twitter (and doesn’t appear to work right).

B+ for the idea. It’s a bit iffy on the implementation, but the core goal is sound. However, the way that it’s approached is a somewhat typical approach that I see lately: Publish first, actually create the thing you’re publishing about later. That type of attitude is the kind of thing that drives me — as a creator who puts a lot of time and thought into community interaction first and foremost — absolutely bonkers.

Clean it up, make it a spec, and describe some of the benefit and utility in a way that’s not tied directly to MapBox, and I can see this actually becoming a pretty regular thing for distributing files around. I can definitely see the value and benefit — with some metrics at larger scale — of doing this kind of thing for distributing larger tilesets. I just don’t want to fall into the Admiral Ackbar problem: “It’s a trap!”

6 Responses to “MBTiles — a bit of a rant”

  1. Map Tiles to go :: High Earth Orbit Says:

    […] Schmidt has shared his ideas and added broadening support to TileCache in support of storing tiles in SQLite so that anyone […]

  2. Eric Gundersen Says:

    Chris, Thanks for checking out MBTiles. You are spot on about this being simple. While we needed this for an iPad app and Maps on a Stick, we opened this up for other folks that might be running into similar problems of running a lot of tiles locally and making these more portable. There’s a more technical description linked from the blog post [1] which states more clearly the fact that it’s an SQLite database – the blog post itself is rather nontechnical. Right now “opened” just means on github which is under active development, in binary for OSX, and docs explaining the sqlite structure. The C code wasn’t trumpeted with the blog post because it’s still beta-quality, but there’s also an implementation in Python [2] and a Python reader, as part of mapsonastick, is also open [3]. What a generalized reader would be (possibly a module, with a script to go to & from simple files), would be interesting and quick to build. The C reader is in C because we’re handling huge tilesets and trying to minimize dependencies and optimize speed for huge tilesets, even for Windows.

    We’d be really interested in hearing possible improvements of the spec and it’s cool that it was added to TileCache. Right now it fits our use cases extremely well, filling the space in between files-on-disk and RasterLite, but it’s meant to be simple for adoption purposes and relatively extensible for other people’s needs.

    [1]: http://mapbox.com/documentation/mbtiles-file-format
    [2]: http://github.com/tmcw/gdal2mb
    [3]: http://github.com/developmentseed/mapsonastick/blob/master/mapsonastick/server.py#L188

  3. Matt Giger Says:

    Seems like they might want to be able to have more than one data source for the tiles, in which case they would need an ID to differentiate tiles with the same row/col/depth. I’m not sure I saw it in their metadata section, but it’s pretty important to know what projection the tiles are in. Mercator is quite a different thing than plate carree projection.

  4. crschmidt Says:

    Matt: I think that if you think these are problems, you’re misunderstanding what mbtiles is 🙂 It’s simply a different way to cache tiles; it’s no different than the equivalent of zipping up the tiles in /tmp/tilecache/layername/ and putting them in one file. Metadata about layers has to be stored elsewhere — this is not a standalone replacement for a layer, it’s a replacement for the set of data in a layer.

  5. Bart van den Hoff Says:

    …”as a creator who puts a lot of time and thought into community interaction first and foremost”…

    Feature Server (scribble db)?
    Open Aerial Map?

    Keep drinking the me, me, me, kool-aid buddy. No problem with half arsed implementations or quitting half way (we’re all human), just don’t go calling others on it at the same time 🙂

  6. crschmidt Says:

    I’m interested why you think FeatureServer falls into a category here that fails.

    As part of my work on FeatureServer, I helped:

    • Create a relatively widely used spec for delivering geographic data over the web (GeoJSON)
    • Developed and implemented a tool that was used by many to quickly get set up with serving features on the web at a time that wasn’t really an easy thing to get started with
    • Pushed on the idea of treating features as resources, and allowing for multiple representations of them — A concept carried forward by the MapFish project
    • Created a reasonably widely used library for simple serialization and deserialization of features in Python.

    In addition: The FeatureServer project was not made available/announced until after the source had been released. In fact, by the time we released FeatureServer publicly, we already had an external contributor to the project, and the project quickly grew because of external contributions following a process which had already been relatively well established with TileCache.

    The Scribble database option for FeatureServer was hugely successful, and many people used it for doing one-off demos and projects much more easily than many of the other options at the time. A lot of time was spent both for FeatureServer and TileCache making it easy to use the software in order to minimize barriers to user participation and contribution.

    I still make commits to the FeatureServer project, and I use the vectorformats code (the core of FeatureServer) regularly when I’m writing Python code. It has a number of people outside of me who can commit to the project, and it has a mailing list as well as a bug tracker. I’m not sure what specific part of that you think is ‘half arsed’ or quitting halfway through.

    The OpenAerialMap project in its previous incarnation was a failure. I’ll fully admit that. However, I never feel like I ‘oversold’ the project — it was exactly what it said on the tin, with a public SVN repository and a description of the goals and ideals, as well as reasonably complete documentation (through code) of what existed. (It wasn’t much.) When the project turned out to be too much for one person to handle, I put out a broad call to arms to help improve the situation. I received no offers of any help, and in fact, no response of any kind from the community other than sympathy. I never ‘hyped’ the project beyond what it actually was, in my opinion, which is my position.

    I put a lot of time and effort into making it easy for people to get involved before I start a project. (The number of hours of time I’ve spent fighting internally to open things up to the community is huge, though obviously and intentionally somewhat less visible than most of my work.) I’ve specifically taken a lot of steps in my work to minimize barriers to entry wherever practical, and I think that communities have responded well to these things in general. Although there are always some projects that can’t be run as a one-man show, I think that claiming that talking about OAM was overhyping into an echo chamber, or that FeatureServer is ‘half arsed’, is pretty unfair.

    Perhaps I don’t understand what you mean, but if you’re trying to say either of those projects was just me tooting my own horn — I’m sorry, but I have to pretty strongly disagree.