Archive for November, 2006

OpenStreetMap in TileCache via mapnik

Posted in Locality and Space on November 26th, 2006 at 11:40:30

Last night: fought with mapnik until I got it built.
This morning: Added a mapnik layer to TileCache (14 lines of code.)

Result: OpenStreetMap in TileCache

Mapnik renders things really pretty. If you just want to draw maps, I’d say that mapnik might be the way forward, once it grows up a bit more. Geographically deterministic labels means that ka-map like hacks are unnecessary. agg-based rendering means that it’s really frickin’ pretty. I think I’m falling in love.

Writing Code…

Posted in Software on November 20th, 2006 at 03:17:28

I think I wrote my first open source C code last night, in a fit of frustration over lack of error feedback in the Mapserver in-image error exception handling. Turns out that the in-image handling only displays the first error: XML service exceptions everything in the list by comparison. I opened a bug for better error reporting via inimage exception handling, and added a patch and example output that the patch results in.

I’m typically a Python/Javascript hacker at this point: TileCache is Python, OpenLayers is Javascript. I did write some Perl code at some point in the last year, for the Open Guide to Boston.

I don’t even write PHP much anymore, though I did write some S2 earlier today to redo the look of my LiveJournal.

So, Open Source code I’ve written… patch for Mapserver: C. Patches for OpenGuides: Perl. Patches for LiveJournal: Perl. Random Semweb utilities: Python. Random web utilities: PHP? I’m pretty sure I’ve released some of that stuff anyway. OpenLayers: Javascript. MetaCarta open source stuff: Python. I can’t think of anything else I’ve done that’s open source… I wonder what I might be forgetting.

Why Large Standards Organizations Fail

Posted in Locality and Space, Social on November 18th, 2006 at 13:30:33

Lately, I’ve seen a couple of attempts of large standards organizations to migrate in some way or another to the more community-driven specification creation model demonstrated by the FOAF and GeoRSS communities. I think that these communities are a great model of the right way to create application-level specifications, but I don’t think that large standards orgs are going to find it possible to migrate to this framework without a very significant change in the way they look at what they do.

Community based standard creation offers a great number of benefits. Without a large corporation behind the specification, it seems to have been possible to create an environment where the ‘survival of the fittest’ in terms of inclusion into the spec is much more likely than in a case where large competing interests don’t have any impetuous to create a small specification. This environment is likely created by the specification creators being implementors. Implementors are more likely to head to the simplest thing that could possibly work. Implementors care about how difficult a spec is to understand, because if they don’t understand it, they can’t implement it.

Large corporations don’t have the same feeling. A representative of Sun, IBM, HP is going to try to solve a lot more cases than he or she will personally implement: The reason for this is that in those large organizations, a specification will be used by dozens or hundreds of different implementors, to solve different goals. This is great when the result is a specification which is well suited to all of those needs — however, typically what happens is that a specification grows in complexity with each additional use case. You end up with a large specification which can be used to solve lots of different needs, but doesn’t solve any of them perfectly, and is difficult to implement — or at least to understand how to implement — due to the added complexity of addressing dozens, or more, of use cases rather than a single simple case. The single representative of a large organization is going to be speaking as a representative of far more people than a single use case.

There are certain specifications which are better built this way. The fact that so many people have spent time thinking about XML has resulted in a format which is extremely flexible, and can play the roles needed in thousands of applications. The specification has been around long enough that tools which understand XML are very mature, and this maturity makes XML a useful tool for information exchange among a wide number of tools. I think an implementor of a new toolkit to work with XML would, however, argue that XML is not easy at all: there are many edge cases which need to be handled, and the fact that these tools are already created hides these complexities.

Application-level specifications are not well-suited for large-organization standardization. Application-level specifications should start by addressing the simplest thing that could possibly work. Large organizations don’t have an interest in creating the simplest thing that could possibly work. Starting small and expanding into something larger is a pathway that large standards bodies have thus far successfully demonstrated the ability to do.

It is possible that someplace like the W3C could succeed at this, but it would require changing the way these standards are generally created.

* Open communication. Even in the Geospatial Incubator group, the primary communications channels are currently closed: private mailing lists, private telecons, and private wikis. Implementors have not been invited to participate. This is a large mistake: standards for application use need to involve applications. There are a number of applications developers who would love to take part in standards development who have not yet been afforded the opportunity to.
* Changing development strategy. Rather than spending months going through 17 calls for comments, last calls, etc. to get to a recommendation, get something out as quickly as possible and iterate on it.
* Require that a specification have implementations before it is complete. I’m still not aware of a good, W3C released or endorsed XML parser. No tools to convert some existing data format to any kind of XML. No way to test how the specification is supposed to work and make sure you’ve got it right. Conformance testing is a good part of this, but is not all of it. When working on RSS 1.1, we had an implementation for WordPress and support for XML::Parser patched in Perl before we even released the spec. We included testing, Relax-NG testing, and a validator service. All of these and more were important to the rapid (though limited) uptake the technology received: no one had any serious questions about RSS 1.1 that couldn’t be answered by reading through the example implementations.

These changes, however, are antithetical to the way that large standards organizations work. In general, grassroots implementors aren’t part of large organizations which can afford to be members of these standards organizations, or when they are, they aren’t often implementing these standards as part of their work for that company. Since large standards organizations depend on large corporations memberships for revenue, choosing to allow small-potatoes implementors participate means that they give up possible revenue stream. When this happens, the larger customers start wondering what the benefit there is to being a member: if anyone can just come in on their own time and influence the standard, what is the benefit to pay large sums of money to participate?

Part of the reason for this shift is that organizational costs for creating standards have been headed downward. For $500/yr, or less, I can get my own webserver with lots of bandwidth. I can run mailing lists, Web pages, Wikis, IRC channels. With VoIP setups, I can achieve relatively cheap realtime communication, and the grassroots standards developers typically have more of a preference for somewhat real-time online communication anyway. When I can obtain all the organizational materials I need to create a standard for less than $500/yr, what purpose does a $50,000 a year membership to a standards org get me?

Typically, the answer is “Access to the right people”. Getting the members of the RDF developer community in the same place for discussions is not easy, and doing it with telecons does take work. However, as the grassroots development practices mature, they are demonstrating the ability to create specifications in ways that do not work the same way as XML was developed. Good specifications limit the need for specification ‘marketing’ — if a spec does what you need, you’ll find it, and you’ll use it, if it’s good.

So, with organizational costs heading downwards far enough that people are willing to contribute the resources out of pocket, with developers tending to group together without the need of an organizational support, and with the resulting specification having the support of application implementors, what do standards organizations offer to application specification development?

I think you can guess what I think the answer to that is.

TileCache: Map Tile Caching

Posted in TileCache on November 12th, 2006 at 18:03:45

Last week, Schuyler and I wrote TileCache, a WMS-C server implementation. OpenLayers is already a WMS-C client, so combining the two gives you a super-fast loading slippy map.

Yesterday, we wandered around town doing various thought experiments as to how to create a distributed peer to peer tile caching network. TileCache supports an extensible Cache plugin backend: any type of storage can be used. We wrote a simple Disk based cache, and also included support for a memory-based cache, based on memcached. The former is good when you have lots of disk, and a slow remote service. The latter is good when you need an LRU cache of the most important areas, but can fall back to the source data for everything else: simply install memcached, get it running, and the MemoryCache will do you fine with the defaults.

However, for things like landsat, most people don’t have enough disk to cache even the hot areas, if you plan to actually create a service which is usable. All of landsat as source data ranges into the terabytes, and even a just-in-time cache is probably on the hundreds of gigabytes scale: bigger than most people have disk for. (I’m biased in this sense: because I work for MetaCarta Labs, I do get access to some pretty sweet hardware. However, I try to target the things that we release to the more casual user, and in that sense, I look at my hosted webserver, which has an 80GB drive that I have full use of.)

So, we need a cache plugin which distributes caches to peers in a network. Ideally, because peers go down, you want to distribute the cache to multiple peers. You want the number of peers to be able to scale, and you want the changing of the peer list to not result in a complete load redistribution. (Apparently a term for this is ‘consistent hashing’, as described in this paper.)

While discussing that, Schuyler mentioned Chris Holmes’s post on S3 for secondary storage of tiles. Although in some cases this might make sense, I’m not sure that it does in the case of caching open data under the umbrella of OSGeo, which is the large part of why I’m thinking about doing this. For small data sources — the NYC Freemap, or the Boston Freemap — I can do local caches without running out of disk. For larger data sources, most of the important ones — like TIGER, or Landsat — could presumably be hosted under some form of OSGeo umbrella. If that’s the case, then falling back to S3 doesn’t make sense, since Telascience has offered very large amounts of disk space — larger than could reasonably be paid for on a monthly basis via S3. If you take the several terabytes available there currently, and do the math, you find that you’re looking at a cost of hundreds of dollars a month just for storage, and that’s before you even start counting the bandwidth costs.

I don’t know the exact particulars of the S3 service interface. It seems like it’s likely to be a key-based data storage system. Given the resources made available for exploring high-bandwidth, high-use applications for open geospatial work, I think that it’s much more likely that creating an S3-like service on top of the Telascience resources would be approved by the SAC than paying thousands of dollars per year for S3 storage and bandwidth.

I haven’t been able to think of a situation where using S3 would help me to muster the resources I need to solve a particular goal as the best solution. Perhaps if you don’t have machines that you can put extra disks in, it makes the most sense to go the S3 route, but I think that by the time you’re hosting such large datasets that you need something like S3, you’ve gone beyond cost effectiveness, given the resources available to most of the people participating in Open GeoSpatial Foundation, both personally and via other services made available for projects under that guise.

Of course, adamhill from the WorldWind project has reported that a just-in-time cache of 8 or so layers for the entire world tops out at about a terabyte for them. So perhaps that entire discussion is moot: most services are not going to have a serious problem with caching all the data they need. But when you need to scale to the level that Google does, you do need to investigate more serious services: but at that point, you better be making some serious cash, or you’re going to run into trouble at some point or another no matter what.