OpenStreetMap in TileCache via mapnik

Posted in Locality and Space on November 26th, 2006 at 11:40:30

Last night: fought with mapnik until I got it built.
This morning: Added a mapnik layer to TileCache (14 lines of code.)

Result: OpenStreetMap in TileCache

Mapnik renders things really pretty. If you just want to draw maps, I’d say that mapnik might be the way forward, once it grows up a bit more. Geographically deterministic labels means that ka-map like hacks are unnecessary. agg-based rendering means that it’s really frickin’ pretty. I think I’m falling in love.

Writing Code…

Posted in Software on November 20th, 2006 at 03:17:28

I think I wrote my first open source C code last night, in a fit of frustration over lack of error feedback in the Mapserver in-image error exception handling. Turns out that the in-image handling only displays the first error: XML service exceptions everything in the list by comparison. I opened a bug for better error reporting via inimage exception handling, and added a patch and example output that the patch results in.

I’m typically a Python/Javascript hacker at this point: TileCache is Python, OpenLayers is Javascript. I did write some Perl code at some point in the last year, for the Open Guide to Boston.

I don’t even write PHP much anymore, though I did write some S2 earlier today to redo the look of my LiveJournal.

So, Open Source code I’ve written… patch for Mapserver: C. Patches for OpenGuides: Perl. Patches for LiveJournal: Perl. Random Semweb utilities: Python. Random web utilities: PHP? I’m pretty sure I’ve released some of that stuff anyway. OpenLayers: Javascript. MetaCarta open source stuff: Python. I can’t think of anything else I’ve done that’s open source… I wonder what I might be forgetting.

Why Large Standards Organizations Fail

Posted in Locality and Space, Social on November 18th, 2006 at 13:30:33

Lately, I’ve seen a couple of attempts of large standards organizations to migrate in some way or another to the more community-driven specification creation model demonstrated by the FOAF and GeoRSS communities. I think that these communities are a great model of the right way to create application-level specifications, but I don’t think that large standards orgs are going to find it possible to migrate to this framework without a very significant change in the way they look at what they do.

Community based standard creation offers a great number of benefits. Without a large corporation behind the specification, it seems to have been possible to create an environment where the ‘survival of the fittest’ in terms of inclusion into the spec is much more likely than in a case where large competing interests don’t have any impetuous to create a small specification. This environment is likely created by the specification creators being implementors. Implementors are more likely to head to the simplest thing that could possibly work. Implementors care about how difficult a spec is to understand, because if they don’t understand it, they can’t implement it.

Large corporations don’t have the same feeling. A representative of Sun, IBM, HP is going to try to solve a lot more cases than he or she will personally implement: The reason for this is that in those large organizations, a specification will be used by dozens or hundreds of different implementors, to solve different goals. This is great when the result is a specification which is well suited to all of those needs — however, typically what happens is that a specification grows in complexity with each additional use case. You end up with a large specification which can be used to solve lots of different needs, but doesn’t solve any of them perfectly, and is difficult to implement — or at least to understand how to implement — due to the added complexity of addressing dozens, or more, of use cases rather than a single simple case. The single representative of a large organization is going to be speaking as a representative of far more people than a single use case.

There are certain specifications which are better built this way. The fact that so many people have spent time thinking about XML has resulted in a format which is extremely flexible, and can play the roles needed in thousands of applications. The specification has been around long enough that tools which understand XML are very mature, and this maturity makes XML a useful tool for information exchange among a wide number of tools. I think an implementor of a new toolkit to work with XML would, however, argue that XML is not easy at all: there are many edge cases which need to be handled, and the fact that these tools are already created hides these complexities.

Application-level specifications are not well-suited for large-organization standardization. Application-level specifications should start by addressing the simplest thing that could possibly work. Large organizations don’t have an interest in creating the simplest thing that could possibly work. Starting small and expanding into something larger is a pathway that large standards bodies have thus far successfully demonstrated the ability to do.

It is possible that someplace like the W3C could succeed at this, but it would require changing the way these standards are generally created.

* Open communication. Even in the Geospatial Incubator group, the primary communications channels are currently closed: private mailing lists, private telecons, and private wikis. Implementors have not been invited to participate. This is a large mistake: standards for application use need to involve applications. There are a number of applications developers who would love to take part in standards development who have not yet been afforded the opportunity to.
* Changing development strategy. Rather than spending months going through 17 calls for comments, last calls, etc. to get to a recommendation, get something out as quickly as possible and iterate on it.
* Require that a specification have implementations before it is complete. I’m still not aware of a good, W3C released or endorsed XML parser. No tools to convert some existing data format to any kind of XML. No way to test how the specification is supposed to work and make sure you’ve got it right. Conformance testing is a good part of this, but is not all of it. When working on RSS 1.1, we had an implementation for WordPress and support for XML::Parser patched in Perl before we even released the spec. We included testing, Relax-NG testing, and a validator service. All of these and more were important to the rapid (though limited) uptake the technology received: no one had any serious questions about RSS 1.1 that couldn’t be answered by reading through the example implementations.

These changes, however, are antithetical to the way that large standards organizations work. In general, grassroots implementors aren’t part of large organizations which can afford to be members of these standards organizations, or when they are, they aren’t often implementing these standards as part of their work for that company. Since large standards organizations depend on large corporations memberships for revenue, choosing to allow small-potatoes implementors participate means that they give up possible revenue stream. When this happens, the larger customers start wondering what the benefit there is to being a member: if anyone can just come in on their own time and influence the standard, what is the benefit to pay large sums of money to participate?

Part of the reason for this shift is that organizational costs for creating standards have been headed downward. For $500/yr, or less, I can get my own webserver with lots of bandwidth. I can run mailing lists, Web pages, Wikis, IRC channels. With VoIP setups, I can achieve relatively cheap realtime communication, and the grassroots standards developers typically have more of a preference for somewhat real-time online communication anyway. When I can obtain all the organizational materials I need to create a standard for less than $500/yr, what purpose does a $50,000 a year membership to a standards org get me?

Typically, the answer is “Access to the right people”. Getting the members of the RDF developer community in the same place for discussions is not easy, and doing it with telecons does take work. However, as the grassroots development practices mature, they are demonstrating the ability to create specifications in ways that do not work the same way as XML was developed. Good specifications limit the need for specification ‘marketing’ — if a spec does what you need, you’ll find it, and you’ll use it, if it’s good.

So, with organizational costs heading downwards far enough that people are willing to contribute the resources out of pocket, with developers tending to group together without the need of an organizational support, and with the resulting specification having the support of application implementors, what do standards organizations offer to application specification development?

I think you can guess what I think the answer to that is.

TileCache: Map Tile Caching

Posted in TileCache on November 12th, 2006 at 18:03:45

Last week, Schuyler and I wrote TileCache, a WMS-C server implementation. OpenLayers is already a WMS-C client, so combining the two gives you a super-fast loading slippy map.

Yesterday, we wandered around town doing various thought experiments as to how to create a distributed peer to peer tile caching network. TileCache supports an extensible Cache plugin backend: any type of storage can be used. We wrote a simple Disk based cache, and also included support for a memory-based cache, based on memcached. The former is good when you have lots of disk, and a slow remote service. The latter is good when you need an LRU cache of the most important areas, but can fall back to the source data for everything else: simply install memcached, get it running, and the MemoryCache will do you fine with the defaults.

However, for things like landsat, most people don’t have enough disk to cache even the hot areas, if you plan to actually create a service which is usable. All of landsat as source data ranges into the terabytes, and even a just-in-time cache is probably on the hundreds of gigabytes scale: bigger than most people have disk for. (I’m biased in this sense: because I work for MetaCarta Labs, I do get access to some pretty sweet hardware. However, I try to target the things that we release to the more casual user, and in that sense, I look at my hosted webserver, which has an 80GB drive that I have full use of.)

So, we need a cache plugin which distributes caches to peers in a network. Ideally, because peers go down, you want to distribute the cache to multiple peers. You want the number of peers to be able to scale, and you want the changing of the peer list to not result in a complete load redistribution. (Apparently a term for this is ‘consistent hashing’, as described in this paper.)

While discussing that, Schuyler mentioned Chris Holmes’s post on S3 for secondary storage of tiles. Although in some cases this might make sense, I’m not sure that it does in the case of caching open data under the umbrella of OSGeo, which is the large part of why I’m thinking about doing this. For small data sources — the NYC Freemap, or the Boston Freemap — I can do local caches without running out of disk. For larger data sources, most of the important ones — like TIGER, or Landsat — could presumably be hosted under some form of OSGeo umbrella. If that’s the case, then falling back to S3 doesn’t make sense, since Telascience has offered very large amounts of disk space — larger than could reasonably be paid for on a monthly basis via S3. If you take the several terabytes available there currently, and do the math, you find that you’re looking at a cost of hundreds of dollars a month just for storage, and that’s before you even start counting the bandwidth costs.

I don’t know the exact particulars of the S3 service interface. It seems like it’s likely to be a key-based data storage system. Given the resources made available for exploring high-bandwidth, high-use applications for open geospatial work, I think that it’s much more likely that creating an S3-like service on top of the Telascience resources would be approved by the SAC than paying thousands of dollars per year for S3 storage and bandwidth.

I haven’t been able to think of a situation where using S3 would help me to muster the resources I need to solve a particular goal as the best solution. Perhaps if you don’t have machines that you can put extra disks in, it makes the most sense to go the S3 route, but I think that by the time you’re hosting such large datasets that you need something like S3, you’ve gone beyond cost effectiveness, given the resources available to most of the people participating in Open GeoSpatial Foundation, both personally and via other services made available for projects under that guise.

Of course, adamhill from the WorldWind project has reported that a just-in-time cache of 8 or so layers for the entire world tops out at about a terabyte for them. So perhaps that entire discussion is moot: most services are not going to have a serious problem with caching all the data they need. But when you need to scale to the level that Google does, you do need to investigate more serious services: but at that point, you better be making some serious cash, or you’re going to run into trouble at some point or another no matter what.

MetaCarta PageMapper

Posted in Locality and Space on October 23rd, 2006 at 17:53:14

This weekend, I did something a bit different than the last time I wrote — instead of writing something that geeks with maps can use, I wrote something that anyone who mentions a place in page can use.

The MetaCarta PageMapper lets you map out the locations mentioned in any page.

PageMapIt!

You can drag this link up to your browser toolbar, or you can just hit it here — see Boston, London, and Tokyo on a map with the click of a button.

It’s not perfect, but hey, nothing is. And I think it’s pretty neat.

Map Rectification

Posted in Locality and Space on October 2nd, 2006 at 08:05:21

So, what did I do with my weekend?

Built a web-based map rectification system.

‘A what?’, says the masses.

The service will allow you to upload an image, and using a reference map, you can select ground control points — points from the reference map which match up to the uploaded image — and then warp the image. You can use any of a number of default base maps to reference against, or add your own WMS or KaMap layers to the map and use them to find ground control points.

Warping the image creates two kinds of output: a GeoTIFF, which can be downloaded for use in desktop GIS. Also, a WMS URL is provided for copy-and-pasting into OpenLayers applications.

Read more about it in the new MetaCarta Labs Weblog, and try it out for yourself!

Neogeographer? Hah!

Posted in FOSS4G 2006, Locality and Space on September 18th, 2006 at 06:09:40

This conference brought to light that I am one of the quintessential examples of a neogeographer: wordspy, for example, lists two examples for the term, one of which is:

Schmidt spends his time wandering around his hometown of Cambridge, Massachusetts, using his custom cell-phone software to unmask the ID numbers on each GSM cell tower he passes. Then he associates that tower ID with a GPS-defined location, and uploads it to his website.

When his electronic surveying is complete, Schmidt will have a system that can tell him where he is at all times — without GPS — by triangulating the signals from the newly mapped cell towers.

Calling himself a “neogeographer,” Schmidt is part of a generation of coders whose work is inspired by easily obtained map data, as well as the mashups made possible by Google Maps and Microsoft’s Virtual Earth.

—Annalee Newitz, “Where 2.0 Gives the World Meaning,” Wired News, June 16, 2006

where image -- cell stumbling inverse distance weighting
A closer look at the article brings you to my pretty face above a screenshot of some work I did while doing cell tower mapping. Some people might recognize that the cell tower mapping in the picture is actually an image rasterized out of GRASS, an Open Source GIS tool which existed before the term “open source” did. GRASS has been around for 30 years, and is probably the single tool which is the clearest example of software from the “GIS experts” that the neogeographers are so often compared with.

So perhaps I am a neogeographer, but the image points out one thing only: even neogeographers can learn something from the GIS experts. This is a point which was made by Schuyler in his lightning talk at the FOSS4G conference, and has been made by me at other times in this journal of technical ramblings, but it’s a point that bears repeating.

I always said that I used the term neogeographer as a deragative term for myself, not one I expected to be used to praise me. After thinking about it more, I realized it’s not that it’s deragative — or it shouldn’t need to be. The term is describing the ability to take the things which other people agonize over, and make them fun. GIS work and neogeography are two ends of a spectrum: one dedicated to analysis and accuracy, the other dedicated to sharing stories, which oftentimes results in a loss of accuracy. That loss of accuracy is not always a bad thing: There are many things which don’t *need* to be accurate to the meter, or centimeter, level. Putting my photos on a map can tell a story — even without being 100% accurate to the meter.

Neogeography can be good — but many times, the neogeographers need to learn from the GIS experts before they reimplement the wheel, and give it squared off corners.

6×6 Rule

Posted in FOSS4G 2006, Locality and Space on September 17th, 2006 at 19:39:55

So, several times throughout the most recent conference, I found myself wishing that the 6×6 rule was a standard teaching item in more curriculum… until I realized that most of the people presenting have probably never taken any courses in presenting, so there’s no place in formal education it could fit. So, instead, I’ll share it with people here.

When crafting presentation slides, each slide should have no more than 6 lines, and each line should have no more than 6 words.

There are many reasons for this, but essentially it comes down to the fact that with more than 6×6, the text on the slide starts to to be the text of your talk. This leads to your audience reading your slides, ignoring you, and ending up bored.

The one exception to this I’ve found is in a case that I saw at this FOSS4G conference: the speaker spoke only French, but his slides were slightly more verbose… and in English. So even though he broke the 6×6 rule, it was effective: he was able to communicate his message in English with his slides, and his message more verbosely in French via the talk.

In general though, try following the 6×6 rule. You may find it helpful to prevent you from just reading your slides at conferences.

OpenLayers Presentation

Posted in FOSS4G 2006, Locality and Space on September 16th, 2006 at 08:16:42

As I’ve so often found at conferences, I didn’t have enough time to write what I wanted, and now I’ve got all the time in the world to write it, and much less to write about, since nothing is happening.

The OpenLayers presentation was extremely well received. I did a survey of morning presenters, and OpenLayers was by far the best attended: We had about as many attendees as the other 6 sessions in the 8:30 slot put together. We had about 75 persons attending in total, and after our presentation, the room cleared out, which says to me that OpenLayers specifically was the draw for people. It’s good to know that the topic that we came here to present is a popular one.

Sadly, due to the early morning slot, no video of the presentation was made. This is especially unfortunate because there were so many people who were interested in the presentation, but unable to come. However, the slides for the presentation are available online (OpenOffice Impress), and the slides basically walk you through the presentation well. It was essentially a tutorial on doing a number of things with OpenLayers — adding WMS and commerial layers to a map, adding markers, adding popups, etc. I think we probably showed a lot of people just how easy it can be to create an OpenLayers map or application.

I want to thank anyone out there who attended the presentation, as well as MetaCarta for sending Schuyler and I over to this great conference.

Conference Sessions

Posted in FOSS4G 2006, Locality and Space on September 14th, 2006 at 04:16:05

Got a late start, so I missed the PostGIS case studies session I wanted to get to see. However, I’m currently at the QGIS as WMS Server talk. It is an interesting idea — essentially, they created a C++ CGI which calls out against the QGIS libraries.

It doesn’t really solve what I would like to see: a WMS implementation which is installable as a single binary app. You could open the data, style it, and once it was styled, you could serve that data — exactly as is — in any WMS client. The reason I want this is probably clear: this allows you to do OpenLayers development against a GIS application.

I think that Arc* are actually getting close to doing something like this, but that’s not a solution for me, since I’m looking primarily for something Open Source, and secondarily for free as in beer. Actually, even if it wasn’t free as in libre, I’d probably be happy with it.

One thing that’s hard to do right now is directly combine editing of GIS data and displaying it in a web mapping client. Certainly, one can see the uses that might lead to this: take an OpenLayers WMS setup, and you want to add data on top of it — perhaps you want to load a PostGIS datastore into QGIS, edit the points, and then immediately render the data into your webpage. If there were some GIS to provide a WMS against itself *directly* — not just via Apache or something else — you can skip a setup level of MapServer, or what have you.

MS4W is great, but it doesn’t include tools for creating geographic data. QGIS is great, but it doesn’t include tools for making it act as a WMS server. Maybe there are tools out there that do it — if so, maybe they’re Java, and we all know that I don’t touch that, but if it did this, I would.

I want to serve a WMS layer from a single GIS app running as a webserver on my local machine. What can do it? QGIS is apparently not the answer — you still need some CGI server to do the serving. So, what is it?