Technical Ramblings

Archive for May, 2011

More On Flaws in OpenLayers

Posted in default on May 30th, 2011 at 17:00:04

Tom MacWright has responded to my previous post about OpenLayers; in the spirit of continuing the conversation, I’ll respond in kind. (Note that this is a conversation that feels disjointed, which is unfortunate; but I’m not sure of a better way to continue the conversation; oh for the days of Usenet, where I feel like this conversation could be so much more effective.)

First, Tom points out that I compared to the wrong mapping libraries. That’s unfortunate, I guess, but I was going entirely from the list that was presented; I don’t have any beef in the argument. Since I’m an OpenLayers user, I don’t end up doing a lot of research on new hotness, so I’ll admit I had a flaw there; I apologize. (In fact, I couldn’t even *find* Leaflet until it was pointed out to me; clear evidence of my flawed knowledge in this regard.) It was suggested that any comparison should include Leaflet, OpenLayers, and Modest Maps.

I reject that statement as simply incorrect. Leaflet, I’ll gladly accept into the ring—I’m especially happy to see that Cloudmade has open-sourced their new efforts, which opens the possibility for collaboration and contribution across projects, a great step forward. However, Modest Maps is not a JavaScript web mapping library—it’s based on Flash and requires developing using the Flash toolchain. For those interested in comparing various mapping libraries, there are comprehensive collections like 토토사이트 모음 that provide a wide range of options tailored to different needs. While Modest Maps may be an interesting choice for a small percentage of the OpenLayers user base, I would argue that I’m content with users whose needs align with Modest Maps to use that library, but it doesn’t offer a relevant comparison for the vast majority of OpenLayers usage. While OpenLayers can certainly learn from Modest Maps, it’s not an interesting comparison for users of the software as an API.

Some questions which are asked:

Where does control of the map come from? The baselayer? The map object? Both?

Ah, a softball! The layer configuration for resolutions, projections, and all other information related to it should come directly from the current baseLayer of the map. The information assigned to the map may occasionally offer a shortcut for that configuration, but it should not be in any case necessary to have that configuration on the map. (Any case where that is untrue — and I’m sure there are some! — is likely a bug.) For other aspects of the overall configuration, the map may be a home, but in almost all cases, configuration options on the map are just a shortcut (and likely a confusing one, because it compounds the possible configuration issues). When configuring options for projection information, the information comes from the layer; as a courtesy, we do our best to copy configuration from the map to the layers to make it easier (since most maps have the same properties on all layers.)

OpenLayers has its own events system … which is different than the $.proxy and other apis in jQuery or anything else (or in pure Javascript) that people are used to.

Well, in actuality, the OpenLayers event system is based almost entirely on the Prototype system. As we have moved on we have incorporated shortcuts very similar to the ExtJS event registration system. But really, we’re talking about one or two functions here, which are consistent throughout the OpenLayers library: on anything with an ‘events’ object, you can call “foo.events.register(‘eventname’, null, function(evt) {});”. So while it may not be particularly familiar — since it’s based on a less commonly used syntax, perhaps — it’s not like this is the most complex part of the library. I’ll also point out that we’re talking about a library which is older than jQuery, and older than ExtJS — it’s hardly a sin to not follow conventions that were invented years after you started your project 🙂

OpenLayers has its own HTML/styling system, that’s just built-out enough to be terribly limiting to anyone who wants controls to look nice, and uses anywhere from 40% to 60% hardcoded styles.

A mistake (largely created by our early development) that we have been slowly paying back ever since we got some external developers. Tim Schaub has been pushing the use of CSS and stylesheets for new controls since 2.4. We added CSS-stylable zoom and pan panels in OpenLayers 2.7, almost 3 years ago. While there are still a limited number of controls that use non-CSS based styling — specifically, the Control.PanZoomBar is probably a big one that will bite people, and the LayerSwitcher — I hardly think that these flaws are something that are the biggest problem with the library.

Now, the LayerSwitcher is a point: it’s *very* difficult to style, and it’s a not a particularly attractive display… but the reason that we haven’t invested in it is precisely because it’s not in the spirit of OpenLayers to solve this problem for users. As an API — not a UI development tool — OpenLayers isn’t seeking to solve all the problems of user interaction. Instead, we have left things like a more clean layer switching interface to third parties like GeoExt — with reasonable success. We’ve seen people create alternative layerswitchers for mobile applications, as well as other similar alternatives, ranging all the way down to just a simple dropdown. For complex usage, these may be insufficient — but that’s where libraries like GeoExt pick up the slack. In fact, Tom makes my argument for me: “See Modest Maps’s controls: there aren’t any, because designers will make prettier ones.” So it’s okay to have *no* controls, but not okay to have *unattractive* controls… that argument seems a bit flawed.

In this vein, there is a followup later in the post:

The configuration of the theme requires a line of straight-imperative code that sets a global variable. What if you want a different imgpath for different maps? Shouldn’t this be a per-map setting?

For panels — that is, for controls developed in the past 3 years, rather than 4 or 5 years ago — styling is entirely via CSS. This includes the Editing toolbars, it includes the panpanel and zoompanel controls, etc. Since these controls are via CSS, you can use different configurations per map, simply by using CSS rules.

Comparing to ModestMaps — which Tom describes as having no controls at all — and Leaflet — which has only a Zoom control — OpenLayers doesn’t appear to be lacking in the ‘easily customizable controls’ department 🙂

Now, Tom describes this as part of a systemic problem: “an assumption that you already know the scope of the library and the parts that you can customize.” This is a criticism that I am totally willing to grant — and feel falls directly under the heading of poor documentation, especially prose documentation, which I feel is a significant lack in OpenLayers for most users.

The growth of OpenLayers is addressed by code compression, not by changing the code, ever. … the bloat of OpenLayers and its ever-more-complex API is the core problem that shows no signs of changing.

This is completely untrue. The OpenLayers project has made significant efforts in three different areas to reduce code size: Compression (via better tools like the Closure compiler), Better build tools (which simplified building build profiles), and reducing interdependencies in code to limit build size. In fact, a significant portion of the most recent release was an attempt to minimize builds — simplifying default controls, pulling various portions that people use most out of the larger components so they could be built separately, and creating alternative build profiles to help users get started with building smaller OpenLayers builds for their applications.

In addition to making the OpenLayers build itself smaller, we have concentrated a large amount of our recent work on improving performance (simplifying operations while dragging to improve responsiveness) and limiting download size (using more sensible defaults, providing more example build profiles, and doing things like shrinking our default images to save space there). Overall, a simplified OSM map designed for a mobile interface, using our documented example profiles, will save over a megabyte of space, due to improved defaults and documentation.

In addition, the description of the OpenLayers API as ‘ever more complex’ feels confusing to me. An OpenLayers 2.0 application works just as well with OpenLayers 2.11 — so the core API can’t have changed much. There is certainly new functionality in the API, but looking at other APIs — like the leaflet quick start example — I just don’t see that much difference in APIs between OpenLayers and its competition.

While Polymaps and Modest Maps are great drawing frameworks, OpenLayers is an abysmal one. Try to get individual tiles from a drawn map? No way. Interact with anything on the map like a DOM element? Nope. “Unbind” the map from the HTML element it “owns”?

This would be what I would consider contrary to the design of OpenLayers, yes 🙂 I can’t think of *any* case where someone has come to the mailing list and asked a question which would be more easily answered if these things are easier — and I read a lot of mailing list posts.

New-style Javascript isn’t there in OpenLayers.

A perfectly reasonable criticism — OpenLayers is a child of the era from which it was born, and I think that if this is important to you, then I expect other tools will be more useful to you. That said, OpenLayers isn’t alone in this regard — Google, for example, follows much the same style of code, so far as I can tell, so at least we’re in good company. (Note that leaflet‘s quick start reads very much like an OpenLayers example, so I’m not sure I see where this comparison is coming from.)

Recommending GeoExt/MapQuery is a good idea for people building RIAs, but it’s the opposite of that for most others. What people want (people, as in, me, and people building things on mapping frameworks) is a lower-level, simpler API, rather than a higher one.

I think that this is simply untrue, based on the types of questions I regularly see. I think that OpenLayers provides a reasonable API for most tasks at the low level — many of the complaints are indeed only about the higher level tasks (like the layerswitcher styling). Given the rapid growth of usage for GeoExt, I see a lot of basis in the argument that OpenLayers was insufficiently providing for that niche — while I don’t see widely used reasonable competition to OpenLayers as a low level API.

The ability to make slim builds of OpenLayers (something we’ve also done, with openlayers_slim[5]) hasn’t been trumpeted nearly to the level that it should have been.

I don’t really know what more we can do. It’s the first response to every question we’ve ever had on speeding up your application, it’s in our deploying documentation, it’s been the first performance increasing technique mentioned every time we’ve talked about OpenLayers.

Short of not providing a pre-built OpenLayers at all, I don’t know what more we can do to help developers understand how important this is.

I’m positive that OpenLayers is not the best tool for all problems. But I think it’s actually a pretty good tool for a lot of problems. And one of the key things that it does — that nothing else does — is provide support for layers that you don’t *get* anywhere else. OpenLayers is designed to make it possible to use Bing, Google, and OSM in the same map — and as far as I can tell, nothing else does that. I still don’t see a library which offers support for parsing OSM, KML, GeoJSON, and GML in Javascript — and in my experience 50%+ of OpenLayers users end up using either the format parsing code or the commercial layer code.

For the few users who will never need to use anything other than OSM and some markers drawn in Javascript — great. But comparing OpenLayers to these other frameworks for anything other than dropping pre-drawn tiles in a map is sort of a non-sequiter for real usage, no matter what the flaws of OpenLayers are.

4 Comments »

Perceived Flaws of OpenLayers

Posted in default on May 29th, 2011 at 12:28:21

So, apparently at WhereCampEU this weekend, there was some discussion of “Why people use web mapping libraries other than #openlayers”. (I can’t find any other references to this in Googling; someone who was there may be able to provide more context which might alter the content of this post.)

There are a number of good points about OpenLayers identified in the whiteboard snapshot; the ones listed there that I can read are:

Documentation
Default Look
Viability
Examples are not Good
No explanation of the general architecture

(Throughout this post, I will be comparing OpenLayers to other mapping tools listed on the whiteboard snapshot; with the exception of “Leaflet”, which I can’t seem to find by googling things like “Leaflet” and “Leaflet web mapping”. Help is gladly accepted.)

Some of these things I would be interested in understanding more deeply — and some I’d like to take on trying to address.

Documentation and Examples Are Not Good

I think a large difference of opinion exists between different people in the OpenLayers using and developing community about how documentation should be managed. In general, in OpenLayers, we have taken the approach of saying that the best form of documentation is via examples that clearly demonstrate how to use a given set of functionality. (Note that in this case I say ‘we’, but I won’t claim this is a considered decision on behalf of the project. Instead, it is one that I believe fell into place in large part because of how we started documenting the project, and has not drastically changed.)

In OpenLayers, the examples are the first line of documentation. Always check the examples first; not doing so is doing yourself a disservice. This is not to say the examples are perfect; they are far from it. Especially as the project has grown, the examples are not an ideal way to understand all of the functionality of OpenLayers; as the library has grown, the complexity has grown correspondingly. However, I have often been told “Well, I looked at the API docs first and they weren’t clear about how I was supposed to use a specific feature” — and I think that particular approach is one that is doomed to fail with the documentation currently available in OpenLayers. The best advice I can give to beginners starting with OpenLayers is: start with the examples. Search for the thing you’re looking for — and if you can’t find what you’re looking for with the search term you’re using, tell the OpenLayers users list.

Now, as I’ve said, the examples haven’t kept up. OpenLayers has grown tremendously, and although we are relatively diligent about writing API documentation, prose documentation has never been high on our list, and prose documentation outside of the examples probably never will be. I think that this is a problem that really needs a solid champion outside of the current developer pool if any progress is going to be made; no one on the current development team is likely to be strongly in favor of making sweeping changes, simply because the choices are to continue to add features — things like improved performance, better mobile support, improved functionality for new features like geolocation APIs — or to work on documentation, and the current developers will tend towards the former.

That said, I think it’s fair to compare OpenLayers to some of the other mapping APIs listed on this point.

Google Maps: Blows OpenLayers out of the water on API documentation. Google has done an excellent job of integrating reference documentation with prose documentation, and has literally hundreds of pages of API docs for their mapping APIs. Their examples are solid — though comparing this page to the OpenLayers equivalent seems to have a certain something lacking. There is, however, no doubt that Google has done great work on documenting their API, and that there are many simple problems for which Google is a better solution than OpenLayers for some users simply because it’s better documented.
Polymaps: Again, a relatively well done set of documentation. Their API docs are much more descriptive than the equivalent in OpenLayers, and their examples like this one provide a simple, easy to read overview, example, and code in one page, and the example overview is quite nice, with screenshots. I think that this is well in line with what you would expect from Stamen. I will say that their API docs do lack information about what properties you are actually supposed to pass into functions; the map documentation for example, says that “map.center([x])
Sets or gets the location of the map center. If the argument x is specified, it sets the new map center”… but doesn’t explain what x is supposed to be. So long as functions have clear examples, this probably isn’t a problem; it’s just something that I could see being a problem with more complex functions that don’t take obvious arguments.
Web Maps Lite (I’m assuming this is the Cloudmade offering): This API offers a pretty reasonable set of documentation; their Overview is slightly more comprehensive than the OpenLayers equivilant which uses Naturaldocs. I would describe most of the brief API method descriptions as being similar in quality to those in OpenLayers; no obvious winner or loser there. The examples seem clear and well done for the few that exist; not particularly comprehensive or entirely obvious, but probably a win for most simple use cases over OpenLayers.
Mapstraction: Documentation was a bit hard to find; I didn’t even see it linked from the homepage, but Googling turned up javadoc-like content; limited, but since Mapstraction is a pretty simple library, probably not insufficient. The examples are… brief, but workable, looking at the mapstraction sandbox. I’m not a huge fan of the sandbox form of examples personally, but for people who are, this probably is a perfectly workable set of docs — but I don’t think there’s an obvious win over OpenLayers here.
Tile5: The Tile5 API documentation seems, to me, to be less informative and more confusing than OpenLayers. It seems hard to navigate, and even once you’re in it, it’s hard to understand exactly what you’re meant to do with things like T5.Marker objects. The ‘examples’/tutorials section has two sections, approximately equivalent to the OpenLayers introduction, but nothing further; it’s not clear that these examples offer any sort of view of a more comprehensive idea of how to start using the library. I may be missing something, but I feel like this is actually a pretty major lack in this particular project — and given that OpenLayers is coming from deep in the hole on this problem, that’s saying something 🙂

Overall, reviewing the various APIs shows one thing to me: I believe that the integrated API reference and prose documentation of Google Maps works best, and that it looks like the tools for generating Javascript documentation from code appear to have improved greatly since OpenLayers started. However, no project has an obvious win in all categories; the different approaches generally leave something to be desired. OpenLayers is probably one of the worst in this regard — with the exception that it does appear to have the broadest example support, providing at least limited documentation for a number of ready-built use cases. However, in prose documentation, most APIs do better, and in API docs, Naturaldocs seems like it may simply not be cutting it for many users, and switching to something else may be viable way to help increase developer engagement with the documentation.

General Architecture

Unfortunately, without being there, it’s hard for me to know what people want here; I’ve heard requests for a general architecture description before, but when I try to give them, people usually end up saying something like “No, that’s not what I meant, I wanted something… else.” I’d be interested to hear thoughts on what this means.

In general, the architecture of OpenLayers is: You pass an HTML element to OpenLayers, at which point, it owns that element. You inform OpenLayers of a set of x/y coordinates you would like to render data into, and it takes layers of content, and allows you to lay down multiple layers in whatever set of x/y coordinates you defined (via your base layer). In general, OpenLayers tries to provide comprehensive tools for interacting with the map — using browser gestures, clicks, etc. — with an API to implement more advanced functionality beyond that. Where possible, OpenLayers will make it as easy as practical to interface with a wide variety of data sources to provide the data you are able to browse, and make interacting with the map to increase the amount of data possible.

Beyond that, OpenLayers isn’t supposed to be much of anything; it’s an API and a tool for rendering map data, and little else. (Sometimes the problem is simply overly high expectations for OpenLayers because people have used it to do a lot of complex things!)

In searching out the various map APIs described above, I didn’t see anything obviously filling this role, which probably means that I don’t know what I’m looking for — but I’d be happy to learn, and explore how to solve this problem, because it doesn’t really sound that complex to solve.

Default Look and Feel

This one is interesting. In general, OpenLayers doesn’t *have* much of a look and feel; in fact, the default look and feel is generally limited to 7 icons which are easily replaceable by CSS. (One of the things we even have docs for!) That said, I’m willing to go head to head against most of the other APIs here for that issue:

Google: Google has smaller controls than OpenLayers; these may be more attractive to some people. I’m willing to accept that some people prefer Google’s style to OpenLayers standard control styling.
Tile5: When I look at Tile5’s example at simple map marker, the only ‘default style’ I see is a checkered grey and black background (easily implemented via CSS in any webpage).
Polymaps: Seems to have a default +/- icon, but no others; the +/- look very similar to what we have in our mobile example, styled using entirely easily-modifiable CSS.
Mapstraction: This uses the native control style for everything, it looks like, and turns controls off by default; since OpenLayers can have its controls turned off with one map option, this feels like they are the same in this regard.
Cloudmade: basic example seems to have no default look at all; again, no difference.

Overall, it seems there is a preference for… no controls. Perhaps we should simply make that a better-documented OpenLayers mode of operation, and then people could complain about how their map didn’t have any default styling 🙂

Viability

Edit: After reading Volkler‘s post on the topic, I realized that this point was *Usability*, not Viability. I’ll expand on that after this section, but I think that the Viability section here is an important concern, if you ignore the off-base ranty nature of it: OpenLayers is an extremely broadly used and widely supported tool, so using any other open source solution is risking joining a smaller community, which may have its own dangers.

This is the one I actually have the strongest feelings about. I think that if you are going to talk about problems with OpenLayers, viability is the one that has the least of a leg to stand on, on any API other than a commercially supported (non-free) one.

OpenLayers has:

2000 members of its users list, with dozens of messages a day.
Code contributions from more than 40 users
New releases multiple times a year, closing hundreds of bug reports
A steadily growing list of features and enhancements

OpenLayers is the de facto web mapping library used by almost every open source geo related project. OpenLayers has a broad community, with broad support. There are two books introducing OpenLayers, there is an active community around OpenLayers, and earlier this year, more than a dozen organizations came together to sponsor an OpenLayers code sprint, funding 18 developers to stay in Switzerland for a week and work solely on supporting OpenLayers.

OpenLayers is a huge success story. The biggest evidence of that, in my mind, is simply the fact that no one talks about using OpenLayers anymore — they just do it. OpenLayers has become the de facto web client at FOSS4G; OpenLayers is used by tens of thousands of people every day around the world, from OpenStreetMap to the US Government, to the Portland public transit system, to consulting companies selling services in Australia.

I’m sure that there are many people who would claim that other APIs win on various aspects — and they’d be right. Documentation is a pain. Since OpenLayers caters to thousands of different use cases, it’s not as trivial to get started with. The library is complex, and there are many features that are confusing and complex to get started with — but overall, OpenLayers is a reasonably usable library for doing a relatively complex task that achieves in a way that I don’t think any of the open source competitors come close.

In the same way that OpenLayers has revolutionized accessibility in the field of web mapping, there’s a growing trend in other industries to enhance accessibility to vital resources. One significant area is the pharmaceutical industry, particularly regarding the availability and affordability of medications. For example, there’s a noticeable shift towards generic medications, like Generic Cialis, which are gaining popularity due to their cost-effectiveness. To support this trend, informative resources and platforms are emerging, offering detailed comparisons and information, much like the community support seen in OpenLayers, to help users find cost-efficient options for their healthcare needs.

Someone questioning the viability of OpenLayers makes me wonder what they mean — because the project continues to make regular, large strides in new developments and features, and I can’t see how anyone could look carefully at the project in comparison to its competitors and think that it fails on that merit.

I’d be glad to here about this point, or any others, from people who were at the WhereCampEU session — perhaps I’m misunderstanding the complaint, for example — but I would say that to almost everyone looking to create a web map: OpenLayers is a strong starting point, if you’re a competent developer, upon which it is possible to build great applications.

Usability

It appears that “usability” here is a description of the API as it applies to developer usability. I think that this is an area where the main problem is “OpenLayers is a broadly used library supporting dozens, or hundreds, of different use cases, and is not a narrowly designed tool designed to solve a specific problem.” I’ve theorized for a long time that a lot of people (especially people who are likely to attend WhereCamp events) would be much happier with OpenLayers if I were to write a simple “OSM.Map” wrapper around it that picked sensible defaults for users of OSM maps.

The biggest thing that makes usability hard is targeting too many users. OpenLayers is trying to solve many hard problems — from integrating with OSM and parsing OSM data, to integrating with OGC catalog services, to interoperating with WFS. The most ‘sensible defaults’ for one application are not the best ones for another, and with weak documentation, usability suffers.

Some ways to fix this include better docs, writing wrappers for specific purposes to simplify defaults for some users, etc. But overall, because OpenLayers is a library which supports many different use cases, it’s somewhat harder to get started with for all use cases. This is unfortunate, but not unexpected; I think it’s a necessary evil of having an API which targets such broad use cases.

Overall, many of the complaints about OpenLayers are reasonable: our documentation is weak, our default styling apparently doesn’t appeal to some people (I still like it, but I’m apparently in the minority), and it’s harder to get started than with some other tools. However, we’ve seen continued transitions of people from other tools to OpenLayers as they encounter harder problems — because as your problems get harder, OpenLayers becomes a more attractive solution to prevent you from having to reinvent the wheel.

7 Comments »

Working with Place APIs (aka “How I spent my Spring Vacation”)

Posted in default on May 28th, 2011 at 11:10:55

So, this week, I have learned a couple things:

Nobody in this house does the dishes but me.
My 10 year old daughter loves playing with the younger kids in the sandbox at the park… and the other parents seem to love her for it most of the time.
There are at least 7 different major POI data sources, and the type and quality of data varies from “enh” to unusable — but once you get them working, there is actually some interesting data contained in them.

I’m going to do a bit of an exploration of the various place APIs, how difficult they were to get started with, and some thoughts on when you might use each of them. The order listed here is the order in which I implemented them for a new toy that I’m playing with. In each of these things, my goal is to provide place listings in response to search queries from users. The way I am doing this is to take each place API search, and wrap up the title, lat, lon, and URL into a GeoJSON object.

FourSquare

The FourSquare Venue API is actually surprisingly simple to get started with. Although for user access, FourSquare depends on OAuth — a bit weird given that I’m required to type my username and password in directly on my phone for the official app — for venue search, you’re not required to go the full OAuth route, and can instead use the Venues Project, which makes searching for Venues easy. In addition, the documentation on the FourSquare venues project makes it clear that using the API is pretty open; you can do most of what you want to do (except copying Foursquare places wholesale into your own database), including copying and storing IDs, displaying information, etc.

The documentation and examples with the FourSquare API were easy to get started with. The process for getting a key was well linked from the docs, so getting there was easy. There was no need for complex OAuth loops, and the licensing is clear and obvious. Overall, the FourSquare Venues API made getting started with place search easy.

FourSquare Data

The FourSquare data, at least for the areas that I’m currently looking at, are pretty great. I have been able to find results for things that don’t exist in any other data using FourSquare. The primary problem I have with FourSquare is that it is somewhat too generous in attempting to find results — appropriate for searching on a mobile phone, but less for a web interface. (A search for “Cambridge, MA” from Europe will give you many results for ‘ma’, but nothing actually relevant to the town.) Excepting this — which applies to almost all of the APIs in this post — FourSquare provides an easy to use API with a large quantity of interesting data in the areas where I’ve looked at it so far.

Facebook

Facebook is all noise. While I appreciate the social aspects of what Facebook place results provide — they have the benefit that you can have all of the aspects of Facebook linked into them — Facebook places coverage is limited, and what little coverage there is tends to be absolutely filled with duplicates. (A local hospital has 4 duplicates in 3 surrounding towns, even though it only has one location.) The API is relatively easy to use — just fetch a temporary token, and you can then make additional calls back to the API — and Facebook provides a familiar interface for Facebook users, but overall, the quantity and quality of data tends to be low in a way that makes it difficult to use for any serious purpose. Additionally, using the API was made slightly more difficult because getting a human-readable link required an additional call to fetch more details about a place, which almost all of the other APIs did not require. However, it was possible to bunch all these requests in one ?ids= request, so it wasn’t particularly painful. Total number of HTTP calls to get the data I needed: 3.

Google Places

Google Places is a weird entry. First of all, unlike every other API, Google does not give you a link to the place, nor does it have a way of getting details about a batch of places. In order to get a link, you must make a place details call for each place you are interested in. This makes Google one of the slower APIs to get the information I needed for my UI.

Google has a relatively high level of data quality. Although it uses a narrower definition of places than something like FourSquare, it is much broader than something like CitySearch. It also has a smaller problem with duplicates and other data issues than many other providers with user-submitted data tend to.

Overall, Google has high quality data with low noise, for their definition of places. However, for maximum coverage — to support services like check-in — other options like FourSquare have Google beat, at least in the urban areas I’m looking at.

CitySearch

The CitySearch API is interesting; it is a deeper/richer set of data than many other place providers — because it appears to be centered around a strongly curated set of results. Many places have reviews, and additional details, that can be gathered through the APIs, and due to strong categorization efforts, it is a good choice to use to find things like cuisines, which the full text search also indexes.

From the point of view of how I’m using it, the results are reasonable, though it is sometimes not obvious exactly how things are being matched due to the depth of data being searched. Additionally, the coverage is centered around restaurants and the like; because adding places is not something that users are expected to do (the API is centered around having businesses submit, rather than users), the coverage for things like public transit stations, or other things that users would consider important, is lower than competing APIs.

Additionally, the terms of service make it clear that using the API is really designed around creating your own CitySearch clone; they describe how you have to create links to CitySearch, how many ads you have to display on your pages, and the like. From reading the terms, it’s not even clear if it’s allowed to display CitySearch content without also displaying at least 2 ads on the page — see section 3.5 of Usage Requirements. Overall, though there is a great depth of metadata — reviews, attributes, etc. — for places in the CitySearch API, from the perspective of someone looking for a comprehensive POI source, this is probably not the best place to start.

Yelp

Yelp falls under pretty much the same descriptions as CitySearch, minus the weird/confusing Usage Requirements and the lack of user-submitted content. Because the site is review driven, the search is also searching full text of reviews, which has a somewhat weird impact on the results you get. (Searching for “Green Line” will get you Green Line stops… but also places which are close to the green line, because people mention it in reviews.)

Yelp uses OAuth for their API, but there are solid examples of how to make calls in Python, which made getting started easy. Signing up was also easy.

The content in the Yelp API has more user generated content — which means, as usual, more coverage and more noise. It is also an API with a lot of depth — reviews, etc. — if that’s the type of thing you’re interested in.

GeoNames

GeoNames is a perfectly reasonable set of data, but POI coverage is low. For navigation style queries to addresses, towns, etc. GeoNames is fine; for POI coverage, you’ll probably need to look elsewhere. Additionally, the API is not the best documented, and getting set up with an account took longer than I wish it had; additionally, the API is not particularly full featured. There is no way to pass a location into the query, so results are always worldwide, with the expected results. Additionally, while developing, I found that the API was down for about an hour one evening. All of this is in line with the expectations that I’d have of a non-commercial service like this, but it makes using it in any kind of application somewhat less tenable.

OSM/Nominatim

OSM’s coverage is primarily towards address-like things, as is the case for GeoNames, but there is an interesting amount of POI data in it. The search API is relatively fully featured, and you have a benefit of any object you find in this way having a full edit history, and additional data/attributes available, via the OSM APIs. This also allows for easy correction of this data in the case of mistakes — something that doesn’t exist in any of the other APIs. However, overall the coverage for POIs is spotty, and this is not an appropriate source for someone looking to build on top of a comprehensive place database.

SimpleGeo

I almost didn’t bother with SimpleGeo; despite the use of GeoJSON (my preferred format) as transport format, for two reasons: 1. It didn’t offer any particularly exciting data. 2. Required OAuth, but … in a weird way without examples. In general, the documentation around the SimpleGeo *tried* really hard, but failed in too many important ways; without working a way to do browsing of example requests, it was just too much effort to integrate. However, I did eventually find the SimpleGeo Python client (after a lot of looking around), which solved that particular problem. (In general, I don’t like the idea of depending on a specific API client to do things like ‘Fetch JSON’, but this was easier than the alternative.)

SimpleGeo did offer the benefit of using GeoJSON natively — for this reason, SimpleGeo is the only API for which I have full data available. Unfortunately, due to the difficulty of getting started with their API, I almost didn’t bother to implement it.

When I did implement it, I found it lacking in about the way I’d expect. It looks like someone glommed together a bunch of free data with some user entered data, with about the expected result. (For example, “Harvard Square Eye Care” seems to have two copies of the location in Porter Square… and none of the one in Harvard Square.)

Factual

After reading a bit about Factual, I actually got far enough to try their API — setting up a token, etc. — and found that it does make getting set up with the API relatively easy. However, because their local data is split by country, there is no practical way to treat Factual as a worldwide POI data source via their APIs, which makes it impractical without a lot more work than most of these APIs require.

But Don’t take my word for it!

Realizing that everyone has different needs, and even different aspects of a project have different needs, I didn’t just explore these APIs: I built something with them. The result is Anymaps.

Anymaps will let you search through the aggregated result list of all of the place APIs described in this post. So whether you’re looking for Four Burgers in Cambridge, or Pune Central, or even the Taj Mahal, you can see the results from all the various place APIs that are out there — and decide for yourself which one has the right data for you.

And that’s how I spent my Spring Vacation.

3 Comments »

Finding distance from a point in Spain to the Sea

Posted in default on May 15th, 2011 at 08:44:05

So, a friend asked me if there was a list of distances-to-the-sea for worldwide cities. I wasn’t aware of one, but thought it should be relatively quick to build one — then actually thought a bit more, and actually did it, and found out instead of the ‘a while’ estimate, it was actually closer to about 10 minutes. So I thought I’d write up what I did and get a critique on it.

This approach only works for the area of interest, Spain.

So, first I grabbed the world_borders file from http://www.mappinghacks.com/data/world_borders.zip — we’re looking for a gross estimate, so this is a reasonable dataset to use. I unzipped the data, and loaded it into postgres:

$ createdb world
$ psql -f $POSTGIS/postgis.sql world
# Crap, forgot the plpgsql
$ createlang plpgsql world
$ psql -f $POSTGIS/postgis.sql world
$ ogr2ogr -f PostgreSQL PG:dbname=world world_borders.shp

Then assembled a geometry of France, Spain, and Portugal, using ST_Union:

select ST_Union(wkb_geometry) 
   FROM world_borders 
   WHERE cntry_name IN ('Spain', 'France', 'Portugal');

Once I did that, I grabbed the point for Madrid, and calculated the distance, after converting that shape to a boundary:

select ST_Distance_Sphere(ST_GeomFromText('POINT(-3.683333 40.4)'), 
    Boundary(ST_Union(wkb_geometry))) 
    FROM world_borders 
    WHERE cntry_name IN ('Spain', 'France', 'Portugal');

Which gave me the ballpark answer I was expecting, of 303.2km.

Dependencies: PostgreSQL, PostGIS, OGR installed on your machine (easy on a mac — just grab the KyngChaos libraries — and also easy on Debian derivatives).

Comments Off on Finding distance from a point in Spain to the Sea