So, this week, I have learned a couple things:
- Nobody in this house does the dishes but me.
- My 10 year old daughter loves playing with the younger kids in the sandbox at the park… and the other parents seem to love her for it most of the time.
- There are at least 7 different major POI data sources, and the type and quality of data varies from “enh” to unusable — but once you get them working, there is actually some interesting data contained in them.
I’m going to do a bit of an exploration of the various place APIs, how difficult they were to get started with, and some thoughts on when you might use each of them. The order listed here is the order in which I implemented them for a new toy that I’m playing with. In each of these things, my goal is to provide place listings in response to search queries from users. The way I am doing this is to take each place API search, and wrap up the title, lat, lon, and URL into a GeoJSON object.
The FourSquare Venue API is actually surprisingly simple to get started with. Although for user access, FourSquare depends on OAuth — a bit weird given that I’m required to type my username and password in directly on my phone for the official app — for venue search, you’re not required to go the full OAuth route, and can instead use the Venues Project, which makes searching for Venues easy. In addition, the documentation on the FourSquare venues project makes it clear that using the API is pretty open; you can do most of what you want to do (except copying Foursquare places wholesale into your own database), including copying and storing IDs, displaying information, etc.
The documentation and examples with the FourSquare API were easy to get started with. The process for getting a key was well linked from the docs, so getting there was easy. There was no need for complex OAuth loops, and the licensing is clear and obvious. Overall, the FourSquare Venues API made getting started with place search easy.
The FourSquare data, at least for the areas that I’m currently looking at, are pretty great. I have been able to find results for things that don’t exist in any other data using FourSquare. The primary problem I have with FourSquare is that it is somewhat too generous in attempting to find results — appropriate for searching on a mobile phone, but less for a web interface. (A search for “Cambridge, MA” from Europe will give you many results for ‘ma’, but nothing actually relevant to the town.) Excepting this — which applies to almost all of the APIs in this post — FourSquare provides an easy to use API with a large quantity of interesting data in the areas where I’ve looked at it so far.
Facebook is all noise. While I appreciate the social aspects of what Facebook place results provide — they have the benefit that you can have all of the aspects of Facebook linked into them — Facebook places coverage is limited, and what little coverage there is tends to be absolutely filled with duplicates. (A local hospital has 4 duplicates in 3 surrounding towns, even though it only has one location.) The API is relatively easy to use — just fetch a temporary token, and you can then make additional calls back to the API — and Facebook provides a familiar interface for Facebook users, but overall, the quantity and quality of data tends to be low in a way that makes it difficult to use for any serious purpose. Additionally, using the API was made slightly more difficult because getting a human-readable link required an additional call to fetch more details about a place, which almost all of the other APIs did not require. However, it was possible to bunch all these requests in one ?ids= request, so it wasn’t particularly painful. Total number of HTTP calls to get the data I needed: 3.
Google Places is a weird entry. First of all, unlike every other API, Google does not give you a link to the place, nor does it have a way of getting details about a batch of places. In order to get a link, you must make a place details call for each place you are interested in. This makes Google one of the slower APIs to get the information I needed for my UI.
Google has a relatively high level of data quality. Although it uses a narrower definition of places than something like FourSquare, it is much broader than something like CitySearch. It also has a smaller problem with duplicates and other data issues than many other providers with user-submitted data tend to.
Overall, Google has high quality data with low noise, for their definition of places. However, for maximum coverage — to support services like check-in — other options like FourSquare have Google beat, at least in the urban areas I’m looking at.
The CitySearch API is interesting; it is a deeper/richer set of data than many other place providers — because it appears to be centered around a strongly curated set of results. Many places have reviews, and additional details, that can be gathered through the APIs, and due to strong categorization efforts, it is a good choice to use to find things like cuisines, which the full text search also indexes.
From the point of view of how I’m using it, the results are reasonable, though it is sometimes not obvious exactly how things are being matched due to the depth of data being searched. Additionally, the coverage is centered around restaurants and the like; because adding places is not something that users are expected to do (the API is centered around having businesses submit, rather than users), the coverage for things like public transit stations, or other things that users would consider important, is lower than competing APIs.
Additionally, the terms of service make it clear that using the API is really designed around creating your own CitySearch clone; they describe how you have to create links to CitySearch, how many ads you have to display on your pages, and the like. From reading the terms, it’s not even clear if it’s allowed to display CitySearch content without also displaying at least 2 ads on the page — see section 3.5 of Usage Requirements. Overall, though there is a great depth of metadata — reviews, attributes, etc. — for places in the CitySearch API, from the perspective of someone looking for a comprehensive POI source, this is probably not the best place to start.
Yelp falls under pretty much the same descriptions as CitySearch, minus the weird/confusing Usage Requirements and the lack of user-submitted content. Because the site is review driven, the search is also searching full text of reviews, which has a somewhat weird impact on the results you get. (Searching for “Green Line” will get you Green Line stops… but also places which are close to the green line, because people mention it in reviews.)
Yelp uses OAuth for their API, but there are solid examples of how to make calls in Python, which made getting started easy. Signing up was also easy.
The content in the Yelp API has more user generated content — which means, as usual, more coverage and more noise. It is also an API with a lot of depth — reviews, etc. — if that’s the type of thing you’re interested in.
GeoNames is a perfectly reasonable set of data, but POI coverage is low. For navigation style queries to addresses, towns, etc. GeoNames is fine; for POI coverage, you’ll probably need to look elsewhere. Additionally, the API is not the best documented, and getting set up with an account took longer than I wish it had; additionally, the API is not particularly full featured. There is no way to pass a location into the query, so results are always worldwide, with the expected results. Additionally, while developing, I found that the API was down for about an hour one evening. All of this is in line with the expectations that I’d have of a non-commercial service like this, but it makes using it in any kind of application somewhat less tenable.
OSM’s coverage is primarily towards address-like things, as is the case for GeoNames, but there is an interesting amount of POI data in it. The search API is relatively fully featured, and you have a benefit of any object you find in this way having a full edit history, and additional data/attributes available, via the OSM APIs. This also allows for easy correction of this data in the case of mistakes — something that doesn’t exist in any of the other APIs. However, overall the coverage for POIs is spotty, and this is not an appropriate source for someone looking to build on top of a comprehensive place database.
I almost didn’t bother with SimpleGeo; despite the use of GeoJSON (my preferred format) as transport format, for two reasons: 1. It didn’t offer any particularly exciting data. 2. Required OAuth, but … in a weird way without examples. In general, the documentation around the SimpleGeo *tried* really hard, but failed in too many important ways; without working a way to do browsing of example requests, it was just too much effort to integrate. However, I did eventually find the SimpleGeo Python client (after a lot of looking around), which solved that particular problem. (In general, I don’t like the idea of depending on a specific API client to do things like ‘Fetch JSON’, but this was easier than the alternative.)
SimpleGeo did offer the benefit of using GeoJSON natively — for this reason, SimpleGeo is the only API for which I have full data available. Unfortunately, due to the difficulty of getting started with their API, I almost didn’t bother to implement it.
When I did implement it, I found it lacking in about the way I’d expect. It looks like someone glommed together a bunch of free data with some user entered data, with about the expected result. (For example, “Harvard Square Eye Care” seems to have two copies of the location in Porter Square… and none of the one in Harvard Square.)
After reading a bit about Factual, I actually got far enough to try their API — setting up a token, etc. — and found that it does make getting set up with the API relatively easy. However, because their local data is split by country, there is no practical way to treat Factual as a worldwide POI data source via their APIs, which makes it impractical without a lot more work than most of these APIs require.
But Don’t take my word for it!
Realizing that everyone has different needs, and even different aspects of a project have different needs, I didn’t just explore these APIs: I built something with them. The result is Anymaps.
Anymaps will let you search through the aggregated result list of all of the place APIs described in this post. So whether you’re looking for Four Burgers in Cambridge, or Pune Central, or even the Taj Mahal, you can see the results from all the various place APIs that are out there — and decide for yourself which one has the right data for you.
And that’s how I spent my Spring Vacation.