Are you generative than consumptive in your field?

Posted in Locality and Space, OpenLayers, Social, Software on May 26th, 2009 at 10:57:47

Anselm just posted what appears to be a random thought on twitter:

Are you more generative than consumptive in your particular field? … Create more than you consume?

In open source, I often rephrase this question as “Are you a source, or a sink?”

There are many people in the community who contribute more than they consume. Organizations, individuals, etc. There are also many sinks in the community — since entropy is every increasing, this seems a forgone conclusion — and one of the key things that causes an open source project to succeed or fail is the number of sources or sinks.

I personally try very hard to be a source in all that I do, rather than a sink. One way that I do this is that I try very hard to always followup any question I ask — for example, on a mailing list, on an IRC channel, or what have you — with at least two answers of my own. This means that, for example, when I hopped into #django to ask about best practices for packaging apps, I stuck around, and helped out two more people — one who was asking a question about PIL installation, and one about setting up foreign keys to different models.

Now, in the end, my answers were simple — no one with even a basic knowledge of Django would have had problems answering them. But by sticking around and answering them, I was able to make up to some extent for the time/energy that I consumed from someone more familiar with the project, by saving them from needing to answer as well.

It is often the case that users trying to get help will claim that once they get help, they will ‘contribute back’ to the community by, for example, writing documentation. This never happens. Though there are exceptions to every rule, it is almost always the case that users who ask a question, prefacing it with “I will document this for other users”, never follow through on the latter half. The exceptions to this — or rather, the alternate cases — are cases where a user has already invested significant research, and likely already started writing documentation. Unless the process is started before the problem is solved, it is almost universally true — in my experience — that the user will act as a sink, taking the information from the source and disappearing with it.

I work very hard on supporting a number of open source projects that I work on. Though my involvement lately has been more hands off — by doing things like writing documentation instead of answering questions, acting as a release manager instead of fixing bugs, and so on — I work very hard to keep the karmic balance of my work on the positive side. I believe that this pays off in the long run — I have somewhat of a reputation of being helpful, which is beneficial to me since it means I’m more likely to receive help when I need it. I also work to keep karmic balance high on the part of the organization I work for, since many of the other people in the organization are less able to keep karmic balance high.

These rules don’t apply solely to open source — I have the same karmic balance issues going on in my work inside of MetaCarta — but I maintain the same attitude there. Coming in with the idea that it is okay to be a sink can lead to a nasty precedent. In the end, I think that everyone loses. Sinks — both in open source and other karmic ventures — will eventually use up the karma they start with, and be left out to dry. It is the case for more than one person that they have extended their information seeking without contributing back beyond the point where I am willing to continue to support their information entropy.

I joke sometimes about giving out “crschmidt karma points”. Though I don’t have an actual system in this regard, I do quite clearly delineate between constant sinks, and regular sources, and grey areas in-between. I try to stay on the source side, and I encourage anyone else to do the same — even if it’s only by answering easy questions on the mailing list, or doing a bit more research on your own. Expecting other people to fix your problems, in open source or otherwise, is simply a false economy of help, since in the end, it simply doesn’t work.

WSGI + Basic Auth

Posted in default on April 15th, 2009 at 10:17:05

I use the logged_in_or_basicauth snippet for a lot of my work, and had had some problems with it since I started using mod_wsgi in place of mod_python. Thanks to this post, I now know why my basic auth under mod_wsgi isn’t working: lack of WSGIPassAuthorization On in my Apache config.

Thanks to the author of that post! Also, thanks to Google, since without it, I’d never have found it.

PowerPoint, in a sentence

Posted in default on April 6th, 2009 at 09:13:30

PowerPoint is a way to make gibberish look important.

– my 12 year old daughter, Alicia

MrSID SDK Improvements

Posted in default on March 10th, 2009 at 12:37:48

For a long time, I avoided MrSID like the plague. After trying to do *anything* useful with it, I finally gave up; the requirement for old versions of gcc, non-working on 64bit, etc. really gave me a negative impression of the SDK for MrSID reading. This was especially painful when working with OpenAerialMap, since MrSID has a practical lock on the market from ortho imagery datasources. (There are exceptions to this, but they’re usually JPEG2000 data, which was even worse to work with with the tools that I use, in general.)

However, after a set of discussions yesterday, I sat down and had a bit of a discusion about it, and Frank said that MrSID building in GDAL had gotten much easier. I didn’t really believe him, but I had the DSDK handy for other reasons, and reading the build hints, it was supposed to be easy.

Thinking I was going to prove Frank wrong, I started building. I did ./configure –with-mrsid=~/Downloads/Geo_DSDK-7.0.0.2167; confirmed MrSID ‘yes’ in the output, then make.

3 minutes later, I had a gdalinfo and gdal_translate built on my Mac with MrSID support.

My historical problems with MrSID are completely irrelevant: the effort in the new SDK to support more platforms has clearly worked, and I can say that building MrSID support even on the Mac is trivial. A big thumbs up to the LizardTech folks for their effort in this regard — and to people like Frank and Michael for egging me on into learning this about the DSDK in the first place.

Code Sprint: Day 3

Posted in default on March 10th, 2009 at 09:24:38

Yesterday, I got to sit down and do some real performance testing with the MapServer folks. After rebuilding a local copy of the Boston Freemap on my laptop, I was able to share it with Paul, who ran it through Shark to find out where the performance killers are. The one thing we found was that this 5 year old MapServer ticket was negatively affecting performance on maps with many labels: The labelling code in MapServer right now, if you’re using outlines, draws each glyph 9 times in order to get a nice outline color. After determining this, it was determined that we are going to be working with the GD maintainers to add the support described in #1243 to GD to use Freetype’s internal stroking code to get the same behavior. (At the time, in Freetype *2.0.09*, there was a bug in this code; but we’re now on 2.3.8, so that bug has been long fixed. :)) This change will likely give a 20% increase on map drawing with many outlined labels, as can be seen in maps like the Boston Freemap.

After this, we sat down with MrSID and GDAL/MapServer to figure out if there were performance problems there. One thing we found was that the MapServer code drawing one-band-at-a-time means that there is a significant performance hit. In addition, some other performance enhancement techniques are being looked into at the GDAL level by Frank, thanks to the help of LizardTech developers participating in the sprint. He’s currently looking at improving the way that GDAL reads from MrSID, and was already able to achieve a 25% speed increase by simply changing the size of the internal GDAL buffer size for reading from MrSID to GeoTIFF. More documentation and experimentation is still in order, but there are some possible optimizations to investigate there for users of the library.

We then had a great dinner at Jack Astor’s.

Thanks to our sponsors for today: Bart van den Eijnden from OSGIS.nl and Michael Gerlek from LizardTech — performance improvements in MapServer and GDAL access for label drawing and MrSID are potentially big wins for many users of MapServer.

Toronto Code Sprint: Day 2

Posted in Locality and Space, Mapserver, OSGeo, PostGIS, Toronto Code Sprint on March 8th, 2009 at 22:44:32

Day 2 of the code sprint seemed to be much more productive. With much of the planning done yesterday, today groups were able to sit down and get to work.

Today, I accomplished two significant tasks:

  • Setting up the new OSGeo Gallery, which is set to act as a repository for demos of OSGeo software users in the same way that the OpenLayers Gallery already does for OpenLayers. We’ve even added the first example.
  • TMS Minidriver support for the GDAL WMS Driver: Sitting down and hacking out a way to access OSM tiles as a GDAL datasource, Schuyler and I built something which is reasonably simple/small — an 18k patch including examples and docs — but allows for a significant change in the ability to read tiles from existing tileset datasources on the web.

Other things happening at the sprint today were more WKT Raster discussions, liblas hacking, and single-pass MapServer discussions, as well as some profiling of MapServer performance with help from Paul and Shark. Thanks to the participation of the LizardTech folks, I think there will also be some performance testing done with MrSID rendering within MapServer, and there was — as always — more discussion of the “proj strings are expensive to look up!” discussion.

Other than that, it was a quiet day; lots of work getting done, but not much excitement in the ranks.

We then had a great dinner at Baton Rouge, and made it home.

This evening, I’ve been doing a bit more hacking, opening a GDAL Trac ticket for an issue Schuyler bumped into with the sqlite driver, and pondering the plan for OpenLayers tomorrow.

As before, a special thanks to the conference sponsors for today: Coordinate Solutions via David Lowther, and the lovely folks at SJ Geophysics Ltd.. Thanks for helping make this thing happen! I can guarantee that neither of those GDAL tickets would have happened without this time.

Toronto Code Sprint: Day 1

Posted in Mapserver, OSGeo, PostGIS, Toronto Code Sprint on March 8th, 2009 at 07:55:43

I’m here at the OSGeo Code Sprint in Toronto, where more than 20 OSGeo hackers have gathered to work on all things OSGeo — or at least MapServer, GDAL/OGR, and PostGIS.

For those who might not know, a code sprint is an event designed to gather a number of people working on the same software together with the intention of working together to get a large amount of development work done quickly. In this case, the sprint is a meeting of the “C tribe”: Developers working on the C-based stack in OSGeo.

After some discussion yesterday, there ended up being approximately 3 groups at the sprint:

  • People targeting MapServer development
  • PostGIS developers
  • liblas developers

(As usual, I’m a floater, but primarily concentrating on OpenLayers; Schuyler will be joining me in this pursuit, and I’ve got another hacker coming Monday and Tuesday to sprint with us.)

The MapServer group was the most lively discussion group (and is also the largest). It sounded like there were three significant development discussions that were taking place: XML Mapfiles, integration of pluggable rendering backends, and performance enhancements, as well as work on documentation.

After a long discussion on the benefits/merits of XML mapfiles, it came down to there being one main target use case for the XML mapfile is encouraging the creation and use of more editing clients. With a format that can be easily round-tripped between client and server, you might see more editors able to really speak the same language. In order to test this hypothesis, a standard XSLT transform will be created and documented, with a tool to do the conversion; this will allow MapServer to test out the development before integrating XML mapfile support into the library itself.

I didn’t listen as closely to the pluggable renderers discussion, but I am aware that there’s a desire to improve support and reduce code duplication of various sorts, and the primary author of the AGG rendering support is here and participating in the sprint. Recently, there has been a proposal to the list to add OpenGL based rendering support to MapServer, so this is a step in that direction.

The PostGIS group was excited to have so many people in the same place at the same time, and I think came close to skipping lunch in order to get more time working together. In the end, they did go, but it seemed to be a highly productive meeting. Among some of their discussions was a small amount of discusssion on the WKTRaster project which is currently ongoing, I believe.

After our first day of coding, we headed to a Toronto Marlies hockey game. This was, for many of us, the first professional hockey we’d ever seen. (The Marlies are the equivilant of AAA baseball; one step below the major leagues.) The Canadians in the audience, especially Jeff McKenna, who played professional hockey for a time, helped keep the rest of us informed. The Marlies lost 6-1, sadly, but as a non-Canadian, I had to root a bit for the Hershey team. (Two fights did break out; pictures forthcoming.)

We finished up with a great dinner at East Side Mario’s.

A special thanks to our two sponsors for the day, Rich Greenwood of Greenwood Map and Steve Lehr from QPUBLIC! Our sprint was in a great place, very productive, and had great events, thanks to the support of these great people.

Looking forward to another great day.

Geodata Cost Recovery: Eaton County

Posted in Locality and Space on February 25th, 2009 at 08:37:47

I was pointed out to Eaton County’s GIS Data Prices last night, and all I can say is how disappointed I am that people can still feel that this is an appropriate way to fleece their taxpayers. The data is collected, reproduction costs for the data are probably in the realm of a couple hundred bucks — less, if you just distribute them online. (Clearly, you already have a website.) Yet you charge twelve *thousand* dollars for copies — and even after that, you’re still limited in what you can do.

This kind of thing is just a damn shame. Taxpayers should insist that this data is made available at reasonable reproduction costs; the policies of GIS departments to make money off of these things is simply silly so long as they are collected with taxpayer dollars.

(If the GIS department does not receive state funding, then I suppose this type of cost recovery makes sense — in the same way that Sanborn or any other commercial entity would charge for it. However, I doubt that the primary client of such data isn’t the state itself, in which case it’s still taxpayer dollars covering the costs somewhere…)

Yahoo! Maps APIs, aka ‘grr, argh!’

Posted in Locality and Space, OpenStreetMap on February 16th, 2009 at 15:14:00

I have a love/hate relationship with Yahoo!’s mapping API. It’s lovely that Yahoo! believes, unlike Google and other mapping providers, that their satellite data is a suitable base layer to use for derivation of vectors. This openness really is good to see — they win big points from me in this regard. (Google, on the other hand, is happy to have you give them data against their satellite imagery, but letting you actually have it back is against the Terms of Service.)

However, the Yahoo! Maps AJAX API has never gotten much love. I think that a preference for flash has always existed in the Yahoo! world; iirc, their original API was Flash.

However, I realized today that this tendancy to leave the AJAX API in the dust has resulted in something that seriously affects me: The Yahoo! maps AJAX API uses a different set of tiles, which has two fewer zoom levels available in it:

AJAX Maps: most zoomed in Flash Maps: Most zoomed in

For the new OpenStreetMap editor I’m working on, this is a *serious* difference: although the information actually available in these tiles isn’t *that* much higher, it allows the user to extract more information by getting in a bit more, and to be more precise in placement of objects when using Yahoo! as a basemap.

Although it would be relatively easy to rip the tiles out, and create an OpenLayers Layer class that loaded them directly, this violates the Yahoo! Terms of Use. This is understandable, but unfortunate, because it means I can’t solve the problem with my own code.

What I would really love to see is more providers creating a more friendly way of accessing their tiles. I understand the need for counting of accesses, and the need for copyright notifications. If an API were published, that allowed you to:

  • Fetch a copyright notice for a given area, possibly also returning a temporary token to use
  • Following that, fetch tiles to fill that area
  • Require users to copyright notice in such a way as to make Yahoo! and their providers happy

This would allow for building a layer into OpenLayers which complied with this, without depending on Yahoo! to write a Javascript layer that did these things for me.

Now, it’s understandable that this doesn’t happen — having the client out of control of Yahoo! means that they can’t *enforce* that the copyright is displayed prominently, as they are able to (to some extent) with their API. However, I think that this type of API would allow more innovation, and possibly even a *more* prominent placement for Yahoo’s copyrights and notices. For example, in many mapping apps, the bottom inch of the map is not seen much by the users. If there was an API to get text to display, then an application could display the text in a more prominent location, rather than burying it under many markers or other pieces of text that might overlap it.

In the short term, all I really wish was that the AJAX API used the apparently-newer set of satellite tiles that the Flash API appears to have access to. I think the fact that this isn’t currently possible leads to an alternative access pattern for tiles, one which may make more sense in the long run, where tiles can be used by an application without necessarily running in the constrained Javascript API that these providers have the ability to write. And of course, if you want to provide your users with a ‘default’ API to use, you can always use OpenLayers, and extend it to include your own extensions…

Making a Big OSM Map

Posted in default on February 12th, 2009 at 11:43:50

Mapnik is a great tool. It allows for all kinds of neat toys to happen, and the recent work in SVN has really opened up the possibility that Mapnik might be a potential solution for a rendering engine in a lot of areas that it has previously left alone. (Support for reading OGR datasources, sqlite/spatiallite plugins, etc. are all great developments that look likely to be released in the upcoming 0.6 release.)

Big OSM Map In prep for the OpenStreetMap Mapping Party this Saturday and Sunday in Somerville, I was working on printing a big map to bring with me. A friend at the Media Lab was gracious enough to help me out.

Using Mapnik, it was trivial to produce a large — 29750 x 29750 pixel — PNG image. This was designed to fill up the 49.5″ by 49.5″ printer space at 600 dpi.

The printer prefers PDF, PS or TIFF. I was able to take that PNG and convert it to a TIFF — but the resulting tiff was DEFLATE compressed, and the printer help only mentioned LZW compression. I decided to fall back to trusty GDAL to try to fix this. I found that the imagemagick-converted TIFF had one giant block — and GDAL was not pleased with this at all. (Its internal un-blocking scheme doesn’t work with compressed tiffs.)

Thanks to a suggestion from Norman Vine, I was able to use the ossim image copy program (icp) to convert this giant tiff to a tiled tiff which gdal could easily read: icp tiff_tiled -w 256 image2.out.tiff image.icp.tiff. Once I had done this, I recompressed the tiff using LZW compression with GDAL: gdal_translate -co COMPRESS=LZW image.icp.tiff image.lzw.tiff, and was able to upload the 3GB image to the printer.

All in all, took a bit more than I was expecting, but I’ve got a 4ft by 4ft map to bring to the mapping party this weekend. In the process, I also got to wanting magnification in Mapnik… which is amusing since just 24 hours before, I’d read a thread on the MapServer list and couldn’t imagine for the life of me why such a thing mattered.

Looking forward to showing the map off to local OSMers at the mapping party!