Technical Ramblings

Aaron

Posted in default on January 13th, 2013 at 01:58:40

Aaron Swartz was an incredible guy. He was constantly successful in making me feel completely inadequate — which is generally a pretty hard thing to do — and I can claim more success in my life than I would otherwise have had thanks to Aaron’s influence.

The world is worse off without him. My best to all his friends and family.

As a result of Aaron’s passing, I am going to change my recent practice of doing many things on Facebook only. Before, I would also have ensured that my content was made available in places that weren’t Facebook, because I felt that the freedom that other platforms offered me — as well as long term stability — were important. Of late, I have not stuck to that ideal — but the fact that I haven’t is a regression from a belief that I have always had, that sharing things only in walled gardens hurts everyone.

I think this is the kind of thing that I would have frowned upon in myself a decade ago, and there is no less reason now that it should upset me. Sharing information only in a single closed platform is bad for everyone. It’s time to go back to sticking to those principles, and making my information as free as it can be. (There are practical limits to anything, but “I’m a lazy bum” isn’t a good enough excuse.)

4 Comments »

Responding to Recruiters: Priority List

Posted in default on October 28th, 2012 at 22:39:50

I get a handful of recruiters who are looking to find me a role in their companies. (Sometimes they are also looking for people who aren’t me to fill roles — which I usually pass on to others by saying “Anyone looking for a job?”, getting a chorus of “Nope”, and moving on.)

While responding to one of these recently, I ran down the checklist I have in my head for what is important to me in looking at a new job. I think that the list of items on this list are essentially a log-scale order of binary predictors for how likely I am to consider a switch to another position; for example, I don’t think that it’s plausible to imagine that I’d consider any position that didn’t have the first two conditions met.

Work from Cambridge, MA, ideally in a local office or some other employer-sponsored working space. (Things that are close enough: Cambridge, Boston. Things that are not close enough: Lexington, Waltham, Billerica.)
Working in a working environment which supports flexibility in work schedule, and is supportive of work/life balance.
Working on projects that I don’t personally consider dishonest or immoral.
Working with user data — the bigger the better.
Working on projects which are visible to the public.
Working on interesting new technologies, especially technologies which can be open sourced and shared.
Working with maps, or geospatial data.

(Compensation also plays a role, but I don’t think I’ve ever not responded to a recruiter based on that fact.)

I’m not actively looking for a job — despite Nokia’s overall poor performance, I work under the ‘Location and Commerce” group inside Nokia that is still making a healthy profit on our overall activities. Most importantly to me, I work with the same team I’ve worked with for more than 6 years now, so switching jobs would be a painful transition that is unlikely to be enticing without a really strong offer.

That said, I often read the engineering blogs of places like Yelp, Netflix, and Foursquare and think “Man, wouldn’t it be cool to work someplace where maybe I couldn’t put out fires all the time? Where occasionally, I could actually work on cool stuff?” (Note that my brief research into Netflix indicates that it fails *both* of the first items on my list, so it’s evident that “Companies doing cool things” is not synonymous with companies for whom I would want to work.)

I just miss the days of MetaCarta when occasionally, I got to put together something interesting without spending 75% of my time fighting against people inside my own company, and I dream that somewhere out there, there must be other cool companies to work for where that’s not the case. I’m not convinced this isn’t just a ‘grass is always greener’ thought, though. 🙂

(If you are looking for a senior software developer, and think your company can meet all of the criteria above and be cooler than where I work now, feel free to drop me a line.)

Comments Off on Responding to Recruiters: Priority List

Some comments on EC2 instance heterogeneity

Posted in default on October 24th, 2012 at 20:39:50

An article (Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2) linking to a paper from HotCloud ’12 has some information about mixed instance types for Amazon EC2 machines. I found it interesting, so browsed through the article. Here are some observations I had when looking:

– “Furthermore, the high-memory instances use identical Intel X5550 processors” — Not true, from what I can tell. E5-2665 processors are used across at least us-east availability zones for all m2 instances sizes — m2.xlarge, m2.2xlarge, and m2.4xlarge. In fact, in several thousand instances spun up, it seems that these instances are used up to 70% of the time in one availability zone (though almost not at all in another.)
– The CPUBench test was done across 20 instances, but the Redis test appears to have only been done against one of each type, as far as I can read. I’m not totally convinced — given the variability in performance between node types — that this is entirely explained by instance differences — though given the CPUBench scores, it’s clear that some of the variablity could well be coming from that.

Anyway, I primarily wanted to comment on the high memory instances all using X5550s — since it’s clear that they don’t, at least not in US-East 🙂

Comments Off on Some comments on EC2 instance heterogeneity

Deep, Dark, OpenLayers History

Posted in default on May 1st, 2012 at 00:18:46

The OpenLayers that everyone knows today, born in what seems like the dawn of time of the modern Javascript age, was not born of whole cloth. The early development of OpenLayers recorded in our SVN history represents some of the very very early work in OpenLayers as it is today, but the project had a life for a year before that that is largely unknown.

I’ll admit that I’m not the best person to tell this story: Most of it is also before my time. I originally started working with the MetaCarta team in March of 2006, working on some server-side KML hacking. When I joined the company, there had already been three versions of OpenLayers.

“But Chris!”, the educated illuminati among you might say, “OpenLayers wasn’t released until May of 2006! What do you mean, there had been three versions of OpenLayers?”

Well, my friends, it’s a sad, but not shocking tale, all too common in software development: the premature demo.

After the Where conference in 2005, John Frank reached out to several interested parties to help build an open source alternative to the Google Maps API. (Or so I’m told.) MetaCarta had map based interfaces, and it was clear to John then that this new fangled mapping thing was going to be the future — not just for Google, but for all map interfaces. (In fact, John has even been credited with one of the early definitions of Slippy Map, in June of 2005: “A “slippy map” is type of web-browser based map client that allows you to dynamically pan the map simply by grabbing and sliding the map image in any direction. Modern web browsers allow dynamic loading of map tiles in response to user action _without_ requiring a page reload. This dynamic effect makes map viewing more intuitive.”

Revamping MetaCarta’s ‘enterprise’ UI to be more user friendly was the primary thing on John’s mind. Switching from a form with *11* form fields to a more understandable one-box search. Improving the experience of map interactions. But for a long time, that was essentially all it was: while there was a core idea behind each of these approaches — the idea of making an open source library out of the results, and distributing it widely — in each case, the demo came first.

Instead of concentrating on building a solid library which could be made into an open source base for many different projects, the first incarnations of OpenLayers were all libraries designed for a single application — something that never works well for creating a more general purpose tool.

This was a major misunderstanding of the market demand: one that could not be overcome by any amount of technical success. What the world needed at the time was not another client/server component; it wasn’t another application that allowed them to do pretty things with maps. When OpenLayers succeeded, it succeeded largely because it avoided solving anything other than the most basic problem; it avoided doing anything other than the one, simple thing of having a draggable map on a web page, and being able to load data from multiple sources. This was crucial to the success of the library we know as OpenLayers today.

Some of the flaws in previous iterations that I saw as a result of this:

Core functionality based around parsing WMS GetCapabilities documents. Although many have criticized OpenLayers for not reading WMS Capabilities documents, reading XML from a remote domain in the browser is intended to be impossible (due to the same origin policy). Though there are now common workarounds for these types of problems, at the time, this was essentially a showstopper for client-side-only deployment: a key missing ingredient in some of the early OpenLayers work. Just as browser-based platforms evolved to include interactive, real-time experiences — prompting users to find out more about crypto betting — developers also began to embrace alternative solutions for capability parsing. It was only by throwing away capability parsing—by reiterating data in more than one place—that it became trivial to use OpenLayers to talk to remote servers. Note that the problem here has nothing to do with WMS: The problem has everything to do with ‘entirely client side’ vs. ‘requiring a server-side proxy’.
Centralized hosting of a ‘service’ instead of an API. At one point, there was a thought that one of the things OpenLayers could provide was a ‘mapviewerservice’ — a simple, hosted way to present data online by simply modifying HTTP parameters. (I don’t think this was ever at the core of any of the OpenLayers versions that were written, but it was something we supported even after the transition to the all-public “Mark IV” of OpenLayers.) In the end, nobody at the time really wanted this.
Concentration on pretty. OpenLayers, to this day, is ugly as sin out of the box, and is more annoying to customize than some other solutions might be. That said, the core functionality of OpenLayers is designed to *hide*. There are very few things that OpenLayers does — and it tries to hide as much of them as possible. Several previous incarnations had a lot of user-targeted UI — making them more applications than libraries. This was a mistake. What the world needed really was a library.

(Now, I’m sure that others who were ‘there’ as it were, might have more commentary. And certainly there were many flawed aspects of technical implementation. But these were biggies at the social level, which would have prevented uptake even if the technical flaws had been worked out.)

This post is written in large part because just this last week, I had a conversation in a bar with someone who claimed he helped start a Javascript mapping project called OpenLayers. We went back and forth for a bit, and then I realized he was right: He *did* participate in something that helped set the stage for OpenLayers. (The earlier incarnations, though usable, never really were the thing that people think of as OpenLayers today.) I just didn’t know he did — and he certainly didn’t know that OpenLayers grew up, got legs, and walked away from MetaCarta and into the hands of thousands of adoring fans.

And I didn’t even know his name before last week. But last weekend, I walked into RPI, where I met a couple of college students from RCOS — people who were still in high school when OpenLayers started — and they knew what the OpenLayers project was, and were excited to meet a guy who helped get it started.

So just remember: Though the OpenLayers you know and love today was put largely put together over a three day weekend, hacking in a darkened room, with a projector on the wall, and Venkman at our side: before we got there, mistakes were made by us, and others. And even before that, a guy with a vision of easier open source maps saw a future where all maps would be slippy.

So a brief thank you, from me, to all the people who came before me in the OpenLayers history; to OpenLayers Mark 3, 2, and 1, and especially to John Frank, who helped push the project from a vision to a reality.

1 Comment »

python SimpleHTTPServer + OpenLayers testing

Posted in default on April 27th, 2012 at 20:17:00

OpenLayers testing for new users was always felt a bit odd at things like code sprints: because the OpenLayers tests use XMLHttpRequest, Popup windows, and the like, there was always an issue of a few tests that would fail outside of being run on an HTTP server. For a product where almost all the tests pass just fine without it, I always found it sort of annoying that a few minor XMLHttpRequest restrictions forced me to shell out to a server.

This weekend, as I was helping at the OpenHatch Open Source workshop at RPI, I found myself in a position where a new developer was running the tests, and asking me why they failed. I was pointing out that in order for them to pass, they’d have to be run from a webserver, and someone else in the room helpfully pointed out that if you have Python installed, you have a webserver available to you with just one line of code.

“What?” I said, incredulously. I mean, I believed them — in the same way that python -mjson.tool has become a daily part of my life, I’m not entirely surprised by Python modules offering useful command line interactions that help make my life easier. Still, this was a new one to me.

“Sure”, came the reply. “Just use python -m SimpleHTTPServer in the directory you want to serve.”

And I `cd`’d into the root of my OpenLayers checkout, and typed python -m SimpleHTTPServer, and went to http://localhost:8000/tests/run-tests.html — and ‘lo, the tests did pass, and the developer did say it was Good.

(I probably learned more tips and tricks in the two day workshop about git, and other helpful tools, than I do in a week of doing my own development. Kids these days, teaching me new things!)

Comments Off on python SimpleHTTPServer + OpenLayers testing

“Get off my lawn!” — How Maps + JS Have Changed

Posted in default on April 25th, 2012 at 04:44:15

Occasionally, I think back to when we started writing OpenLayers, and some of the tools we didn’t have when I started programming JavaScript. Then I feel old, and start to yell at kids to get off my lawn.

In May of 2006, when we started working on OpenLayers:

Internet Explorer was 63% of w3cschools web traffic. (Today? 19%.)
IE7 wouldn’t be released for another 5 months.
SVG support was only available via the Adobe SVG plugin, and only in IE on most platforms.
Safari was at version 1.2/1.3.
Firefox was not yet at version 1.5, which would bring in SVG support, but disabled by default.
There was no Firebug. “Real men use Venkman!” (I believe that as part of the rewrite of OpenLayers that we eventually shipped, we did bump into Firebug 0.3/0.4. 1.0 wouldn’t be released for another 6 months.)
jQuery was still 6 months from being released.

In addition to the JavaScript world changing, the Maps world has changed. Although I was originally interested in OpenLayers because of OpenStreetMap, there wasn’t a lot there back in 2006. That isn’t the only way the world has changed:

When OpenLayers started, OpenStreetMap had approximately 2000 registered users. (Today? 500,000.) At the time, there was no regular dump, and the map that existed was… ‘interesting’ 🙂 (Mapnik wouldn’t come until later.)
Installing PostGIS on most platforms was… touchy at best. (Things like pgRouting, though coming into existence around that time, were far from practical to install, even more than a year later.)
ka-Map and Community Map Builder were still the de facto web mapping software.
There was no one in the open source world caching XYZ tiles yet. (The FOSS4G discussion on tile caching in September of 2006 was the first real discussion of that.) TileCache was developed later that year — after a discussion where we all agreed that WMS-style strings were a good idea, and then someone left the room and immediately started talking about TMS 🙂
All map rendering software was somewhat difficult to install at the time — things like GeoServer’s current wonderful web UI were… not as complete then as they are now 🙂
Nobody knew how to render things in ‘Spherical Mercator’ so that they matched up to Google. Spatial Reference codes like 41001, 900913, 3857/3875 were all quite a ways down the road.
Software that hasn’t changed much: GDAL/OGR. GDAL was an extremely useful tool in 2006 — pretty much the same as it is today. Although GDAL has certainly grown many features, and more complete over the years, it still has the same general shape as it did back then. 🙂

(Other things OpenLayers predates: Twitter, open access to Facebook.)

As you would expect, the world has changed. People sometimes comment that OpenLayers feels a bit long in the tooth — something I can certainly sympathize with. I have always prioritized maintaining API compatibility for existing applications over any thing else in my personal investment in OpenLayers: the most important thing to do is not to break existing applications. This stability has allowed many people to use OpenLayers, and I don’t think that violating those principles is a good thing. (I am happy with the solution that has grown over the past 6 months in OpenLayers — moving code to the “deprecated.js” file is a great way to let people maintain backwards compatibility with a path forward as well.)

I’m happy to have other people take the principles created by OpenLayers over the past half decade and do something exciting with them. Competition is good. Options for applications are good. The fact that OpenLayers effectively sucked all of the air out of the room from 2006-2010 was not good for the rest of the web mapping world: without competition, it’s really hard for any innovation to take place.

But the fact that a piece of JavaScript software written in a world before jQuery, before Firebug, while OpenStreetMap was still getting off the ground, is still useful today — I think that’s a testament to what OpenLayers became, and I’m happy to see what it has become and continues to be for many people.

3 Comments »

Simple Mistakes in Getting Started in Open Source Dev

Posted in default on April 24th, 2012 at 07:20:46

In 2002, I got an account on LiveJournal. In order to help my friends get accounts, I started doing LiveJournal support. (I got 5 support points in my first week, which was a reasonably big deal back then. (Warning: Reading other posts from that era of my life is not recommended :))

Over time, as a somewhat more technical member of the support team, I started attempting to explore links to this ‘zilla’ thing that people kept mentioning, LiveJournal’s bug tracker. However, I always found that I never really understood what was going on: looking at the content above the fold in a typical Bugzilla bug is a bunch of confusing looking metadata to a new developer. I remember at some point when someone told me to look at the comment — and I remember saying something along the lines of “Wait, there’s more content in this page if I scroll down?” Yes, I had been reading bugs for weeks, and never realized that … there was information other than the metadata.

I remember somewhat later, as I got more into development, that I would actually read patch files and attempt to understand what they were doing. In one case, I found a patch file written by a friend in the support community that had a pretty clear typo in it. So, I downloaded the patch file, opened it in an editor, added the new line that was missing, and added the “+” at the beginning of the line — like he had for all of his lines, of course — and re-uploaded the patch.

Of course, anyone with a passing knowledge of tools will know that adding a single line to a patch file … isn’t going to go well. 🙂 But to someone who didn’t know that patchfiles were produced and applied with ‘patch’, ‘diff’, etc., this concept was a strange one.

So, in my start in open source, I didn’t know how to read bug reports; I didn’t know how patchfiles worked. Yet I still was able to learn, improve, and eventually, do a little bit of good in the open source world, thanks to the help of a lot of people along the way.

We all start somewhere.

1 Comment »

Scalatron: Rough Approach and Running Code

Posted in default on April 16th, 2012 at 09:00:53

This weekend, I sat down and got comfy with Scalatron, a programming tutorial based around building a competitive bot in a game environment in Scala.

At first, I was a bit stymied; with no Java-friendly IDE, and the instructions being really IDE targeted, I had some problems getting my development environment set up. Thanks to some help from godwinsgo on Twitter, I got set up with an environment where I was able to have:
– Automatic compiling on file save, which was super helpful (rather than having to do something else)
– The ability to quickly swap back over, tap a button, and have Scalatron reload my bot and test it right away.

After some playing, I wrote a bot that is pretty good: it gets ~22000 in the ‘reference’ (bot on its own, 5000 steps, 100×100 field). I have a better understanding of some of the simple mistakes that I made — adding a println at the end of my function when I have a return type expected will return a somewhat cryptic “Expected Type, Got Unit” error message pointing to a spot in my code that it doesn’t make sense to me; same with really any case where I’m not paying careful attention to return types, since without an explicit ‘return’ statement, what gets returned is confusing to me.

I probably wrote terrible Scala; I didn’t bother to learn much about the language, just enough to actually hack my bot into something that worked. (My bot is on github, if people want to critique my terrible Scala — or code in general.) However, I was able to get working, learned some things about Scala, and wrote a cool digital robot. (Hooray!)

Thanks to David Winslow for giving me the bits to running, and to Scalatron for helping me have a fun project for this weekend!

Comments Off on Scalatron: Rough Approach and Running Code

Picture Hanging

Posted in default on April 14th, 2012 at 10:40:04

“When a junior developer has decided they need to build a whole new framework to solve a relatively simple problem, and they’ve started building it instead of fixing the problem, that’s an unpleasant surprise. When they’ve a week in to it, and have been making difficult-to-undo changes to support it, that’s a nasty surprise.” — Scrum, the Good Bits: Daily Standups

So true. And it points to Picture Hanging — a metaphor that a couple years ago would have been completely foreign to me, but is now so much a part of my daily life that I take it for granted…

Comments Off on Picture Hanging

multiprocessing is cool

Posted in default on April 8th, 2012 at 16:12:36

So, in general, I’ve avoided multithreaded processing; it’s one of those things that historically has been tricky to get right, and I don’t typically have embarrassingly parallel problems.

Today, however, I was parsing a set of 25,000 HTML files using BeatifulSoup, to pull out a small set of data (~500 bytes of JSON per HTML file). I briefly tried to simplify some of the code, but then realized that the lion’s share of the CPU time was being spent on the initial parse; there wasn’t going to be a way to clean up my code enough to make the script that much faster, no matter how good the rest of my code was.

Enter multiprocessing. With a 5 line change to my Python code, I was able to move from one core to four. Instead of:

def handle_place():
     for i in glob.glob("beerplaces/*"):
           # Do stuff with I
     return data

I have:

from multiprocessing import Pool
def handle_place(filename):
    # Do stuff with filename
if __name__ == "__main__":
    p = Pool(4)
    data = p.map(handle_place, glob.glob("places/*"))

Once I made the change, I went from using one CPU fully to using all four — and instead of taking 25 minutes to generate my output, the total time was under 7.

multiprocessing is cool.

Comments Off on multiprocessing is cool