Archive for the 'Locality and Space' Category

FOSS4G2007: Two Presentations

Posted in FeatureServer, FOSS4G 2007 on July 15th, 2007 at 23:53:22

Just got notice that I’ll be giving two presentations at FOSS4G 2007:

FeatureServer: A REST-based Server for Simple Features
With the number of tools for creating vector data online growing rapidly over the past year, users have spent more and more of their time creating annotations of existing maps. Unfortunately, managing this user generated data in a web browser can be difficult: the existing tools for storing geographic data on the server are largely based around the WFS-T specification, which is not the most conducive to the browser environment. FeatureServer provides an alternative mechanism for fetching and storing geographic data, using a REST-based interface that is friendly to browsers and other clients alike.

And with Howard Butler:

The Gift Economy Ain’t Free: Getting Help with Open Source Software
Have you ever been told to RTFM? STFW? Sent an email to a project’s maillist that languished for days without a response? This talk will give you ammo that you can use to bust out of the rut of frustration and non-response.

Woot.

Honeymoon Time

Posted in meta, OpenLayers, Social on June 29th, 2007 at 06:18:57

As of 5PM today, I will be unavailable on the web until July 9th. I’m going on a honeymoon, and I’m leaving my computer offline for the entire trip. (Yes, I am a workaholic. Heck, I committed to OpenLayers less than 2 hours before my wedding.) So, although I will be at home slightly more than that, you should not expect to hear from me until July 9th. If you have something that requires my urgent attention, please get it to me today 🙂

My actual honeymoon will include a stay at the Mount Washington Hotel, in the White Mountains of New Hampshire. Not to pat myself on the back too much, but I think I’ve earned it 🙂

Don’t break anything while I’m gone!

First Attempt at IronPython

Posted in default, IronPython, Locality and Space, Python, TileCache on June 20th, 2007 at 06:53:26

My first attempt to do something useful with IronPython:

>>> import urllib
>>> urllib.urlopen("http://crschmidt.net/")
Traceback (most recent call last):
  File httplib, line unknown, in getreply
  File httplib, line unknown, in getresponse
  File httplib, line unknown, in __init__
  File System, line unknown, in set_ReceiveBufferSize
  File System, line unknown, in SetSocketOption
WindowsError: Invalid arguments

Note that I’m working on OS X, and my exception is a WindowsError. Fancy.

(I was inspired by Bill Thorp’s efforts to get TileCache working on IronPython: Round 1, Round 2. However, I’m not all that inspired now.)

Still, it is kind of cool that IronPython just ran — I didn’t expect that to work. Maybe there’s something to this mono business after all.

Wedding Complete, Other Geohacks

Posted in FeatureServer, FlickrBrowse, Twittervision on June 16th, 2007 at 07:34:41

I just wanted to thank all the well-wishers in comments on my previous post about the upcoming wedding. For some reason, I hadn’t actually gotten the comment notifications — or I just deleted them as noise — and so I’ve just gone back and approved the 5 comments I got. (And to Taral: the ‘highway’ called RED is actually the Red Line — a subway line. This is a case where local knowledge is assumed. :))

The week post wedding has been relatively peaceful, though our last wedding guests didn’t leave until Thursday, so we’re finally getting a bit of alone time this weekend. Bio-dad has the kids, so I plan to spend the weekend chilling out.

If you haven’t seen FlickrBrowse recently, you might want to check it out: It’s now using the new world wrapping support in OpenLayers, which is kind of nifty. I also added Twittervision support to FeatureServer a couple days ago — not the most useful thing in the world, since the only query type is ‘current location for $username’, but it was almost too easy *not* to do it: the code is only 12 lines.

The story of a procession…

Posted in FeatureServer, Locality and Space, Social on June 8th, 2007 at 05:52:03

Maps tell stories. They tell all sorts of stories, but one of the stories that they tell the best are processions: series of photos taken over a wide area over a period of time, but with the same principle actors.

The most important procession to me right now is the one that I’ve been working up to for four years, as of tomorrow. At the church shown (using the Boston Freemap TileCache, FeatureServer for feature translation from the Flickr API, and OpenLayers, of course, as the map interface) on the map I put together, I’ll be getting married tomorrow to the lovely Jessica Allan.

From Chicago to Champaign-Urbana to Manchester, NH to Cambridge, MA, with more late nights and late flights than I care to remember, more love and devotion than I can describe, and more acceptance of my tendencies to show utter obsession with anything I’m doing than I could possibly have expected, my relationship with Jess has flourished, and I couldn’t be happier to be tying the knot tomorrow.

Okay, so the relation this has to mapping or technical ramblings is tentative at best, but I still think it’s cool. 🙂

Job Opening at MetaCarta

Posted in Locality and Space, MetaCarta on June 6th, 2007 at 09:56:21

New job posting at MetaCarta:

MetaCarta Labs seeks an enthusiastic software engineer to join the team in developing new ideas and innovative products in both open source software and in MetaCarta’s applications, which connect documents to maps using geoparsing.

Check the link for more info.

Essentially, this will entail working on the stuff that you see coming out of MetaCarta Labs. Someone local is best. It should also afford you the oppourtunity to spend long hours working with me, for better or for worse. 🙂

Google Developer Day: Theorizing from Data

Posted in GDD07 on May 31st, 2007 at 19:22:39
  • Growth of available text corpuses: 100 Trillion Words on the internet.
  • LDC N-gram corpus: 13 million unigrams (much more than OED).
  • Parametric, semi-parametric, non-parametric, hand-built, blank slate data-derived models.
  • What you can do with data: What do you do with a copy of the internet?
  • Google Sets, clustering algorithm
  • Extraction via regex -> Extraction via relations -> extractions via example
  • Unsupervised learning/machine reading: determining relationships  from text automatically, patterns, extraction of facts based on patterns.
  • Learning classes and attributes of classes: Planets and stars. Attributes: size, temperature, density. Other example: Video games, attributes are makers, developers, designers, etc.
  • What’s the right source for attributes: Documents on the web vs. queries. Documents are informative, but hard to parse. Queries are easy to parse and short. Attributes from them are different: ‘manufacturer, dose’ from documents ‘side effects, half life, mechanism of action’ from queries (on same class of classes)
  • Statistical Machine Translation: Parrallel texts. Web page in english, web page in german, they correspond: align, but can’t do that to start, do it probalistic until you have some data. Search for optimal solution: Best english translation of foreign sentence: probability of the English coming out against lots of data.
  • Showing Arabic -> English translation, Chinese -> English. Arabic: ~one disfluency from english per sentence, three in Chinese.
  • Take sentence in Chinese, look up in dictionary, look up probability, determine the probability of bigram combinations and pull them out. Use trigram sequence, etc. Enumerate, multiply probabilities, out of all of them, one is most probable.
  • Translation Model, Language Model, Decoding Algorithm.
  • Translation Model: Counting up parrallel corpus counts. Phrase pairs: Bring in linguist. Tried that, turned out that it hurt as much as it helped: theories work right most of the time that are exceptions, and statistics are better at exceptions, and deal with general data well enough, so statistical is the choice.
  • Language Model: Google, 7-grams, 1trillion words vs. 3-grams + 1 billion in the past.
  • English Model is better: parrallel corpus is harder to find. Why not apply same reason to all factors?
  • Features that don’t help: parse trees, part of speech tags, wordnet categories, treebank, framebank didn’t help. Raw counts from data is the only thing that helps. Useful information, but we haven’t figured out how to help.
  • More data helps: Still adding more training data (linear).
  • Scaling: how many bits to store probabilities? 64-32-16 bit? Trillions of numbers == 8GB vs. 1GB. Empirical answer: minimal bits that don’t lose performance. Answer was 4bits. Almost no difference between 4bit -> 8bit. Counts of bits only matter up to about 4bits.
  • Word alignment needs a lot of memory because it stores all lexical co-occurences. Store all combinations of words. Possible solutions: Stemming produced -> produce or possible truncating: produced -> produc. Not as accurate, but maybe good enough: Use smallest representation that does not hurt alignment. Truncating at 7 characters, not much affect, 6 characters, 5 characters -> better, 4 -> better still. 4 characters works best. Saves a lot of space, re-run experiment if it changes. Empricial science.
  • Lots of data is important -> Happy user.
  • Better models learned from data: lots of writers who do editor jobs -> happy user.
  • 3 turning points in history of developing information: Sebastian Brant: “There is nothing nowadays that our children fail to know.” re Gutenburg Press
  • Ben Franklin: “Common person whole book: farmers are as intelligent as most gentlemen” re Public Library
  • Bill Clinton: “America has changed the way it works learns and communicates” re Internet
  • Google is going to continue the trend.
  • Questions?
  • Data has been analyzed the most is stock market data: have you used any of the theories from people of UN and applied it to this field. “Google Hedge Fund might be a good idea”, but it’s always difficult to try to beat the market. Even if you understand it, if everyone else does, it doesn’t help you.
  • Q: “Last slide — author -> user, working on any tools assisting authors in what to write based on what people are writing?” “People do research before they write things. More informed arguments because they look at that. Looking for better way to close the loop. What are people interested in? Google Trends: live trends “, Q: “Enter body of text, ranking against search queries” A: “We provide analytic for are you readig this, but not ‘why’? interesting area.”
  • Q: “You show instances of automated translation with disfluences — is automated translation close, far away, etc.?” A: “It’s hard to get a lot better. Still growing, but limitations to approach. Have more to do with real world — all we can do is filter through language on the page, and sometimes that’s not enough. In English we have no gender agreemnet: it in English can’t be masculine/feminine easily. Dropped a brick and it broke: can’t tell which one, because that’s physics, not linguistics. Maybe combination of automatic + post-edit via humans, and have true fluency. 100% automated is tough.”
  • Q: “Speculation — Google 411 so Google could gather spoken data. Is that true? Any numbers on non-textual data?” A: “Thought 411 was great, hook up local products. Gather more training data, more data is better for your products because you have that data. Interested in non-textual data, starting to branch out a bit more, various areas, maps, apis, think that’s very important, image search for many years, throw away the images and look at text for determining results, now we’re starting to do image analysis to get text on still and video images on YouTube. “
  • Q: “Followup: Speech to text. Intuitions as to whether anything here works for speech recognition?” A: “Applicable, but speech is harder to get feedback on where you’re right and wrong. IF you had a good source of where you were right/wrong, it would be possible to do that, translation model and language model would apply if you have the right place to get data, more data you get better you do.”
  • Q: “Can you help us with spam? Really big problem — comment spam. Submit every comment, and get back a response.” A: “Interesting, so we should have name (on vaccation) for this. We have done nofollow, but it’s still a problem. Intriguing: submit — spam/not spam. I can’t commit, something worth looking into.” (Ed note: This is Akismet. Use it. It rocks.)
  • Q: (Missed) A: “We look at what people are clicking on can be a clue in what they’re actually looking for. Do your queries appear on page? In links? in title? Pagerank? Other factors? are people clicking on this page? Other clues. All that data is useful.”
  • Q: “How do you measure the diff between someone finding something and being satisfied?” A: “Keeps me up at night. Worked at a company that got from amazon. Amazon, you get a credit card purchase, that’s a success — never have that. Show some results, click on result: we don’t know you. Toolbar tests, but mainly learned that no one clicks ‘yes’, only ‘no’. Observer what happens, but not if you liked it. Many millions of examples: don’t trust any one click, can add them all up and hope that they work. At Berkley, CTO from (couldn’t hear), ‘what we do is measure us against google’. We’re 100% on that metric. But not good enough.”
  • Q: “How can you tell that a website isn’t using machine translation to begin with?” A: “We had that problem. Arabic showed up. Didn’t have much arabic speaking, is hard to catch. Arabic speakers said ‘That’s just junk machine translation’. Business model around trying to make money off machine translation. Spam detection: good vs. bad. Problem, but we think we can deal with it.”
  • Q: “Britannica vs. Joe’s Weblog: How do you take into account slang vs. formal language?” A: “News sites vs. non-news sites is useful. Slang vs. not-slang — just go by the totals.”
  • Q: “Would this work with OCR?” A: “OCR is another thing we’re doing — book scanning, ties into all these, OCR systems do a good job, trying to correct things with language models, can improve output of OCR by better model of what makes sense.”

Google Developer Day: Python Design Patterns

Posted in GDD07 on May 31st, 2007 at 18:06:06

Some notes from the Python Design Patterns:

  • Largely talking about Facade vs. Adapters are: Facade is creating simple interfaces around very-rich interfaces to limit you to what the client needs. Adapters are well-suited for small-scale, Facade better when you have a large API you want to hide.
  • favor object composition over class inheritence
  • If you’re using holding — I just have an object, it can be on the wrong axis, or you might want to change the internals. This concern applies to Javascript as well: we run into it all the time in OpenLayers. Should Geometry have a feature? Feature have a Geometry? Both?
  • Object Wrapping: Law of Demeter: You should only have one dot in your path. Client only talks with wrapper: delegate under to cover.
  • “Factory” is essentially built in in Python
  • Examples of adapters, talking about two tools which provide same functionality, talking about per-class subclassing, passing instances into wrapper, etc. etc. “Mixins are the smartest usage of multiple inheritence”: inherit from two class, and override the method you don’t want called to call the one you do want called. “Mix and match.” This is what OpenLayers uses, I think: Schuyler says that anyway 😉

Speaker seems very intelligent, but a bit dull as a speaker: he’s also probably speaking significantly over my head. 🙂

Google Developer Day: Gears

Posted in default, GDD07 on May 31st, 2007 at 13:55:02

At the Google Developer Day. Sitting in the Google Gears session: it’s pretty frickin cool. I just created an OpenLayers Map, wrote 15 lines of code, and that page will now load, even when I’m online.

Other things it does:

  • WorkerPool — run code in the background in your browser. Demoing finding prime quadruplets: user interface continues to be responsive — can run multiples at the same time, and user experience is not bad. (Oh man, how I wish this were  built in by default.)
  • Storage — local storage in a sqlite database, including full text search: millions of documents, working on fts3, which will be 10s of millions.
  • Local server — caches data for offline use. 

There are tons of limitations — very early release — but it’s available for Firefox and IE, and can be built for Safari.

Week O’ Fun

Posted in Locality and Space on May 27th, 2007 at 02:01:45

The upcoming week is going to be a ton of fun:

  • Where 2.0: 29th-30th. I’ll be presenting OpenLayers at 4:30 on the 29th, and during the conference I’ll also be figuring out how to sanely hand out my limited supply of MetaCarta Labs On A Stick drives.

    Where last year was great: it was my first introduction to a lot of geo people, and I’m looking forward to it this year. It’s a great way to get out into the world a bit more and meet the people who actually might have a use for all the code I write.

  • Google Developer Day: May 31st. Although I’m not a huge Google person, this should be a good way to try to catch up to the mainstream in web mapping. Google has for a long time set the standard for web mapping, and I’m not convinced yet that this isn’t still the case (No matter how much I love OpenLayers).
  • WhereCamp: Jun 2-3. Anselm Hook and Ryan Sarver have put together WhereCamp, the geounconference, for the weekend after Where. Hosted at the Yahoo! campus, this is going to be a great event for all geohackers, and I’m hugely looking forward to it. 

I’ve still got to finish my OpenLayers presentation, and finalize the content for these Labs-on-a-Stick drives, but I’m hoping that once I do I can sit back, and hopefully get in some good time doing OpenLayers hacking: Schuyler, myself, Erik and Tim are all going to be in town, so we should be able to move forward on some stuff while we’re all in the same physical location.

Looking forward to meeting anyone and everyone at the conference… feel free to drop a comment if you feel like we should meet up and haven’t already made plans.