Archive for the 'Programming Languages' Category

PersonalProfileDocument Parsing

Posted in FOAF, Python on May 23rd, 2005 at 18:14:44

Earlier today, on the OpenID mailing list, I was asked to supply Perl code to look for PPDs in FOAF docs and return some basic props on the user who owned the FOAF file. My Perl skills have long since fallen by the wayside, but I was able to put together something in Python which seems to me to work pretty good.

ppd.py is a FOAF parser using xml.dom.minidom to look for a PPD, and parse out a couple basic forms of the Personal Profile Document, for cases in which you can’t bring a full RDF parser to bear on the situation. (I know that the question of when this arises has been argued a million times, but an RDF parser is an extra dependency that some projects simply have no interest in bringing on.)

This parses two basic forms of PPD: one in which the foaf:maker is identified by an rdf:nodeID=”nodename”, or one in which the foaf:maker is identified as an rdf:resource=”#nodename” coupled with a rdf:ID=”nodename”.

This hasn’t been fully tested: it was mostly done as a quick proof of concept that people could expand on. I’ve tested it on the nodeID case, and tested that if it can’t find an appropriate PPD, it falls back (against LiveJournal files). I’m not sure how python-esque my code is, but it does seem to work, which was my primary concern.

As usual, this code is designed to be used at the command line as “python ppd.py http://crschmidt.net/foaf.rdf”, or imported as a module, after which you can run ppd.get_person(“http://crschmidt.net/foaf.rdf”).

Thoughts on the method? Will this work with a sufficiently constrained FOAF doc?

PHP and Redland

Posted in PHP, Redland RDF Application Framework on May 8th, 2005 at 09:49:27

Recently, I moved most of my serving to a colocated machine, so I finally have a “Testing” machine and a “stable” machine, leaving me more free to play around locally. This has led to me installing a Rasqal nightly release and installing it, in an attempt to get the newer SPARQL query syntax working in my RDF bot, so that I can test query type detection and the like.

I had no problems installing it: very simple, just download the nightly, ./configure, make, make install. I got it working in my local “julietest”, although I’m waiting until the next release before I consider installing it on the remote server.

I got it working in PHP from the command line, no problems.

However, no matter what I do, the web version still seems to be using the old query syntax, and I don’t have any clue why. If you go to http://zeus.crschmidt.net/julie/sparql, you can test it out, and it only returns data if you use the old query format. However, if I copy the same script locally, and run the exact same query, it doesn’t work, requiring the new format.

I don’t understand it, and I don’t know if anyone else does either. The PHP in Apache2 and CLI both have almost exactly the same phpinfo(), they both have the same extension directory, and there isn’t a second copy of redland.so for the Apache version to load anyway! If anyone has run into this problem before and knows how to fix it, I’d appreciate it, because right now I’ve given up and am waiting for a release before I debug further.

(This post brought to you in part by the effort to bump all of Danny’s off of PlanetRDF while he’s on vaccation. ;))

MySQLdb in Python

Posted in Python on May 5th, 2005 at 07:28:55

I was just looking around for a tutorial on working with MySQL in Python, and found a great Into to MySQLdb in Python page. Since I know some of you reading this are fans of Python, and may have to work with MySQL at some point, I thought this might be interesting to those of you who have to look for it.

It’s not advanced, just an intro, but quite useful, in my opinion, for what it is.

GovTrack RDF Data

Posted in PHP, Redland RDF Application Framework, SPARQL on April 25th, 2005 at 20:10:02

One of the larger sources of RDF data that I’ve loaded into a database, the GovTrack RDF data is an interesting set with all kinds of information on congressmen adn so on. I recently started paying a bit more attention to the Gargonza Experiment, and found a link to their data source via the wiki.

I’ve been playing with setting up SPARQL stuff all day, and have a couple simple pages set up from my new GovTrack page. Loading the entire dataset (the RDF/XML, at least: the n3 bits I left out for the time being) took a long time, and I did some tweaking of MySQL in the process to allow me to load data faster. Some things I learned, for optimizing loading time with Redland:

1. MySQL’s key cache size is important when loading large data stores.
2. When loading statements, if you really want to optimize your load time, load with contexts. Redland will not check for duplicate statements in this case: This can be a major time saver. However, this may slow down later work, so it will probably not be worth it in the long term.
3. Loading into an already existing Redland database, even in a new model, will not increase speed: since Bnodes, Literals, and Resource tables are database wide, the selects to determine existing statements will still be just as slow as if you were loading into the existing models.

I also discovered that my QueryResults->result() method was returning actual Redland nodes, rather than the wrapped Redland.php::Node. I suppose at one point I probably realized that, but it had slipped my mind. This made it really difficult to do things like deal with optionals: calling the librdf_node_to_string in the PHP bindings causes them to segfault if the node is NULL, and there’s no decent way to check if the node is null that I found.

To compensate, I created a new way to create nodes (basically a copy constructor). This allowed me to check at node creation time whether it was a Resource/Bnode/Literal, which are the only types of Nodes there are. If it’s none of the above, I make it a PHP NULL, which I can check for, and it won’t crash PHP.

I have learned the many different ways to segfault PHP over the past week working on Redland. Of course, they all relate to PHP doing funky things with a SWIG wrapper, but it’s still one of the more interesting experiences I’ve had.

With the new PHP, all of the SPARQL interfaces I’ve got set up: one for Julie, one for XTech, one for GovTrack support Optionals. This has allowed me to create things like the GovTrack Senators page, (example for New Hampshire), listing some profile information about all the Senators from your state. (Representatives are more difficult. I’m still working on that.)

Anyway, the GovTrack data is fun to play with, although I really need to develop some more interesting interfaces over the data. I plan to do that: just haven’t gotten there yet. These tools take time to develop, but they do feel really nifty. I would go into the why’s of why I feel it’s nifty, but I almost always end up feeling like a complete and utter geek when I do it, and it makes people look at me strange, so I’ll skip it this time.

RDF Query

Posted in Perl, RDF, SPARQL on April 21st, 2005 at 14:50:13

Apparently the anxious type, Greg Williams has thrown together an RDF Query implementation in Perl, with support for the new SPARQL draft as of yesterday.

The library also offers ORDER BY support, something that I’m sure Greg is happy to have for his MT-Redland. Ordering things by date for me is something that I’ve sidestepped, but I’m not looking forward to when I actually have to deal with it.

The code uses Parse::RecDescent to generate a query based only on the SPARQL grammar. Greg mentions that it is slow: most of the time is actually in generating the Query from the Grammar.

If only I was still a Perl hacker… sadly, I’m not, so I suppose I’ll just have to start working on my C in order to help get Redland working with the new draft. (Dave estimates that it will take him about 1.5 months to catch up to the most recent WD of SPARQL.) I’d really love to just be able to use the tools I’ve already written in Python, rather than switching to Perl, or even another backend than Redland. It has worked so well for me so far.

Still, this is the first SPARQL implementation using the new Draft that I’m aware of, even if it is mostly just a hack job, so I think that it’s pretty cool, and my props are out to Greg for his work on it!

Redland PHP Wrapper

Posted in PHP, RDF, Redland RDF Application Framework on April 17th, 2005 at 20:01:46

Today, I was working with the XTech Stuff, and decided I wanted to offer some fun Redland-based queries against it. Since the entire website is in PHP, I decided to stick with that theme, and write some PHP code.

I had the PHP bindings installed from a couple days ago, for… something I don’t exactly remember. I had some grand goal in mind… oh, right, I was going to provide my logo information in RDF, and parse it out using PHP.

Something I realized today is that there is no decent Redland Wrapper class like there is for Python and Perl. SWIG provides interfaces, but that basically just gets you to the level of the C API, which is something that is a bit low level for me.

To resolve this, I’ve written a PHP Wrapper class, which I hope to maintain and improve upon. It is stored in a subversion repository: you can check it out using:

svn co http://crschmidt.net/svn/redland/

Please feel free to use the trac project to help with the project.

Status: Beta Quality. Has only been tested using included test.php script. Does not do proper memory checks in any/most cases.
License: This wrapper is released under the same license as Redland itself.
Homepage: phpwrapper.

Parsing SVG Metadata

Posted in Python, RDF, Redland RDF Application Framework, Semantic Web, SVG on April 7th, 2005 at 15:12:48

How to Parse SVG Metadata, the Redland + Python way:

import urllib
import xml.dom.minidom as minidom
import RDF

m = RDF.Model()
p = RDF.Parser()
u=urllib.urlopen(“Location Of SVG File”)
svg = u.read()
doc = minidom.parseString(svg)
p.parse_string_into_model(m, doc.getElementsByTagName(“rdf:RDF”)[0].toxml(), “Location of SVG File”)
print m

In other words: Bring in the RDF and minidom modules, Create an RDF model and parser, download the SVG file to a string, parse the string into a minidom compatible variable, then look for RDF in the SVG file, parsing it into the model, and serializing the model.

Problems: What if someone uses something that’s not rdf: as the prefix?
Solutions: mattmcc offers that minidom supports getElementsByTagNameNS, so the parse line would become:
p.parse_string_into_model(m, doc.getElementsByTagNameNS( “http://www.w3.org/1999/02/22-rdf-syntax-ns#”, “RDF”)[0].toxml(), “Location of SVG File” )
resolving the Namespace issue.

Of course, since this is Redland, this is taken care of for you. Rather than doing it in this way, which is specific to SVG, we can scan for RDF in any XML doc. Simply:

import RDF
m=RDF.Model(); p=RDF.Parser()
p.set_feature(“http://feature.librdf.org/raptor-scanForRDF”, “1”)
p.parse_into_model(m, “URL Of SVG File”)

There are a number of other features you can use with a Parser. They are available via rapper -f help, but here’s a list: assumeIsRDF, allowNonNsAttributes, allowOtherParsetypes, allowBagID, allowRDFtypeRDFlist, normalizeLanguage, nonNFCfatal, warnOtherParseTypes, checkRdfID.

Naturally, Redland already does what I want it to do. Another pat on the back for Dave (and thanks to him for pointing it out).

TrafficCam, Version 3

Posted in Python, Symbian Python on February 20th, 2005 at 05:42:21

Apparently, when the TrafficCam flash program was released, Justin was opening a can of worms that was bigger than I could have imagined.

After my example of quick development on the Python app, I got a lot of interest in my own TrafficCam application. Suddenly, there was a London version. And a Dubai version. And every time I mentioned it, someone else wanted to create their own version and load it in so that they could use the same nifty features. If there’s one thing that I do right, it’s listen to what my users are telling me. So, this afternoon, upon the arrival about my new Nokia 6600, I got to work.

First step: Build a file loader. This function should take a file of predefined format and read it in over the web, letting you specify some parameters to the program. This data should then be returned in a way that the application can use. We can’t make the file format too complex: the default Python install comes with no XML support, remember, so we’re using a very basic, tab delimited layout for this. The format is pretty simple: It’s described in the TrafficCam Format Documentation, for those of you who may want to use it.

Step two: Build the app around the data. Given a specific URL, construct the entire application setup, from the tabs to the title to listings, from that returned data. Not too hard: required a little bit of changing how I did things so that it could be reloaded easily, but as always, Python was cooperative.

Step Three: Build a frontend to choosing URLs to load data from. Store a title an a URI, and let people choose which to load. Not too bad: using the popup_menu that symbian provides, can easily associate the resulting choice with your earlier list.

Step Four: Add support for reloading. Once I’m done with one set of cameras, I want to view another without having to exit and restart. This is a bit more complex: it requires me to move some of the logic around so that the application flow stays mostly the same. In the end, I ended up cleaning up some ugly repetition of the code this way, which was useful.

Step Five: Make it more user friendly. Add an “Other” for choosing their own URLs, add progress meters and information boxes, put in exception support for when a URI doesn’t load correctly, and in general, make the app work better.

All in all, I spent six hours yesterday working on the application, and basically rewrote it from top to bottom. It’s now easy to use, and extendable to do whatever people want. I can admit that it’s probably the single most user-friendly application I’ve ever written: almost all my work in the past has been command line based, but this is truly a cool application.

If you have a phone which supports Python, I highly recommend this application. Although I’m sure there are better apps out there, this one is my personal favorite: lets you get a glimpse of the world through your phone. Of course, you should be aware that this is not a low bandwidth application: the camera listings are only about 3-4k apiece, but each camera image can be anywhere from 10-15k, sometimes more depending on the cameras you’re using. Yesterday, while doing development, I used up a megabyte of GPRS bandwidth – luckily, I have unlimited GPRS through my provider.

If you live in an area where there are traffic cameras, and you’d like to see them added, simply construct a file according to the format documentation, and drop me a line.

Have I mentioned lately that I love Python?

Now, to get working on that contact database export I had in mind…

Development Time

Posted in Python, Symbian Python on February 17th, 2005 at 01:41:40

Russ posted about a pretty cool Flash Lite application that was developed: a way to look at the NYC traffic cameras using your phone. It’s an extremely cool app – if I lived in New York, I’d buy Flash Lite just to be able to use that application. One thing that Russ mentioned was the development time for the project: 20 hours of development time, when something in Java would have been way larger.

Well, I’m a Python man, not a Flash man, so I can’t get much out of this yet. However, I do think it’s a cool use case: so I did a little research, found out where the data that the Flash app uses was coming from, and did a little hacking. The result? TrafficCam version 0.1, in Python. This little app took me 45 minutes to develop a fully functional prototype: this included taking the HTML from the NY Transportation site, building it into a Python file, creating a user interface, and downloading and displaying the image in the built in phone image viewer.

Not only did I do all this in 45 minutes, I did it without even having a phone to test with. Passing it off to the owner of a 6600 and a 6630, both say it works just fine, as is.

(Note that I think it probably doesn’t, but in ways that aren’t visible: There has been no testing done yet.)

So, although Flash is great for pretty apps – the Python app is *nothing* like the Flash app, which is a great user interface and something that’s really fun to use, even in a browser – but Python can be really great for *quick* apps, especially on the phone.

Update: With another 35 minutes of work, I now have fully functioning tabbed lists, one for each borough. So, with a total of less than an hour and a half of development, I have an application which allows you to download any of the traffic cam images and view them on your phone. It’s no flash, but I call that pretty damn impressive.

Python Special Interest Group

Posted in Python, Symbian Python on February 2nd, 2005 at 23:50:48

Recently, a number of local Python users have assembled some form of organization, to the point that there is actually now relatively regular meetings of these groups of people (before other Linux Users Group gatherings, thus far). With the recent Nokia Python announcement, there’s been some renewed interest in my mobile python work, so I’m hopefully going to get some of that into shape over the weekend for a demonstration to the group on Monday, assuming I can make the meeting.

For those of you who have an interest in Python: What do you think would be interesting to a bunch of Python coders as a demonstration? Is there something that’s particularly spiffy that I could show off, or convert from being a command line application to being a cell phone application? Note thave I’m thinking relatively simply here: I only have a relatively limited memory space to work in, and I only have a small subset of modules to work with, and I’m in Python 2.2.

So, what do you see as being interesting topics/programs to demonstrate to the world the power of Python on the cell phone?

I’m really looking forward to getting together with a bunch of like-minded hackers, and racking their brains on what I can do better. I’ve never really had a good development process before, but Python developers seem to have one, especially the ones that I’ve seen discussing things on the mailing list. I’m used to LiveJournal’s spaghetti code, or writing in PHP which is typically not so well tested. It’ll be interesting to enter conversations with a group of more “formal” developers than myself.

Just looking for thoughts on what I should be working on.