PersonalProfileDocument Parsing

Earlier today, on the OpenID mailing list, I was asked to supply Perl code to look for PPDs in FOAF docs and return some basic props on the user who owned the FOAF file. My Perl skills have long since fallen by the wayside, but I was able to put together something in Python which seems to me to work pretty good.

ppd.py is a FOAF parser using xml.dom.minidom to look for a PPD, and parse out a couple basic forms of the Personal Profile Document, for cases in which you can’t bring a full RDF parser to bear on the situation. (I know that the question of when this arises has been argued a million times, but an RDF parser is an extra dependency that some projects simply have no interest in bringing on.)

This parses two basic forms of PPD: one in which the foaf:maker is identified by an rdf:nodeID=”nodename”, or one in which the foaf:maker is identified as an rdf:resource=”#nodename” coupled with a rdf:ID=”nodename”.

This hasn’t been fully tested: it was mostly done as a quick proof of concept that people could expand on. I’ve tested it on the nodeID case, and tested that if it can’t find an appropriate PPD, it falls back (against LiveJournal files). I’m not sure how python-esque my code is, but it does seem to work, which was my primary concern.

As usual, this code is designed to be used at the command line as “python ppd.py http://crschmidt.net/foaf.rdf”, or imported as a module, after which you can run ppd.get_person(“http://crschmidt.net/foaf.rdf”).

Thoughts on the method? Will this work with a sufficiently constrained FOAF doc?

One Response to “PersonalProfileDocument Parsing”

  1. Darren Chamberlain Says:

    In a FOAF doc without a PersonalProfileDocument element, like the one I’m testing with, ppd.py has a problem:


    Traceback (most recent call last):
    File "ppd.py.orig", line 70, in ?
    print get_person(sys.argv[1])
    File "ppd.py.orig", line 49, in get_person
    if not target:
    UnboundLocalError: local variable 'target' referenced before assignment

    This can be fixed by moving the target = None line from line 30 to line 20 and reindenting it suitably (I’d paste the diff but the formatting would be wrong).