Library in RDF

A long time ago, when I first got a Mac, there was a lot of hubbub about a program called “Delicious Library”: an application that would let you scan in your books, and provided an awesome user interface to searching, storing, lending, and everything else you might want to do with them. At the time, I wanted it, and I wanted it bad, but I decided to wait until I got an iSight: the idea of entering hundreds, perhaps up to a thousand, UPCs by hand, did not strike me as one of my favored tasks.

March 19th, I got an iSight: a birthday present, from Jess. I thought then “ooh, Delicious Library”, but never got around to it.

This weekend, I was starting to pack up books from the bookshelves. I thought “Hey, I won’t have a clue where any of the books are… unless…”

Jess was out of the house. I downloaded and tried the program: I scanned a full shelf of books (after getting some decent light) and packed them up before I hit my 25 limit and had to pay the piper. $40 for knowing where all of these books are after we move (as well as a new toy to play with) is well worth it.

I scanned another shelf (and ran out of boxes), then sat down to do the fun part.

First: xml2rdf – an XSLT stylesheet to convert from Delicious Library’s XML format to RDF. One of the biggest problems with this stylesheet is that it needs to know about the actual image files available from delicious library: this is where files.xml comes in, which is constructed using the following bash commands:

echo “<container>” > files.xml
for i in ~/Library/Application\ Support/Delicious\ Library/Images/Medium\ Covers/*; do
export j=`echo $i | sed -e ‘s!.*/!!’`
echo “<image size=’medium’ name=’$j’ />” >> files.xml
done
echo “</container>” >> files.xml

This is then used with XSLT’s document() function in order to load what files are available, to prevent from inaccurate <foaf:depiction>s being spat into the source: Amazon does not store cover images for some books, so until I implemented this fix, there were broken image references.

Next: convert.py – Load the file as an RDF model, delete all the existing dc:description statements, convert them from rtfreader from Brandon’s Program Archive

Next: Process through cwm for RDF pretty printing.

Next: rdf2html – taking the RDF output and converting it to HTML.

End result? Content negotiated version of the books I’ve scanned so far in the Books Library – RDF and HTML versions available.

This is some of my first major experience in XSLT, and I’ve found it to be pretty darn easy: far less difficult than I thought it was in the past. I think that I may go on an XSLT kick for the next couple weeks, so don’t be surprised if you see a lot more of my RDF looking a little bit prettier. For example, I already wrote an XSLT stylesheet for the FIF reviews I’ve received, so if you’re using a capable browser, that will be a lot nicer looking now than it used to be.

9 Responses to “Library in RDF”

  1. Jimmy Cerra Says:

    Darn! I thought rdf2html was a generic rdf/xml pretty printer! Now my mind is going to wrap around that problem all night!

  2. Christopher Schmidt Says:

    Jimmy: Heh, no. I’m not that advanced. My XSLT for this is all relative named: so, “rdf2html” is really /library/rdf2html, which makes more sense… I’m just lazy 😉

  3. Mark Eichin Says:

    I picked up Delicious Library when it first came out, but the fact that the shelf model (1) wasn’t unique-membership (2) didn’t have Contained-In at all (3) wasn’t usefully supported at the scan level (what I wanted there was Create a shelf – declare it the default – all subsequent scans go there; then create a new Shelf and repeat) … all got in the way of finding it truly useful. So is convert.py just faking up the relationships to match your usage? (I looked at DL 1.5 and it still doesn’t seem to have “real” shelves…)

  4. Christopher Schmidt Says:

    convert.py is just something that fixes the fact that the Delicious Library output is in RTF, which is ugly.

    What I’ve been doing is scanning a set of books, selecting them all (order by scan time and click-hold shift-click from the top to the first book from the shelf) and then clicking “My Info”, setting the “Location in Building” field, creating a shelf, and dragging all the books in. Total time to do all that is ~30 seconds, and lets me get at the data both in the view (I have shelves on the left) or in the order by (where you can sort by Location In Building).

    In the RDF/HTML output, there’s no information about which shelf something is in, since I don’t have a need for it; The information is mostly there so I can share a listing of what books I have with other people, and eventually so that I can aggregate other things like reviews in with the data. RDF gives me a nice unique identifier for my copy of the book that I can treat as equivilant to the Amazon copy, then pull in local reviews, stuff like that.

    You’ll notice if you look at the actual RDF (/library/books.rdf if you don’t want to play with content negotiation) that the library is all at the end (in the rdf:first, rdf:last…. hm, that should be a bag, not a Collection, Collections are ordered, theses shouldn’t be, just realized that) and that it’s just one big collection. I may be a special case, but outside the efficient search of the actual program, I don’t need the storage data at all.

  5. Jimmy Cerra Says:

    You may be interested in the various RDF extensions to XSLT. There are to name a few:

    * Treehugger,
    * Nemo RDF, by me and isn’t close to beta,
    * RDF Twig.

    Also note another Norman Walsh creation, SXPipe. It would make multiple transformations much easier.

  6. Mark Eichin Says:

    Ahh, I’d completely failed to find the “Location in Building”, bit, that’s quite useful. Thanks!

  7. Christopher Schmidt Says:

    Jimmy: Java, Java, Java and, oh wait, Java! I don’t touch the stuff with a 10 foot pole 🙂 I’m aware of the limitations of using XSLT against RDF data, but I’m also aware of its possibilities: given a specific input and programmatic output, there is nothing wrong with treating RDF data as XML and using XPath like queries against it. It means if I change my build process (as I had to when I started using convert.py and cwm to create clean+non-RTF output) I’ll have to change my XSLT, but that isn’t really such a big deal: in my case, it only required changing one XPath statement, not so bad in my mind.

    Mark: I missed it at first too, but once I found it, I was a much happier panda 😉 Shelves at that point become organized by whatever you want them to be – for example, by category, rather than by location – and the location search lets you find them. Quite handy, imho.

  8. Jimmy Cerra Says:

    Which XSLT processor do you use?

  9. Christopher Schmidt Says:

    Jimmy: xsltproc when at the command line, occasionally the PHP4 XSLT processor (Sablotron-based). The Library stuff is done in xsltproc.