Technical Ramblings

Archive for the 'Software' Category

Library in RDF

Posted in Delicious Library, RDF, Semantic Web, XSLT on June 5th, 2005 at 21:19:20

A long time ago, when I first got a Mac, there was a lot of hubbub about a program called “Delicious Library”: an application that would let you scan in your books, and provided an awesome user interface to searching, storing, lending, and everything else you might want to do with them. At the time, I wanted it, and I wanted it bad, but I decided to wait until I got an iSight: the idea of entering hundreds, perhaps up to a thousand, UPCs by hand, did not strike me as one of my favored tasks.

March 19th, I got an iSight: a birthday present, from Jess. I thought then “ooh, Delicious Library”, but never got around to it.

This weekend, I was starting to pack up books from the bookshelves. I thought “Hey, I won’t have a clue where any of the books are… unless…”

Jess was out of the house. I downloaded and tried the program: I scanned a full shelf of books (after getting some decent light) and packed them up before I hit my 25 limit and had to pay the piper. $40 for knowing where all of these books are after we move (as well as a new toy to play with) is well worth it.

I scanned another shelf (and ran out of boxes), then sat down to do the fun part.

First: xml2rdf – an XSLT stylesheet to convert from Delicious Library’s XML format to RDF. One of the biggest problems with this stylesheet is that it needs to know about the actual image files available from delicious library: this is where files.xml comes in, which is constructed using the following bash commands:

echo “<container>” > files.xml
for i in ~/Library/Application\ Support/Delicious\ Library/Images/Medium\ Covers/*; do
export j=`echo $i | sed -e ‘s!.*/!!’`
echo “<image size=’medium’ name=’$j’ />” >> files.xml
done
echo “</container>” >> files.xml

This is then used with XSLT’s document() function in order to load what files are available, to prevent from inaccurate <foaf:depiction>s being spat into the source: Amazon does not store cover images for some books, so until I implemented this fix, there were broken image references.

Next: convert.py – Load the file as an RDF model, delete all the existing dc:description statements, convert them from rtfreader from Brandon’s Program Archive

Next: Process through cwm for RDF pretty printing.

Next: rdf2html – taking the RDF output and converting it to HTML.

End result? Content negotiated version of the books I’ve scanned so far in the Books Library – RDF and HTML versions available.

This is some of my first major experience in XSLT, and I’ve found it to be pretty darn easy: far less difficult than I thought it was in the past. I think that I may go on an XSLT kick for the next couple weeks, so don’t be surprised if you see a lot more of my RDF looking a little bit prettier. For example, I already wrote an XSLT stylesheet for the FIF reviews I’ve received, so if you’re using a capable browser, that will be a lot nicer looking now than it used to be.

9 Comments »

Wedged Subversion Repository

Posted in Subversion on May 20th, 2005 at 22:11:57

Earlier this morning, one of my projects subversion repositories got wedged. After figuring out that it was actually wedged (no GET response, no PROPFIND/timeout requiring a kill -9 on svn and svnadmin commands), I started playing with svnadmin. Still didn’t work. Hopped into #svn. Asked, was pointed to FAQ.

Copied current repository to another location before attempting anything else, since I’ve fucked up a BDB based subversion repository attempting to repair it before.

Attempted svnadmin repair /var/www/svn/rdfpython: failed with lots and lots of “PANIC” type errors.

Attempted svnadmin repair ~/newcopyofrepos: That seemed to work. An svnadmin verify ~/newcopyofrepos confirmed that it had.

Made another backup of the repos, removed it, copied the new ~/newcopyofrepos into place.

And the world was good again. A verify/checkout process both verified that the files were all in place, and trac started to work again, and all was well, good and happy.

However, I think that from now on, I may use the fsfs storage method rather than BDB, as this is certainly not the first time this has happened to me or anyone else, and I really think I’m just starting to not trust BDB for mission critical tasks, which is what I consider subversion. My version control is one of the few things that I don’t have completely backed up most of the time: files I can copy around easily, but databases of changes to files typically stay in one place. I could recreate the structure, but I couldn’t really recreate the history, and that’s important to me.

If anyone has any experience with fsfs SVN repositories over BDB based ones, I’d be glad to hear it.

1 Comment »

Planet, GNU Arch

Posted in GNU Arch, Planet Planet, Technology on May 5th, 2005 at 22:08:29

Yesterday, after some discussion regarding Bluemoon (currently offline, LiveJournal syndicated copy available at livejournal temporarily), the idea of a “Planet Swhack” was brought up: an aggregated collection of the weblogs of members of #swhack, much similar to the many other planets run by the Planet Planet software or like Planet RDF, run off the Chumpalogica aggregator.

So, yesterday, I set it up. AaronSw controls Swhack DNS, and wasn’t around at the same time as me at any point, so I set up as a temporary URI to demonstrate it. Picked up some bloggers, and set up the stylesheet to be the same as my other Planet, PlanetMobile. Tonight, as I was preparing to ask AaronSw to set up DNS for Planet Swhack, I noticed that jcowan‘s most reccent entry was messing things up. I looked into the issue a bit more and found out that Planet was using version 2.7 of Mark Pilgrim’s Universal Feed Parser, which barfed quite badly on the XHTML in his Atom content.

So, I looked into it a bit more, and found out that the “nightly tarball” of Planet has not been updated since October. So much for any kind of decent release schedule.

Looking at a mailing list thread on release scheduling, I realized that the issues I was having had been fixed, and set about to check out the latest code from their version control.

Except there’s no instructions on how to do that, just a repository name. And it’s GNU Arch, which I sure as hell don’t know. So, I go to install it… apt-get install tla, on my home machine… apt tells me:

Media change: please insert the disc labeled
‘Ubuntu 5.04 _Hoary Hedgehog_ – Preview i386 Binary-1 (20050310)’
in the drive ‘/cdrom/’ and press enter

Update: Since I get a large number of hits from Google for this issue: The way to fix this is to edit your /etc/apt/sources.list, and remove the first line in it that references the cdrom drive. (You can simply put a # in front of it.) Then type apt-get update. (You’ll have to edit and update as root – type sudo before the commands to do that.) If you need more help, feel free to comment. (2006-01-10)
… right. I’m a 15 minute bike ride from home. Not going to happen. So, I switch from zeus to athena and try it, and get tla installed. Then start looking for instructions on how to check out a repository.

Apparently, the industry standard term “check out” is not part of the Arch repository system. Eventually, I wandered into Logjam’s arch repository, which provides clean instructions for how to get the code out of it:

tla register-archive http://logjam.danga.com/arch/2004
tla get logjam@danga.com–2004/logjam–dev–4.4

I was able to check out the “shiny development branch” of the Planet code, and get it in place on the site, fixing all my issues with Atom and XHTML content. All is well in the world, and Planet Swhack is a go. Never let it be said that checking out code from an arch repository is intuitive though. Anyone who thinks it is is out of their tree.

1 Comment »

Recent Work

Posted in default, FOAF, julie, Semantic Web, Subversion on April 14th, 2005 at 21:34:36

I’ve been doing some work with FOAFNaut, SVG, and other related technologies lately. For the most part, the changes in and of themselves are too small to track in a weblog format, but I did build myself a little tool to store recent updates to crschmidt.net last night, so I could share them. crschmidt.net site updates has an HTML view, as well as an RSS 1.0 and RSS 2.0 view, and is used to display information on the front page on what has changed recently.

Today, I spent a big chunk of my afternoon playing with julie alongside DanC. He asked if I planned on implementing SPARQL in the bot any time soon (which I do, as soon as a Redland release supporting the turtle format for SPARQL queries comes out). We also talked about GRDDL support, and some other related things. He offered some interesting files which I added to the database, teaching julie more about W3C proceedings and allowing for some more interesting queries in that respect. I need to start keeping track of my todolist for julie so that I can get organized in the freetime I have to do something about the state she’s in. I’m really starting to think another refactoring may be in order: although I received a pretty gigantic patch at one point, I still really haven’t “thrown one away” yet.

I also decided to install trac earlier today for some reason, something that was reinforced when I was asked to start a wiki foaf FOAFNaut internals as I was playing with it. You can check out the listing of projects I have here, which will grow as time continues, because I’m going to be moving more and more of my stuff into Subversion and more and more of what’s in Subversion to trac. It’s really nifty software, and I’m looking forward to playing with it. Who knows, it might shove a few more people into getting involved in my current projects. It’s got everything I need but have been too lazy to install in one place: wiki, bug tracking, source viewing, revision… quite nice, really.

Other than that, not much going on: Keep an eye on the site updates as I continue to do more little changes in and around crschmidt.net to my various projects.

Comments Off on Recent Work