Archive for the 'Technology' Category

Library in RDF

Posted in Delicious Library, RDF, Semantic Web, XSLT on June 5th, 2005 at 21:19:20

A long time ago, when I first got a Mac, there was a lot of hubbub about a program called “Delicious Library”: an application that would let you scan in your books, and provided an awesome user interface to searching, storing, lending, and everything else you might want to do with them. At the time, I wanted it, and I wanted it bad, but I decided to wait until I got an iSight: the idea of entering hundreds, perhaps up to a thousand, UPCs by hand, did not strike me as one of my favored tasks.

March 19th, I got an iSight: a birthday present, from Jess. I thought then “ooh, Delicious Library”, but never got around to it.

This weekend, I was starting to pack up books from the bookshelves. I thought “Hey, I won’t have a clue where any of the books are… unless…”

Jess was out of the house. I downloaded and tried the program: I scanned a full shelf of books (after getting some decent light) and packed them up before I hit my 25 limit and had to pay the piper. $40 for knowing where all of these books are after we move (as well as a new toy to play with) is well worth it.

I scanned another shelf (and ran out of boxes), then sat down to do the fun part.

First: xml2rdf - an XSLT stylesheet to convert from Delicious Library’s XML format to RDF. One of the biggest problems with this stylesheet is that it needs to know about the actual image files available from delicious library: this is where files.xml comes in, which is constructed using the following bash commands:

echo “<container>” > files.xml
for i in ~/Library/Application\ Support/Delicious\ Library/Images/Medium\ Covers/*; do
export j=`echo $i | sed -e ’s!.*/!!’`
echo “<image size=’medium’ name=’$j’ />” >> files.xml
done
echo “</container>” >> files.xml

This is then used with XSLT’s document() function in order to load what files are available, to prevent from inaccurate <foaf:depiction>s being spat into the source: Amazon does not store cover images for some books, so until I implemented this fix, there were broken image references.

Next: convert.py - Load the file as an RDF model, delete all the existing dc:description statements, convert them from rtfreader from Brandon’s Program Archive

Next: Process through cwm for RDF pretty printing.

Next: rdf2html - taking the RDF output and converting it to HTML.

End result? Content negotiated version of the books I’ve scanned so far in the Books Library - RDF and HTML versions available.

This is some of my first major experience in XSLT, and I’ve found it to be pretty darn easy: far less difficult than I thought it was in the past. I think that I may go on an XSLT kick for the next couple weeks, so don’t be surprised if you see a lot more of my RDF looking a little bit prettier. For example, I already wrote an XSLT stylesheet for the FIF reviews I’ve received, so if you’re using a capable browser, that will be a lot nicer looking now than it used to be.

Google Sitemap Format

Posted in RDF, Semantic Web, XSLT on June 3rd, 2005 at 10:02:23

Josh points out Google’s Sitemap Protocol, via the SWIG Chump. I pull out my XSLT-foo (what little of it there is). I hack a bit back and forth, run into a problem which uche helps me figure out: “XPath does *not* use the default prefix in the stylseheet for purposes of matching”, fix my XSLT up a bit, and create a new RDF source under my semweb section: Google Sitemap Tools, including an XSLT stylesheet, example output, and a conversion service which uses the XSLT: For example, Google’s Example File in RDF.

Now, to find some sitemaps in action in the real world, and add gzip decoding of gzipped sitemaps.

Javascript, RDF Searching

Posted in Javascript, PHP, SPARQL on May 31st, 2005 at 11:29:06

I’ve been doing some playing with goofy Javascript stuff lately to try to get my head wrapped around it, since I’m going to be needing to implement it in a few tools at work in the near future.

I’ve so far used it in
1. An admin interface for Athena’s email accounts,
2. An inventory listing for a work project
3. The newest one, a “suggestion” field for Wordnet searches against the RDF store I just imported this morning.

Danny alerted me to the existence of a new Wordnet dataset. I grabbed the full set, dropped it into Redland, and set up a sparql search against it. The top box there is the nifty one though: type in a string (say, apple) and watch the right side as a list of suggestions is populated.

I still need to get it actually doing a Google Suggest-like dropdown box, but haven’t had the time to hack WICK to do what I want as far as that goes.

I’m still learning, and as such, the code is sucky. I wouldn’t recommend reading it for an example: it’s a quick hack, but it works. Still many bugs to work out - for example, if you type apple, it still searches for app, appl, apple in the process. But I’ll get there. (Okay, so I just did a few bug fixes that make it much better, and switched the search mechanism to use MySQL rather than an 11 Meg PHP array. Much better now.)

Anyway, I think it’s cool. RDF people can mark it down in the “another SPARQL datastore”, Javascript people can mark it down as “Another idiot trying to use XmlHttpRequest and doing it wrong.”

Lemme know if you’ve got suggestions!

XSLT + Image Regions + Sparql

Posted in Flickr, Image Description, RDF, SPARQL, XSLT on May 22nd, 2005 at 20:05:23

Read Masahide’s notes on XSLT+Image Regions. Used some tools to convert my flickr photos to RDF.

Converted an XSLT Stylesheet to a different result format. Loaded ~400 RDF files into a Model, totalling 33,000 statements. Added an option to my Sparql Interface. Changed the default query. Made the extra option add the stylesheet.

Ran a query. Tweaked until it worked. Typed it all up here, to share with all of you.

Hooray for masahide, flickr, and all kinds of other wonderful things.

Lynx View

Posted in Technology, Web Publishing on May 20th, 2005 at 19:20:58

A new crschmidt.net webservice:

lynxview, converting based on a domain name to a lynx -dump form. For when you want to show some windows user just how crap their website is with all the graphics turned off.

As a form of demonstration, check out crschmidt.net or Planet Mobile.

Currently, sites are cached eternally, so that the service can’t be used to DDoS some poor site.

Produced in part by a request from DanC on #swig earlier today.

Blocking Port 25

Posted in SMTP on May 14th, 2005 at 09:26:52

So, for the first time this weekend, I’m on a network where outgoing mail on port 25 is blocked. How annoying.

I use a number of mail servers in a number of different ways. Typically, when on one of my Linux boxes (zeus or athena) I’ll send mail directly from those servers by using a localhost Postfix installation, and no smart or relay hosts. I don’t really see a need for my ISP to see my mail, and doing it this way is the default setup for most Linux distros that I’m aware of.

If I’m someplace that doesn’t have a mail server (ie the powerbook, creusa, or the mac mini, hermes), I use athena as a mail host, on which I have installed SASL authentication. Athena is set up to accept mail in a couple cases:

1. Mail from local network. This includes all the IPs in my block on Sagonet.
2. SASL Authenticated users: This users password authentication against the local mail database to check users who can login to the server to send mail.

As such, the server is protected against being an open relay (so long as I don’t get a spammer on the local machine, but I don’t think that’s going to be the case), and I like having it there as a backup for when other mail servers fail me. wedu’s mail uses POP before SMTP for authentication, which is all well and good, but can be a pain since the logs are reset at :45 past the hour, and if you try and send mail right after that, you get a nice “Relay denied” message.

In any case, I tried to send mail this morning via crschmidt.net… and got a timeout. Tried getting there from here, no go. Panicked a bit, since this is my main mail server, and if it’s down, that’s a bad thing. Tested it from zeus: no problems. Tested it locally: no problems. Tried going to another port 25… problems. So it’s on the Ameritech end. Great.

Set up an ssh tunnel: ssh -L 25:localhost:25 crschmidt@crschmidt.net. Set up a server in Mail.app as localhost port 25. Forward my mail. Sigh at Ameritech. Bitch in weblog. And the circle of life continues.

Planet, GNU Arch

Posted in GNU Arch, Planet Planet, Technology on May 5th, 2005 at 22:08:29

Yesterday, after some discussion regarding Bluemoon (currently offline, LiveJournal syndicated copy available at livejournal temporarily), the idea of a “Planet Swhack” was brought up: an aggregated collection of the weblogs of members of #swhack, much similar to the many other planets run by the Planet Planet software or like Planet RDF, run off the Chumpalogica aggregator.

So, yesterday, I set it up. AaronSw controls Swhack DNS, and wasn’t around at the same time as me at any point, so I set up as a temporary URI to demonstrate it. Picked up some bloggers, and set up the stylesheet to be the same as my other Planet, PlanetMobile. Tonight, as I was preparing to ask AaronSw to set up DNS for Planet Swhack, I noticed that jcowan’s most reccent entry was messing things up. I looked into the issue a bit more and found out that Planet was using version 2.7 of Mark Pilgrim’s Universal Feed Parser, which barfed quite badly on the XHTML in his Atom content.

So, I looked into it a bit more, and found out that the “nightly tarball” of Planet has not been updated since October. So much for any kind of decent release schedule.

Looking at a mailing list thread on release scheduling, I realized that the issues I was having had been fixed, and set about to check out the latest code from their version control.

Except there’s no instructions on how to do that, just a repository name. And it’s GNU Arch, which I sure as hell don’t know. So, I go to install it… apt-get install tla, on my home machine… apt tells me:

Media change: please insert the disc labeled
‘Ubuntu 5.04 _Hoary Hedgehog_ - Preview i386 Binary-1 (20050310)’
in the drive ‘/cdrom/’ and press enter

Update: Since I get a large number of hits from Google for this issue: The way to fix this is to edit your /etc/apt/sources.list, and remove the first line in it that references the cdrom drive. (You can simply put a # in front of it.) Then type apt-get update. (You’ll have to edit and update as root - type sudo before the commands to do that.) If you need more help, feel free to comment. (2006-01-10)
… right. I’m a 15 minute bike ride from home. Not going to happen. So, I switch from zeus to athena and try it, and get tla installed. Then start looking for instructions on how to check out a repository.

Apparently, the industry standard term “check out” is not part of the Arch repository system. Eventually, I wandered into Logjam’s arch repository, which provides clean instructions for how to get the code out of it:

tla register-archive http://logjam.danga.com/arch/2004
tla get logjam@danga.com–2004/logjam–dev–4.4

I was able to check out the “shiny development branch” of the Planet code, and get it in place on the site, fixing all my issues with Atom and XHTML content. All is well in the world, and Planet Swhack is a go. Never let it be said that checking out code from an arch repository is intuitive though. Anyone who thinks it is is out of their tree.

SVG-Metadata

Posted in RDF, SVG on April 9th, 2005 at 19:40:27

Earlier, I posted about extracting SVG metadata with Redland. However, one of the problems with this is that there isn’t a whole lot of SVG out there, nor is there a whole lot of SVG with metadata out there.

One solution to this is the OpenClipArt Library - thousands of Public Domain SVG images with embedded metadata, totalling a heck of a lot of RDF information that could provide an interesting example of how RDF information can be used in real world scenarios.

However, the metadata provided by this library was, when I looked at it, broken RDF. I sent an email to the clipart list explaining the problems with their metadata, and received friendly and helpful replies letting me know that the data was generated with the SVG-Metadata perl library.

This weekend, I downloaded that code and began working on it, submitting a patch to the maintainer (who is also one of the founders of the Inkscape project, and works on OpenClipart), which was integrated today, improving their license support (now supporting all Creative Commons licenses) and their RDF output (such that it validates).

A new version has been released, uploaded to CPAN, and will soon be propogating its way to the CPAN archives. New SVGs uploaded to openclipart will contain metadata which is valid RDF, and Bryce is looking into regenerating the data on older SVGs as well.

More RDF. Better metadata. That’s something that I think I can live with.

Parsing SVG Metadata

Posted in Python, RDF, Redland RDF Application Framework, SVG, Semantic Web on April 7th, 2005 at 15:12:48

How to Parse SVG Metadata, the Redland + Python way:

import urllib
import xml.dom.minidom as minidom
import RDF

m = RDF.Model()
p = RDF.Parser()
u=urllib.urlopen(”Location Of SVG File”)
svg = u.read()
doc = minidom.parseString(svg)
p.parse_string_into_model(m, doc.getElementsByTagName(”rdf:RDF”)[0].toxml(), “Location of SVG File”)
print m

In other words: Bring in the RDF and minidom modules, Create an RDF model and parser, download the SVG file to a string, parse the string into a minidom compatible variable, then look for RDF in the SVG file, parsing it into the model, and serializing the model.

Problems: What if someone uses something that’s not rdf: as the prefix?
Solutions: mattmcc offers that minidom supports getElementsByTagNameNS, so the parse line would become:
p.parse_string_into_model(m, doc.getElementsByTagNameNS( “http://www.w3.org/1999/02/22-rdf-syntax-ns#”, “RDF”)[0].toxml(), “Location of SVG File” )
resolving the Namespace issue.

Of course, since this is Redland, this is taken care of for you. Rather than doing it in this way, which is specific to SVG, we can scan for RDF in any XML doc. Simply:

import RDF
m=RDF.Model(); p=RDF.Parser()
p.set_feature(”http://feature.librdf.org/raptor-scanForRDF”, “1″)
p.parse_into_model(m, “URL Of SVG File”)

There are a number of other features you can use with a Parser. They are available via rapper -f help, but here’s a list: assumeIsRDF, allowNonNsAttributes, allowOtherParsetypes, allowBagID, allowRDFtypeRDFlist, normalizeLanguage, nonNFCfatal, warnOtherParseTypes, checkRdfID.

Naturally, Redland already does what I want it to do. Another pat on the back for Dave (and thanks to him for pointing it out).

SVG

Posted in RDF, SVG on March 28th, 2005 at 00:19:23

Lately, I’ve been playing with SVG, since I finally got it to work decently well on two of the computers I regularly use. I was able to get it working on a Static FOAFNaut even, which is motivating me to actually write a few more tools in Redland to get FOAFnaut working better. I never realized that much of the speed problem with FOAFNaut before was that it was dynamically parsing RDF in Javascript, which is not fast, rather than something related to the actual SVG rendering, which is actually pretty quick.

With help from #svg on freenode, I’ve got SVG running with a prerelease version of an Adobe plugin on my Linux box, and I’ve had it for a while on Firefox on the mac. I’m really looking forward to the release of Firefox 1.1 now though: having built in SVG support will lead me to be able to try out some pretty neat stuff, and maybe pull a few more people over to Firefox in the fray (if the engine isn’t crap, at least).

SVG is, all and all, pretty cool. I’m probably going to add support for parsing RDF out of SVG files to julie once I get my DSL line problems fixed and start running her again. Yet another source of data… such nifty stuff to be done.

For those who don’t know: SVG is kind of like a standards-compliant version of Flash. It stands for Scalable Vector Graphics, and it lets you describe how to draw things in terms of curves and lines, rather than by specifying the pixels. This means that you don’t get blurriness at any size you look at it, unlike rasterized formats. It’s kind of like comparing Adobe Illustrator files to flattened Photoshop files, for those of you who are familiar with such things: one can be stretched at will and not look odd, whereas the other is just not going to react so well to that. There’s still some issues I’m having with them in the “embedded in web pages” way, but that may just me not knowing how to deal with stuff.

For Linux and Windows SVG authoring, there’s Inkscape, which seems to be a simply fantastic piece of work. Illustrator can also export to SVG, and I’m sure there are other tools which the lazyweb can share.

All in all: SVG is cool, and I hope to do some work with it in the near future. I’m happy to hear anything about success stories you may have had so far.