Archive for the 'Semantic Web' Category

SVG-Metadata

Posted in RDF, SVG on April 9th, 2005 at 19:40:27

Earlier, I posted about extracting SVG metadata with Redland. However, one of the problems with this is that there isn’t a whole lot of SVG out there, nor is there a whole lot of SVG with metadata out there.

One solution to this is the OpenClipArt Library – thousands of Public Domain SVG images with embedded metadata, totalling a heck of a lot of RDF information that could provide an interesting example of how RDF information can be used in real world scenarios.

However, the metadata provided by this library was, when I looked at it, broken RDF. I sent an email to the clipart list explaining the problems with their metadata, and received friendly and helpful replies letting me know that the data was generated with the SVG-Metadata perl library.

This weekend, I downloaded that code and began working on it, submitting a patch to the maintainer (who is also one of the founders of the Inkscape project, and works on OpenClipart), which was integrated today, improving their license support (now supporting all Creative Commons licenses) and their RDF output (such that it validates).

A new version has been released, uploaded to CPAN, and will soon be propogating its way to the CPAN archives. New SVGs uploaded to openclipart will contain metadata which is valid RDF, and Bryce is looking into regenerating the data on older SVGs as well.

More RDF. Better metadata. That’s something that I think I can live with.

RDF + GPG

Posted in RDF, Redland RDF Application Framework, Semantic Web on April 9th, 2005 at 14:35:34

One of my eventual goals is to have julie replace all the features of wh4 (libby’s query bot) and foafbot (edd’s community IRC bot). One thing that edd’s bot did that julie doesn’t is to verify data based on signed documents, and to use this information as a “provenance” for the data: not just “where was it said”, but actually verifying “who said it”.

Dealing with GPG is not nearly as easy as I really think it should be. Take Redland as an example: You can interact with the library at all kinds of levels, from the base swig wrapper to the hand-written RDF.py module wrapped around it, and you can do just about anything the base library does from within Python.

GPG, on the other hand, is very hard to work with from a library level. There is a Python module for working with GPG, but it equates to simply using the command line tool in the end. You can’t tell it “Check this document”: you basically just have tools to create a pipe to GPG, and pass the options in the same way you would on the command line. Add to that that it’s Yet Another Dependancy which is somewhat of a pain to resolve in Python, and you can see why it’s slightly annoying for people who might want to use GPG.

Wondering what edd had done for FOAFbot back when it was running, I decided to grab his code and play with it. Turns out he just opened a pipe to gpg using the commands module in Python. This seemed simple enough to me, so I ripped out some of his code and turned it into a little script.

With that, I announce the release of rdfgpg, a tool for verifying the signature described by an RDF document. It uses the Redland Python bindings, and the usage is:

python rdfgpg.py http://example.org/urlof.rdf

Optionally, you can add a second argument, to set the debug argument, which will show more information about what’s going on in the background, which may help if something that you expect to work isn’t. Additionally, you can easily import the module and use the function rdfgpg.verify_url(url), which returns a list of email addresses on the signing key.

The code is released under a GPL 2.0 license, and is stolen in large part from the FOAFbot code released by Ed Dumbill. Feature requests via comments or email.

Hopefully with this, I’ll start to actually use it in my tools, to verify provenance when possible, and to start convincing people to sign their files. I hate to think what would happen to the semantic web if people suddenly started creating lots of false documents… but hopefully it’s not quite that popular yet.

Parsing SVG Metadata

Posted in Python, RDF, Redland RDF Application Framework, Semantic Web, SVG on April 7th, 2005 at 15:12:48

How to Parse SVG Metadata, the Redland + Python way:

import urllib
import xml.dom.minidom as minidom
import RDF

m = RDF.Model()
p = RDF.Parser()
u=urllib.urlopen(“Location Of SVG File”)
svg = u.read()
doc = minidom.parseString(svg)
p.parse_string_into_model(m, doc.getElementsByTagName(“rdf:RDF”)[0].toxml(), “Location of SVG File”)
print m

In other words: Bring in the RDF and minidom modules, Create an RDF model and parser, download the SVG file to a string, parse the string into a minidom compatible variable, then look for RDF in the SVG file, parsing it into the model, and serializing the model.

Problems: What if someone uses something that’s not rdf: as the prefix?
Solutions: mattmcc offers that minidom supports getElementsByTagNameNS, so the parse line would become:
p.parse_string_into_model(m, doc.getElementsByTagNameNS( “http://www.w3.org/1999/02/22-rdf-syntax-ns#”, “RDF”)[0].toxml(), “Location of SVG File” )
resolving the Namespace issue.

Of course, since this is Redland, this is taken care of for you. Rather than doing it in this way, which is specific to SVG, we can scan for RDF in any XML doc. Simply:

import RDF
m=RDF.Model(); p=RDF.Parser()
p.set_feature(“http://feature.librdf.org/raptor-scanForRDF”, “1”)
p.parse_into_model(m, “URL Of SVG File”)

There are a number of other features you can use with a Parser. They are available via rapper -f help, but here’s a list: assumeIsRDF, allowNonNsAttributes, allowOtherParsetypes, allowBagID, allowRDFtypeRDFlist, normalizeLanguage, nonNFCfatal, warnOtherParseTypes, checkRdfID.

Naturally, Redland already does what I want it to do. Another pat on the back for Dave (and thanks to him for pointing it out).

todo lists

Posted in julie, RDF, Semantic Web on April 6th, 2005 at 12:20:51

So, a while ago, I was bored and wanted to add something to my todo list. So I created a URI for a todo namespace, and used it a couple times via the ^addturtle function built into the julie IRC bot.

As usual with my todo methods, I totally forgot about it. Recently, I brought julie into a new IRC channel (#svg) and she met raxor, who immediately started going through her commandlist:

13:15:07 < julie> Current commands: allRelated, olb, like-pubs, maintainer, webpage, drankbeerwith, like-same-music-as, alldayevents, depiction, based_near, icbm, keywords, country-population, kissed, todo, authorlinks, like-musicalwork, like-books, title, rsslinktitles, country-background, languages, nick, neighborhoods, commentContains, pub-address, schemaweb, desc, homepage, workplace, available, country-lowestPoint, knows, quote, school, sha, ljinterests, xfn_met, members, country-highestPoint, rangeOf, term, made, name, places, agentknows, dob, like-musicians, domainOf, modified, picOfA, newdepiction, rsstitles, weblog, contact, javaPlatform, biodob, mbox, dranklagerwith, namefromany, rsslinks.

Wondering what todo was, he tried it, and got a todo item I had added long ago. I replied, “Oops. Never did that.” and went to work on investigating how I could make the todo feature more useful.

In the process, I added a command to julie to add a todo item given a string:

^todoItem document built ins

Will add for me a todo item of the following turtle:
[a todo:Item; todo:owner [a foaf:Person; foaf:nick “crschmidt”]; dc:date “currenttime”; todo:text “text given”].

It will then query the model for all existing todo items for me, and return that.

Of course, this has problems: one of them being I have no way to mark a todo item as “done” once it is, and other similar things, so I will have to work a bit more on the interface to the todo list, but it’s interesting, and I thought maybe other people might want to know about it.

I do need to start documenting the built ins like this: listeningTo is another example. They don’t have a ^commandinfo result, so I’ll have to improve julie’s built in help.

julie may also see some codepiction/path searching in the near future: Greg Williams (aka kasei) gave me some Perl code that he uses to find shortest paths in a Redland store, so I’ll hopefully be able to use that and build it into julie. Also need to get code back into subversion: I screwed up my working directory so that it’s not managed in subversion, so I haven’t checked in in weeks. (This was the same problem I had last time, when someone sent me a complete refactoring of julie – only he had done it against SVN, which wasn’t up to date.)

This isn’t as polished as my usual posts: I think sometimes I overthink what I’m writing a bit, so you may see a bit more “Hey, this is my cool semantic web trick of the day” posts in the future.

FOAF Names…

Posted in FOAF on April 2nd, 2005 at 22:44:39

A while ago I did a really crappy survey of how many people were using the various forms of “name” properties in the FOAF schema. I say “Crappy” becuase it was incomplete and generated via RDQL queries, which was a really silly way to do things, now that I know how to actually use a few of the Redland API calls. So, since I’m bored and working on a wrapper for a variety of Redland stuff, I figured I’d look at it again.

The model in question is the model for Julie, an IRC based interface to a Redland store. She contains about 2.3 megatriples, in a MySQL backed storage.

Total foaf:Persons: 129,932
Total statements using foaf:name as a predicate: 5549
Total statements using foaf:givenname as a predicate: 5363
Total statements using foaf:firstName as a predicate: 874
Total statements using foaf:family_name as a predicate: 117
Total statements using foaf:surname as a predicate: 6314
Total statements using foaf:nick as a predicate: 120529

Keep in mind that a large chunk of this data is spidered from LiveJournal, so the results are most likely going to be extremely biased to that case, which has no use of any name properties other than foaf:nick.

Nothing all that impressive, really, but interesting as far as statistics go nonetheless.

More Musicbrainz…

Posted in MeNow, RDF on March 29th, 2005 at 23:33:56

What I posted about yesterday was obviously too ridiculously difficult to actually be a real solution to the problem. So, I set about making something that works at least a little bit better.

It’s possible to generate “TRM”s for songs you have. These TRMs are basically accoustic identifiers for the track: they let you identify the song based on the way it sounds. This is how Musicbrainz does its identification. Yesterday, I installed a bunch of musicbrainz stuff in an effort to get this working, and did end up finding something that will generate TRM files. My current song, Chumbawumba’s Tubthumping, has a TRM of 776643d0-9b47-4eb9-8d29-608fa9ccedcd.

So, I can generate TRMs: but that doesn’t get me very far. Now, I need to figure out the actual track associated. Since I’m doing this mostly non-interactively, I’m just going to use the most popular track with that TRM. (This doesn’t always work: for me so far this evening, it’s given me a ~80% accuracy rate). So, I fetch the RDF version of the TRM file: this can be retrieved from http://musicbrainz.org/trmid/776643d0-9b47-4eb9-8d29-608fa9ccedcd for the song I mentioned earlier.

The first song in the “tracklist” RDF bag is the one that is the best match, so I’ll grab that Track. I can then add that URI, and fetch the creator ID from that file. All these files can be tossed into the general RDF model I keep lying around, along with the turtle that I mentioned in the earlier entry: [a foaf:Person; foaf:nick “crschmidt”; menow:hasStatus [a menow:Status; dc:date “timestamp”; menow:listeningTo <trackuri>]].

Then, I can issue a query against the model: since I know the time, I only return the most recent result:

select ?t, ?n, ?d where (?p foaf:nick “crschmidt”) (?p menow:hasStatus ?s) (?s dc:date ?d) (?s menow:listeningTo ?o) (?o dc:title ?t) (?o dc:creator ?a) (?a dc:title ?n) AND ?d =~ /timestamp/

The end result? A couple hundred extra triples loaded into the global model, and I can see:

23:23:56 <crschmidt> ^listeningTo 776643d0-9b47-4eb9-8d29-608fa9ccedcd
23:24:02 <julie> 2005-03-30T04:24:01Z Tubthumping Chumbawamba

Some of the tracks I’ve been listening to tonight can be shown via:^q select ?t, ?n, ?d where (?p foaf:nick “crschmidt”) (?p menow:hasStatus ?s) (?s dc:date ?d) (?s menow:listeningTo ?o) (?o dc:title ?t) (?o dc:creator ?a) (?a dc:title ?n) AND ?d =~ /2005-03-30/. Feel free to stop by #julie on irc.freenode.net and try it!
Read the rest of this entry »

MeNow and MusicBrainz

Posted in MeNow, RDF, Semantic Web on March 28th, 2005 at 16:05:09

So, I had a few minutes of free time earlier today, and I decided to play a bit with MusicBrainz. On my Mac right now, the only music I have is tagged by Musicbrainz, and I finally have redlandbot/julie back online after some extended DSL line problems.

So, I figured “hey, my music is tagged by musicbrainz, and they do some nifty RDF stuff, right?” So I started exploring.

I’m listening to Lou Bega’s Mambo #5, from Mastermix 160 (disc 1), according to iTunes. As I said before, all these titles are pulled from MusicBrainz. So, I go to Musicbrainz, and type in “Mastermix 160” to the Albums list, click on the correct response. I arrive at http://www.musicbrainz.org/showalbum.html?albumid=145830 , and from there, move on to Mambo #5. I’m given two RDF links: One for the Track, one for the artist.

I add these to my local RDF store via IRC:

15:43:43 < crschmidt> ^add
http://mm.musicbrainz.org/mm-2.1/track/5c29f67b-5014-40a7-a443-0de7636b26ac
15:43:44 < julie> Adding
http://mm.musicbrainz.org/mm-2.1/track/5c29f67b-5014-40a7-a443-0de7636b26ac to my database…
15:43:45 < julie> Added 14 statements from
http://mm.musicbrainz.org/mm-2.1/track/5c29f67b-5014-40a7-a443-0de7636b26ac. Model size is 2125954.
15:43:51 < crschmidt> ^add
http://mm.musicbrainz.org/mm-2.1/artist/64f9c914-74a0-4f6b-8589-6261851b0ab9
15:43:52 < julie> Adding
http://mm.musicbrainz.org/mm-2.1/artist/64f9c914-74a0-4f6b-8589-6261851b0ab9 to my database…
15:43:53 < julie> Added 8 statements from
http://mm.musicbrainz.org/mm-2.1/artist/64f9c914-74a0-4f6b-8589-6261851b0ab9. Model size is 2125962.

So, now the bot knows about the song that I’m listening to – how to tell her I’m listening to it? The MeNow schema is designed for just that. A turtle serialization stating “crschmidt, whose homepage is http://crschmidt.net, is, at time 2005-03-28T20:47Z listening to the track identified by http://musicbrainz.org/track/5c29f67b-5014-40a7-a443-0de7636b26ac” would go something like this:

[a foaf:Person; foaf:nick “crschmidt”; foaf:homepage <http://crschmidt.net>; menow:hasStatus [ a menow:Status; menow:listeningTo <http://musicbrainz.org/track/5c29f67b-5014-40a7-a443-0de7636b26ac>; dc:date “2005-03-28T20:47Z”]].

Just my luck, my IRC bot also understands Turtle, so I add some triples:

15:49:07 < crschmidt> ^addturtle [a foaf:Person; foaf:nick “crschmidt”; foaf:homepage <http://crschmidt.net>; menow:hasStatus [ a menow:Status; menow:listeningTo
<http://musicbrainz.org/track/5c29f67b-5014-40a7-a443-0de7636b26ac>; dc:date “2005-03-28T20:47Z”]].
15:49:08 < julie> Model size increased by 7 to 2125969 via turtle statements.

So, now julie knows what I’m listening to, but how do I tell other people? Using RDQL queries (I haven’t added SPARQL support yet), I can show off what I’m listening to:

15:53:16 < crschmidt> ^q select ?t, ?n, ?d where (?p foaf:nick “crschmidt”) (?p menow:hasStatus ?s) (?s dc:date ?d) (?s menow:listeningTo ?o) (?o dc:title ?t) (?o dc:creator ?a) (?a dc:title ?n)
15:53:17 < julie> 2005-03-28T20:47Z Mambo No. 5 Lou Bega, 2005-03-28T16:59Z Can’t Get Enough of You Baby Smash Mouth, 2005-03-28T16:39:08Z Electric Sleep (Original Version) sHeavy, 2005-03-28T16:47:08Z Dead Already Thomas Newman

As you can see, this shows off all the songs I’ve been listening to recently. If I want to limit them, I can add a regex onto ?d: ?d =~ /20:47/. This gives me the result I want: 2005-03-28T20:47Z Mambo No. 5 Lou Bega

Okay, so it’s the most ass backwards way of sharing what you’re listening to ever. That doesn’t mean it doesn’t have 0 merit however: one of the benefits of RDF is its extensibility. This means that I can do a lot more than just say what I’m listening to. I could, for example, offer a rating, using the review vocabulary. I could find out what license a work is under, using information from the Creative Commons Metadata project. I could find out what songs someone else is listening to, and then find out their contact information via FOAF, check their availability via MeNow information and Jabber Pub/Sub tech, and drop them a message if they’re around.

I’m stretching it, but this is why I want all this stuff which MeNow can work with. RDF is powerful, and an application people might actually use would be a cool way to share this data. Then again, for the most part, I’m preaching to the choir here. But I wanted to write about it anyway. “Now Playing information stored in RDF: Wave of the future! You heard it here first!”

SVG

Posted in RDF, SVG on March 28th, 2005 at 00:19:23

Lately, I’ve been playing with SVG, since I finally got it to work decently well on two of the computers I regularly use. I was able to get it working on a Static FOAFNaut even, which is motivating me to actually write a few more tools in Redland to get FOAFnaut working better. I never realized that much of the speed problem with FOAFNaut before was that it was dynamically parsing RDF in Javascript, which is not fast, rather than something related to the actual SVG rendering, which is actually pretty quick.

With help from #svg on freenode, I’ve got SVG running with a prerelease version of an Adobe plugin on my Linux box, and I’ve had it for a while on Firefox on the mac. I’m really looking forward to the release of Firefox 1.1 now though: having built in SVG support will lead me to be able to try out some pretty neat stuff, and maybe pull a few more people over to Firefox in the fray (if the engine isn’t crap, at least).

SVG is, all and all, pretty cool. I’m probably going to add support for parsing RDF out of SVG files to julie once I get my DSL line problems fixed and start running her again. Yet another source of data… such nifty stuff to be done.

For those who don’t know: SVG is kind of like a standards-compliant version of Flash. It stands for Scalable Vector Graphics, and it lets you describe how to draw things in terms of curves and lines, rather than by specifying the pixels. This means that you don’t get blurriness at any size you look at it, unlike rasterized formats. It’s kind of like comparing Adobe Illustrator files to flattened Photoshop files, for those of you who are familiar with such things: one can be stretched at will and not look odd, whereas the other is just not going to react so well to that. There’s still some issues I’m having with them in the “embedded in web pages” way, but that may just me not knowing how to deal with stuff.

For Linux and Windows SVG authoring, there’s Inkscape, which seems to be a simply fantastic piece of work. Illustrator can also export to SVG, and I’m sure there are other tools which the lazyweb can share.

All in all: SVG is cool, and I hope to do some work with it in the near future. I’m happy to hear anything about success stories you may have had so far.

FilmTrust

Posted in Semantic Web on March 13th, 2005 at 09:57:35

FilmTrust logo
I’m not sure if I’ve posted on this beforecoaching, but I noticed a few new features in the site, so I’m going to mention it again.

FilmTrust is a film rating site, much like the ratings built into NetFlix or other similar services. You rate things you’ve seen, and FilmTrust offers to you suggestions as to what you might want to look into seeing. However, instead of just basing it on what movies you’ve seen and what everyone else thought about those movies, it also uses social connections to make these estimates. You create a “friends” network, and give each of these friends ‘ratings’, which determines how much affect their opinions have on your recommendations.

According to the tour, this calculation is “… calculated using the trust ratings you have for your friends, what they have for their friends, and how those people rated the film.”

One of the cooler aspects of the project is that it is rich with information in RDF. So, you can take the information from the site, and pull it into a local RDF store, and manipulate it ot your heart’s content. If you wanted to do your own suggested ratings system by looking at the reviews that each of your friends have offered, you can do that: you could, indeed, redisplay much of the information available on the site solely by using the RDF information and doing your own calculations. (This would, I’m pretty sure, bring up the issue of copyright, so I wouldn’t recommend it without at least discussing it with the project maintainers first.)

FilmTrust is an academic research project being run by Jennifer Golbeck. More information is available on the About FilmTrust page.

Other people have already written on the topic of FilmTrust: MortenF has some nifty toys based around it, Danny’s post a month ago talked about it, and there’s always the random non-english post when you get any project large enough to get a significant following.

I’d like to see more people joining it, especially people with an interest in good computer related movies, because I need some suggestions. So, join today, and add me as a friend!

Tech Plenary

Posted in Semantic Web on March 1st, 2005 at 20:46:28

The W3C fifth annual Tech Plenary is in Boston, Massachusetts this week, meaning there’s a large group of the people who I typically work with exclusively over IRC very nearby. Unfortunately, free time is not exactly forthcoming during the daytime, so I missed out on the Semantic Web Interest Group F2F meetings. I was able to grab a few tidbits over IRC: one of the more interesting ones is the fact that Forum Nokia is run with a lot of metadata underneath, as Patrick Stickler’s Slides demonstrate. (Powerpoint files, so a powerpoint viewer of some kind is required.)

In addition, Patrick mentioned a series of other links, which are available from the irc chump for Patrick’s slides. One of the more interesting ones to me is the Device Profile Search, which I assume works off the RDF available from URLs like, for example, the Nokia6100 Device Profile. A list of profiles of this kind from a number of manufacturers is available from the UAProf Profile Repository, a number of which have been aggregated into my Redlandbot service, and are used periodically for answering questions like “What Java Platform does the Nokia 3650 run?” (The answer, in machine readable form, is “Profile/MIDP-1.0, Configuration/CLDC-1.0, rdf:Bag”. This service subject to change at any time.)

So, seeing some demos of that from Patrick was cool. I’m still hoping to catch some of the Semantic Web people lingering in town for a F2F meetup, if nothing else than for getting my picture included in some codepiction stuff for demos. Hoping to gather some people either in the next couple evenings sometime, or on Saturday if anyone is left.