RDF?
Posted in Uncategorized on December 10th, 2005 at 12:07:49RDF: Resource Description Framework or Reality Distortion Field?
RDF: Resource Description Framework or Reality Distortion Field?
I mentioned in an earlier post about Ning today that I felt that the content model used by functionally no different from RDF. This needs a bit of explanation, I’m sure, and thus far there isn’t a lot of talk out there about how to actually use the content store. (This is probably related to the significant lack of beta developer accounts so far, something that will hopefully change in the near future.)
A quick tutorial on creating Content Objects on Ning first (drawn from XN_Content documentation):
$object = XN_Content::create(“TypeOfObject”, “Title of Object”, “Description of Object”);
$object->save();
This creates an object with a Type, Title, Description. These are the three most commonly used “System” attributes — they are present on almost every object created in some form or another, so they can be used in displaying that data.
In addition to these system attributes, there is the ability to add developer attributes:
$object->my->name = “Christopher Schmidt”;
$object->my->age = 21;
$object->save();
If we look at the content for this object, via the $object->debugHTML() method, we see:
XN_Content: id [198390] createdDate [2005-10-05T06:50:23-07:00] updatedDate [2005-10-05T06:50:23-07:00] type [TypeOfObject] title [Title of Object] description [Description Of Object] tagCount [0] contributorName [crschmidt] ownerName [Example App] ownerUrl [exapp] isPrivate [false] my attributes [ [-1] name : Christopher Schmidt : string [-1] age : 21 : number ]
Here we see some interesting bits that we hadn’t before: the id, Date, and owner fields are all useful for where the data is coming from, and the contributorName for who the data is coming from. For the time being, however, we’ll concentrate on the my attributes and the id and type system attributes.
Mapping this to an RDF representation is relatively simple:
ning:198390 a TypeOfObject;
dc:title "Title of Object";
dc:description "Description of Object";
ning:name "Christopher Schmidt";
ning:age "21" .
You’ll see here the main issue with representing Ning content objects as RDF: the fact that types and predicates (developer attributes) are stored only as strings, not the URIs that RDF typically asks for. This results in a less descriptive format than you would typically find in most RDF descriptions: there’s no URIs for predicates, so you can’t do some of the “magic” that RDF is famous for.
However, I have been working with RDF for more than 1.5 years now, and I have never had any use for that magic.
The ability to uniquely identify predicates may be useful in a general sense, and it provides the ability to accurately and adequately describe these predicates for use in other systems… but at the base level, they are designed to be able to attribute semantics to the terms. In many cases, these semantics are simply unneccesary: if I have a Book with an “isbn” property, I probably know what it’s going to be.
If you do need this level of semantics, all hope is not lost! You can still achieve your goals! Typically, apps use an attribute in only one way, and with each content object is the application which owns it. (In this case, exapp — which doesn’t exist, for the record. The minimum length is 6 characters.) So, you can take the URL for the app (http://exapp.ning.com/) and create a URL based on that: whether it’s in the app’s namespace (squatting on a URL that it isn’t using, for example) or in your own. I could coin http://crschmidt.net/ns/ning-exapp# for a prefix to predicate URLs in this situation.
The semantics attached to a predicate are extremely weak in the Ning content store — but that doesn’t mean that they are useless. It is my opinion that a simple RDF representation of Ning content objects can be useful in many cases, and that the model that is in use succeeds in proving that the RDF model as an application data store is not useless, but instead can lead to rich applications sharing data, as Ning is specifically designed to accomplish.
One thing I didn’t mention above is that it is possible to create links, not only to literals and numbers, but also to other content objects:
$object->my->otherContent = XN_Content::create(“OtherObject”);
$object->save();
This creates a link to another content object, by that object’s ID. This fits perfectly into the RDF model, where URLs identify an object: the Content Object’s ID is its unique identifier within the Ning universe, and the backing store lets you link any number of these objects together. You want to grab a book that someone added, and tie it to your own? That’s fine! Simply create a Content Object and add both of those as attributes. You want a whole list of books from someone else’s bookshelf? That’s fine too: just grab the IDs, and attach em. This casual interlinking of data is exactly what is going to make Ning a success — and it’s exactly the kind of interrelationship that proves the RDF model is not wrong.
I know that some people will strongly disagree: they will say that the Ning content model is more demonstration that people “don’t get it”. What I see it as is exactly the opposite: it’s proof that regardless of the underlying way content is stored, it can be presented to applications in a way that is easy to use, and useful. By giving objects a Unique Identifier (whether it’s a URI or an ID), a way to connect them together, and a decent API to put it all in place in code, you can create magic.
While many people these days are switching to annotation-in-XHTML, there’s still at least one file format out there which has extremely useful metadata annotation using RDF/XML inside the document: SVG.
The Scalable Vector Graphcs format has a Metadata element, which is expected to contain RDF/XML. This is great news for people who might wish to create a directory of SVG images: the metadata can be stored in the actual images, something that the Open Clip Art Library takes advantage of, using a number of tools to extract statistics and aggregate metadata from SVG files.
To take an example from the library, Autos_01.svg (SVG file, requires SVG viewer) contains 23 RDF statements. These triples are given a base of a cc:Work with the URL of the file of itself, meaning that a simple query about the predicates and objects with http://openclipart.org/clipart/transportation/autos_01.svg as a subject returns the important aspects of this document. This includes description, creator, keywords, and license. The license is “Public Domain” — adding the images to the Open Clip Art Library requires placing them into the Public Domain.
For working with this data, developers of the project created the Perl module SVG::Metadata – a module for annotating SVG files with this metadata, as well as making change to the metadata which already exists in such files.
The maintainer just announced on the Clipart Discussion list that he has released 0.28, which includes the changes from previous releases 0.26 and 0.27 which were mostly maintenance releases. (The message will eventually appear in the August threads, but hasn’t yet.)
The RDF generation in versions prior to 0.24 was broken, but was fixed in the 0.25 release – OCAL is now using this release in their scripts, so many of the more recent images in the library are valid RDF, meaning that you can simply pass it to Redland with the http://feature.librdf.org/raptor-scanForRDF feature set. In the Python bindings, that is:
p.set_feature(”http://feature.librdf.org/raptor-scanForRDF”, “1″)
In rapper:
[crschmidt@creusa ~]$ rapper -c -f scanForRDF=1 http://www.openclipart.org/incoming/cat_scrathing_post_benji_01.svg
rapper: Parsing URI http://www.openclipart.org/incoming/cat_scrathing_post_benji_01.svg
rapper: Parsing returned 30 statements
I think this is a great example of how to work with structured metadata without dealing with the crappy aspects of RDF/XML syntax corner cases: simply write a library which parses the metadata, fills your variables up, and lets you modify them with a standard API, then lets you resync the data to the file. Congrats to Bryce for his hard work on the module, and on making the metadata for these SVG files accurate and useful to external users.
SPARQL CONSTRUCT as rules announces the inclusion in julie of SPARQL-CONSTRUCT based rule-like processing for the creation of additional statements to be added to julie.
Basically, the syntax hasn’t changed much:
^q CONSTRUCT {?p2 ?prop ?p. } WHERE { ?prop rdf:type <http://www.w3.org/2002/07/owl#SymmetricProperty>. ?p ?prop ?p2. } returns:
Total of 542 statements: Here’s the first three.
{(r0_r1114530965r33008), [http://purl.org/vocab/relationship/colleagueOf], (r0_r1114530965r32995)}, {(r0_r1114530965r32998), [http://purl.org/vocab/relationship/colleagueOf], (r0_r1114530965r32995)},
{(r0_r1114689381r708), [http://purl.org/vocab/relationship/colleagueOf], (r0_r1114530965r32995)}.
This is to let you know what you’re getting yourself into. For example, you probably don’t want to add the rdfs:subClassOf relationship for everything: you’d be dealing with a heck of a lot of statements, enough to trap the bot for hours. Here, we can see that it’s a relatively reasonable subset of the model, so we can pass this construct result into the model:
21:33:35 < julie> Created 542 statements based on CONSTRUCT query: Model size changed by 481.
You can see the queries and results in context in #swig logs.
I’d be interested to hear how people think this could be used, or if it’s useful in a more general sense. Would it be better to simply provide a temporary URI where people could fetch the data? Hm, that sounds almost like a useful service – POST data, get back a URI and some data after having parsed the data and stored it in a local location. Wonder if I’m the only person who could use that…
At some point, on the FOAFnet mailing list, David Sifry asked, ” What is DOAP? What is GRDDL? What’s the use case?” in response to Danny Ayers’ post on the mailing list.
My reply was pretty simple and succinct in my opinion, but I never put it anywhere public to read. I think it’s a relatively succinct explanation of why GRDDL can be useful for moving between the microformats, with tiny datasets, and RDF, with the large dataset behind it.
What is DOAP? What is GRDDL? What’s the use case?
DOAP – Description of a Project. DOAP is to Open Source software what FOAF is to people. http://usefulinc.com/doap
GRDDL – Gleaming Resource Descriptions from Dialects of Language. This is a way of performing transormations from (x)HTML documents to an RDF document. This transformation takes place via an XSLT transformation, specified either implicitly through the “profile” link in the <head> of the HTML document, or via a meta rel=”transformation”, which parses can then apply to the document to get something with real meaning out of.
http://www.w3.org/TeamSubmission/grddl/
Use Case:
“Microformats” and storing data in HTML may be good for people who are writing specialized tools for each format they want to deal with, but GRDDL allows for people who want to support all these to simply write one translation, using XSLT, and take advantage of the underlying data model of RDF. This allows you to merge existing data sources with the newly emergent small-s semantic technologies, and to use existing tools for combining these sets of data and discovering correlations between them, using the already existing RDF data access framework.
If I have a datastore of RDF data (which I do), and someone wants to find if the maintainer of a certain project has contact information in that database, using hBlah pages, I can’t really do that. However, if I merge the datasets with the already existing FOAF data out there, I can find out that a person named Christopher Schmidt, who is a maintainer of “julie, aka redlandbot”, also has an email address of crschmidt@crschmidt.net.
That’s a pretty simplified use case, but the general idea is simply to take the tiny sources of data that the h* formats provide, and integrate them with the millions of pieces of data out there already in FOAF, RSS, and everything else Semantic.
Lately, I’ve been watching Chimezie play with Emeka, his RDF bot. It’s basically a Versa/4suite counterpart to Julie, the redlandbot (based, obviously, on Redland.)
I can’t say anything for his code, but I do know that through people working with Emeka, I’ve seen some Versa queries recently, and I have to say that they confuse the heck out of me.
I just read through the Versa Article on XML.com, and was helped not at all. The language itself makes sense when I read the examples, but I simply can’t come up with the way to do what I would in SPARQL. I’m sure that it’s really easy once you’re used to it, but to me, it seems like a query language with a 663 mode. Sure, I might be able to write it, with a reference handy, and execute it (hi Emeka!), but I sure as heck can’t read it.
I think I like SPARQL because it feels familiar: the turtle patterns are just turtle statements with some variables. The triple patterns *look* like triples. Versa doesn’t have this benefit: triple patterns in Versa become something along the lines of (all() ->rdfs:label->*) rather than (?s rdfs:label ?l). I think it may just be the fact that all the extra syntax confuses me: why put all these bits in the middle of my triples? They’re triples! Spaces are enough!
Anyway, this isn’t very helpful: I haven’t used Versa enough to have useful comments. Just know that reading this stuff, and even trying to wrap my head around the stuff Chimezie is ending to Emeka, I have problems. This probably is true for SPARQL/julie as well for most people – but for me it “just works”.
A long time ago, when I first got a Mac, there was a lot of hubbub about a program called “Delicious Library”: an application that would let you scan in your books, and provided an awesome user interface to searching, storing, lending, and everything else you might want to do with them. At the time, I wanted it, and I wanted it bad, but I decided to wait until I got an iSight: the idea of entering hundreds, perhaps up to a thousand, UPCs by hand, did not strike me as one of my favored tasks.
March 19th, I got an iSight: a birthday present, from Jess. I thought then “ooh, Delicious Library”, but never got around to it.
This weekend, I was starting to pack up books from the bookshelves. I thought “Hey, I won’t have a clue where any of the books are… unless…”
Jess was out of the house. I downloaded and tried the program: I scanned a full shelf of books (after getting some decent light) and packed them up before I hit my 25 limit and had to pay the piper. $40 for knowing where all of these books are after we move (as well as a new toy to play with) is well worth it.
I scanned another shelf (and ran out of boxes), then sat down to do the fun part.
First: xml2rdf – an XSLT stylesheet to convert from Delicious Library’s XML format to RDF. One of the biggest problems with this stylesheet is that it needs to know about the actual image files available from delicious library: this is where files.xml comes in, which is constructed using the following bash commands:
echo “<container>” > files.xml
for i in ~/Library/Application\ Support/Delicious\ Library/Images/Medium\ Covers/*; do
export j=`echo $i | sed -e ‘s!.*/!!’`
echo “<image size=’medium’ name=’$j’ />” >> files.xml
done
echo “</container>” >> files.xml
This is then used with XSLT’s document() function in order to load what files are available, to prevent from inaccurate <foaf:depiction>s being spat into the source: Amazon does not store cover images for some books, so until I implemented this fix, there were broken image references.
Next: convert.py – Load the file as an RDF model, delete all the existing dc:description statements, convert them from rtfreader from Brandon’s Program Archive
Next: Process through cwm for RDF pretty printing.
Next: rdf2html – taking the RDF output and converting it to HTML.
End result? Content negotiated version of the books I’ve scanned so far in the Books Library – RDF and HTML versions available.
This is some of my first major experience in XSLT, and I’ve found it to be pretty darn easy: far less difficult than I thought it was in the past. I think that I may go on an XSLT kick for the next couple weeks, so don’t be surprised if you see a lot more of my RDF looking a little bit prettier. For example, I already wrote an XSLT stylesheet for the FIF reviews I’ve received, so if you’re using a capable browser, that will be a lot nicer looking now than it used to be.
Josh points out Google’s Sitemap Protocol, via the SWIG Chump. I pull out my XSLT-foo (what little of it there is). I hack a bit back and forth, run into a problem which uche helps me figure out: “XPath does *not* use the default prefix in the stylseheet for purposes of matching”, fix my XSLT up a bit, and create a new RDF source under my semweb section: Google Sitemap Tools, including an XSLT stylesheet, example output, and a conversion service which uses the XSLT: For example, Google’s Example File in RDF.
Now, to find some sitemaps in action in the real world, and add gzip decoding of gzipped sitemaps.
After some thinking this morning, I converted the current PHP-based crschmidt.net templating system to a Cheetah Python template. This means that some more of my tools can move to being Python powered, rather than PHP powered.
“So what?”
Currently, the interface to Redland that I have available in PHP is significantly less good than Python. It’s coded by yours truly, and it’s basically only designed for my use cases, so every time I want to use something new, I have to go and code it, or use a closer-to-native C-style interface translated into PHP. Neither of those are particularly enjoyable.
Python is a much more comfortable language for me to use. It is more intuitive for me. It feels more natural, not to mention the fact that I keep forgetting semicolons in my PHP code. It has an awesome binding for Redland, which is one of the things that I’ve been working with most over the past while.
In the past, all my scripts had been either 1. PHP or 2. Python with no site theming. Hopefully the new Cheetah template will help make me create some more tools in Python, which is the language I feel most comfortable in.
With that in mind, I’ve created a new crschmidt.net web service: an RDF Validator. A number of times, I have found that the official RDF validator will puke, but won’t give much of a reason why. This tool uses Redland, which has a tendancy to return what I consider better error messages on worse RDF. It’s designed as a one-off example of the new templating system, and should not be considered indicitive of most of the expected output of such scripts. Just a first attempt at getting myself into more code.
My post last night was a bit cryptic, so let me walk through a bit more clearly what I’ve been doing, since I seem to have picked up the interest of some more people.
I currently am using Flickr to annotate my photos: primarily because I like their image region annotations, and partially because their API offers me a way to get lots of data out that I’ve put in, which is useful to me. So, that’s what I’m using for photo annotation at the moment, which may change at any point.
Masahide has a flickr2rdf service: flickr2rdf takes a Flickr Photo page URI and exports RDF from it: For example, a picture of myself, my ex girlfriend, and Foghorn Leghorn can be seen, fully annotated, using XSLT+RDF, via the flickr2rdf tool.
Additionally, the original photos stored at flickr (full size) have EXIF information: this information can be exported via Masahide’s equally cool exif2rdf tool: Foghorn Leghorn Example.
Once I have the photo_id of a photo, I can collect all these statements together. Additionally, since I am using tags from GeoBloggers for geolocation, I have a tool which parses out these tags (using the Flickr API) and creates Geo data for them.
I add a few tracking statements: specifically, links to seeAlso the final RDF/XSLT view of the image, (again, Foghorn Leghorn example). I serialize the Model out from Redland, and get a directory full of files full of RDF singletons. From here, I use cwm to process the singletons into an abbreviated RDF/XML file. These files are then synced to the http://crschmidt.net/albums/flickr/ directory. Here, I use a couple little tricks to add an XSLT declaration as the first line of each file, so that the content negotiated version offers XML delivered as application/xml, rather than just application/rdf+xml (which Firefox won’t display in a browser).
Next step is to add each of these files into an RDF model. Since I’m still occasionally changing statements, I’ve been dropping the whole model and readding every time: this doesn’t take too long, as it’s only a few hundred files, and Redland is speedy quick.
So, now we have a database full of RDF statements. Fine. But that’s not too useful. So, I have my SPARQL query interface. Which is all well and good, for people who have lots of knoweldge of RDF. It can provide some cool results.
But it doesn’t really do anything *fun*. So last night, I added an optional checkbox, that said “If you ahve something in a specific query format, process an XSLT file against it”. I tweaked this XSLT from masahide’s example, linked yesterday, into what it is now, which you can see, if you’re interested.
Well, that’s all well and good, but most people don’t understand SPARQL enough to know what they should type in. What’s the use of having to learn a language just to see some pictures? So, my next step was to add a search box specifically for Regions: my sparql page has a search box now specifically for this purpose.
I realized after a couple times, though, that using client side XSLT to process the XML was really slow, clunky, and generally ugly. Not to mention that Mozilla’s XSLT doesn’t let me disable-output-escaping on variables: so, I installed php4-xslt, and started using that implementation on the server side.
Yeah, that’s all well and good too, but now my pretty RDF with queries and all went away! So, I added them back: at the end of the Foghorn Search, in a comment, you’ll see:
Generated using the XSLT stylesheet at http://crschmidt.net/xslt/imgreg.xsl against the data generated by the query:
PREFIX dc: <http ://purl.org/dc/elements/1.1/>
PREFIX foaf: <http ://xmlns.com/foaf/0.1/>
PREFIX imgreg: <http ://www.w3.org/2004/02/image-regions#>
SELECT ?img,?title,?page,?desc,?atitle,?coord
WHERE {
?img
dc:title ?title;
foaf:page ?page;
dc:description ?desc;
imgreg:hasRegion ?r.
?r
dc:title ?atitle;
imgreg:coords ?coord.
FILTER REGEX(?atitle ,”Foghorn”) }
Data was:
followed by the XML version of the SPARQL query results.
Another interesting example: Schmidt – myself, family members, and others.
Anyway, being a bit more informative seemed appropriate given the situation. So there’s my implementation toy of the day.