Archive for the 'Semantic Web' Category

Exhaustion with RDF

Posted in Semantic Web on June 21st, 2005 at 23:36:27

This is probably a familiar story to many of you who have been around a while, but I’ve lost a lot of my interest in working with the Semantic Web lately, and I don’t see it coming back anytime in the near future. For those of who are waiting on action items from me, I recommend removing them from my plate and putting them somewhere else where they are more likely to be taken care of.

There’s a few reasons for this. One is simply a lack of time: I’ve been working 14 hour days for the past two weeks at work, and that’s probably not going to change in the near future. Combine that with the fact that I need to do job searching as well, since we’ll be moving to Cambridge soon, and you’ve got an extreme amount of time going out the door to projects that aren’t my own.

Another is frustration with evangalizing being part of the process of proceeding in the Semantic Web world. Every time I take a step forward with some code, I find another 5 steps I have to take back in order to defend my position and the way I’ve done it. After doing this repeatedly for several months now, I’m growing tired of always having to spend more than half of my time fighting to defend the way I’ve created a certain project, rather than soliciting patches or getting help from the community.

Another is the lack of widespread support from the powers that could help move the RDF and Semantic Web movement forward. It would be relatively simple for something like IMDb to open up its database in an RDF format. This would allow for a widespread rating system to be created based around the datat that IMDb provides, allowing for a way for distributing information about movies that could be useful in a number of ways. Similarly for Netflix. Similarly for a half dozen other sites out there – but it never happens. Instead, they stick to their proprietary information, keeping everything internal. While this may generate more income for them, it hardly represents any interest in interacting with the community, which is what the Semantic Web needs in order to accelerate adoption.

I’ve had relatively little feedback on the projects I have put together. Things like rdfgpg, redlandbot, etc. all get left in the dust of the work of larger groups of people, with more impressive results (and rightly so). Nothing I’m doing is particularly innovative or interesting, and it shows in the response from the community.

There is much more motivation for people behind things like microformats – something that’s close to RDF, but far enough away (and unlikely to see transformation to it) that it seems pointless. People are trying to create all these tools that take advantage of the small-s semantic web, but not taking the one extra step needed – via GRDDL, profiles, whatever. They think they’re writing the new version of the SemWeb, when in reality, they’re just creating an incomplete imitation.

I suppose at some point, people will start to come around. The world of RDF is powerful. The world of HTML is not. Trying to create semantics out of a language that has none will not work in the long run. For right now, however, people are convinced it will, and that leaves most of the work I’ve done behind as people hop onto the next bandwagon.

I’m going to try and clean up some of the code I have, document it fully, and get licenses attached to it, so that people who want to use it or maintain it can take it up. This is especially true for Julie, which is kind of my pride and joy as far as code goes.

I typically move my interests in about 6 month cycles, so I may eventually swing back towards semantic web development. For now, however, I’m going to do my best to wrap things up, and move onto something different, where I don’t have to fight every step of the way to get things that I do acknowledged.

San Francisco Trip

Posted in Mobile Platform, Semantic Web, Social on June 9th, 2005 at 01:52:15

For those of you who are not yet aware, I will be in San Francisco this weekend, arriving Thursday night (late) and leaving Early Sunday afternoon. I will be in meetings all day on Friday, but if anyone is interested in meeting up, let me know.

People I plan to see so far include, but are not neccesarily limited to: Neil, twid, leora, miker and wombatmobile (possibly) from #mobitopia. I plan to visit tourist sites, as well as stopping by The Mothership in Cupertino while I’m there. I want to ride the famous Trolley’s, I want to eat tacos in the Mission district, I want to visit Unicorn Precinct XIII (note to self, poke zool to fix sf.openguides).

What else should I be doing? Should I go to the DNA Lounge? Muir Woods Redwoods?

Advise me, dear reader, as to what you would do if you were in San Francisco for 36 hours with nothing else on your todo list! Tell me if you want to meet me, and talk about the next hack for the Semantic Web! Tell me if you want to meet me and berate me for not working on location based cell phone computing! Tell me your thoughts on my work, tell me what you’d like to cook up next. Point me out the coolest things in and around downtown San Francisco, and come with me to see them.

The rest is up to you.

Library in RDF

Posted in Delicious Library, RDF, Semantic Web, XSLT on June 5th, 2005 at 21:19:20

A long time ago, when I first got a Mac, there was a lot of hubbub about a program called “Delicious Library”: an application that would let you scan in your books, and provided an awesome user interface to searching, storing, lending, and everything else you might want to do with them. At the time, I wanted it, and I wanted it bad, but I decided to wait until I got an iSight: the idea of entering hundreds, perhaps up to a thousand, UPCs by hand, did not strike me as one of my favored tasks.

March 19th, I got an iSight: a birthday present, from Jess. I thought then “ooh, Delicious Library”, but never got around to it.

This weekend, I was starting to pack up books from the bookshelves. I thought “Hey, I won’t have a clue where any of the books are… unless…”

Jess was out of the house. I downloaded and tried the program: I scanned a full shelf of books (after getting some decent light) and packed them up before I hit my 25 limit and had to pay the piper. $40 for knowing where all of these books are after we move (as well as a new toy to play with) is well worth it.

I scanned another shelf (and ran out of boxes), then sat down to do the fun part.

First: xml2rdf – an XSLT stylesheet to convert from Delicious Library’s XML format to RDF. One of the biggest problems with this stylesheet is that it needs to know about the actual image files available from delicious library: this is where files.xml comes in, which is constructed using the following bash commands:

echo “<container>” > files.xml
for i in ~/Library/Application\ Support/Delicious\ Library/Images/Medium\ Covers/*; do
export j=`echo $i | sed -e ‘s!.*/!!’`
echo “<image size=’medium’ name=’$j’ />” >> files.xml
done
echo “</container>” >> files.xml

This is then used with XSLT’s document() function in order to load what files are available, to prevent from inaccurate <foaf:depiction>s being spat into the source: Amazon does not store cover images for some books, so until I implemented this fix, there were broken image references.

Next: convert.py – Load the file as an RDF model, delete all the existing dc:description statements, convert them from rtfreader from Brandon’s Program Archive

Next: Process through cwm for RDF pretty printing.

Next: rdf2html – taking the RDF output and converting it to HTML.

End result? Content negotiated version of the books I’ve scanned so far in the Books Library – RDF and HTML versions available.

This is some of my first major experience in XSLT, and I’ve found it to be pretty darn easy: far less difficult than I thought it was in the past. I think that I may go on an XSLT kick for the next couple weeks, so don’t be surprised if you see a lot more of my RDF looking a little bit prettier. For example, I already wrote an XSLT stylesheet for the FIF reviews I’ve received, so if you’re using a capable browser, that will be a lot nicer looking now than it used to be.

Feedback in Feeds

Posted in Semantic Web, Web Publishing on June 5th, 2005 at 08:41:00

Some of you may have noticed if you’re regular subscribers that I’m currently working on the Feedback In-Feed that I added a few days back: specifically, trying to make it less obtrusive. However, one of the things that I didn’t realize in all my efforts is that I almost always already *have* the user’s homepage information: if they’ve left a comment on the site, it’s stored in a cookie in their browser (which is probably what they’re going to be submitting the form through).

“But”, I hear the audience crying, “You can’t see their cookies! They’re posting the form from their aggregator!”

Ah, my feeble minded friends, this is true, but this is unimportant. What matters is the user agent which hits the final form – which is stored (now) in the same directory as this blog, meaning it has access to all the cookies set by this blog. Including, as it happens, the name and URL of the person, assuming they’ve left a comment here before.

Of course, this has some limitations: persons who have not commented here before will not have a homepage set, or whose cookies have expired. (They expire after one year in WordPress, it seems.) Still, at the cost of saving space, I have gone the route of removing the homepage from the form, as well as pulling out some of the HTML I didn’t really like in the first place, and cleaning it up in general. I’ve been working on this over the past few days, in large part with help from jc, who was helping me figure out ways to make controls smaller. He said that my feedback form was “Somehow even more annoying than Adsense in feeds” – something that I was loath to admit, but in the end had to agree with.

Since I know there are a large number of people who read this who never comment, there is also a built in “more info” link, via which users can set their Name/URI. So, if you have any interest, feel free to use that link to set your name/URI, which will then be stored with the data.

Another thing that I’m doing, which I hadn’t yet written or talked about in public yet (despite Danny’s conviction otherwise) is the fact that I am capturing the referer information for the feedback, and exposing it via the RDF interface, attached to the review. This is a more generally useful property: a review for something can come via Amazon, a blog, or any other of a number of things. I’m simply using it to store the referer information: So I know whether someone came in via Planet Swhack, Planet RDF, Planet Mobile, LiveJournal, Bloglines, or any other of a number of web resources. Sometimes, of course, there is no referer information, in which case it was probably an aggregator, but I haven’t gotten far enough to analyze that yet. Unfortunately, User-Agent in most of these cases probably doesn’t make a bit of difference: The form is going to be posted via the browser, not the aggregator.

Danny advocates an extremely minimalist feedback mechanism, but I think that that’s less likely to get people to submit feedback, especially once they realize he asks for more information afterwards. LiveJournal’s polls always get more feedback than comments, because they’re low impact in comparison. The same idea applies here: but something which is just a tickybox or a simple form press is not enough to provide the user with a sense of offering helpful feedback. I hope that I’m achieving (for the most part) low impact as well, by redirecting to the referer once the form is posted – but I want to have low impact, informative, useful feedback. I think the recent design changes will help that.

I do think that this is the right idea, and that implementation is the problem, rather than anything more basic in the idea. So, I’ll be working with the implementation to make it better as time goes on.

Google Sitemap Format

Posted in RDF, Semantic Web, XSLT on June 3rd, 2005 at 10:02:23

Josh points out Google’s Sitemap Protocol, via the SWIG Chump. I pull out my XSLT-foo (what little of it there is). I hack a bit back and forth, run into a problem which uche helps me figure out: “XPath does *not* use the default prefix in the stylseheet for purposes of matching”, fix my XSLT up a bit, and create a new RDF source under my semweb section: Google Sitemap Tools, including an XSLT stylesheet, example output, and a conversion service which uses the XSLT: For example, Google’s Example File in RDF.

Now, to find some sitemaps in action in the real world, and add gzip decoding of gzipped sitemaps.

Python/Redland Powered RDF Validator

Posted in PHP, Python, RDF, Redland RDF Application Framework, Semantic Web on June 2nd, 2005 at 20:02:24

After some thinking this morning, I converted the current PHP-based crschmidt.net templating system to a Cheetah Python template. This means that some more of my tools can move to being Python powered, rather than PHP powered.

“So what?”

Currently, the interface to Redland that I have available in PHP is significantly less good than Python. It’s coded by yours truly, and it’s basically only designed for my use cases, so every time I want to use something new, I have to go and code it, or use a closer-to-native C-style interface translated into PHP. Neither of those are particularly enjoyable.

Python is a much more comfortable language for me to use. It is more intuitive for me. It feels more natural, not to mention the fact that I keep forgetting semicolons in my PHP code. It has an awesome binding for Redland, which is one of the things that I’ve been working with most over the past while.

In the past, all my scripts had been either 1. PHP or 2. Python with no site theming. Hopefully the new Cheetah template will help make me create some more tools in Python, which is the language I feel most comfortable in.

With that in mind, I’ve created a new crschmidt.net web service: an RDF Validator. A number of times, I have found that the official RDF validator will puke, but won’t give much of a reason why. This tool uses Redland, which has a tendancy to return what I consider better error messages on worse RDF. It’s designed as a one-off example of the new templating system, and should not be considered indicitive of most of the expected output of such scripts. Just a first attempt at getting myself into more code.

In-Feed Feedback

Posted in Semantic Web, Social on June 1st, 2005 at 23:38:23

I’m playing second fiddle to Danny again right now, implementing his Reader Provided Blog Enhancements as a wordpress plugin. Currently I’m posting to a local MySQL table, from which I can pull the relevant information and create different views later.

This is a great example of some code that would be nice to do with XmlHttpRequest: rather than having the post go to a redirect (which is only going to work if the user has referrers on right now, otherwise it just brings to a single page that says it was completed), it could all be done in the client, and the user would never have to leave.

However, there’s a couple problems with this.

1. RSS Aggregators are not web browsers, and depending on the level of the implementation they are using for displaying HTML content, they may not support Javascript at all or not completely. I’m hoping that HTTP POST will actually do something useful for most of them, but even that is a guess.
2. Online aggregators such as LiveJournal oftentimes strip out Javascript to prevent malicious cookiestealing (and for good reason).

So, unfortunately, javascript is out.

Couple changes that will be happening in the meantime while I work on this: RSS feeds will be limited in size to 1 or 2 posts, so that you don’t get change-flooded every time I turn the plugin on or off to test something, and you may see the review boxes appear or disappear.

Anyway, nothing much to see yet, but I will be doing RDF export of annotations provided, so the data isn’t going to be lost, and I will be working to clean up the code and make it “just work” with a WordPress plugin, hopefully. They are surprisingly easy to write. I didn’t realize how simple some of the stuff was. Keep your eyes on the prize!

Oh, and Danny? Your RDF in that post is broken. Missing rdf:RDF, and one of your close tags is missing a /. Thought I’d let you know 😉

Javascript, RDF Searching

Posted in Javascript, PHP, SPARQL on May 31st, 2005 at 11:29:06

I’ve been doing some playing with goofy Javascript stuff lately to try to get my head wrapped around it, since I’m going to be needing to implement it in a few tools at work in the near future.

I’ve so far used it in
1. An admin interface for Athena’s email accounts,
2. An inventory listing for a work project
3. The newest one, a “suggestion” field for Wordnet searches against the RDF store I just imported this morning.

Danny alerted me to the existence of a new Wordnet dataset. I grabbed the full set, dropped it into Redland, and set up a sparql search against it. The top box there is the nifty one though: type in a string (say, apple) and watch the right side as a list of suggestions is populated.

I still need to get it actually doing a Google Suggest-like dropdown box, but haven’t had the time to hack WICK to do what I want as far as that goes.

I’m still learning, and as such, the code is sucky. I wouldn’t recommend reading it for an example: it’s a quick hack, but it works. Still many bugs to work out – for example, if you type apple, it still searches for app, appl, apple in the process. But I’ll get there. (Okay, so I just did a few bug fixes that make it much better, and switched the search mechanism to use MySQL rather than an 11 Meg PHP array. Much better now.)

Anyway, I think it’s cool. RDF people can mark it down in the “another SPARQL datastore”, Javascript people can mark it down as “Another idiot trying to use XmlHttpRequest and doing it wrong.”

Lemme know if you’ve got suggestions!

Flickr Image Region Selection

Posted in Flickr, Image Description, SPARQL on May 26th, 2005 at 22:58:33

One of the things I’ve noticed with my Image Region stuff, which I posted about recently, is that it’s slow. I didn’t really think about why: at first, a lot of it seemed to have to do with the client side XSL, or the CSS cropping of gigantic images.

However, I’m now realizing that this is using a regex with a pretty heavy query: The kind of query that I wouldn’t want anyone to run against julie, because it would just take too long.

The reason for this is Redland’s current REGEX implementation: It basically loads all the literals out of the store and does a regex against them after it has them, which is obviously not ideal.

With that in mind, I tried to think of interesting queries which could be done without requiring a regex, and came up with the idea of flickr images searches: show me a closeup of all the regions in a flickr image of mine.

So, now there’s an additional search box on my SPARQL interface: Flickr ID/URI. It then uses the foaf:page part of the photo to query against, which is obviously much faster.

Maybe I’ll expand this: let people put in any flickr photo ID, and display the information using XSLT against an RDF datasource, with a link to the output of the datasource. I’ve got all the tools to do it now running in Python locally, so I don’t think it would be too difficult: I would need to get some error parsing together though. I really wish I could tie PHP / Python code on the web together more easily though…

Anyway, an example: Flickr Page to RDF generates Regions.

Try it out at The SPARQL search. As always, data and query are shown inside the source of the page, at the bottom.

More on Image Regions

Posted in Flickr, Geolocation, Image Description, RDF, SPARQL on May 23rd, 2005 at 18:43:40

My post last night was a bit cryptic, so let me walk through a bit more clearly what I’ve been doing, since I seem to have picked up the interest of some more people.

I currently am using Flickr to annotate my photos: primarily because I like their image region annotations, and partially because their API offers me a way to get lots of data out that I’ve put in, which is useful to me. So, that’s what I’m using for photo annotation at the moment, which may change at any point.

Masahide has a flickr2rdf service: flickr2rdf takes a Flickr Photo page URI and exports RDF from it: For example, a picture of myself, my ex girlfriend, and Foghorn Leghorn can be seen, fully annotated, using XSLT+RDF, via the flickr2rdf tool.

Additionally, the original photos stored at flickr (full size) have EXIF information: this information can be exported via Masahide’s equally cool exif2rdf tool: Foghorn Leghorn Example.

Once I have the photo_id of a photo, I can collect all these statements together. Additionally, since I am using tags from GeoBloggers for geolocation, I have a tool which parses out these tags (using the Flickr API) and creates Geo data for them.

I add a few tracking statements: specifically, links to seeAlso the final RDF/XSLT view of the image, (again, Foghorn Leghorn example). I serialize the Model out from Redland, and get a directory full of files full of RDF singletons. From here, I use cwm to process the singletons into an abbreviated RDF/XML file. These files are then synced to the http://crschmidt.net/albums/flickr/ directory. Here, I use a couple little tricks to add an XSLT declaration as the first line of each file, so that the content negotiated version offers XML delivered as application/xml, rather than just application/rdf+xml (which Firefox won’t display in a browser).

Next step is to add each of these files into an RDF model. Since I’m still occasionally changing statements, I’ve been dropping the whole model and readding every time: this doesn’t take too long, as it’s only a few hundred files, and Redland is speedy quick.

So, now we have a database full of RDF statements. Fine. But that’s not too useful. So, I have my SPARQL query interface. Which is all well and good, for people who have lots of knoweldge of RDF. It can provide some cool results.

But it doesn’t really do anything *fun*. So last night, I added an optional checkbox, that said “If you ahve something in a specific query format, process an XSLT file against it”. I tweaked this XSLT from masahide’s example, linked yesterday, into what it is now, which you can see, if you’re interested.

Well, that’s all well and good, but most people don’t understand SPARQL enough to know what they should type in. What’s the use of having to learn a language just to see some pictures? So, my next step was to add a search box specifically for Regions: my sparql page has a search box now specifically for this purpose.

I realized after a couple times, though, that using client side XSLT to process the XML was really slow, clunky, and generally ugly. Not to mention that Mozilla’s XSLT doesn’t let me disable-output-escaping on variables: so, I installed php4-xslt, and started using that implementation on the server side.

Yeah, that’s all well and good too, but now my pretty RDF with queries and all went away! So, I added them back: at the end of the Foghorn Search, in a comment, you’ll see:

Generated using the XSLT stylesheet at http://crschmidt.net/xslt/imgreg.xsl against the data generated by the query:

PREFIX dc: <http ://purl.org/dc/elements/1.1/>
PREFIX foaf: <http ://xmlns.com/foaf/0.1/>
PREFIX imgreg: <http ://www.w3.org/2004/02/image-regions#>
SELECT ?img,?title,?page,?desc,?atitle,?coord
WHERE {
?img
dc:title ?title;
foaf:page ?page;
dc:description ?desc;
imgreg:hasRegion ?r.
?r
dc:title ?atitle;
imgreg:coords ?coord.
FILTER REGEX(?atitle ,”Foghorn”) }

Data was:

followed by the XML version of the SPARQL query results.

Another interesting example: Schmidt – myself, family members, and others.

Anyway, being a bit more informative seemed appropriate given the situation. So there’s my implementation toy of the day.