Archive for May, 2005

Javascript, RDF Searching

Posted in Javascript, PHP, SPARQL on May 31st, 2005 at 11:29:06

I’ve been doing some playing with goofy Javascript stuff lately to try to get my head wrapped around it, since I’m going to be needing to implement it in a few tools at work in the near future.

I’ve so far used it in
1. An admin interface for Athena’s email accounts,
2. An inventory listing for a work project
3. The newest one, a “suggestion” field for Wordnet searches against the RDF store I just imported this morning.

Danny alerted me to the existence of a new Wordnet dataset. I grabbed the full set, dropped it into Redland, and set up a sparql search against it. The top box there is the nifty one though: type in a string (say, apple) and watch the right side as a list of suggestions is populated.

I still need to get it actually doing a Google Suggest-like dropdown box, but haven’t had the time to hack WICK to do what I want as far as that goes.

I’m still learning, and as such, the code is sucky. I wouldn’t recommend reading it for an example: it’s a quick hack, but it works. Still many bugs to work out - for example, if you type apple, it still searches for app, appl, apple in the process. But I’ll get there. (Okay, so I just did a few bug fixes that make it much better, and switched the search mechanism to use MySQL rather than an 11 Meg PHP array. Much better now.)

Anyway, I think it’s cool. RDF people can mark it down in the “another SPARQL datastore”, Javascript people can mark it down as “Another idiot trying to use XmlHttpRequest and doing it wrong.”

Lemme know if you’ve got suggestions!

Hosting Offer

Posted in Web Hosting on May 28th, 2005 at 15:25:25

I’ve offered this to a number of small groups of people, but after working out most of the bugs that I’ve found, I’m ready to offer it to the larger audience that is the readers of this fine publication:

I am currently the maintainer of a dedicated server, hosted by SagoNet, and the machine is set up to offer virtual hosted domains to any and all who need them. This hosting package includes the following:

  • Web Hosting on Apache 2: All websites are hosted on Apache2 with PHP4 with most common modules installed. If you need a custom module installed, this can typically be worked out as well. Disk space is not limited.
  • FTP access: FTP access directly to web directory, giving full access to access and error logs, as well the ability to create your own directories for storage
  • MySQL access: Each account offers a MySQL database for you to use as you please.
  • Subversion Repository (by request): Revision Control system hosted alongside your website
  • Shell Access (optional, free of charge): If you want shell access, you’ve got it. (Please let me know before I set up the account).
  • SMTP (using SASL authentication), IMAP, POP3, Webmail Access: Send/Receive mail from anywhere: on the road, at home, or anything else. Mail is provided using courier-imap/pop packages with Postfix, and you can have as many addresses as you want.
  • Up to 100GB/month bandwidth: this offers basically limitless bandwidth for anything you need. (I doubt anyone would need this much bandwidth in a month: if you feel that bandwidth is a limiting factor for you, let me know and we will work something out.)
  • Free installation of any packages in Debian Sarge: If there’s something you need, I’ll install it for you.

Stay away from the CPanel hassle. Get yourself on a real machine today, and don’t suffer the limitations of other web hosts! Get full shell access to your data. Gain the freedom of having access to the machine which hosts your files. Or, ya know. Just save me some cash ;)

Cost for serving depends on setup neccesary, but will not exceed $10 per domain. File a Request and I’ll get you a quote. If someone else is offering the same thing cheaper, I’ll match it.

If you are working on a Semantic Web related project, you may be eligible for reduced price or free hosting. Indicate this in your request.

All quotes will be answered in 24 hours or less.

Flickr Image Region Selection

Posted in Flickr, Image Description, SPARQL on May 26th, 2005 at 22:58:33

One of the things I’ve noticed with my Image Region stuff, which I posted about recently, is that it’s slow. I didn’t really think about why: at first, a lot of it seemed to have to do with the client side XSL, or the CSS cropping of gigantic images.

However, I’m now realizing that this is using a regex with a pretty heavy query: The kind of query that I wouldn’t want anyone to run against julie, because it would just take too long.

The reason for this is Redland’s current REGEX implementation: It basically loads all the literals out of the store and does a regex against them after it has them, which is obviously not ideal.

With that in mind, I tried to think of interesting queries which could be done without requiring a regex, and came up with the idea of flickr images searches: show me a closeup of all the regions in a flickr image of mine.

So, now there’s an additional search box on my SPARQL interface: Flickr ID/URI. It then uses the foaf:page part of the photo to query against, which is obviously much faster.

Maybe I’ll expand this: let people put in any flickr photo ID, and display the information using XSLT against an RDF datasource, with a link to the output of the datasource. I’ve got all the tools to do it now running in Python locally, so I don’t think it would be too difficult: I would need to get some error parsing together though. I really wish I could tie PHP / Python code on the web together more easily though…

Anyway, an example: Flickr Page to RDF generates Regions.

Try it out at The SPARQL search. As always, data and query are shown inside the source of the page, at the bottom.

More on Image Regions

Posted in Flickr, Geolocation, Image Description, RDF, SPARQL on May 23rd, 2005 at 18:43:40

My post last night was a bit cryptic, so let me walk through a bit more clearly what I’ve been doing, since I seem to have picked up the interest of some more people.

I currently am using Flickr to annotate my photos: primarily because I like their image region annotations, and partially because their API offers me a way to get lots of data out that I’ve put in, which is useful to me. So, that’s what I’m using for photo annotation at the moment, which may change at any point.

Masahide has a flickr2rdf service: flickr2rdf takes a Flickr Photo page URI and exports RDF from it: For example, a picture of myself, my ex girlfriend, and Foghorn Leghorn can be seen, fully annotated, using XSLT+RDF, via the flickr2rdf tool.

Additionally, the original photos stored at flickr (full size) have EXIF information: this information can be exported via Masahide’s equally cool exif2rdf tool: Foghorn Leghorn Example.

Once I have the photo_id of a photo, I can collect all these statements together. Additionally, since I am using tags from GeoBloggers for geolocation, I have a tool which parses out these tags (using the Flickr API) and creates Geo data for them.

I add a few tracking statements: specifically, links to seeAlso the final RDF/XSLT view of the image, (again, Foghorn Leghorn example). I serialize the Model out from Redland, and get a directory full of files full of RDF singletons. From here, I use cwm to process the singletons into an abbreviated RDF/XML file. These files are then synced to the http://crschmidt.net/albums/flickr/ directory. Here, I use a couple little tricks to add an XSLT declaration as the first line of each file, so that the content negotiated version offers XML delivered as application/xml, rather than just application/rdf+xml (which Firefox won’t display in a browser).

Next step is to add each of these files into an RDF model. Since I’m still occasionally changing statements, I’ve been dropping the whole model and readding every time: this doesn’t take too long, as it’s only a few hundred files, and Redland is speedy quick.

So, now we have a database full of RDF statements. Fine. But that’s not too useful. So, I have my SPARQL query interface. Which is all well and good, for people who have lots of knoweldge of RDF. It can provide some cool results.

But it doesn’t really do anything *fun*. So last night, I added an optional checkbox, that said “If you ahve something in a specific query format, process an XSLT file against it”. I tweaked this XSLT from masahide’s example, linked yesterday, into what it is now, which you can see, if you’re interested.

Well, that’s all well and good, but most people don’t understand SPARQL enough to know what they should type in. What’s the use of having to learn a language just to see some pictures? So, my next step was to add a search box specifically for Regions: my sparql page has a search box now specifically for this purpose.

I realized after a couple times, though, that using client side XSLT to process the XML was really slow, clunky, and generally ugly. Not to mention that Mozilla’s XSLT doesn’t let me disable-output-escaping on variables: so, I installed php4-xslt, and started using that implementation on the server side.

Yeah, that’s all well and good too, but now my pretty RDF with queries and all went away! So, I added them back: at the end of the Foghorn Search, in a comment, you’ll see:

Generated using the XSLT stylesheet at http://crschmidt.net/xslt/imgreg.xsl against the data generated by the query:

PREFIX dc: <http ://purl.org/dc/elements/1.1/>
PREFIX foaf: <http ://xmlns.com/foaf/0.1/>
PREFIX imgreg: <http ://www.w3.org/2004/02/image-regions#>
SELECT ?img,?title,?page,?desc,?atitle,?coord
WHERE {
?img
dc:title ?title;
foaf:page ?page;
dc:description ?desc;
imgreg:hasRegion ?r.
?r
dc:title ?atitle;
imgreg:coords ?coord.
FILTER REGEX(?atitle ,”Foghorn”) }

Data was:

followed by the XML version of the SPARQL query results.

Another interesting example: Schmidt - myself, family members, and others.

Anyway, being a bit more informative seemed appropriate given the situation. So there’s my implementation toy of the day.

PersonalProfileDocument Parsing

Posted in FOAF, Python on May 23rd, 2005 at 18:14:44

Earlier today, on the OpenID mailing list, I was asked to supply Perl code to look for PPDs in FOAF docs and return some basic props on the user who owned the FOAF file. My Perl skills have long since fallen by the wayside, but I was able to put together something in Python which seems to me to work pretty good.

ppd.py is a FOAF parser using xml.dom.minidom to look for a PPD, and parse out a couple basic forms of the Personal Profile Document, for cases in which you can’t bring a full RDF parser to bear on the situation. (I know that the question of when this arises has been argued a million times, but an RDF parser is an extra dependency that some projects simply have no interest in bringing on.)

This parses two basic forms of PPD: one in which the foaf:maker is identified by an rdf:nodeID=”nodename”, or one in which the foaf:maker is identified as an rdf:resource=”#nodename” coupled with a rdf:ID=”nodename”.

This hasn’t been fully tested: it was mostly done as a quick proof of concept that people could expand on. I’ve tested it on the nodeID case, and tested that if it can’t find an appropriate PPD, it falls back (against LiveJournal files). I’m not sure how python-esque my code is, but it does seem to work, which was my primary concern.

As usual, this code is designed to be used at the command line as “python ppd.py http://crschmidt.net/foaf.rdf”, or imported as a module, after which you can run ppd.get_person(”http://crschmidt.net/foaf.rdf”).

Thoughts on the method? Will this work with a sufficiently constrained FOAF doc?

XSLT + Image Regions + Sparql

Posted in Flickr, Image Description, RDF, SPARQL, XSLT on May 22nd, 2005 at 20:05:23

Read Masahide’s notes on XSLT+Image Regions. Used some tools to convert my flickr photos to RDF.

Converted an XSLT Stylesheet to a different result format. Loaded ~400 RDF files into a Model, totalling 33,000 statements. Added an option to my Sparql Interface. Changed the default query. Made the extra option add the stylesheet.

Ran a query. Tweaked until it worked. Typed it all up here, to share with all of you.

Hooray for masahide, flickr, and all kinds of other wonderful things.

Wedged Subversion Repository

Posted in Subversion on May 20th, 2005 at 22:11:57

Earlier this morning, one of my projects subversion repositories got wedged. After figuring out that it was actually wedged (no GET response, no PROPFIND/timeout requiring a kill -9 on svn and svnadmin commands), I started playing with svnadmin. Still didn’t work. Hopped into #svn. Asked, was pointed to FAQ.

Copied current repository to another location before attempting anything else, since I’ve fucked up a BDB based subversion repository attempting to repair it before.

Attempted svnadmin repair /var/www/svn/rdfpython: failed with lots and lots of “PANIC” type errors.

Attempted svnadmin repair ~/newcopyofrepos: That seemed to work. An svnadmin verify ~/newcopyofrepos confirmed that it had.

Made another backup of the repos, removed it, copied the new ~/newcopyofrepos into place.

And the world was good again. A verify/checkout process both verified that the files were all in place, and trac started to work again, and all was well, good and happy.

However, I think that from now on, I may use the fsfs storage method rather than BDB, as this is certainly not the first time this has happened to me or anyone else, and I really think I’m just starting to not trust BDB for mission critical tasks, which is what I consider subversion. My version control is one of the few things that I don’t have completely backed up most of the time: files I can copy around easily, but databases of changes to files typically stay in one place. I could recreate the structure, but I couldn’t really recreate the history, and that’s important to me.

If anyone has any experience with fsfs SVN repositories over BDB based ones, I’d be glad to hear it.

Lynx View

Posted in Technology, Web Publishing on May 20th, 2005 at 19:20:58

A new crschmidt.net webservice:

lynxview, converting based on a domain name to a lynx -dump form. For when you want to show some windows user just how crap their website is with all the graphics turned off.

As a form of demonstration, check out crschmidt.net or Planet Mobile.

Currently, sites are cached eternally, so that the service can’t be used to DDoS some poor site.

Produced in part by a request from DanC on #swig earlier today.

Redland Updates

Posted in RDF, Redland RDF Application Framework, SPARQL, julie on May 19th, 2005 at 23:58:45

Dave released a new Raptor and new Rasqal today. I’ve built both, and rebuilt my Python bindings so I no longer get segfaults (Almost thought it was a bug, then Dave reminded me of previous “bugs” which were my fault).

As a result, all of my tools on both zeus and athena are now running the latest and greatest in the way of SPARQL, meaning the new query syntax (and I believe, new XML output syntax). I still need to update the examples on my PHP pages, but julie’s code is all up to date.

While I was at it, I took the oppourtunity to do some cleanups that I’ve been wanting to do for a while: You can see the revisions on the rdfpython trunk in trac’s timeline, but here’s a summary:

* Did some rewriting on mortenf’s smusher. I now get owl:sameAs triples in the store, so I have a reversible process to some extent for smushing, as well as making the smusher look for the shortest URI rather than just grabbing the first node it sees as “canonical”. Of course, I did this after a lot of URIs got tossed in my last smushing run… ah well, live and learn.
* Moved more code to use the “parse_anything” function that I wrote, which uses heuristics and logic to try and guess what kind of content we’re dealing with. It depends a lot on Content Types, but is also something I can edit and reload without restarting the bot, which is a major boon for me. This means that if something is broken, I can fix it, and make it more robust, without any kind of guilty concious about flooding channels with joins/parts/quits.
* GRDDL support (with newest raptor) in parse_anything. Since ^add is really now parse_anything, this means that if you add a page with a GRDDL description Redland supports, you’ll get the triples out of it.
* Heuristics of queries, guessing which is which. (Really ‘dumb’ right now: it just looks for ” {”, and considers it Sparql if it has it.)

What does this mean to you, dear user?

Well, quite simply, it means that you will probably support more formats (RSS, SVG, HTML+GRDDL, Atom, Turtle, ntriples) with less work (it’s all done through ^add). You can run queries in either the old format (RDQL) or new format (Sparql), or store either one.

I’d say that’s a benefit.

Thanks to Dave for getting new Redland stuff out the door.

The Origins of the Internet

Posted in Reading, default on May 14th, 2005 at 23:59:49

I’ve got a number of books on my “To read” list, most which were given to me as part of my birthday or as random gifts. One of them which was gifted to me by Tom Croucher from my Amazon wishlist (as a thank you for help in his dissertation, I believe). The book is Where Wizards Stay Up Late: The Origins of the Internet, and it’s one of my favorite non-fiction computer-related books.

The book is a relatively detailed study of how the internet came to be: the development of the theories behind it, the actual hardware, the proposals from the Defense Department for the creation. The origin of RFCs, the way TCP/IP was invented, things like AlohaNet and ethernet as well.

It’s because of this book that I still carry around Vinton Cerf’s business card, which I obtained at a 10th anniversary discussion of Mosaic. Cerf was there discussing the idea of an interplanetary internet and how it would work. I was there drooling at the fact that I was in the same room as Vinton Cerf, and actually got up on stage afterwards and shook his hand. I keep that thing well - it used to be in my wallet, now it stays seperate - because it means that much to me, and it was inspired by this book.

So, a big thank you to Tom for the book, and a suggestion that you all go out and read it.