Python/Redland Powered RDF Validator

Posted in PHP, Python, RDF, Redland RDF Application Framework, Semantic Web on June 2nd, 2005 at 20:02:24

After some thinking this morning, I converted the current PHP-based crschmidt.net templating system to a Cheetah Python template. This means that some more of my tools can move to being Python powered, rather than PHP powered.

“So what?”

Currently, the interface to Redland that I have available in PHP is significantly less good than Python. It’s coded by yours truly, and it’s basically only designed for my use cases, so every time I want to use something new, I have to go and code it, or use a closer-to-native C-style interface translated into PHP. Neither of those are particularly enjoyable.

Python is a much more comfortable language for me to use. It is more intuitive for me. It feels more natural, not to mention the fact that I keep forgetting semicolons in my PHP code. It has an awesome binding for Redland, which is one of the things that I’ve been working with most over the past while.

In the past, all my scripts had been either 1. PHP or 2. Python with no site theming. Hopefully the new Cheetah template will help make me create some more tools in Python, which is the language I feel most comfortable in.

With that in mind, I’ve created a new crschmidt.net web service: an RDF Validator. A number of times, I have found that the official RDF validator will puke, but won’t give much of a reason why. This tool uses Redland, which has a tendancy to return what I consider better error messages on worse RDF. It’s designed as a one-off example of the new templating system, and should not be considered indicitive of most of the expected output of such scripts. Just a first attempt at getting myself into more code.

In-Feed Feedback

Posted in Semantic Web, Social on June 1st, 2005 at 23:38:23

I’m playing second fiddle to Danny again right now, implementing his Reader Provided Blog Enhancements as a wordpress plugin. Currently I’m posting to a local MySQL table, from which I can pull the relevant information and create different views later.

This is a great example of some code that would be nice to do with XmlHttpRequest: rather than having the post go to a redirect (which is only going to work if the user has referrers on right now, otherwise it just brings to a single page that says it was completed), it could all be done in the client, and the user would never have to leave.

However, there’s a couple problems with this.

1. RSS Aggregators are not web browsers, and depending on the level of the implementation they are using for displaying HTML content, they may not support Javascript at all or not completely. I’m hoping that HTTP POST will actually do something useful for most of them, but even that is a guess.
2. Online aggregators such as LiveJournal oftentimes strip out Javascript to prevent malicious cookiestealing (and for good reason).

So, unfortunately, javascript is out.

Couple changes that will be happening in the meantime while I work on this: RSS feeds will be limited in size to 1 or 2 posts, so that you don’t get change-flooded every time I turn the plugin on or off to test something, and you may see the review boxes appear or disappear.

Anyway, nothing much to see yet, but I will be doing RDF export of annotations provided, so the data isn’t going to be lost, and I will be working to clean up the code and make it “just work” with a WordPress plugin, hopefully. They are surprisingly easy to write. I didn’t realize how simple some of the stuff was. Keep your eyes on the prize!

Oh, and Danny? Your RDF in that post is broken. Missing rdf:RDF, and one of your close tags is missing a /. Thought I’d let you know 😉

Javascript, RDF Searching

Posted in Javascript, PHP, SPARQL on May 31st, 2005 at 11:29:06

I’ve been doing some playing with goofy Javascript stuff lately to try to get my head wrapped around it, since I’m going to be needing to implement it in a few tools at work in the near future.

I’ve so far used it in
1. An admin interface for Athena’s email accounts,
2. An inventory listing for a work project
3. The newest one, a “suggestion” field for Wordnet searches against the RDF store I just imported this morning.

Danny alerted me to the existence of a new Wordnet dataset. I grabbed the full set, dropped it into Redland, and set up a sparql search against it. The top box there is the nifty one though: type in a string (say, apple) and watch the right side as a list of suggestions is populated.

I still need to get it actually doing a Google Suggest-like dropdown box, but haven’t had the time to hack WICK to do what I want as far as that goes.

I’m still learning, and as such, the code is sucky. I wouldn’t recommend reading it for an example: it’s a quick hack, but it works. Still many bugs to work out – for example, if you type apple, it still searches for app, appl, apple in the process. But I’ll get there. (Okay, so I just did a few bug fixes that make it much better, and switched the search mechanism to use MySQL rather than an 11 Meg PHP array. Much better now.)

Anyway, I think it’s cool. RDF people can mark it down in the “another SPARQL datastore”, Javascript people can mark it down as “Another idiot trying to use XmlHttpRequest and doing it wrong.”

Lemme know if you’ve got suggestions!

Hosting Offer

Posted in Web Hosting on May 28th, 2005 at 15:25:25

I’ve offered this to a number of small groups of people, but after working out most of the bugs that I’ve found, I’m ready to offer it to the larger audience that is the readers of this fine publication:

While I have seen many people preferring both Godaddy and Hostgator for their excellent prices and service for their web hosting process, it all depends on business to business which they would prefer.

I am currently the maintainer of a dedicated server, hosted by SagoNet, and the machine is set up to offer virtual hosted domains to any and all who need them. This hosting package includes the following:

  • Web Hosting on Apache 2: All websites are hosted on Apache2 with PHP4 with most common modules installed. If you need a custom module installed, this can typically be worked out as well. Disk space is not limited.
  • FTP access: FTP access directly to web directory, giving full access to access and error logs, as well the ability to create your own directories for storage
  • MySQL access: Each account offers a MySQL database for you to use as you please.
  • Subversion Repository (by request): Revision Control system hosted alongside your website
  • Shell Access (optional, free of charge): If you want shell access, you’ve got it. (Please let me know before I set up the account).
  • SMTP (using SASL authentication), IMAP, POP3, Webmail Access: Send/Receive mail from anywhere: on the road, at home, or anything else. Mail is provided using courier-imap/pop packages with Postfix, and you can have as many addresses as you want.
  • Up to 100GB/month bandwidth: this offers basically limitless bandwidth for anything you need. (I doubt anyone would need this much bandwidth in a month: if you feel that bandwidth is a limiting factor for you, let me know and we will work something out.)
  • Free installation of any packages in Debian Sarge: If there’s something you need, I’ll install it for you.

Stay away from the CPanel hassle. Get yourself on a real machine today, and don’t suffer the limitations of other web hosts! Get full shell access to your data. Gain the freedom of having access to the machine which hosts your files. Or, ya know. Just save me some cash 😉 View different shared web hosting plans and select the one that meets your demands.

Cost for serving depends on setup neccesary, but will not exceed $10 per domain. File a Request and I’ll get you a quote. If someone else is offering the same thing cheaper, I’ll match it.

If you are working on a Semantic Web related project, you may be eligible for reduced price or free hosting. Indicate this in your request.

All quotes will be answered in 24 hours or less.

Flickr Image Region Selection

Posted in Flickr, Image Description, SPARQL on May 26th, 2005 at 22:58:33

One of the things I’ve noticed with my Image Region stuff, which I posted about recently, is that it’s slow. I didn’t really think about why: at first, a lot of it seemed to have to do with the client side XSL, or the CSS cropping of gigantic images.

However, I’m now realizing that this is using a regex with a pretty heavy query: The kind of query that I wouldn’t want anyone to run against julie, because it would just take too long.

The reason for this is Redland’s current REGEX implementation: It basically loads all the literals out of the store and does a regex against them after it has them, which is obviously not ideal.

With that in mind, I tried to think of interesting queries which could be done without requiring a regex, and came up with the idea of flickr images searches: show me a closeup of all the regions in a flickr image of mine.

So, now there’s an additional search box on my SPARQL interface: Flickr ID/URI. It then uses the foaf:page part of the photo to query against, which is obviously much faster.

Maybe I’ll expand this: let people put in any flickr photo ID, and display the information using XSLT against an RDF datasource, with a link to the output of the datasource. I’ve got all the tools to do it now running in Python locally, so I don’t think it would be too difficult: I would need to get some error parsing together though. I really wish I could tie PHP / Python code on the web together more easily though…

Anyway, an example: Flickr Page to RDF generates Regions.

Try it out at The SPARQL search. As always, data and query are shown inside the source of the page, at the bottom.

More on Image Regions

Posted in Flickr, Geolocation, Image Description, RDF, SPARQL on May 23rd, 2005 at 18:43:40

My post last night was a bit cryptic, so let me walk through a bit more clearly what I’ve been doing, since I seem to have picked up the interest of some more people.

I currently am using Flickr to annotate my photos: primarily because I like their image region annotations, and partially because their API offers me a way to get lots of data out that I’ve put in, which is useful to me. So, that’s what I’m using for photo annotation at the moment, which may change at any point.

Masahide has a flickr2rdf service: flickr2rdf takes a Flickr Photo page URI and exports RDF from it: For example, a picture of myself, my ex girlfriend, and Foghorn Leghorn can be seen, fully annotated, using XSLT+RDF, via the flickr2rdf tool.

Additionally, the original photos stored at flickr (full size) have EXIF information: this information can be exported via Masahide’s equally cool exif2rdf tool: Foghorn Leghorn Example.

Once I have the photo_id of a photo, I can collect all these statements together. Additionally, since I am using tags from GeoBloggers for geolocation, I have a tool which parses out these tags (using the Flickr API) and creates Geo data for them.

I add a few tracking statements: specifically, links to seeAlso the final RDF/XSLT view of the image, (again, Foghorn Leghorn example). I serialize the Model out from Redland, and get a directory full of files full of RDF singletons. From here, I use cwm to process the singletons into an abbreviated RDF/XML file. These files are then synced to the http://crschmidt.net/albums/flickr/ directory. Here, I use a couple little tricks to add an XSLT declaration as the first line of each file, so that the content negotiated version offers XML delivered as application/xml, rather than just application/rdf+xml (which Firefox won’t display in a browser).

Next step is to add each of these files into an RDF model. Since I’m still occasionally changing statements, I’ve been dropping the whole model and readding every time: this doesn’t take too long, as it’s only a few hundred files, and Redland is speedy quick.

So, now we have a database full of RDF statements. Fine. But that’s not too useful. So, I have my SPARQL query interface. Which is all well and good, for people who have lots of knoweldge of RDF. It can provide some cool results.

But it doesn’t really do anything *fun*. So last night, I added an optional checkbox, that said “If you ahve something in a specific query format, process an XSLT file against it”. I tweaked this XSLT from masahide’s example, linked yesterday, into what it is now, which you can see, if you’re interested.

Well, that’s all well and good, but most people don’t understand SPARQL enough to know what they should type in. What’s the use of having to learn a language just to see some pictures? So, my next step was to add a search box specifically for Regions: my sparql page has a search box now specifically for this purpose.

I realized after a couple times, though, that using client side XSLT to process the XML was really slow, clunky, and generally ugly. Not to mention that Mozilla’s XSLT doesn’t let me disable-output-escaping on variables: so, I installed php4-xslt, and started using that implementation on the server side.

Yeah, that’s all well and good too, but now my pretty RDF with queries and all went away! So, I added them back: at the end of the Foghorn Search, in a comment, you’ll see:

Generated using the XSLT stylesheet at http://crschmidt.net/xslt/imgreg.xsl against the data generated by the query:

PREFIX dc: <http ://purl.org/dc/elements/1.1/>
PREFIX foaf: <http ://xmlns.com/foaf/0.1/>
PREFIX imgreg: <http ://www.w3.org/2004/02/image-regions#>
SELECT ?img,?title,?page,?desc,?atitle,?coord
WHERE {
?img
dc:title ?title;
foaf:page ?page;
dc:description ?desc;
imgreg:hasRegion ?r.
?r
dc:title ?atitle;
imgreg:coords ?coord.
FILTER REGEX(?atitle ,”Foghorn”) }

Data was:

followed by the XML version of the SPARQL query results.

Another interesting example: Schmidt – myself, family members, and others.

Anyway, being a bit more informative seemed appropriate given the situation. So there’s my implementation toy of the day.

PersonalProfileDocument Parsing

Posted in FOAF, Python on May 23rd, 2005 at 18:14:44

Earlier today, on the OpenID mailing list, I was asked to supply Perl code to look for PPDs in FOAF docs and return some basic props on the user who owned the FOAF file. My Perl skills have long since fallen by the wayside, but I was able to put together something in Python which seems to me to work pretty good.

ppd.py is a FOAF parser using xml.dom.minidom to look for a PPD, and parse out a couple basic forms of the Personal Profile Document, for cases in which you can’t bring a full RDF parser to bear on the situation. (I know that the question of when this arises has been argued a million times, but an RDF parser is an extra dependency that some projects simply have no interest in bringing on.)

This parses two basic forms of PPD: one in which the foaf:maker is identified by an rdf:nodeID=”nodename”, or one in which the foaf:maker is identified as an rdf:resource=”#nodename” coupled with a rdf:ID=”nodename”.

This hasn’t been fully tested: it was mostly done as a quick proof of concept that people could expand on. I’ve tested it on the nodeID case, and tested that if it can’t find an appropriate PPD, it falls back (against LiveJournal files). I’m not sure how python-esque my code is, but it does seem to work, which was my primary concern.

As usual, this code is designed to be used at the command line as “python ppd.py http://crschmidt.net/foaf.rdf”, or imported as a module, after which you can run ppd.get_person(“http://crschmidt.net/foaf.rdf”).

Thoughts on the method? Will this work with a sufficiently constrained FOAF doc?

XSLT + Image Regions + Sparql

Posted in Flickr, Image Description, RDF, SPARQL, XSLT on May 22nd, 2005 at 20:05:23

Read Masahide’s notes on XSLT+Image Regions. Used some tools to convert my flickr photos to RDF.

Converted an XSLT Stylesheet to a different result format. Loaded ~400 RDF files into a Model, totalling 33,000 statements. Added an option to my Sparql Interface. Changed the default query. Made the extra option add the stylesheet.

Ran a query. Tweaked until it worked. Typed it all up here, to share with all of you.

Hooray for masahide, flickr, and all kinds of other wonderful things.

Wedged Subversion Repository

Posted in Subversion on May 20th, 2005 at 22:11:57

Earlier this morning, one of my projects subversion repositories got wedged. After figuring out that it was actually wedged (no GET response, no PROPFIND/timeout requiring a kill -9 on svn and svnadmin commands), I started playing with svnadmin. Still didn’t work. Hopped into #svn. Asked, was pointed to FAQ.

Copied current repository to another location before attempting anything else, since I’ve fucked up a BDB based subversion repository attempting to repair it before.

Attempted svnadmin repair /var/www/svn/rdfpython: failed with lots and lots of “PANIC” type errors.

Attempted svnadmin repair ~/newcopyofrepos: That seemed to work. An svnadmin verify ~/newcopyofrepos confirmed that it had.

Made another backup of the repos, removed it, copied the new ~/newcopyofrepos into place.

And the world was good again. A verify/checkout process both verified that the files were all in place, and trac started to work again, and all was well, good and happy.

However, I think that from now on, I may use the fsfs storage method rather than BDB, as this is certainly not the first time this has happened to me or anyone else, and I really think I’m just starting to not trust BDB for mission critical tasks, which is what I consider subversion. My version control is one of the few things that I don’t have completely backed up most of the time: files I can copy around easily, but databases of changes to files typically stay in one place. I could recreate the structure, but I couldn’t really recreate the history, and that’s important to me.

If anyone has any experience with fsfs SVN repositories over BDB based ones, I’d be glad to hear it.

Lynx View

Posted in Technology, Web Publishing on May 20th, 2005 at 19:20:58

A new crschmidt.net webservice:

lynxview, converting based on a domain name to a lynx -dump form. For when you want to show some windows user just how crap their website is with all the graphics turned off.

As a form of demonstration, check out crschmidt.net or Planet Mobile.

Currently, sites are cached eternally, so that the service can’t be used to DDoS some poor site.

Produced in part by a request from DanC on #swig earlier today.