Technical Ramblings

Archive for the 'Redland RDF Application Framework' Category

OS X 10.4 Compile failures due to libtool

Many different projects (Redland, svn, and wxWindows included) have seen cases where users have attempted to compile, and seen the errors:

/usr/bin/libtool: for architecture: cputype (16777234) cpusubtype (0) file: -lSystem is not an object file (not allowed in a library)

or similar, posted to the mailing lists of these projects.

Only one place that I’ve been able to find so far (and not easily!) has the answer:

This is the typical error you get when you do an upgrade install of
Panther -> Tiger but you don’t install the Tiger Developer Tools
(Xcode 2.0). Don’t do that (Do upgrade your dev tools)

(From darwinports mailing list.)

Thank you ssen! Now, anyone else who has seen this will hopefully be able to read this post for now and into the future.

Ran into this attempting to compile gpsd.

Comments Off on OS X 10.4 Compile failures due to libtool

RDF as Backing Storage

Posted in Redland RDF Application Framework, Semantic Web, SPARQL on August 20th, 2005 at 09:37:34

So often lately, I’ve seen some prominent people in the Semantic Web advising that the possibilities of using RDF as a backing store for an application are great, and that maybe people should use “SPARQL as a query language for the application”.

Stop. Don’t do that.

Right now, we’re in a situation where RDF implementations are really usable – if you are aware of their limitations, and avoid them. This is also true in MySQL: You don’t make every one of your queries include half a dozen joins, so you don’t want to do a similar thing in SPARQL. The problem is that with RDF query languages, unless you’ve spent a lot of time with them, you’re working on something that’s much more difficult to understand the work behind. It’s extremely easy to write a query that is not well optimized that take the application a very long time to compute, at the cost of making RDF and SPARQL look as if they are too slow.

They’re not.

For small size web applications, there is no real reason that a properly built site could not be extremely quick, and built on RDF tools. (At the current point in time, I won’t say the same for large applications: I don’t have any knowledge of what happens to triplestores once they get past 2 million triples.) However, most queries that people write are slow, simply because they are not optimized. Depending on how exactly the application-level query translator works, you may be dealing with something that’s not completely optimized, which can have an extreme negative impact on your query time.

An example? Well, my knowledge is mostly in Redland, so I’ll just toss out a query via julie, the redlandbot.

09:05:18 < crschmidt> ^q select ?i where { <http://crschmidt.net/foaf.rdf#crschmidt> foaf:knows ?p. ?p foaf:nick ?i. }
09:05:18 < julie> solcita, bluemoonshark, telepwen, jayo, littledownpour, jessicacmalfoy, csogilvie, alacrity, wyvernbitch, pne,
luxtiger, danceinacircle, bertho, ursamajor, pie_is_good, jessical, danbri, ryanbyers, shupe, thebubba, kangarooofdoom,
neviachiel, kamara, joanna4136, raventhon, evilcat84, chrisg, nostrademons, coffeechica, fracturedfaerie, nyxie,
siren52684, pthalogreen, ChemicalLace, zach, seymour, adcott, girlxfriday, meinterrupted, biztheinsane,
sarah_mascara, busbeytheelder, tinyjo, rho, xtremesaints, sherm, mendel, acerbic, thewildrose, bobert225,
sporkmistress, isabeau, beginning, supersat, braindrain, ratkrycek, opal1159, maryam, uberzacker, lor22ms, burr86,
comeseptember, rahaeli, pezstar, girlfriday, xavier, jproulx, roy

That’s right, those results on a hot database are returned in less than one second. On the other hand, if you do the query in the opposite order:

09:05:39 < crschmidt> ^q select ?i where { ?p foaf:nick ?i. <http ://crschmidt.net/foaf.rdf#crschmidt< foaf:knows ?p. }
09:18:29 < julie> solcita, bluemoonshark, acerbic, jayo, bobert225, …

You have a multi-minute wait. (I’ll fill this in when it comes back: it’s been 3 minutes so far, and still going.) Ah, it finished. Just shy of 13 minutes. Someone less versed in Redland would look at that and say “Why?”

Redland’s query engine, Rasqal, works based on the triples it finds as it goes through the query. In this case, for the first query, it finds approximately 100 triples: “Here are the foaf:knows triples pointing from the crschmidt node. Now let’s go find their nicks.” It then has approximately 100 distinct subjects to match in the second part of the query. Now, look at the second query: You’ll notice that the first triple pattern is going to match a lot more triples than the first: in this case, I think it’s approximately 20,000 triples. Then, each of those triples will be matched against the second part of the query. It has to ask approximately 200 times as many questions to the triplestore behind the data. That’s 200 times as long to wait to get all the data out. Since most of the data will probably be “cold” (not cached in the MySQL table cache), you’ll not only be waiting to get it out, but you’ll also probably be emptying out the MySQL cache of anything but this one useless query. All because you got your triple patterns mixed up.

Perhaps this is just a Redland problem: I don’t know. I’ve not used any of the Java-based tools, and I don’t know of any other non-Redland tools for working with SPARQL against a large data store. However, it is an image problem when you advise or offer to “just use SPARQL”: People do not, by intuition, recognize that the above two queries will be any different. Since it’s extremely hard to notice the differences on something that is small scale, it’s hard to catch these mistakes when they start showing up without a fair amount of experience in the application level RDF tool. Ah, it finished, so back up to show you timeframes…

Before you start using SPARQL for an application query language, consider the application you’re using, and how you can optimize it. SPARQL queries can be made to run much faster, generally, if you have a good knowledge of what you’re doing. However, working with the raw triples using the methods the library provides will oftentimes provide a level of insight as to what’s actually going on under the hood that can be extremely useful for knowing how things can work better.

I think that people need to stop advocating SPARQL as the end-all, be all solution for everything in RDF. I’m sure that it can be great, and that there’s tons of great ways to use it. However, one of the most popular RDF engines does not work well with most SPARQL queries that I’ve seen people throw at it. The first step in learning how to properly use SPARQL (for any kind of time-neccesary things: anything over HTTP basically counts here) is learning how to properly manipulate the triples in the way that works best in the application without the query language.

11 Comments »

Python/Redland Powered RDF Validator

Posted in PHP, Python, RDF, Redland RDF Application Framework, Semantic Web on June 2nd, 2005 at 20:02:24

After some thinking this morning, I converted the current PHP-based crschmidt.net templating system to a Cheetah Python template. This means that some more of my tools can move to being Python powered, rather than PHP powered.

“So what?”

Currently, the interface to Redland that I have available in PHP is significantly less good than Python. It’s coded by yours truly, and it’s basically only designed for my use cases, so every time I want to use something new, I have to go and code it, or use a closer-to-native C-style interface translated into PHP. Neither of those are particularly enjoyable.

Python is a much more comfortable language for me to use. It is more intuitive for me. It feels more natural, not to mention the fact that I keep forgetting semicolons in my PHP code. It has an awesome binding for Redland, which is one of the things that I’ve been working with most over the past while.

In the past, all my scripts had been either 1. PHP or 2. Python with no site theming. Hopefully the new Cheetah template will help make me create some more tools in Python, which is the language I feel most comfortable in.

With that in mind, I’ve created a new crschmidt.net web service: an RDF Validator. A number of times, I have found that the official RDF validator will puke, but won’t give much of a reason why. This tool uses Redland, which has a tendancy to return what I consider better error messages on worse RDF. It’s designed as a one-off example of the new templating system, and should not be considered indicitive of most of the expected output of such scripts. Just a first attempt at getting myself into more code.

2 Comments »

Redland Updates

Posted in julie, RDF, Redland RDF Application Framework, SPARQL on May 19th, 2005 at 23:58:45

Dave released a new Raptor and new Rasqal today. I’ve built both, and rebuilt my Python bindings so I no longer get segfaults (Almost thought it was a bug, then Dave reminded me of previous “bugs” which were my fault).

As a result, all of my tools on both zeus and athena are now running the latest and greatest in the way of SPARQL, meaning the new query syntax (and I believe, new XML output syntax). I still need to update the examples on my PHP pages, but julie’s code is all up to date.

While I was at it, I took the oppourtunity to do some cleanups that I’ve been wanting to do for a while: You can see the revisions on the rdfpython trunk in trac’s timeline, but here’s a summary:

* Did some rewriting on mortenf’s smusher. I now get owl:sameAs triples in the store, so I have a reversible process to some extent for smushing, as well as making the smusher look for the shortest URI rather than just grabbing the first node it sees as “canonical”. Of course, I did this after a lot of URIs got tossed in my last smushing run… ah well, live and learn.
* Moved more code to use the “parse_anything” function that I wrote, which uses heuristics and logic to try and guess what kind of content we’re dealing with. It depends a lot on Content Types, but is also something I can edit and reload without restarting the bot, which is a major boon for me. This means that if something is broken, I can fix it, and make it more robust, without any kind of guilty concious about flooding channels with joins/parts/quits.
* GRDDL support (with newest raptor) in parse_anything. Since ^add is really now parse_anything, this means that if you add a page with a GRDDL description Redland supports, you’ll get the triples out of it.
* Heuristics of queries, guessing which is which. (Really ‘dumb’ right now: it just looks for ” {“, and considers it Sparql if it has it.)

What does this mean to you, dear user?

Well, quite simply, it means that you will probably support more formats (RSS, SVG, HTML+GRDDL, Atom, Turtle, ntriples) with less work (it’s all done through ^add). You can run queries in either the old format (RDQL) or new format (Sparql), or store either one.

I’d say that’s a benefit.

Thanks to Dave for getting new Redland stuff out the door.

Comments Off on Redland Updates

PHP and Redland

Posted in PHP, Redland RDF Application Framework on May 8th, 2005 at 09:49:27

Recently, I moved most of my serving to a colocated machine, so I finally have a “Testing” machine and a “stable” machine, leaving me more free to play around locally. This has led to me installing a Rasqal nightly release and installing it, in an attempt to get the newer SPARQL query syntax working in my RDF bot, so that I can test query type detection and the like.

I had no problems installing it: very simple, just download the nightly, ./configure, make, make install. I got it working in my local “julietest”, although I’m waiting until the next release before I consider installing it on the remote server.

I got it working in PHP from the command line, no problems.

However, no matter what I do, the web version still seems to be using the old query syntax, and I don’t have any clue why. If you go to http://zeus.crschmidt.net/julie/sparql, you can test it out, and it only returns data if you use the old query format. However, if I copy the same script locally, and run the exact same query, it doesn’t work, requiring the new format.

I don’t understand it, and I don’t know if anyone else does either. The PHP in Apache2 and CLI both have almost exactly the same phpinfo(), they both have the same extension directory, and there isn’t a second copy of redland.so for the Apache version to load anyway! If anyone has run into this problem before and knows how to fix it, I’d appreciate it, because right now I’ve given up and am waiting for a release before I debug further.

(This post brought to you in part by the effort to bump all of Danny’s off of PlanetRDF while he’s on vaccation. ;))

2 Comments »

GovTrack RDF Data

Posted in PHP, Redland RDF Application Framework, SPARQL on April 25th, 2005 at 20:10:02

One of the larger sources of RDF data that I’ve loaded into a database, the GovTrack RDF data is an interesting set with all kinds of information on congressmen adn so on. I recently started paying a bit more attention to the Gargonza Experiment, and found a link to their data source via the wiki.

I’ve been playing with setting up SPARQL stuff all day, and have a couple simple pages set up from my new GovTrack page. Loading the entire dataset (the RDF/XML, at least: the n3 bits I left out for the time being) took a long time, and I did some tweaking of MySQL in the process to allow me to load data faster. Some things I learned, for optimizing loading time with Redland:

1. MySQL’s key cache size is important when loading large data stores.
2. When loading statements, if you really want to optimize your load time, load with contexts. Redland will not check for duplicate statements in this case: This can be a major time saver. However, this may slow down later work, so it will probably not be worth it in the long term.
3. Loading into an already existing Redland database, even in a new model, will not increase speed: since Bnodes, Literals, and Resource tables are database wide, the selects to determine existing statements will still be just as slow as if you were loading into the existing models.

I also discovered that my QueryResults->result() method was returning actual Redland nodes, rather than the wrapped Redland.php::Node. I suppose at one point I probably realized that, but it had slipped my mind. This made it really difficult to do things like deal with optionals: calling the librdf_node_to_string in the PHP bindings causes them to segfault if the node is NULL, and there’s no decent way to check if the node is null that I found.

To compensate, I created a new way to create nodes (basically a copy constructor). This allowed me to check at node creation time whether it was a Resource/Bnode/Literal, which are the only types of Nodes there are. If it’s none of the above, I make it a PHP NULL, which I can check for, and it won’t crash PHP.

I have learned the many different ways to segfault PHP over the past week working on Redland. Of course, they all relate to PHP doing funky things with a SWIG wrapper, but it’s still one of the more interesting experiences I’ve had.

With the new PHP, all of the SPARQL interfaces I’ve got set up: one for Julie, one for XTech, one for GovTrack support Optionals. This has allowed me to create things like the GovTrack Senators page, (example for New Hampshire), listing some profile information about all the Senators from your state. (Representatives are more difficult. I’m still working on that.)

Anyway, the GovTrack data is fun to play with, although I really need to develop some more interesting interfaces over the data. I plan to do that: just haven’t gotten there yet. These tools take time to develop, but they do feel really nifty. I would go into the why’s of why I feel it’s nifty, but I almost always end up feeling like a complete and utter geek when I do it, and it makes people look at me strange, so I’ll skip it this time.

3 Comments »

Redland PHP Wrapper

Posted in PHP, RDF, Redland RDF Application Framework on April 17th, 2005 at 20:01:46

Today, I was working with the XTech Stuff, and decided I wanted to offer some fun Redland-based queries against it. Since the entire website is in PHP, I decided to stick with that theme, and write some PHP code.

I had the PHP bindings installed from a couple days ago, for… something I don’t exactly remember. I had some grand goal in mind… oh, right, I was going to provide my logo information in RDF, and parse it out using PHP.

Something I realized today is that there is no decent Redland Wrapper class like there is for Python and Perl. SWIG provides interfaces, but that basically just gets you to the level of the C API, which is something that is a bit low level for me.

To resolve this, I’ve written a PHP Wrapper class, which I hope to maintain and improve upon. It is stored in a subversion repository: you can check it out using:

svn co http://crschmidt.net/svn/redland/

Please feel free to use the trac project to help with the project.

Status: Beta Quality. Has only been tested using included test.php script. Does not do proper memory checks in any/most cases.
License: This wrapper is released under the same license as Redland itself.
Homepage: phpwrapper.

Comments Off on Redland PHP Wrapper

RDF + GPG

Posted in RDF, Redland RDF Application Framework, Semantic Web on April 9th, 2005 at 14:35:34

One of my eventual goals is to have julie replace all the features of wh4 (libby’s query bot) and foafbot (edd’s community IRC bot). One thing that edd’s bot did that julie doesn’t is to verify data based on signed documents, and to use this information as a “provenance” for the data: not just “where was it said”, but actually verifying “who said it”.

Dealing with GPG is not nearly as easy as I really think it should be. Take Redland as an example: You can interact with the library at all kinds of levels, from the base swig wrapper to the hand-written RDF.py module wrapped around it, and you can do just about anything the base library does from within Python.

GPG, on the other hand, is very hard to work with from a library level. There is a Python module for working with GPG, but it equates to simply using the command line tool in the end. You can’t tell it “Check this document”: you basically just have tools to create a pipe to GPG, and pass the options in the same way you would on the command line. Add to that that it’s Yet Another Dependancy which is somewhat of a pain to resolve in Python, and you can see why it’s slightly annoying for people who might want to use GPG.

Wondering what edd had done for FOAFbot back when it was running, I decided to grab his code and play with it. Turns out he just opened a pipe to gpg using the commands module in Python. This seemed simple enough to me, so I ripped out some of his code and turned it into a little script.

With that, I announce the release of rdfgpg, a tool for verifying the signature described by an RDF document. It uses the Redland Python bindings, and the usage is:

python rdfgpg.py http://example.org/urlof.rdf

Optionally, you can add a second argument, to set the debug argument, which will show more information about what’s going on in the background, which may help if something that you expect to work isn’t. Additionally, you can easily import the module and use the function rdfgpg.verify_url(url), which returns a list of email addresses on the signing key.

The code is released under a GPL 2.0 license, and is stolen in large part from the FOAFbot code released by Ed Dumbill. Feature requests via comments or email.

Hopefully with this, I’ll start to actually use it in my tools, to verify provenance when possible, and to start convincing people to sign their files. I hate to think what would happen to the semantic web if people suddenly started creating lots of false documents… but hopefully it’s not quite that popular yet.

2 Comments »

Parsing SVG Metadata

Posted in Python, RDF, Redland RDF Application Framework, Semantic Web, SVG on April 7th, 2005 at 15:12:48

How to Parse SVG Metadata, the Redland + Python way:

import urllib
import xml.dom.minidom as minidom
import RDF

m = RDF.Model()
p = RDF.Parser()
u=urllib.urlopen(“Location Of SVG File”)
svg = u.read()
doc = minidom.parseString(svg)
p.parse_string_into_model(m, doc.getElementsByTagName(“rdf:RDF”)[0].toxml(), “Location of SVG File”)
print m

In other words: Bring in the RDF and minidom modules, Create an RDF model and parser, download the SVG file to a string, parse the string into a minidom compatible variable, then look for RDF in the SVG file, parsing it into the model, and serializing the model.

Problems: What if someone uses something that’s not rdf: as the prefix?
Solutions: mattmcc offers that minidom supports getElementsByTagNameNS, so the parse line would become:
p.parse_string_into_model(m, doc.getElementsByTagNameNS( “http://www.w3.org/1999/02/22-rdf-syntax-ns#”, “RDF”)[0].toxml(), “Location of SVG File” )
resolving the Namespace issue.

Of course, since this is Redland, this is taken care of for you. Rather than doing it in this way, which is specific to SVG, we can scan for RDF in any XML doc. Simply:

import RDF
m=RDF.Model(); p=RDF.Parser()
p.set_feature(“http://feature.librdf.org/raptor-scanForRDF”, “1”)
p.parse_into_model(m, “URL Of SVG File”)

There are a number of other features you can use with a Parser. They are available via rapper -f help, but here’s a list: assumeIsRDF, allowNonNsAttributes, allowOtherParsetypes, allowBagID, allowRDFtypeRDFlist, normalizeLanguage, nonNFCfatal, warnOtherParseTypes, checkRdfID.

Naturally, Redland already does what I want it to do. Another pat on the back for Dave (and thanks to him for pointing it out).

Comments Off on Parsing SVG Metadata

Redland Bug Squish

Posted in Redland RDF Application Framework on December 19th, 2004 at 19:30:27

Chris: 2, Redland: 0

For quite a while, I’ve been having some problems with installing the latest version of the Redland RDF Application Framework. I’m sure that these were problems of my own: I know that my system is a little goofy, in part because I’ve got a combination of ebuild-built and non-ebuild versions of things. So, for a while, I’ve been unable to compile the lateset Redland, because whenever I’d try and compile, the system-wide redland-config would get in the way, and then Redland would try to link against the already installed libraries, etc. etc.

However, yesterday I started having issues with julie crashing, returning a glibc exception. I wrote a small test script, and found that the issue was occuring any time I was using two “AND” clauses in an RDQL query. I recompiled glibc with debug info this morning, and was amazed to find that my backtraces suddenly became much more informative: Instead of gdb returning ?? in ?? in my backtrace, it actually gave me function names and line numbers. I didn’t think that just recompiling glibc would have helped so much.

After I was able to do this, I sent mail to the Redland-dev list: Here’s my problem, here’s my backtrace, where do I go from here? That Email got several responses back and forth from Dave Beckett (maintainer of Redland) and Morten Fredrickson (maintainer of the MySQL storage backend). I ended up stopping all services which used Redland, uninstalling redland completely, and making the latest release successfully. In combination with discussion on IRC, I moved onto testing with valgrind, an open-source memory debugger for x86-GNU/Linux and ppc-GNU/Linux. Sending the data dump from that helped Dave to find the problem, which was related to the way that the mysql backend frees results.

One patch later, the problem was cleared up. There’s still the question of why it broke, which seems to be related to some broken statements in my database. However, it’s good to know that I helped to find a bug in Redland, rather than just finding something broken locally.

In the end, I did learn a few things:

1. Having glibc debug symbols on is a Good Thing.
2. If you can’t install the latest version, try uninstalling the previous version first.
3. valgrind is an app which may help you locate the source of bad free() calls
4. glibc has some extra sanity check memory detection: you can adjust its behavior via MALLOC_CHECK_. Levels of sanity are 0->3, with 0 being “display error message and kill application” and 3 being “do nothing”. (I found this via fedora core release notes.)

I also was able to compile new Redland bindings: this means that I fixed my Perl Query problem (which has been fixed for a while, but I couldn’t install new bindings), as well as adding Sparql support to my local Redland tools. Now, I just need to write something that uses it.

So, in the end, I helped squish a bug, and upgraded all my Redland related stuff on my system. That’s what I call a good day.

Comments Off on Redland Bug Squish

Archive for the 'Redland RDF Application Framework' Category

OS X 10.4 Compile failures due to libtool

RDF as Backing Storage

Python/Redland Powered RDF Validator

Redland Updates

PHP and Redland

GovTrack RDF Data

Redland PHP Wrapper

RDF + GPG

Parsing SVG Metadata

Redland Bug Squish

Archives

Categories