Technical Ramblings

Archive for the 'SPARQL' Category

GovTrack RDF Data

Posted in PHP, Redland RDF Application Framework, SPARQL on April 25th, 2005 at 20:10:02

One of the larger sources of RDF data that I’ve loaded into a database, the GovTrack RDF data is an interesting set with all kinds of information on congressmen adn so on. I recently started paying a bit more attention to the Gargonza Experiment, and found a link to their data source via the wiki.

I’ve been playing with setting up SPARQL stuff all day, and have a couple simple pages set up from my new GovTrack page. Loading the entire dataset (the RDF/XML, at least: the n3 bits I left out for the time being) took a long time, and I did some tweaking of MySQL in the process to allow me to load data faster. Some things I learned, for optimizing loading time with Redland:

1. MySQL’s key cache size is important when loading large data stores.
2. When loading statements, if you really want to optimize your load time, load with contexts. Redland will not check for duplicate statements in this case: This can be a major time saver. However, this may slow down later work, so it will probably not be worth it in the long term.
3. Loading into an already existing Redland database, even in a new model, will not increase speed: since Bnodes, Literals, and Resource tables are database wide, the selects to determine existing statements will still be just as slow as if you were loading into the existing models.

I also discovered that my QueryResults->result() method was returning actual Redland nodes, rather than the wrapped Redland.php::Node. I suppose at one point I probably realized that, but it had slipped my mind. This made it really difficult to do things like deal with optionals: calling the librdf_node_to_string in the PHP bindings causes them to segfault if the node is NULL, and there’s no decent way to check if the node is null that I found.

To compensate, I created a new way to create nodes (basically a copy constructor). This allowed me to check at node creation time whether it was a Resource/Bnode/Literal, which are the only types of Nodes there are. If it’s none of the above, I make it a PHP NULL, which I can check for, and it won’t crash PHP.

I have learned the many different ways to segfault PHP over the past week working on Redland. Of course, they all relate to PHP doing funky things with a SWIG wrapper, but it’s still one of the more interesting experiences I’ve had.

With the new PHP, all of the SPARQL interfaces I’ve got set up: one for Julie, one for XTech, one for GovTrack support Optionals. This has allowed me to create things like the GovTrack Senators page, (example for New Hampshire), listing some profile information about all the Senators from your state. (Representatives are more difficult. I’m still working on that.)

Anyway, the GovTrack data is fun to play with, although I really need to develop some more interesting interfaces over the data. I plan to do that: just haven’t gotten there yet. These tools take time to develop, but they do feel really nifty. I would go into the why’s of why I feel it’s nifty, but I almost always end up feeling like a complete and utter geek when I do it, and it makes people look at me strange, so I’ll skip it this time.

3 Comments »

RDF Query

Posted in Perl, RDF, SPARQL on April 21st, 2005 at 14:50:13

Apparently the anxious type, Greg Williams has thrown together an RDF Query implementation in Perl, with support for the new SPARQL draft as of yesterday.

The library also offers ORDER BY support, something that I’m sure Greg is happy to have for his MT-Redland. Ordering things by date for me is something that I’ve sidestepped, but I’m not looking forward to when I actually have to deal with it.

The code uses Parse::RecDescent to generate a query based only on the SPARQL grammar. Greg mentions that it is slow: most of the time is actually in generating the Query from the Grammar.

If only I was still a Perl hacker… sadly, I’m not, so I suppose I’ll just have to start working on my C in order to help get Redland working with the new draft. (Dave estimates that it will take him about 1.5 months to catch up to the most recent WD of SPARQL.) I’d really love to just be able to use the tools I’ve already written in Python, rather than switching to Perl, or even another backend than Redland. It has worked so well for me so far.

Still, this is the first SPARQL implementation using the new Draft that I’m aware of, even if it is mostly just a hack job, so I think that it’s pretty cool, and my props are out to Greg for his work on it!

1 Comment »

Technical Ramblings

Archive for the 'SPARQL' Category

GovTrack RDF Data

RDF Query

Archives

Categories