On your phone? Check out who I am!

Introduction to Working with Redland and Python

Many of the complaints about the current status of RDF relate to its level of development tools. Although there are a number of tools out there, due to the lower level of general deployment compared to XML, examples of working with RDF are less generally available.

RDF provides a data model which can be used to rapidly deploy application services without the deployment of a full RDBMS and related data tables. Due to RDF's natural ability to represent data without the need for a pre-defined schema, changes can propogate more quickly. Additionally, the use of RDF as the primary data model allows for a level of extensibility, using external data sources to populate internal data with relative ease.

Given the ability to use RDF in this manner - rapidly deployed application services either as a prototype or as a permanant solution, encouraging extensibility and flexibility in the data model - RDF can be an ideal fit for a number of applications.

This article hopes to serve as a demonstration of how to create an application based on RDF, using Redland and its Python bindings to create the full application from start to finish.

Before You Start

Before beginning this tutorial, you should have:

Section One: Creating your first RDF Model

In Redland, a Model is your main dataset. You can have multiple models - stored seperately in the database, loaded from a file, or created locally for a specific purpose. Typically, you will only have one main model for your application, although this may be different given varying use cases. In this first step, we will show how to create a Model, add data to it, and manipulate that data.

Our example dataset in this section will be the creation of a database about books in a house. I have a book named Speaker for the Dead, by Orson Scott Card. This book has a number of properties, such as the author, number of pages, a description, title, etc. This can all be stored in an RDF Model.

First, we create an identifier for our book. In the case of any item which has an ISBN, our task is already complete for us: RFC 3187 defines a URI scheme for describing ISBNs. By this token, Speaker for the Dead has an identifier of isbn:0-81255075-7. We will use this identifier as a URI in the subject of all our statements about the book.

Once we have an identifier for the book, we can create a triple, or a single piece of information, about that identifier. The first statement we will make will be to declare the title of the book. All statements in RDF consist of three parts: an identifier (subject), what kind of statement it is (predicate), and the value of that statement, either in another identifier or as a literal value (object). Predicates are always URIs: that is, they are some, typically dereferencable, URI to describe the type of relationship the object has to the subject. There are a large number of these already available for many common properties: in the case of title, we will be using the DCMI Metadata Terms. For the title, we have the title element - "A name given to the resource. ... Typically, a Title will be a name by which the resource is formally known." The URI for this term is http://purl.org/dc/elements/1.1/title. Lastly, since we know the title of the book, we use this as the object for the statement/fact/triple. In the end, we have three parts to our fact:

Now that we know the general idea behind what we're doing, we can write some code. The first step to working with Redland in Python is to import the Redland Python module, RDF. Attached to this module are a large number of features - you can see these via the pydoc of RDF, or via the Official RDF Module Documentation online. Reading this, we see the Model class, with which we can build our data set. First, we create a model:

import RDF
model = RDF.Model()

Then, we create a statement, and add it to the model.

subject = RDF.Uri("isbn:0-81255075-7")
predicate = RDF.Uri("http://purl.org/dc/elements/1.1/title")
object = RDF.Node("Speaker for the Dead")
statement = RDF.Statement(subject, predicate, object)
model.append(statement)

You can add as many statements to the model as you wish: there is no "bounds checking" at the model level in RDF (other than the preventing of stating the same fact twice, enforced by Redland). To add a new statement, simply follow the above example. Note that there is no need to create the subject, predicate, and object seperately: they can be created in the RDF.Statement call as well with no ill effects.

Once we have this data, however, what do we do with it? In order to use this data in an application, we need to learn how to retrieve the data from a graph based data store. The best way to do this in Redland's RDF module is via the find_statements method of the Model class. This allows us to specify a statement, with any part of the triple pattern missing, and will return all statements which match. We have a unique identifier — the isbn: URI — which we can use to search the model.

statement = RDF.Statement(subject, None, None)
bookdata = model.find_statements(statement)
for fact in bookdata:
    print fact

This code snippet will print out all the RDF statements we have added to our model which match subject, None, None. This will be all the facts associated with our identifier - all the facts about the book.

You now know how to create a model, add a statement, and search the model for statements based on an identifier. However, there are many times when this will not be enough: you will need to search through for a specific part of information, or store more complex data. These tasks will be covered in the next section of this tutorial.

References

Copyright 2003-2007, Christopher Schmidt