Parsing SVG Metadata

How to Parse SVG Metadata, the Redland + Python way:

import urllib
import xml.dom.minidom as minidom
import RDF

m = RDF.Model()
p = RDF.Parser()
u=urllib.urlopen(“Location Of SVG File”)
svg = u.read()
doc = minidom.parseString(svg)
p.parse_string_into_model(m, doc.getElementsByTagName(“rdf:RDF”)[0].toxml(), “Location of SVG File”)
print m

In other words: Bring in the RDF and minidom modules, Create an RDF model and parser, download the SVG file to a string, parse the string into a minidom compatible variable, then look for RDF in the SVG file, parsing it into the model, and serializing the model.

Problems: What if someone uses something that’s not rdf: as the prefix?
Solutions: mattmcc offers that minidom supports getElementsByTagNameNS, so the parse line would become:
p.parse_string_into_model(m, doc.getElementsByTagNameNS( “http://www.w3.org/1999/02/22-rdf-syntax-ns#”, “RDF”)[0].toxml(), “Location of SVG File” )
resolving the Namespace issue.

Of course, since this is Redland, this is taken care of for you. Rather than doing it in this way, which is specific to SVG, we can scan for RDF in any XML doc. Simply:

import RDF
m=RDF.Model(); p=RDF.Parser()
p.set_feature(“http://feature.librdf.org/raptor-scanForRDF”, “1”)
p.parse_into_model(m, “URL Of SVG File”)

There are a number of other features you can use with a Parser. They are available via rapper -f help, but here’s a list: assumeIsRDF, allowNonNsAttributes, allowOtherParsetypes, allowBagID, allowRDFtypeRDFlist, normalizeLanguage, nonNFCfatal, warnOtherParseTypes, checkRdfID.

Naturally, Redland already does what I want it to do. Another pat on the back for Dave (and thanks to him for pointing it out).

Comments are closed.