Build Your Own API Support

Posted in Ning, Social, Web Publishing on October 5th, 2005 at 18:12:39

I noticed someone had said Marc Canter wrote about Ning. I think that he may have missed an important part of what Ning is about, though.

Why?

Well, let’s start from the top:

I’m pleading with Gina Bianchini to have Ning PLEASE support the PeopleAggregator APIs once it’s out – and I don’t see any reason why she won’t.

No longer does the “Someone else needs to do it” mentality need apply: Applications on Ning are open source. Code can be mixed, cloned, and run any way you want to — including a way to load files and modules from other applications! So, if you have an API you want to support, support it: just develop it and let people know to use XN_Application::includeFile(), document it for ning users, and you can develop your API for whatever you want and have other users use it.

So, once the API for the website you’re talking about is complete — write some code, and put it on Ning, then get people to use it. That’s what Ning is about: Sharing, putting things together, and bringing “View Source” back to the people. This is your chance to make good on something web browsers learned that Macromedia never has: the ability to look at the way something works inside is a huge boon to development, as I think we’ll see as time goes on.

Of course, there’s also the question of whether Canter still believes that Andreesen: sure as hell hasn’t done shit since – what 1995? 😉

Ning Content Store

Posted in Ning, RDF, Web Publishing on October 5th, 2005 at 11:24:15

I mentioned in an earlier post about Ning today that I felt that the content model used by functionally no different from RDF. This needs a bit of explanation, I’m sure, and thus far there isn’t a lot of talk out there about how to actually use the content store. (This is probably related to the significant lack of beta developer accounts so far, something that will hopefully change in the near future.)

A quick tutorial on creating Content Objects on Ning first (drawn from XN_Content documentation):

$object = XN_Content::create(“TypeOfObject”, “Title of Object”, “Description of Object”);
$object->save();

This creates an object with a Type, Title, Description. These are the three most commonly used “System” attributes — they are present on almost every object created in some form or another, so they can be used in displaying that data.

In addition to these system attributes, there is the ability to add developer attributes:

$object->my->name = “Christopher Schmidt”;
$object->my->age = 21;
$object->save();

If we look at the content for this object, via the $object->debugHTML() method, we see:

XN_Content:
  id [198390]
  createdDate [2005-10-05T06:50:23-07:00]
  updatedDate [2005-10-05T06:50:23-07:00]
  type [TypeOfObject]
  title [Title of Object]
  description [Description Of Object]
  tagCount [0]
  contributorName [crschmidt]
  ownerName [Example App]
  ownerUrl [exapp]
  isPrivate [false]
  my attributes [
    [-1] name : Christopher Schmidt : string
    [-1] age : 21 : number
  ]

Here we see some interesting bits that we hadn’t before: the id, Date, and owner fields are all useful for where the data is coming from, and the contributorName for who the data is coming from. For the time being, however, we’ll concentrate on the my attributes and the id and type system attributes.

Mapping this to an RDF representation is relatively simple:

ning:198390 a TypeOfObject;
dc:title "Title of Object";
dc:description "Description of Object";
ning:name "Christopher Schmidt";
ning:age "21" .

You’ll see here the main issue with representing Ning content objects as RDF: the fact that types and predicates (developer attributes) are stored only as strings, not the URIs that RDF typically asks for. This results in a less descriptive format than you would typically find in most RDF descriptions: there’s no URIs for predicates, so you can’t do some of the “magic” that RDF is famous for.

However, I have been working with RDF for more than 1.5 years now, and I have never had any use for that magic.

The ability to uniquely identify predicates may be useful in a general sense, and it provides the ability to accurately and adequately describe these predicates for use in other systems… but at the base level, they are designed to be able to attribute semantics to the terms. In many cases, these semantics are simply unneccesary: if I have a Book with an “isbn” property, I probably know what it’s going to be.

If you do need this level of semantics, all hope is not lost! You can still achieve your goals! Typically, apps use an attribute in only one way, and with each content object is the application which owns it. (In this case, exapp — which doesn’t exist, for the record. The minimum length is 6 characters.) So, you can take the URL for the app (http://exapp.ning.com/) and create a URL based on that: whether it’s in the app’s namespace (squatting on a URL that it isn’t using, for example) or in your own. I could coin http://crschmidt.net/ns/ning-exapp# for a prefix to predicate URLs in this situation.

The semantics attached to a predicate are extremely weak in the Ning content store — but that doesn’t mean that they are useless. It is my opinion that a simple RDF representation of Ning content objects can be useful in many cases, and that the model that is in use succeeds in proving that the RDF model as an application data store is not useless, but instead can lead to rich applications sharing data, as Ning is specifically designed to accomplish.

One thing I didn’t mention above is that it is possible to create links, not only to literals and numbers, but also to other content objects:

$object->my->otherContent = XN_Content::create(“OtherObject”);
$object->save();

This creates a link to another content object, by that object’s ID. This fits perfectly into the RDF model, where URLs identify an object: the Content Object’s ID is its unique identifier within the Ning universe, and the backing store lets you link any number of these objects together. You want to grab a book that someone added, and tie it to your own? That’s fine! Simply create a Content Object and add both of those as attributes. You want a whole list of books from someone else’s bookshelf? That’s fine too: just grab the IDs, and attach em. This casual interlinking of data is exactly what is going to make Ning a success — and it’s exactly the kind of interrelationship that proves the RDF model is not wrong.

I know that some people will strongly disagree: they will say that the Ning content model is more demonstration that people “don’t get it”. What I see it as is exactly the opposite: it’s proof that regardless of the underlying way content is stored, it can be presented to applications in a way that is easy to use, and useful. By giving objects a Unique Identifier (whether it’s a URI or an ID), a way to connect them together, and a decent API to put it all in place in code, you can create magic.

Ning!

Posted in Ning, PHP, Technology, Web Publishing on October 4th, 2005 at 05:03:05

For the past 4 weeks or so, I’ve been working on a project known previoiusly as 24 Hour Laundry.

Now, it’s no longer 24HL: Welcome Ning.

A development playground with all kinds of neat and nifty toys, Ning is attempting to do to application and code sharing what other apps have done to photos, bookmarks or other arenas. Allowing people to clone, mix, and create new apps.

There’s a lot of cool things here, and I’ve got a pretty bad headache, so I’m not going to be able to cover all the things that I would like to here, but here’s some of the cooler things about the site:

* System wide content store. Public content which is created can be accessed by any application. This content store is well abstracted, and has a content creation and query system. You don’t have to worry about scaling up: You can leave that to the professionals in the backend. At the same time, you can collect data from all the other apps in the playground. You want to create a book reviews site? First, grab everything that’s known as a Book from the site, and then use the built in classes for ratings and comments to build a discussion board. The possibilities for content mix and match are really spectacular. However, if you don’t want others touching your data, you can mark it as “private” and use it only in your app – but why would you want to?
* Built in classes for lots of things. Build a calendar. Interact with Flickr. Make a GMap. Talk to Amazon. The code’s all done for you, you just use it. Bookshelf makes extensive use of the Amazon classes, Restaurant Reviews With Maps uses Google Maps to show where you’re going — Bay Area Hiking Trails shows you how to get there.
* RSS feeds of content. The Ning Pivot is a really cool way of looking at the content flowing by, but not only can you watch it, you can watch it flow by.

There’s about a half doezn other really nifty things here that I can’t even think of at the moment because it’s 5am and I’ve been walking like a Zombie for two weeks to get this stuff complete.

But the coolest thing is:
* All data added is placed under CC By-SA license. (If you don’t like this, ning isn’t for you.)
* All app code is completely open, and you can make it your own in 2 seconds.

Screw Ruby on Rails: who needs a 2 minute app, when you can write a 2 second app? All depends on how fast you can click.

If you run into problems with ning, feel free to drop them here: You can never fix all the bugs before release, but I think that the team working on Ning has done an absolutely incredible job with all the work they’ve put together here. I’ll pass them on as best as possible.

There’s a lot of other stuff I want to write — one that others here might find interest in is how similar Ning’s content store is to RDF, and why I think that there’s no functional difference. Of course, Marc and I got into a nice “discussion” on that one on IRC the other night, so maybe I’ll wait til I’m a bit less exhausted and can adequately express my points on the topic. 🙂

New Colors, New Features

Posted in Mobile Platform, Semantic Web, Technology on September 26th, 2005 at 12:10:58

crschmidt.net now features a new colorscheme: I’m still not sure how much I like it, but the old black/grey/white scheme was really starting to grate on me. (Note that the weblog uses a different stylesheet which I haven’t updated yet.)

Additionally, all pages now have a feature to allow commenting from users. So you can now leave a comment on any page! This is taken from Eikeon’s websites, which have this feature (although it requires logging in first). I’ve done some very basic escaping of script tags, and I do my best to add newlines if they are appropriate, but if you want to make your content look right, you’re best off just formatting it with HTML yourself.

However, this means that it’s really easy to offer feedback on any page of the site now. If you’re interested in my semantic web tools, you can leave comments on the various ones there. You can comment on the code for any of my Python tools, on my symbian stuff, on pretty much anything. Soon, and very soon, I’ll be writing an RSS feed generator for this. Right now I’m just happy it works, and would love to see people commenting on any page on the site they’d like more info about or would like to offer feedback on.

Transcribing Radio Feeds

Posted in Social on September 3rd, 2005 at 12:42:29

In and around the Katrina relief effort, there are more than half a dozen web-broadcasting police and other radio scanners. These scanners offer the most up to the minute information about what’s happening in New Orleans, Houston, and San Antonio: incidents, activity, etc. Best two way radios are available at the market as well.

These streams are being recorded in text, live, onto IRC channels. These channels are staffed by volunteers, looking to help organize the information flowing in about the relief effort for those who are unable to listen, or are looking to be kept up to date on the status of events.

On irc.freenode.net, there are (at this point) 6 channels devoted to this traffic: #interdictor-scanner and #interdictor-scanner2 through 6. There is information on where the sources for these feeds are, and how to transcribe, on the nola-intel wiki, at Transcribing.

Please, if you have some free time and are able to listen to the streams and type, try to stop by #nola-intel-help: Here you can ask which feeds need assitance, get directed, and voiced in one of the channels to start helping. You can learn more in an hour from these scanners than you might otherwise in a day listening to the standard news channels. These people are working hard to get the most up to date information out to the world: Many of you have the ability to help. If you can, please do so: this is the best way to know what’s going on in the New Orleans area, and the best way to pass the information along to others.

If you need help connecting to IRC, message me on AIM at cr5chmidt, and I will help you out. Please, feel free to pass this message on: I place this message into the public domain for unlimited posting or modification by anyone.

irssi word completion

Posted in Technology on August 27th, 2005 at 04:22:04

Every now and then, I’ll try and type a difficult to type word on IRC, and curse the lack of auto-complete built into my IRC client. I’ve always thought “I should really look into fixing that.” Well, tonight I was sleepy and browsing through the entire list of irssi scripts (obtained via `rsync -avz main.irssi.org::irssiweb/scripts/scripts/\*.pl ~/.irssi/scripts/official’`), and I discovered that there is a “wordcompletion” script, which pulls data from a MySQL database.

“Nifty!” I thought, and poked at it a bit more, finding that it simply stored words you used in messages into a MySQL database. So, I got to thinking. Wouldn’t it be nice to take the words from /usr/share/dict and dump them into there?

So I did.

for i in `cat /usr/share/dict/american-english"`; do export v=`echo $i | perl -pe "s/'/\\\\\\\\'/"`; echo $v; echo "INSERT INTO words (word, prio) VALUES ('$v', 1)" |mysql -u irssi -pPASSHERE irssi ; done

And since I did it, I saved you the work: You can fetch the entire database dump (in compact, minimal impact one-insert form) from odds and ends, a new section on crschmidt.net. Additionally, you can grab my new version of the script from there, which changes the script to read all messages rather than just ones which were typed by you. In the process, I became interested enough to work out how to store these fields in a setting – the new version of the script features a number of improvements, such as saving the database password, user, and dsn in a setting, as well as offering help, so people who don’t know Perl enough to even change simple variables can use it.

I’ve contacted the author to let him know about these changes so he can roll them into the official version if he wishes. If I don’t hear back within a week, I’ll submit my version as an update to the original script at irssi.org.

Programs which are easy to script make a great wya to keep yourself occupied late at night, and let you occasionally release something that seems impressive which otherwise wouldn’t. Thanks to the original author of the script (Jesper Lindh) as well as the authors of all irssi scripts for their help in getting this one out the door.

SVG::Metadata 0.28 Released

Posted in RDF, Semantic Web, SVG on August 22nd, 2005 at 22:03:52

While many people these days are switching to annotation-in-XHTML, there’s still at least one file format out there which has extremely useful metadata annotation using RDF/XML inside the document: SVG.

The Scalable Vector Graphcs format has a Metadata element, which is expected to contain RDF/XML. This is great news for people who might wish to create a directory of SVG images: the metadata can be stored in the actual images, something that the Open Clip Art Library takes advantage of, using a number of tools to extract statistics and aggregate metadata from SVG files.

To take an example from the library, Autos_01.svg (SVG file, requires SVG viewer) contains 23 RDF statements. These triples are given a base of a cc:Work with the URL of the file of itself, meaning that a simple query about the predicates and objects with http://openclipart.org/clipart/transportation/autos_01.svg as a subject returns the important aspects of this document. This includes description, creator, keywords, and license. The license is “Public Domain” — adding the images to the Open Clip Art Library requires placing them into the Public Domain.

For working with this data, developers of the project created the Perl module SVG::Metadata – a module for annotating SVG files with this metadata, as well as making change to the metadata which already exists in such files.

The maintainer just announced on the Clipart Discussion list that he has released 0.28, which includes the changes from previous releases 0.26 and 0.27 which were mostly maintenance releases. (The message will eventually appear in the August threads, but hasn’t yet.)

The RDF generation in versions prior to 0.24 was broken, but was fixed in the 0.25 release – OCAL is now using this release in their scripts, so many of the more recent images in the library are valid RDF, meaning that you can simply pass it to Redland with the http://feature.librdf.org/raptor-scanForRDF feature set. In the Python bindings, that is:

p.set_feature(”http://feature.librdf.org/raptor-scanForRDF”, “1″)

In rapper:

[crschmidt@creusa ~]$ rapper -c -f scanForRDF=1 http://www.openclipart.org/incoming/cat_scrathing_post_benji_01.svg
rapper: Parsing URI http://www.openclipart.org/incoming/cat_scrathing_post_benji_01.svg
rapper: Parsing returned 30 statements

I think this is a great example of how to work with structured metadata without dealing with the crappy aspects of RDF/XML syntax corner cases: simply write a library which parses the metadata, fills your variables up, and lets you modify them with a standard API, then lets you resync the data to the file. Congrats to Bryce for his hard work on the module, and on making the metadata for these SVG files accurate and useful to external users.

We Don’t Need No Stinkin Rules

Posted in julie, RDF, Semantic Web, SPARQL on August 21st, 2005 at 21:42:34

SPARQL CONSTRUCT as rules announces the inclusion in julie of SPARQL-CONSTRUCT based rule-like processing for the creation of additional statements to be added to julie.

Basically, the syntax hasn’t changed much:

^q CONSTRUCT {?p2 ?prop ?p. } WHERE { ?prop rdf:type <http://www.w3.org/2002/07/owl#SymmetricProperty>. ?p ?prop ?p2. } returns:
Total of 542 statements: Here’s the first three.
{(r0_r1114530965r33008), [http://purl.org/vocab/relationship/colleagueOf], (r0_r1114530965r32995)}, {(r0_r1114530965r32998), [http://purl.org/vocab/relationship/colleagueOf], (r0_r1114530965r32995)},
{(r0_r1114689381r708), [http://purl.org/vocab/relationship/colleagueOf], (r0_r1114530965r32995)}.

This is to let you know what you’re getting yourself into. For example, you probably don’t want to add the rdfs:subClassOf relationship for everything: you’d be dealing with a heck of a lot of statements, enough to trap the bot for hours. Here, we can see that it’s a relatively reasonable subset of the model, so we can pass this construct result into the model:

21:33:35 < julie> Created 542 statements based on CONSTRUCT query: Model size changed by 481.

You can see the queries and results in context in #swig logs.

I’d be interested to hear how people think this could be used, or if it’s useful in a more general sense. Would it be better to simply provide a temporary URI where people could fetch the data? Hm, that sounds almost like a useful service – POST data, get back a URI and some data after having parsed the data and stored it in a local location. Wonder if I’m the only person who could use that…

SPARQL CONSTRUCT as Rules

Posted in julie, Semantic Web, SPARQL on August 20th, 2005 at 11:42:31

So, I implemented SPARQL CONSTRUCT queries in Julie yesterday (http://crschmidt.net/noets/115) along with ASK queries (http://crschmidt.net/noets/114). Really, the CONSTRUCT queries are pretty silly to have: who wants to see serialized triples on an IRC interface? but I was thinking: what if instead of spitting the triples into IRC, it sent them into the backend?

I’m sure that the people in charge of SPARQL thought of this long ago, but it just occured to me, that this is a simple way to achieve rules-like processing:

10:49:36 <crschmidt> ^q CONSTRUCT { ?a rel:childOf <http://crschmidt.net/foaf.rdf#crschmidt>. } where { <http://crschmidt.net/foaf.rdf#crschmidt> rel:parentOf ?a. }
10:49:37 <+julie> {[http://crschmidt.net/~julie/me.rdf#julie], [http://purl.org/vocab/relationship/childOf], [http://crschmidt.net/foaf.rdf#crschmidt]},
{[http://crschmidt.net/~alicia/me.rdf#alicia], [http://purl.org/vocab/relationship/childOf], [http://crschmidt.net/foaf.rdf#crschmidt]}

Feeding these kind of results back into julie could let people create the equivilant of rules, passing them back into julie to add to the triplestore.

Man, I thought I gave up on this RDF stuff. Looks like maybe I wasn’t as done as I thought I was.

RDF as Backing Storage

Posted in Redland RDF Application Framework, Semantic Web, SPARQL on August 20th, 2005 at 09:37:34

So often lately, I’ve seen some prominent people in the Semantic Web advising that the possibilities of using RDF as a backing store for an application are great, and that maybe people should use “SPARQL as a query language for the application”.

Stop. Don’t do that.

Right now, we’re in a situation where RDF implementations are really usable – if you are aware of their limitations, and avoid them. This is also true in MySQL: You don’t make every one of your queries include half a dozen joins, so you don’t want to do a similar thing in SPARQL. The problem is that with RDF query languages, unless you’ve spent a lot of time with them, you’re working on something that’s much more difficult to understand the work behind. It’s extremely easy to write a query that is not well optimized that take the application a very long time to compute, at the cost of making RDF and SPARQL look as if they are too slow.

They’re not.

For small size web applications, there is no real reason that a properly built site could not be extremely quick, and built on RDF tools. (At the current point in time, I won’t say the same for large applications: I don’t have any knowledge of what happens to triplestores once they get past 2 million triples.) However, most queries that people write are slow, simply because they are not optimized. Depending on how exactly the application-level query translator works, you may be dealing with something that’s not completely optimized, which can have an extreme negative impact on your query time.

An example? Well, my knowledge is mostly in Redland, so I’ll just toss out a query via julie, the redlandbot.

09:05:18 < crschmidt> ^q select ?i where { <http://crschmidt.net/foaf.rdf#crschmidt> foaf:knows ?p. ?p foaf:nick ?i. }
09:05:18 < julie> solcita, bluemoonshark, telepwen, jayo, littledownpour, jessicacmalfoy, csogilvie, alacrity, wyvernbitch, pne,
luxtiger, danceinacircle, bertho, ursamajor, pie_is_good, jessical, danbri, ryanbyers, shupe, thebubba, kangarooofdoom,
neviachiel, kamara, joanna4136, raventhon, evilcat84, chrisg, nostrademons, coffeechica, fracturedfaerie, nyxie,
siren52684, pthalogreen, ChemicalLace, zach, seymour, adcott, girlxfriday, meinterrupted, biztheinsane,
sarah_mascara, busbeytheelder, tinyjo, rho, xtremesaints, sherm, mendel, acerbic, thewildrose, bobert225,
sporkmistress, isabeau, beginning, supersat, braindrain, ratkrycek, opal1159, maryam, uberzacker, lor22ms, burr86,
comeseptember, rahaeli, pezstar, girlfriday, xavier, jproulx, roy

That’s right, those results on a hot database are returned in less than one second. On the other hand, if you do the query in the opposite order:

09:05:39 < crschmidt> ^q select ?i where { ?p foaf:nick ?i. <http ://crschmidt.net/foaf.rdf#crschmidt< foaf:knows ?p. }
09:18:29 < julie> solcita, bluemoonshark, acerbic, jayo, bobert225, …

You have a multi-minute wait. (I’ll fill this in when it comes back: it’s been 3 minutes so far, and still going.) Ah, it finished. Just shy of 13 minutes. Someone less versed in Redland would look at that and say “Why?”

Redland’s query engine, Rasqal, works based on the triples it finds as it goes through the query. In this case, for the first query, it finds approximately 100 triples: “Here are the foaf:knows triples pointing from the crschmidt node. Now let’s go find their nicks.” It then has approximately 100 distinct subjects to match in the second part of the query. Now, look at the second query: You’ll notice that the first triple pattern is going to match a lot more triples than the first: in this case, I think it’s approximately 20,000 triples. Then, each of those triples will be matched against the second part of the query. It has to ask approximately 200 times as many questions to the triplestore behind the data. That’s 200 times as long to wait to get all the data out. Since most of the data will probably be “cold” (not cached in the MySQL table cache), you’ll not only be waiting to get it out, but you’ll also probably be emptying out the MySQL cache of anything but this one useless query. All because you got your triple patterns mixed up.

Perhaps this is just a Redland problem: I don’t know. I’ve not used any of the Java-based tools, and I don’t know of any other non-Redland tools for working with SPARQL against a large data store. However, it is an image problem when you advise or offer to “just use SPARQL”: People do not, by intuition, recognize that the above two queries will be any different. Since it’s extremely hard to notice the differences on something that is small scale, it’s hard to catch these mistakes when they start showing up without a fair amount of experience in the application level RDF tool. Ah, it finished, so back up to show you timeframes…

Before you start using SPARQL for an application query language, consider the application you’re using, and how you can optimize it. SPARQL queries can be made to run much faster, generally, if you have a good knowledge of what you’re doing. However, working with the raw triples using the methods the library provides will oftentimes provide a level of insight as to what’s actually going on under the hood that can be extremely useful for knowing how things can work better.

I think that people need to stop advocating SPARQL as the end-all, be all solution for everything in RDF. I’m sure that it can be great, and that there’s tons of great ways to use it. However, one of the most popular RDF engines does not work well with most SPARQL queries that I’ve seen people throw at it. The first step in learning how to properly use SPARQL (for any kind of time-neccesary things: anything over HTTP basically counts here) is learning how to properly manipulate the triples in the way that works best in the application without the query language.