Creating a Feed From A Subversion Repository

Many projects store code, documentation, or other resources in some kind of version control repository. In many cases, notifications of changes are sent via email to a mailing list, or are delivered through some other mechanism. An RSS feed can provide a simple solution to getting the data on when the source changes, and how, fitting better into your existing workflow.

One of the more popular revision control systems of late is Subversion, the ``compelling replacement for CVS''. This hack will show you how a simple combination of scripting can create RSS output from your Subversion repository. Install it as a CGI script and have an always-up-to-date feed, or run it via cron to create a static file. Either way, you can integrate the RSS output from your version control system into your workflow, taking advantage of the benefits that RSS offers.

Learning Subversion Output

Subversion provides a simple, repository-level log of changes via the svn(1) commandline utility. The command for this functionality is svn log. All svn commands are documented via the svn help <command> feature: in this case, documentation is available via svn help log. From there, we learn that the log has a more verbose setting, by passing in the -v option. This has the affect of including the names of changed files in the log report, which will be useful for us in our feed.

To see what this log output looks like, we'll look at the changes made in the past week to the Wordpress subversion repository. Unfortunately, Subversion offers no way to limit the number of results, to a given number, so revisions must be selected by date. To select the date from one week ago, we'll use a bit of shell scripting:

date -r $(($(date ``+%s'')-604800)) ``+%Y-%m-%d''

The inner date gives us time in seconds since epoch, from which we subtract 604,800 seconds, or one week. Then, we pass this to date again as a base time, retrieving the Year-Month-Day format for the time given. Once we've done this, we can pass this into Subversion, retrieving the log entries from the most recent revision, known as HEAD, to one week ago:


   svn log -v -r HEAD:"{`date -r $(($(date "+%s")-604800)) "+%Y-%m-%d"`}" http://svn.automattic.com/wordpress/trunk/

You'll see a number of results, all similar in formatting to:

 ------------------------------------------------------------------------
 r2822 | ryan | 2005-08-30 00:07:12 -0400 (Tue, 30 Aug 2005) | 1 line
 Changed paths:
    M /trunk/wp-includes/functions.php
 
 url_to_postid() typo fix.  Props markjaquith.  fixes #1612
 ------------------------------------------------------------------------

There are several distinct useful parts to this log entry.

We will want to use each of these in a different way in our feed. Since the format of this log entry never changes, we can use a relatively simple regex system to extract the pieces of information we're interested in, building an RSS item for each of the items in the log.

Writing the Regex

The simplest way to seperate out all the data from the Subversion log entries is via a regular expression. Perl is a language ideally suited for working with this problem, given its origins in text manipulation. We'll build the regex similarly to how the actual text is delivered to us, for readability:

  -*
  r(\d+) \| (\w+) \| (.+) \((.+)\) \| (\d*) line(s?)$
  Changed paths:
  (.*)$
  (.*)
  -*$

As you can see here, we have a regex where most of the work is in the first line, and then we capture the changed paths, and the commit message. With this information, we can build up an RSS entry from the pieces we've extracted:

  <item>
    <title>Revision $1, Committed by $2</title>
    <description>$8
      Changed paths:
      $7
    </description>
    <dc:creator>$2</dc:creator>
  </item>

From our earlier example log message, this will create:

  <item>
    <title>Revision 2822, from ryan</title>
    <description>url_to_postid() typo fix.  Props markjaquith.  fixes #1612
      Changed Paths:
      M /trunk/wp-includes/functions.php
    </description>
    <dc:creator>ryan</dc:creator>
  </item>

Building the Feed

So, we now have a way to build our entries. But so far, we haven't built the actual RSS feed that goes with those entries. In the process, we'll show the guts of the Perl script which is going to help us out in creating this feed.

First, we'll find the time frame we want to look at: in this case, I'm going to create a feed which contains a week of log entries. We'll use Perl's builtin time function to get the current number of seconds since epoch time, then find the time we want to start from by subtracting time from there:


    use Date::Format;
    $date = time;
    $date = $date - 60 * 60 * 24 * 7; # one week
    $date = time2str("%Y-%m-%d", $date);

Now that we have a date to start from, we call out to the Subversion binary using Perl's equivilant of backticks, capturing the returned data to a string:

    $logdata = qx!svn log -v -r HEAD:"{$date}" http://svn.automattic.com/wordpress/trunk/!;

Now that we have our data, we can start crafting our RSS feed. First, we print an XML prolog, then begin crafting the elements which are global to the feed. Note that we are simply printing this data, so it can be redirected to a file. It would be also be possible to store it in a string as we went along, then printing that string at the end, either to standard out or to a file.

 print "<?xml version='1.0' encoding='utf-8' ?>
 <rss version='2.0' xmlns:dc='http://purl.org/dc/elements/1.1/'>
 <channel>
   <title>Subversion Log Feed</title>
   <link>http://svn.automattic.com/wordpress/trunk/</link>;
   <description>Automatically generated RSS feed from Wordpress Subversion history.</description>
 ";

Here, we've simply created a quick description of the feed for tools which support these descriptions. The title, link, and description can all be edited to suit your needs.

Now, we can loop over the log entries, turning them into RSS entries.

  foreach my $loge (split("-"x72, $string)) {
    $loge =~ s!-*
  r(\d+) \| (\w+) \| (.+) \((.+)\) \| (\d*) line(s?)$
  Changed paths:
  (.*)$
  (.*)
  -*$!
  <item>
    <title>Revision $1, Committed by $2</title>
    <description>$8
      Changed paths:
      $7
    </description>
    <dc:creator>$2</dc:creator>
  </item>!msg;
      print $loge;
  }

Finally, we close our feed:

    print "</channel>
         </rss>";

And we have an RSS feed.

Extending the Hack

Some extensions to this hack might include obtaining a diff for each revision, and including this in the feed as well as the log message. This would let readers see how much had actually changed, something that the current hack doesn't let you do. One simple extension would be to convert the script to run as a CGI script on a webserver, which we will demonstrate here.

Turning it into a CGI

In order to turn the script into a CGI script, we must include headers to be delivered before any content is delivered. In the complete program listing below, we can see that our first line is ``use Date::Format;''. If, before this, we add the lines:


  #!/usr/bin/perl   
  print "Content-Type: text/xml\r\n\r\n";

The script will drop in place as a .cgi file. Note that heavy load on this script may place significant load on your Subversion server, however: it may be beneficial to run the script under cron to regularly create a static file rather than dynamically generating the content on demand.

The Complete Listing

A<svnrss> shows the complete code listing.