Receive RSS Updates Via IRC

Most everywhere a hacker goes, she'll have trusty access to a shell with her. For that reason, IRC is a communication method that is popular among these users, in large part due to its extremely cross-platform client base, compared to most other communication protocols. With that in mind, IRC has become a haven for small tools to integrate into daily life, everything from reminder bots to message passing bots to weather lookups. For many users, IRC becomes an extension of the shell, a tool to be used to store and pass along information.

With this in mind, it is easy to see why it might be beneficial to have RSS updates dropped into an IRC channel. Given that for many users use IRC as their always-on communication method, critical RSS updates passed into IRC fits right into the workflow for these users. With this hack, we'll show how to use Mark Pilgrim's Universal Feedparser package, along with a simple IRC framework, ircbot.py, from Sean B. Palmer to create an IRC bot that will deliver the latest news right to any channel you have the bot sit in.

Getting the Code

Mark Pilgrim's Universal Feed Parser is available from http://feedparser.org/. This library is a well-maintained, well tested Python module for parsing all the types of feeds in the wild today. The module is entirely contained in a single file, and the latest version can be downloaded from the website. Once you've downloaded the Feedparser code, create a directory to contain your bot, and place the feedparser.py file into this directory.

Next, we'll download Sean B. Palmer's ircbot.py framework and place it in the same directory. Again, this is a simple, one-file Python program which acts as an IRC bot, available from http://inamidst.com/phenny/ircbot.py. This Python-based bot framework is easily extended in a number of different ways, making it an ideal starting base for many simple projects. Once you've done this, we can get to work with building the code which will display our updates for us.

Learning ircbot.py

Once we have both these files in place, we will learn a little bit about how the IRC framework we're using works. Most IRC bots respond to specific ``trigger'' words or phrases from users. ircbot.py is built around the idea of triggers as starting points for functions: at the end of the ircbot.py code we see function definitions, and rule bindings which match these function definitions to certain regular expressions. We will use a binding like this to start our bot fetching RSS, after which it will continue automatically until the bot dies.

You can see based on the examples, that the way to pass messages is via the bot.msg function call, which will send a message given the sender of the original message - this automatically determines whether the message is in a channel or private message, and includes rate limiting to prevent it from being automatically kicked off a server.

Additionally, the bot has a bot.todo method, which can be used to pass direct IRC commands: the bye function in the example bot demonstrates how to use this, by directly sending information for a server command. Note that you will need to consult IRC documentation for information about what these commands require: ircbot.py is a simplistic framework designed around message passing and does not typically offer support for all the intricacies of the IRC protocol.

Writing the RSS Code

Now that we have a vague idea about how the IRC framework we're using works, we can write the code which will actually scan the RSS and pass the messages along. The first step is to initialize a set of variables we're going to be using repeatedly, which we don't want to go out of scope when the loop ends:

    def run_rss(bot, origin):
      import feedparser  # our RSS module
      import time        # for time.sleep
      etag = ""          # For conditional get
      lm = ""
      seen = []          # List for URLs already seen
      url = "http://crschmidt.net/blog/feed/rdf";  # URI for our feed

Now that we have our variables defined, we can build the infinite loop that will be controlling our output. We first initialize a count variable, so that we don't send several dozen entries to the IRC server all at once when we first get them. (This would cause problems with flooding.) We then parse the feed, using feedparser, and store our conditional get information into the variables we previously defined. Note that we are sending in the variables on our first fetch of the feed as well: however, since they are simply empty strings, they will not prevent us from fetching the data.

      while(1):
        count = 0
        feed = feedparser.parse(url, etag=etag, modified=lm)
        if feed.has_key('etag'):
           etag = feed.etag
        if feed.has_key('modified'):
           lm = feed.modified

Our feed data is now available in feed, as we demonstrated by setting the modified and etag variables from this data. The data contains an array of entries in the entries variable, which we will iterate over:

        for entry in feed.entries:
          if not entry.link in seen:  # Have I seen you before?
            if count < 5:
              bot.msg(origin.sender, 
                      u"%s %s" % (entry.link, entry.title))
              count += 1
            if count == 5:
              bot.msg(origin.sender, 
                      "Maximum count reached. More entries available.")
              count += 1
            seen.append(entry.link)
        time.sleep(1800)

Nothing complex: a simple link and title element is sent to the original sender, either the channel the command was originally issued in, or in a private message, if that's how the command was originally sent. You can see that at the end of the loop, we have a sleep command, to tell the loop to pause for 1800 seconds, or 30 minutes. Depending on the server you are talking to, you may wish to increase this time to something longer, or decrease it, if the feed is updated more frequently. Note that many servers may take precautions against clients which refresh feeds too quickly.

Integrating With the Bot

Now that we have a function which can perform our RSS updates, we'll demonstrate how to include this code in the bot. We'll assume that we include this function as a top-level function in the ircbot.py code for this example. At the end of ircbot.py, there is a bot.run call, which starts the main IRC loop. Directly above that, we will add the definition for our rss function, which will start our RSS loop in a seperate thread, so the bot can continue to interact with the IRC server.d

    def rss(m, origin, args, text, bot=bot):
        import threading
        t = threading.thread(target=run_rss, args=[bot, origin])
        t.setDaemon(1)
        t.start()
    bot.rule(rss, 'rss', r'\.start')
    bot.rule(rss, 'rss', r'%s: start' % bot.nick)
    bot.run(host, port)

As you can see here, we've included two rules which control the bot: one is when the bot is addressed, and the other is when the text ``.start'' is issued. In both cases, the bot will start a background thread, passing the bot instance (for message passing) and the origin, so that the original sender can be determined in the run_rss function.

Once we've done this, we can add the run_rss function definition: this can fit immediately below the Bot class. To change the channels that the bot will join by default, simply edit the test function call, a la:

   test('irc.freenode.net', 6667, ['#d8uv.com', '#synhacks'])

That's it. Your bot will now run, jump, and play for you, passing on message from an RSS feed as it sees fit.

Extending the Hack

One extension to this hack would be to allow for users to add their own RSS feeds, rather than having the feed be defined by the code itself. This is not that difficult: because rules in ircbot.py are regular expressions, we can allow users to choose a URL in their message to the bot to start:

    def rss(m, origin, args, text, bot=bot):
        url = m.group(1)
        t = threading.thread(target=run_rss, args=[bot, origin, url])
        t.setDaemon(1)
        t.start()
    bot.rule(rss, 'rss', r'\.start (.*)$')

This would require an equally modest edit to the run_rss code:


    def run_rss(bot, origin, url):

once this is complete, simply remove the line which sets the URL from this code, and your run_rss function will work against the provided URL rather than one provided by the code. Note that this may be used by malicious users as an attack against a website, however: because the bot does no checking of whether already-running threads are visiting the RSS URL, it would be possible to initiate a significant number of hits by running the same command repeatedly.