Setting Up Your Own Planet Aggregator

Many times when you're on the road, you don't have full access to your aggregator, but you still want to be able to keep up with some recent entries. A number of services offer you the ability to manage your subscriptions online, but many people have no interest in using web-based tools for aggregation all the time. However, it is possible to aggregate your feeds in a web based aggregator without the hassle of having to scroll through hundreds of unread items from maintaining your feeds in a service like Bloglines, by setting up a Planet-based aggregator.

PlanetPlanet is a software package which uses Mark Pilgrim's Universal Feed Parser to retrieve and parse RSS feeds, then creates an HTML and RSS output from these feeds. It offers a way to collect a number of similar blogs and put them together -- there are Planet sites which aggregate feeds about PHP, Python, Apache, Gnome, and more. It also offers a way to aggregate your personal feeds so that you can read the Planet page at any point and see the most recent posts among them. It does not offer a number of advanced features found in more complex web based aggregators, but for many purposes, it can ideally fit the bill.

Prerequisites

In order to set up your own Planet aggregator, you will need to have a machine which can act as a 0, which has Python 2.3 or greater installed. This is the only prerequisite for setting up your own Planet: all libraries needed to run the Planet are included in the distribution.

Getting the Planet Code

PlanetPlanet software is maintained in several darcs repositories. However, due to the difficulty of retrieving the source from the GNU Arch repositories, this software has been packaged by a third party for easy installation. Simply download the planet software from http://crschmidt.net/packages/ to the web server. Once you have done this, expand the tarball: this will create a directory called planet-release with the code in it.

Setting up Planet

Once you have the code downloaded and extracted, you can start adding the sites you wish to have feed your aggregator. Inside the extracted directory is a ``config'' folder, with a ``config.ini'' file, as well as a number of template files. The config.ini is the important one, and is well documented with comments. You can see a set of example fields in the file for the [Planet] section: these can be replaced with your own contact information. You should set the output_dir variable to be the output directory for HTML and other files generated by the planet software. At the bottom of the file, you provide the URLs of the feeds you wish to aggregate, with a name attached to each. Follow the examples already in the file for this section.

Once you have set all the variables in the configuration file, it is possible to perform a check-run of the Planet setup: change to the top level directory of the archive (planet-release) and run the command ``python planet.py config/config.ini''. This will activate the Planet software, doing a first run of all the items in the configuration file. This will display a large amount of debugging information, after which you should be able to visit the location on your 0 you directed the planet output.

Running the Planet

The Planet code on its own will not run automatically - there is no daemon which runs and waits for updates. Instead, it is designed to be run via a cron script. Typically, aggregators tend to adopt relatively low refresh rates to prevent extreme load on servers. The typical wait between refreshes is about an hour: we'll use this when setting up cron for our aggregator.

To edit your crontab, type ``crontab -e''. This will open the crontab in your preferred editor. The line you need to enter with start with a random number from 0 to 59 -- the minute of the hour on which the aggregator will refresh feeds. Because so many aggregators do this on the hour, it may be beneficial to stagger this time somewhat from the typical hit times of the top or bottom. I have mine set to run at 23 past the hour. For me, the resulting crontab line is:


   23 * * * * /home/crschmidt/planet-release/planet.py /home/crschmidt/planet-release/config/config.ini

This will run the aggregator every hour, updating the HTML and other templates with the new information as needed. Note that you will need to adjust the path to both your Python script and the config file.

If you're maintaining this aggregator for anyone other than yourself, or even if it's only for you, you may wish to adjust the layout of the HTML output. These changes can be achieved by editing the template files alongside the config.ini in the config directory. The index.html.tmpl is the main Planet file: this is the file that controls the HTML behind the Planet. Here, you can edit the layout, add CSS, or perform any other necessary changes. Again, this file is well commented with tips and tricks for editing it, and if you're interested you can also edit the actual layout of the entries or other aspects of the aggregator output. Note that this is not necessary to make things work - but it will help them to look better.