Using Planet Reader to Update Feeds in a Rails Application Part 1

In this series of posts, I'm going to show how I am customising the excellent "Planet Reader" application to produce a stand-alone program that updates data about a collection of blogs stored in a Ruby on Rails application - the feeds table looks like this:
create_table "feeds" do |b|
b.column "link", :string
b.column "title", :string
b.column "description", :string
b.column "pubDate", :datetime
b.column "language", :string
b.column "error_tag", :integer
b.column "site_url", :string
end

Here's how Planet Reader works now:

  1. get "planet" information and a list of feeds from a config.ini file

  2. create a "planet" object based on the config.ini data

  3. visit each of the feeds using the Universal Feed Parser

  4. If the feed has changed since the last visit, update the information stored in a cache file

  5. "subscribe" the "planet" object to this feed

  6. churn out static html files based on all the data.


For my purposes, the "planet" object is irrelevant, but I'll leave it alone, as it helps the whole thing to hang together! My strategy for customising Planet Reader is as follows:

  1. get database information from database.yml

  2. connect to database and grab a list of all the feeds

  3. export this list to a config.ini file

  4. get "planet" information and a list of feeds from that config.ini file

  5. create a "planet" object based on the config.ini data defaults

  6. visit each of the feeds using the Universal Feed Parser

  7. If the feed has changed since the last visit, update the information stored in a cache file and also update the database

  8. "subscribe" the "planet" object to this feed

  9. churn out static html files based on all the data.


Simple eh? Unfortunately I've never done any serious Python programming, so I'm learning as I go! To begin with, I'll be editing the "planet.py" file, which is the file that gets executed to run Plant Reader. You can download the files here if you want to play along, and will, of course, need to have installed Python! I've been using the Eric IDE, which is a simple "apt-get install" on Debian.

Getting database information from database.yml

A bit of googling turned up PyYaml, which does the job. Once I've installed that, I just need to add:
import yaml
environment="development"
dbconfig=open("../config/database.yml",'r') # we are in lib directory
config_data=yaml.load(dbconfig.read())
host=config_data[environment]['host']
username=config_data[environment]['username']
passwd=config_data[environment]['password'] or ""
database=config_data[environment]['database']

Getting a list of feeds from the database

Now I just need to connect to the database and pull out a list of feeds. First, install the Python-MySQL library. Planet Reader identifies feeds by their URLs, but of course my database will have an id field for each feed, so we'll pull that out too:
Con= MySQLdb.connect(host,username,passwd,database)
Cur=Con.cursor()
Cur.execute("select id, link from feeds order by 'id' ")
Results=Cur.fetchall()

Putting the feed data into a config file

This turns out to be pretty easy. The ConfigParser library has already been imported, and it does writing as well as reading. The variable config_file has already been set to default at "config.ini", so:
config=ConfigParser()
for f in Results:
  config.add_section(f[1])
  config.set(f[1],"id",f[0])
  output=open(config_file,'w')
  config.write(output)

does the job!

I wrap all this stuff into an "update_config_from_database" method, comment out the bit of code that insists upon some "planet" information in "config.ini" and I end up with Planet.py

Running this will produce a set of cache files and static html pages based on all the feeds in my database.

This is pretty cool, but I haven't written anything back to the database yet. That's what "Part 2" will be about!

Popular posts from this blog

Assessment Through The Looking Glass

And what if we don't achieve our dreams?

The Scottish Curriculum: A Beginner's Guide