Rss2File

rss2File does exactly what it says on the tin, and a bit more besides. It’s a program I wrote to grab blogs and save the articles as text files locally. It can also download podcasts, although not at the same time as blogs as you have to specify one mode or the other in the config file.

Download the stable version. Or grab the latest version from my subversion repo at http://svn.trollgod.dyndns.org/Rss2File/trunk/rss2file.py

Basically I wanted a way to merge the dozens of blogs I read into a few category based feeds. The way I came up with to do this was to pull everything onto my own server and then use blosxom to generate new feeds from the files. There was a program called blagg that I originally used to pull the data down but I found it somewhat lacking, and because it’s in Perl (horrid language) was unable to fix it. So I reimplemented it in Python, adding a few features along the way.

I have now added podcast reception functionality to the program too, so it needs to be told which mode to operate in, blog or podcast. I plan on adding torrent support also, but that is a fairly difficult job and so will take some time. It will resume downloads now, so if a download is interrupted for some reason (net connection dies or whatever) then the next time it runs it should finish off the file rather than skipping it like it did previously.

I used Universal Feed Parser by Mark Pilgrim as the basis rather than roll my own xml code, so you need to have that installed to run this.

The program needs to be given the path to a configfile as an argument and optionally a datadir, statusfile and logfile (these will be inferred from the location of the config file if not set).
e.g. "rss2file.py -c Music/Casts/cast2file.cfg -l log/cast2file.log"

The configfile needs to the first line to be set to “Blog” or “Podcast” (without the quotes) and then has one feed per line with the format of “Nickname URL Category”, with category being an optional subdirectory to use. The feed entries will be saved in a dir called Nickname (which will be created if needed) if it is set to blog. If it is set to podcast then the mp3s (or whatever) will be downloaded instead of the entries and placed in the Nick dir.

As an example, here is the config file I use for blogs:

Blog
Aquarius http://www.kryogenix.org/days/index.rss Personal
Senji http://www.livejournal.com/users/senji/data/rss Personal
Kamion http://www.livejournal.com/users/cjwatson/data/rss Personal
Ant http://www.livejournal.com/users/sheffers/data/rss Personal
Aquarion http://www.aquarionics.com/meta/journal.rss2 Personal
Khendon http://www.amphigory.org.uk/?flav=rss Personal
AdamSweet http://blog.drinky.org.uk/wp-rss2.php Personal
Fuzzix http://blogs.linux.ie/fuzzbucket/feed/ Personal
Davee http://www.sungate.co.uk/b2rss2.php Personal
JediMoose http://www.jedimoose.org/index.php/feed/ Personal
JonoBacon http://www.jonobacon.org/rss/jonobacondotorg-blog.xml Personal
Joel http://www.joelonsoftware.com/rss.xml Gurus
Sutter http://pluralsight.com/blogs/hsutter/rss.aspx Gurus
Cvsgui http://sourceforge.net/export/rss2_projfiles.php?group_id=10072 Software
CodeProject http://www.codeproject.com/webservices/articlerss.aspx?cat=2 Articles
TomsHardware http://www.tomshardware.com/articles.xml Articles
Slashdot http://slashdot.org/rss/index.rss News
Zotnix http://zotnix.com/wp-rss2.php Personal
SamRuby http://www.intertwingly.net/blog/index.rss2 Gurus
Schwuk http://schwuk.com/?atom=1 Personal
CodingSlave http://codingslave.blogspot.com/atom.xml Gurus
Xalior http://rimron.co.uk/weblog/wp-rss2.php Personal
BBCNewsTech http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/technology/rss.xml News
LazyWeb http://www.lazyweb.org/index.rdf Articles
Foxylicious http://dietrich.ganx4.com/index.php?rss=true Software
PerfDave http://www.livejournal.com/users/diffrentcolours/data/rss Personal
Omahn http://omahns-home.bishopb-college.ac.uk/wordpress/wp-rss2.php Personal
LUGRadioLive http://www.lugradio.org/live/blog/wp-rss2.php Personal
Ade http://www.adrianbradshaw.co.uk/wp-rss2.php Personal
Sparkes http://sp.arkes.co.uk/feed/ Personal

and for podcasts:

Podcast
DailySourceCode http://radio.weblogs.com/0001014/categories/dailySourceCode/rss.xml
LinuxQuestions http://radio.linuxquestions.org/syndicate/lq.php
EvilGenius http://www.evilgeniuschronicles.org/audio/directmp3.xml
Formosa http://ruk.ca/rss/formosa.xml
EscapeRadio http://escpodcast.com/subscribe
ITConversations http://www.itconversations.com/rss/recentWithEnclosures.php
SpaceMusic http://spacemusic.libsyn.com/rss
ThisWeekInTech http://feeds.feedburner.com/twit

You can also import and export the list of feeds as OPML using the "--import-opml" or "--export-opml" options. This OPML support was added with the help of some opml parsing code from Juri Pakaste.