pipermail2rss

Wednesday, July 26th, 2006 08:36
[personal profile] kpreid
I wanted to be able to read a mailing list as a feed, so I wrote this tool yesterday. It converts Pipermail or Hypermail mailing list archives (at least the one example of each I've tried) to RSS 2.0.

(I'd have used Atom except that <updated> is required, and that information is not easily available in bulk in a Mailman Pipermail archive.)

It uses xsltproc and TagSoup.

darcs repository: http://switchb.org/kpreid/2006/pipermail2rss/

p2r

#!/bin/sh

set -e

here=`dirname $0`
soup="java -jar $here/tagsoup.jar --nodefaults"

monthrel=$(curl -s "$1" | $soup | xsltproc "$here/month.xsl" -)

curl -s "$1/$monthrel" | $soup | xsltproc --stringparam prefix "$1`dirname $monthrel`/" "$here/messages.xsl" -

month.xsl

<?xml version="1.0" standalone="yes"?>
<t:stylesheet
  xmlns:t="http://www.w3.org/1999/XSL/Transform"
  xmlns:h="http://www.w3.org/1999/xhtml"
  xmlns=""
  version="1.0"
>
  <t:output method="text"/>
  <t:template match="/">
    <month>
      <t:value-of select="//h:td/h:a[text()='[ Date ]']/@href
                          | //h:tr[position()=last()]/h:td/h:a[text()='By Date']/@href"/>
    </month>
  </t:template>
</t:stylesheet>

messages.xsl

<?xml version="1.0" standalone="yes"?>
<t:stylesheet
  xmlns:t="http://www.w3.org/1999/XSL/Transform"
  xmlns:h="http://www.w3.org/1999/xhtml"
  xmlns=""
  version="1.0"
>
  <t:template match="/">
    <rss version="2.0"><channel>
    
    <title><t:value-of select="/h:html/h:head/h:title"/></title>
    
    <t:for-each select="//h:li/h:a/@name/../..">
    <t:sort order="descending" data-type="number" select="position()"/>
    <item>
    <t:for-each select="h:em">
      <pubDate><t:value-of select="substring(., 2, string-length(.) - 2)"/></pubDate>
      </t:for-each>
    
      <title><t:value-of select="h:a"/> - <t:value-of select="h:i|h:a/h:em"/></title>
      <link><t:value-of select="$prefix"/><t:value-of select="h:a/@href"/></link>
    </item>
    </t:for-each>
    
    </channel></rss>
  </t:template>
</t:stylesheet>

CGI script to serve the RSS:

#!/bin/sh

PATH=/bin:/usr/bin:/usr/local/bin

echo Status: 200 OK
echo Content-Type: application/rss+xml
echo

/path/p2r <address of mailing list archive, with trailing slash>