pipermail2rss

Wednesday, July 26th, 2006 08:36
[personal profile] kpreid
I wanted to be able to read a mailing list as a feed, so I wrote this tool yesterday. It converts Pipermail or Hypermail mailing list archives (at least the one example of each I've tried) to RSS 2.0.

(I'd have used Atom except that <updated> is required, and that information is not easily available in bulk in a Mailman Pipermail archive.)

It uses xsltproc and TagSoup.

darcs repository: http://switchb.org/kpreid/2006/pipermail2rss/

p2r

#!/bin/sh

set -e

here=`dirname $0`
soup="java -jar $here/tagsoup.jar --nodefaults"

monthrel=$(curl -s "$1" | $soup | xsltproc "$here/month.xsl" -)

curl -s "$1/$monthrel" | $soup | xsltproc --stringparam prefix "$1`dirname $monthrel`/" "$here/messages.xsl" -

month.xsl

<?xml version="1.0" standalone="yes"?>
<t:stylesheet
  xmlns:t="http://www.w3.org/1999/XSL/Transform"
  xmlns:h="http://www.w3.org/1999/xhtml"
  xmlns=""
  version="1.0"
>
  <t:output method="text"/>
  <t:template match="/">
    <month>
      <t:value-of select="//h:td/h:a[text()='[ Date ]']/@href
                          | //h:tr[position()=last()]/h:td/h:a[text()='By Date']/@href"/>
    </month>
  </t:template>
</t:stylesheet>

messages.xsl

<?xml version="1.0" standalone="yes"?>
<t:stylesheet
  xmlns:t="http://www.w3.org/1999/XSL/Transform"
  xmlns:h="http://www.w3.org/1999/xhtml"
  xmlns=""
  version="1.0"
>
  <t:template match="/">
    <rss version="2.0"><channel>
    
    <title><t:value-of select="/h:html/h:head/h:title"/></title>
    
    <t:for-each select="//h:li/h:a/@name/../..">
    <t:sort order="descending" data-type="number" select="position()"/>
    <item>
    <t:for-each select="h:em">
      <pubDate><t:value-of select="substring(., 2, string-length(.) - 2)"/></pubDate>
      </t:for-each>
    
      <title><t:value-of select="h:a"/> - <t:value-of select="h:i|h:a/h:em"/></title>
      <link><t:value-of select="$prefix"/><t:value-of select="h:a/@href"/></link>
    </item>
    </t:for-each>
    
    </channel></rss>
  </t:template>
</t:stylesheet>

CGI script to serve the RSS:

#!/bin/sh

PATH=/bin:/usr/bin:/usr/local/bin

echo Status: 200 OK
echo Content-Type: application/rss+xml
echo

/path/p2r <address of mailing list archive, with trailing slash>

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org