Spambayes
This is a supplement to the article "An introduction to the Spambayes project", published in the March 2003 edition of the Linux Journal (but it's still useful even if you haven't read the article).
As promised, several things have changed since the article went to press:
Web site
The Linux Journal article points to our SourceForge project page at http://sf.net/projects/spambayes - that's the place to go for downloads, to report bugs or request features, and so on. We also now have a homepage at http://spambayes.sf.net that gives some background and documentation for the project, and has pointers to our mailing list and to related resources on the web.
Installer
There's now an installer for the software. It's still a source-code
distribution, so you still need to have Python 2.2.2 or later
installed. You can download the installer from the SourceForge project
page at http://sf.net/projects/spambayes - to install it, unpack
the archive into a temporary directory, cd spambayes-1.0a2
then
run python setup.py install
. On Windows, this will install the
various applications that make up Spambayes into the scripts
area
of your Python installation. On Unix, they will go into the default
location for Python scripts, usually /usr/local/bin
.
Don't forget that on Windows, you'll also need the bsddb3
module.
There's a binary installer available from http://pybsddb.sf.net
Web-based configuration
For people who want to use Spambayes via the POP3 proxy, there's
no longer any need to create and edit a bayescustomize.ini
- you
can configure everything through the web:
- Create a directory for Spambayes to store its data.
cd
to that directory and runpython pop3proxy.py -b
. Your web browser should appear, showing the Spambayes application home page. If your browser doesn't appear, you need to run it yourself and point it at the URL printed bypop3proxy.py
.- Click the
Configuration page
link, and enter the name of your POP3 server and the port number for the proxy to listen on. On Windows it's most convenient to use port 110, since that's the default for POP3. On Unix (or Unix-derived systems like MacOS X) you should use a high port like 1110. ClickSave
to save your configuration. - Reconfigure your email client to talk to the proxy. If your
email client currently talks to
pop3.example.com
on port 110, and you've configured the proxy to listen on port 1110, you should reconfigure your email client to talk tolocalhost
(or the name of the machine on which you're running the proxy) on port 1110.
You should now be able to collect your mail through the proxy, and see
the X-Spambayes-Classification
headers added to the messages. You
can now set up filters in your email client to deal with suspected spam
however you choose. All your mail will be classified as unsure
until
you train the software, which you also do through the web using the
Review messages
page. If you don't want to wait for messages to
arrive for training, you can use the "upload a message or mbox file"
form to train via the web interface, either on individual messages or
unix mbox files.
Privacy for the web interface
If you're worried about other people accessing your Spambayes web interface, you can configure it to only accept connections from the machine it's running on. You do this by adding this:
[html_ui] html_ui_allow_remote_connections: False
to your bayescustomize.ini
.
Integration with Mutt and Gnus
In the contrib
directory of the source distribution are muttrc
and spambayes.el
, which let you train Spambayes from within Mutt
and Gnus - see those files for details.
Running multiple proxies on the same port
Some email clients (notably Eudora) don't let you set different ports for different POP3 servers. This is problem for Spambayes, because the POP3 proxy can only talk to one server per port. The workaround for this is to assign multiple addresses to your machine, and run one proxy per address. Here's an example (for MacOS X, but it should work similarly on any Unix-based platform). It runs two POP3 proxies, both on port 110 but on different local addresses:
#!/bin/sh sudo ifconfig lo0 inet 127.0.0.2 add sudo python pop3proxy.py
And in bayescustomize.ini
:
pop3proxy_servers = pop3.example1.com, pop3.example2.com pop3proxy_ports = 127.0.0.1:110, 127.0.0.2:110
Using the web training interface with procmail
If you're using hammiefilter
to classify mail via procmail
, you
can still use the web training interface to train Spambayes. Where
you have a procmail rule something like this:
:0fw | hammiefilter.py
you can add another rule like this:
:0fw | proxytee.py
That will upload each received message to the web interface for later
training. You need to be running pop3proxy.py
for this, but you
don't need to have any POP3 servers configured.
Note that currently hammiefilter.py
and pop3proxy.py
use
different defaults for the database location - this is something we'll
address in a future release, but for now you need to work around this
by configuring pop3proxy.py
to point to the same database used by
hammiefilter.py
. By default this is ~/.hammiedb
- go to the web
configuration page, enter ~/.hammiedb
as your database filename, and
click Save
.