1. New Applications
SpamBayes 1.1a4 contains a new application, core_server.py. If you are used to the POP3 proxy you will find the web interface to this new server very familiar. It differs in two ways though. First, it implements a simple plugin architecture to allow developers to (hopefully) easily add support for new communication protocols. Second, instead of supporting POP3 as the first protocol it supports XML-RPC. This should allow web applications such as Trac, MoinMoin and blogging servers to use SpamBayes' spam filtering capability. The first candidate client will be Roundup which the Python developers are using as their new issue tracker. Please try it out and report any problems to the spambayes-dev@python.org mailing list.
2. Experimental Options
SpamBayes 1.1a3 contains four new options which need testing (please let spambayes-dev@python.org know how successful these are for you):
-
x-short_runs - If true, generate tokens based on max number of short word runs. Short words are anything of length < the skip_max_word_size option. Normally they are skipped, but one common spam technique spells words like 'V m I n A o G p RA' to try and avoid exposing them to content filters.
-
x-lookup_ip - If true, generate IP address tokens from hostnames. This requires PyDNS (http://pydns.sourceforge.net/). This is included in the Windows installer. (x-pick_apart_urls must also be set true for x-lookup_ip to take effect.)
-
x-image_size - If true, generate tokens based on the size of the largest attached image.
-
x-crack_images - A lot of recent spam contains the entire message embedded in one or more attached images. This option, if true, generates tokens based on the (hopefully) text content contained in any images in each message. The current support is minimal, relies on the installation of ocrad (http://www.gnu.org/software/ocrad/ocrad.html - the latest version as of this writing is 0.16) and the Python Imaging Library (a.k.a. PIL, available at http://www.pythonware.com/products/pil/).
The first two of these can be tested by users of the Outlook plug-in, but at this point the latter two will have no effect for plug-in users unless they check out the code from CVS. Recent changes since the 1.1a3 release fix a number of problems on Windows. If Outlook users wish to try out the x-short_runs and/or x-lookup_ip options, they should open the "bayescustomize.ini" file in their application data directory (the "Advanced" tab of the SpamBayes Manager will find that for you) and add the lines:
[Tokenizer] x-short_runs:True x-lookup_ip:True x-pick_apart_urls:True
Users of sb_server and/or sb_imapfilter can simply use the "Experimental Options" page via the web interface to enable any of these four options, as well as ones present in earlier SpamBayes versions.
3. Storage Types
-
New database backends (other than the pickle and bsddb options available in 1.0.x) are now available, including one using ZODB. In addition, Outlook users are now able to select database backends other than bsddb. We would be very interested in hearing from anyone (Outlook or sb_server or sb_imapfilter, or any others) willing to try the ZODB storage type out.
To do this with sb_server or sb_imapfilter, you will need to change the name of your database files on the main Configuration page, then change the type of storage on the Advanced Configuration page. You may get an error at this point - to fix that, stop sb_server/sb_imapfilter, manually remove the database files with the new names, and restart sb_server/sb_imapfilter. (This bug will be fixed in a future release).
To do this with Outlook, you need to open (or create) your default_bayes_customize.ini file, and put in it the lines:
[Storage] persistent_use_database:zodb
-
If you want to convert your existing database, you will need the source release, Python, and ZODB, and can use the sb_dbexpimp.py script as follows (replace "default_bayes_database.db" with "Proxy/hammie.db" for non-Outlook users):
sb_dbexpimp.py -e -d "C:\Documents and Settings\{username}\Application Data\SpamBayes\default_bayes_database.db" -f bayes.csv sb_dbexpimp.py -i -f bayes.csv -o Storage:persistent_use_database:zodb -o "Storage:persistent_storage_file:C:\Documents and Settings\{username}\Application Data\SpamBayes\default_bayes_database.fs"
-
(The second command has been wrapped, but should be entered as one long line). You must use the full path to the files as above; doing the conversion elsewhere and then moving the files will not work. You will also have a CSV version of the database that you can safely remove.
4. Translations
-
If you speak Spanish or French, please try out the translated interface, and let us know if there are any problems. This should automatically switch to the right language if you are using Windows and have the OS language set to French/Spanish; otherwise you can manually change the language in your conifguration file - sb_server and sb_imapfilter users can find the option on the Advanced Configuration page, while Outlook users will have to manually edit (or create) their default_bayes_customize.ini file and add in the lines, for example:
[globals] language:fr,es_AR
5. Statistics
-
The statistics have been significantly improved for both Outlook and web interface users. If you could keep and eye on the statistics and let us know if there are any problems, or if you have any ideas about how the statistics could be further improved, that would be great.
6. Outlook
-
We would appreciate reports about the new notifications tab - both whether it works as you expect, and any improvement ideas you might have.
We would appreciate reports about the new ability to move mail classified as ham. In particular, we hope that this will provide a solution for those people that need to have rules run *after* SpamBayes (copying messages to a remote device, for example). You should now be able to have ham moved to a "Ham" folder, and have Outlook process rules on messages arriving in that folder. We would appreciate knowing whether this does help as expected.
7. sb_server
-
You should now be able to use POP over SSL (for half of the connection), removing any requirement to use tools like stunnel. If your mail server supports POP over SSL, it would be great if you could test this out (and you get more secure mail as an added bonus). To do this:
-
In the web Configuration page, set SpamBayes to connect to port 995 of your mail server (e.g. instead of 'mail.example.com' in the "Remote Servers" option, you have 'mail.example.com:995').
-
Manually add these lines to your configuration file:
[pop3proxy] use_ssl:automatic
-
Restart sb_server.
-
If you use the Windows service, it would be great if you could do a fresh install of the service, and let us know if the new procedures (outlined in the "What's New" document) work better for you than the old.
8. sb_imapfilter
-
If you are a Windows user, you can now use the binary to install sb_imapfilter. If you experience any trouble with the binary version, please let us know.
sb_imapfilter should do a much better job of keeping track of messages, and should not need to create nearly as many copies of messages. It should seemlessly work with your old database, and not retrain any already-trained messages, or reclassify any already-classified messages. Please let us know if it does do this, or if you have any other troubles with it.
If your IMAP server supports logging in with AUTH CRAM-MD5, then sb_imapfilter will now attempt to do this. Reports of success or failure of this would be appreciated.