1. The SpamBayes Wiki
The SpamBayes Wiki exists to let the users and developers of SpamBayes cooperate to develop documentation, share tips and recipes, and generally help each other out. The SpamBayes Wiki is maintained by RichieHindle.
1.1. How to edit the Wiki
Anyone can add or edit content here - see HowToContribute for an introduction. Because the Wiki has been repeatedly defaced by spammers, you must create an account and log in before you can edit any pages. You can do this via UserPreferences.
1.2. Where to start: SpamBayes Applications
SpamBayes comes in the form of a core engine and several applications - you should choose the application to use according to the way you receive your email:
-
Outlook users (not including Outlook Express) should start with the OutlookPlugin.
-
Most other users should use either the Pop3Proxy (under its new name sb_server) or the ImapFilter, according to whether you receive email via POP3 or IMAP.
-
Unix heads should use SbFilter, which integrates into procmail, or Pop3Proxy / ImapFilter. Another interesting option is the python based procmail replacement PycMail.
1.3. More useful stuff
The SpamBayes Wiki is here to help people to share information about SpamBayes:
-
UserRecipes - recipes and scripts that SpamBayes users have written.
-
WebSiteDevelopment - Help us to develop a better SpamBayes website.
-
TrainingIdeas - ideas for different training strategies and tactics.
-
HowToContribute - would you like to help develop, test or document SpamBayes?
-
MailerTricks - recipes and scripts related to specific mailers.
-
NonEnglish - using Spambayes with languages other than English.
-
InstallerTips - tips about using the Windows installer.
-
Howtos - tutorials about installing, configuring and using SpamBayes.
1.4. Stuff to try
Just about everyone has their own pet idea about what would make a good improvement to SpamBayes. Some people even have ideas about how the tokenizing (converting a message into bits) or classification could be better. There's a file in CVS called NEWTRICKS.TXT that outlines some of these, but for those without write access to that, please feel free to suggest ideas here! (For example, we could generate a token for any token not in a dictionary).
If you want to try things out, grab 1.1a4 (the latest test alpha release), and TryOutThePreRelease changes.
-
Rule to try: Add a variable to indicate whether the sender is currently in your address book plus a variable comparing the ratio of mispelled words to correctly spelled words. Thanks. Matt Parker.
-
DeAnagraming doesn't seem to help... -- TonyMeyer
-
Received-SPF: while SPF blocks messages that FAIL, other results are statistically correlated with spam. I am currenting running a Python wrapped DSPAM system, and here are stats for SPF results:
-
Suggestion: very easy change to the UI that I feel would really help. Use onmouseover and onmouseout events in <tr> tags to highlight the current row during "Review" (see browsing a DB table in phpMyAdmin project for an example). When there are lots of rows on the screen this makes it much easier to see what you are tagging as ham or spam. -- Matt Southall I've added this to CVS now - if you're running from CVS, if you CVS-up you should see it take effect. Otherwise, this will be in 1.1a2 and onwards. Please let us know (spambayes@python.org) if you have any problems with it (tested in Firefox 1.0.1 and IE6) or have suggestions for improvements. Thanks for the idea! --TonyMeyer
-
My feature wish is a user-friendly interface for the maintenance of the database. [I moved the discussion to TrainingIdeas (Chapter "Manual Pruning of Databases")] -- GuenterMilde
-
See MultiWayClassification -- AndyGlew
SPF result | fraction that is spam |
NEUTRAL | 0.898679 |
NEUTRAL(guess) | 0.926437 |
PASS | 0.101463 |
PASS(guess) | 0.257572 |
SOFTFAIL | 0.910824 |
NONE(guessed) | 0.658428 |
UNKNOWN | 0.580007 |
I wrote a sample patch to do the SpfTokenizing. -- TonyMeyer
The SPF format is now RFC4408 with a new website. The received-SPF header is recommend by the spec. -- StuartGathman