1. Train on Almost Everything
There are ways in between: RobsSetup describes a training method that can be summarized as: train on almost anything that didn't score 0.00 or 1.00. Since, after a small initial training period, most messages will score either 0.00 or 1.00, this drastically reduces the database size from the train on everything strategy (and solves some of its other problems as well). A possible advantage over the train on mistakes strategy is that it works without relying very much for scoring on words that are represented only once in the database (so called hapaxes).
This training scheme (represented by the nonedge regime for the incremental harness) does better than TrainOnEverything. In particular, it trains far fewer messages for slightly greater accuracy. It also doesn't seem to decay as badly over extended periods. Have some pictures:
Details of the test set are in AlexDemoTestSet
