UserPreferences

AlexDemoTestSet


Note: This wiki is now frozen; you can no longer edit it, and no interactive features work.

1. Alex's Demo Test Set

The test set used to generate a number of the graphs under TrainingIdeas is based on AlexPopiel's personal mail, collected over a span of 418 days. The test set includes approximately 77000 messages, of which about 21500 are ham and nearly 56000 are spam. Virus/worm emails are considered spam, and the "latest windows update" worm makes its presence felt around day 360.

The dataset is divided into 10 subsets, which are run through the incremental.py harness 10 times, excluding 1 set each time, as per normal cv-ish behaviour. Thus, each of the measurements is replicated 10 times, with slightly different input data.

Graphs using this dataset are found in: