Muli Ben-Yehuda's journal

June 22, 2003

spam filtering with bogofilter

Filed under: Uncategorized — Muli Ben-Yehuda @ 2:53 PM

Yesterday evening, I saw Orna tweaking her spam rules manually.
I opined that tweaking the rules by hand is rather inefficient,
and she opined right back that my setup is even less efficient.
My setup, in case you’re wondering, can be summed up as
“manually delete spam whenever it reaches my inbox”.

The reason I haven’t set up spam filtering so far is that I only
get a dozen or so spams every 24 hours. Since I read my email
compulsively (some would say obsessively), I rarely have more
than a few messages to deal with at a time, and so what if one
of them is a spam? I just delete it, just like I delete most of
the rest.

But yesterday I decided that enough is enough, and I’ll install a
bayesian spam filter. I googled for a bit, and settled on
bogofilter. apt-get install
bogofilter took care of downloading and installing, and then I set
out to read the man page and configure it. At first I was put off
by the fact that ESR wrote it (cf. the CML2 debacle), but then
I decided to give it a try anyway.

Bottom line: it works, once you train it. I use the following two
pieces of configuration that might prove useful, both of which are
adapted from the bogofilter man page:

In my .procmailrc:

# filter mail through bogofilter, tagging it as spam and
# updating the word lists

# Since I built the good list from all of my saved mail, and since
# bogofilter has a bug (which I will soon report to the approriate
# place) when called from procmail to update my very large good list,
# unlike the example in the bogofilter man page, I first run the
# incoming message through bogofilter, without registering its
# words. Then, if it's spam, the bad list gets updated.
:0fw
| /usr/bin/bogofilter -e -p

# file the mail to spam-bogofilter if it's spam, and update the bad
# list
:0
* ^X-Bogosity: Yes, tests=bogofilter
{
  :0 c
  | /usr/bin/bogofilter -s

  :0
  spam-bogofilter
}

And in my .muttrc, to ‘mark as spam and delete’ spam that makes it into my inbox:

# bogofilter integration, taken from the bogofilter man page
macro index \eD "unset wait_key\n\
bogofilter -s -l\n\
set wait_key\n\
" "delete message as spam"

A couple of other userful command lines:

# register every mail in the file mbox-name as spam
$ bogofilter -M -I mbox-name -s -v

# register every mail in the file mbox-name as good (ham)
$ bogofilter -M -I mbox-name -n -v

I used these two to build my good list and bad list quickly.

1 Comment »

  1. I use spambayes, as an outlook plug-in.
    It works amazingly well, even though I had very little spam to train it on.
    It filters out 100% of the spam, with very little false positives (which look a lot like spam to me. Mostly company announcements and such).

    Comment by shapirac — June 22, 2003 @ 9:43 AM | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: