FreeBSD Manual Pages

home | help
BMF(1)									BMF(1)

NAME
       bmf - efficient Bayesian	mail filter

SYNOPSIS
       bmf [-t]	[-n] [-s] [-N] [-S] [-f	fmt] [-d db] [-i file] [-k n] [-m type]	[-p]
	   [-v]	[-V] [-h]

DESCRIPTION
       bmf  is	a  Bayesian  mail  filter. In its normal mode of operation, it
       takes an	email message or other text on standard	input, does a  statis-
       tical check against lists of "good" and "spam" words, registers the new
       data,  and  returns a status code indicating whether or not the message
       is spam.	BMF is written with fast, zero-copy algorithms,	coded directly
       in C, and tuned for speed. It aims to be	faster,	smaller, and more ver-
       satile than similar applications.

       bmf supports both mbox and maildir mail storage formats.	It will	 auto-
       matically process multiple messages within an mbox file separately.

OPTIONS
       Without	command-line options, bmf processes the	input, registers it as
       either "good" or	"spam",	and returns the	appropriate  error  code.  The
       wordlist	directory and nonexistent wordfiles are	created	if absent.

       -t  Test	to see if the input is spam. The word lists are	not updated. A
       report is written to stdout showing the final score and the tokens with
       the highest deviation form a mean of 0.5.

       -n Register the input as	non-spam.

       -s Register the input as	spam.

       -N Register the input as	non-spam and  undo  a  prior  registration  as
       spam.

       -S  Register  the  input	 as spam and undo a prior registration as non-
       spam.

       -f fmt Specify database format. Valid formats are text, db, and	mysql.
       Text  is	 always	 valid.	 The others may	not be available if the	corre-
       sponding	option was not enabled at compile time.	The default is	db  if
       available, else text.

       -d  db Specify database or directory for	loading	and saving word	lists.
       The default is ~/.bmf in	text mode.

       -i file Use file	for input instead of stdin.

       -k n Specify the	number of extrema (keepers) to use in the Bayes	calcu-
       lation. The default is 15.

       -m fmt Specify mail storage format. Valid formats are mbox and maildir.
       The default is to automatically detect the mail	storage	 format.  This
       option is deprecated.

       -p  Copy	 the input to the output (passthrough) and insert spam headers
       in the style of SpamAssassin. An	X-Spam-Status  header  is  always  in-
       serted  with processing details.	The contents of	this header always be-
       gin with	either "Yes" or	"No". If the input is judged to	be  spam,  the
       header "X-Spam-Flag: YES" is also inserted.

       -v Be more verbose. This	option is not well supported yet.

       -V Display version information.

       -h Display usage	information.

THEORY OF OPERATION
       bmf  treats its input as	a bag of tokens. Each token is checked against
       "good" and "bad"	wordlists, which maintain counts  of  the  numbers  of
       times  it  has  occurred	 in non-spam and spam mails. These numbers are
       used to compute the probability that a mail in which the	 token	occurs
       is spam.	After probabilities for	all input tokens have been computed, a
       fixed  number  of  the probabilities that deviate furthest from average
       are combined using Bayes's theorem on conditional probabilities.

       While this method sounds	crude compared	to  the	 more  usual  pattern-
       matching	 approach,  it	turns out to be	extremely effective. Paul Gra-
       ham's paper A Plan  For	Spam:  http://www.paulgraham.com/spam.html  is
       recommended reading.

       bmf  improves  on Paul's	proposal by doing smarter lexical analysis. In
       particular, hostnames and IP addresses are not discarded,  and  certain
       types of	MTA information	are discarded (such as message ids and dates).

       MIME  and  other	 attachments are not decoded. Experience from watching
       the token streams suggests that spam with enclosures  invariably	 gives
       itself  away  through  cues  in	the  headers  and non-enclosure	parts.
       Nonetheless, I would like to add	the ability to decode quoted-printable
       and perhaps base64 encodings for	textual	attachments.

INTEGRATION WITH OTHER TOOLS
       Please see the README for samples and suggestions.

RETURN VALUES
       In passthrough mode: zero for success, nonzero for failure.

       In non-passthrough mode:	0 for spam; 1 for non-spam; 2 for I/O or other
       errors.

FILES
       ~/.bmf/goodlist.txt
	      List of good tokens for text mode.

       ~/.bmf/spamlist.txt
	      List of bad tokens for text mode.

       ~/.bmf/goodlist.db
	      List of good tokens for libdb mode.

       ~/.bmf/spamlist.db
	      List of bad tokens for libdb mode.

BUGS
       The lexer should	recognize multiline headers.

       The lexer should	recognize MIME attachments.

       Content-Transfer-Encoding is not	decoded.

AUTHOR
       Tom Marshall <tommy@tig-grr.com>.

       The Bayes algorithm is from bogofilter by Eric  S.  Raymond  <esr@thyr-
       sus.com>.  bogofilter  can  be  found  at  the bogofilter project page:
       http://bogofilter.sourceforge.net/.

									BMF(1)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=bmf&sektion=1&manpath=FreeBSD+Ports+15.0>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages