Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
IFILE(1)			 User Commands			      IFILE(1)

NAME
       ifile - core executable for the ifile mail filtering system

SYNOPSIS
       ifile  [-b  file] [-q|-Q] [-g] [-k] [-o]	[-v num] [lexing options] file
       ...
       ifile -c	-q|-Q [-T threshold] [-b file] [-g] [-k] [-o] [lexing options]
       file ...
       ifile [-b file] [-d folder] [-i folder|-u folder] [-g]  [-k]  [-o]  [-v
       num] [lexing options] file ...
       ifile -r	[-b file]

DESCRIPTION
       ifile is	a mail filter client that uses machine learning	to classify e-
       mail  into  folders/mail	 boxes.	  The algorithm	that it	uses is	called
       Naive Bayes.   Basically, naive bayes considers each  document  an  un-
       ordered	collection  of	words  and classifies by matching the document
       distribution with the most closely  matching  folder/mailbox  distribu-
       tion.

OPTIONS
       -b, --db-file=file
	      Location to read/store ifile database.  Default is ~/.idata

       -c, --concise
	      equivalent of "ifile -v 0	| head -1 | cut	-f1 -d".  Must be used
	      with -q or -Q.

       -d, --delete=folder
	      Delete the statistics for	each of	files from the category	folder

       -f, --folder-calcs=folder
	      Show the word-probability	calculations for folder

       -g, --log-file
	      Create and store debugging information in	~/.ifile.log

       -i, --insert=folder
	      Add the statistics for each of the files to the category folder

       -k, --keep-infrequent
	      Leave  in	 the  database words that occur	infrequently (normally
	      they are tossed)

       -l, --query-loocv=folder
	      For each of the files, temporarily  removes  file	 from  folder,
	      performs	query  and then	reinserts file in folder.  Database is
	      not modified.

       -o, --occur
	      Uses document bit-vector representation.	Count each  word  once
	      per document.

       -q, --query
	      Output rating scores for each of the files

       -Q, --query-insert
	      For  each	 of the	files, output rating scores and	add statistics
	      for the folder with the highest score

       -T, --threshold=threshold
	      When used	with both -c and -q, output the	 two  highest  ranking
	      categories  if  their score differs by at	most threshold / 1000,
	      which can	be used	to detect border cases.	  When	used  with  -q
	      only and any threshold > 0, output the score difference percent-
	      age.  For	example,
		     ifile -T1 -q foo.txt
	      might result in
		     spam -15570.48640776
		     non-spam -18728.00272369
		     diff[spam,non-spam](%) 9.21
	      If so, then
		     ifile -T93	-q -c foo.txt
	      will result in
		     foo.txt spam,non-spam
	      whereas
		     ifile -T92	-q -c foo.txt
	      will result in
		     foo.txt spam

       -r, --reset-data
	      Erases all currently stored information

       -u, --update=folder
	      Same as 'insert' except only adds	stats if folder	already	exists

       -v, --verbosity=num
	      Amount  of  output while running:	0=silent, 1=quiet, 2=progress,
	      3=verbose, 4=debug

       Lexing options:

       -a, --alpha-lexer
	      Lex words	as sequences of	alphabetic characters (default)

       -A, --alpha-only-lexer
	      Only lex space-separated character sequences which are  composed
	      entirely of alphabetic characters

       -h, --strip-header
	      Skip all of the header lines except Subject:, From: and To:

       -m, --max-length=char
	      Ignore  portion of message after first char characters.  Use en-
	      tire message if char set to 0.  Default is 50,000.

       -p, --print-tokens
	      Just tokenize and	print, don't do	any other  processing.	 Docu-
	      ments are	returned as a list of word, frequency pairs.

       -s, --no-stoplist
	      Do not throw out overly frequent (stoplist) words	when lexing

       -S, --stemming
	      Use 'Porter' stemming algorithm when lexing documents

       -w, --white-lexer
	      Lex words	as sequences of	space separated	characters

       If  no files are	specified on the command line, ifile will use standard
       input as	its message to process.

       -?, --help
	      Give this	help list

       --usage
	      Give a short usage message

       -V, --version
	      Print program version

       Mandatory or optional arguments to long options are also	 mandatory  or
       optional	for any	corresponding short options.

FILES
       ~/.idata
	      ifile  database  (default	 location).  See FAQ included in ifile
	      package for description of database format.

AUTHOR
       Jason  Rennie  <jrennie@csail.mit.edu>  and  many  others.    See   the
       ChangeLog for the full list.

EXAMPLES
       Before  using  ifile,  you  need	 to train it.  Let's say that you have
       three folders, "spam", "ifile" and "friends", and the following	direc-
       tory structure:

	      /--+--spam----+--1
		 |	    +--2
		 |	    +--3
		 |
		 +--ifile---+--1
		 |	    +--2
		 |	    +--3
		 |
		 +--friends-+--1
			    +--2
			    +--3

       The following commands build the	ifile database in ~/.idata (use	the -d
       option to specify a different location for the database):

	      ifile -h -i spam /spam/*
	      ifile -h -i ifile	/ifile/*
	      ifile -h -i friends /friends/*

       The -h option strips off	headers	besides	"Subject:", "From:" and	"To:".
       I find that -h improves ifile's performance, but	you may	find otherwise
       for your	personal collection.

       Note that we have made the argument to -i the same as the corresponding
       folder  name. This is not necessary. The	argument to -i can be any word
       you want	to use to identify a category of e-mails. The argument	to  -i
       must not	include	space characters (including tab, feedline, etc.).

       At this point, your ~/.idata file should	look something like this:

	      spam ifile friends
	      662 1020 6451
	      3	3 3
	      jrennie 9	0:3 1:18 2:16
	      mindspring 6 1:7 2:5
	      make 9 0:5 1:3
	      yahoo 9 0:1 1:22 2:2

       The  first  line	is the space-separated list of folders.	Their ordering
       specifies a numbering (spam=0, ifile=1, friends=2). The second line  is
       a  token	 count	for each folder	(e.g. 662 tokens observed in the three
       spam messages). The third line is an e-mail count for each folder (e.g.
       3 e-mails for each of spam, ifile and  friends).	 Each  following  line
       specifies statistics for	a word.	The format of a	line is

	      word age folder:count [folder:count ...]

       where  folder  is the folder number determined by the first line	order-
       ing. Folders with a count of zero are not listed. So, the  line	begin-
       ning with "jrennie" indicates that "jrennie" appeared 3 times in	"spam"
       e-mails,	18 times in "ifile" e-mails and	16 times in "friends" e-mails.
       The  age	 is  the  number of e-mails that have been processed since the
       word was	added to the database. Very infrequent words are  pruned  from
       the database to keep the	database size down.

       Now  that  you  have a database,	you might want to filter some e-mails.
       Say you have the	following incoming e-mails:

	      /--inbox--+--1
			+--2
			+--3

       To find out what	folders	ifile thinks these e-mails belong in, run

	      ifile -c -q /inbox/1
	      ifile -c -q /inbox/2
	      ifile -c -q /inbox/3

       Let's say that 1	is about ifile,	2 is spam and 3	is from	a friend.  As-
       suming ifile does its job correctly, you'll see output like this:

	      /inbox/1 ifile
	      /inbox/2 spam
	      /inbox/3 friends

       With  such  little  training  data, ifile is unlikely to	get the	labels
       correct,	but you	should get the idea :-)

       Now, if you move	the e-mails to the folders suggested by	ifile,	you'll
       want  to	 update	 the database accordingly. You can do this with	the -i
       option, like before. Or,	you can	simply use -Q in place	of  -q	above.
       This automatically adds the e-mail to the folder	ifile suggests.

       Now, assume for a moment	that e-mail 1 was actually spam. We've added 1
       to ifile	and put	it in the ifile	folder.	We need	to move	it to the spam
       folder  and  update  the	 ifile database	accordingly. We	can update the
       database	with the following command:

	      ifile -d ifile -i	spam /inbox/1

       This deletes the	e-mail from "ifile" and	adds it	to "spam".

SEE ALSO
       Examples	of how to use ifile together with procmail(1) and  metamail(1)
       can be found in the directory /usr/share/doc/ifile/examples.

ifile 1.3.4			 November 2004			      IFILE(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=ifile&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help