Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
mls(1)				 User Manuals				mls(1)

NAME
       mls - Display useful statistics on email	messages

SYNOPSIS
       mls  [-hvq] [-l lang] [-i file] [-o file] [-r|w|u file] [-t|T text] [-m
       mode] [-n XX] [-g xxxx]

DESCRIPTION
       MailListStat is program that prints some	"useful" statistical  info  on
       email  messages.	  It's	main  usage  is	in email conferences - mailing
       lists. Currently	it displays both tables	and graphs.   You  can	select
       either TEXT or HTML output.

OPTIONS
       -h     print help text and exit

       -q     be quiet (print only errors to stderr)

       -v     turn  on	verbose	mode - in this mode it will print more info to
	      stderr - indication of progress (will print  every  10th,	 20th,
	      ...,  90th, 100th, 200th,	..., 900th, 1000th, 2000th ... message
	      being processed) and warnings about malformed headers found

       -l lang
	      select output language; please note that this  applies  only  to
	      generated	 statistics  -	program	messages printed to stderr ale
	      always in	English. These languages are currently	supported:  EN
	      (English),   SK	(Slovak),  IT  (Italian),  FR  (Francais),  DE
	      (Deutsch), ES (Spanish), SR (Serbian), BR	(Portugues Brasil).

       -i file
	      name of input file (if not specified,  use  stdin).   This  file
	      should be	in MBOX	format.	It should exist	and be readable.

       -o file
	      name  of output file (if not specified, use stdout).  If exists,
	      it will be overwritten.

       -r file
	      read input from cache file instead of mailbox. You can read  in-
	      put either from mailbox or cache file, not both!

       -w file
	      write  cache  file  (no  stats produced).	You can	either produce
	      text output or write cache file, not both!  When	writing	 cache
	      file, output-related options are ignored.

       -u file
	      update cache file	= read cache, read input, write	cache. For use
	      with .procmailrc/.forward

       -t text
	      name  of mailing list this statistics is computed	for. If	speci-
	      fied, it is just appended	to the title of	statistics, so it will
	      be like "Statistics from 16.8.2001 to 7.9.2001 for text",	 where
	      text  is whatever	you put	as this	parameter (it could be name of
	      the mailing list or just its email, e.g.	mobil@mobil.sk).

       -T text
	      title text (only this will be printed as	title);	 this  can  be
	      used  to	supress	 normal	title text (date of oldest/newest msg)
	      and completely replace it	with your text.

       -m mode
	      select mode of output (text, html, html2).

       -n XX  show TOP XX tables (default TOP 10). By  default,	 mls  displays
	      tables  of  TOP  10 people, subjects, quoting or whatever. Using
	      this parameter, you can define how many lines shall these	tables
	      have.

       -g xxxx
	      graphs to	show (Day, Week, Month,	Year, Xnone) -	specify	 first
	      letter (e.g. -g dmy).

EXIT STATUS
       0      Everything went OK and no	error occurred.

       1      Error  in	 sscanf() while	reading	& parsing cache	file. It means
	      that the format of cache file is	invalid.  Try  to  create  the
	      cache file again.

       2      Invalid  command-line option/language. You have specified	an in-
	      valid command-line parameter.

       3      Cannot open input/output file. Please check that you have	 typed
	      correct  filename	 and  that you have read permissions for input
	      file and write permissions  to  destination  directory  (because
	      output file must be created).  If	output file exists, it's over-
	      written.

       4      Not  enough  memory is available for dynamically allocated vari-
	      ables. This could	be caused by user-limits, because mls requires
	      only few MBs  of	memory	(it  depends  on  number  of  messages
	      processed	and number of different	subjects and authors).

       5      Error  compiling	regex.	This  error should not occur in	world-
	      available	versions.

USAGE
   Input
       On input, there should be mailbox file in standard MBOX format. If  the
       file  is	 in  different	format,	 the  results are unpredictable. There
       should be at least one email message, otherwise no stats	 can  be  com-
       puted.

       Warning:	 Be  sure that no special messages are in input	files (such as
       that with "DON'T	DELETE THIS MESSAGE -- FOLDER INTERNAL DATA" subject),
       because they will be also analysed. Many	programs  (POP3/IMAP  daemons,
       email  readers) put their special messages to the mailbox. This message
       is only ignored when reporting oldest message found.

   Output
       Statistics is put into output file (or stdout if	unspecified) in	speci-
       fied language. All diagnostic messages are written to stderr and	are in
       English.	 Output	consists of several statistical	data - tables,	graphs
       and summaries.  The title has two formats depending on -t parameter. If
       it's  not  specified,  it  looks	 like  "Statistics  from  16.8.2001 to
       7.9.2001", where	first date is date of the oldest message found in  in-
       put  and	 second	 is date of the	newest one. If there is	for example -t
       mobil@mobil.sk parameter, it will look like "Statistics from  16.8.2001
       to  7.9.2001  for mobil@mobil.sk". The problem is that date of oldest &
       newest msg is often wrong (thanks to bad	date/time settings  on	PC  of
       msg  author), so	you can	specify	entire title using command-line	option
       -T.  When used, only your text will be printed as title,	nothing	 more.
       There  you can put for example something	like "Statistics for mobil@mo-
       bil.sk".

       Now you have option ( -g) to specify which graphs you want  to  show  -
       hours of	Day, days of Week, days	of Month, months of Year. Use 1st let-
       ters  as	 argument  to -g option	(so -g dw will print just hours	of Day
       and days	of Week). Use -g x to disable printing of any graph. For exam-
       ple you don't want to show graph	for months of Year if you are present-
       ing stats for one month,	but for	full-year stats	you probably want it.

   HTML	output
       You can choose between 2	modes of output	- TEXT and HTML. When in  HTML
       mode,  mls will produce the output as HTML page.	When you specify HTML2
       mode, only the body of HTML document is produced	(no  header/footer)  -
       it can be used to have different	HTML header/footer when	calling	mls as
       CGI  or	when using PHP wrapper.	The output consists of HTML tables and
       bar graphs. Almost every	aspect of how it looks can  be	configured  by
       modifying  CSS  style-sheet.  Please  note that files style_mls.css and
       bar.gif must be present in the same directory as	 produced  HTML	 file.
       You  can,  however,  modify  both  to  best suit	your needs. Everything
       should be clear after reading comments in CSS file and looking  at  the
       produced	HTML source.

       I was unsure what type of graphs	to produce. I have tried also horizon-
       tal bar graphs and if you want to try them, just	uncomment part of code
       in PrintGraphHtml() in mls_text.c.

   Cache file support
       Instead	of  producing  statistics in text format, you can save all the
       generated values/results	into "cache" file. Retrieving information from
       this file is very fast, so it is	useful for integration with web	pages.
       Now you can update the cache file just after  new  mail	was  received.
       Users can view actual stats using mls
	as  CGI	 script.  It  has an advantage over static stats that user can
       choose language and others options and it will be generated  in	a  mo-
       ment!

       To update cache file, use the -u	option.	It works like this: first, the
       stats  are  loaded from cache file (doesn't have	to exist) and then new
       message(s) to be	added are read from stdin (or from -i file) and	 added
       to  the stats.  Finally the updated stats are written back to the cache
       file. The process is really quick, because usually only one message  is
       added  at  a  time. This	is useful mainly for updating cache files upon
       receiving new message. In the "examples/" subdir, you can find examples
       of integration with your	.forward and .procmailrc files.	By running MLS
       more than once, you can generate	cache files for	individual months  and
       also  for  whole	 years	(see  examples).  Then	use some PHP script to
       present list of these cache files to user.

       Format of cache files was changed in version 1.3, because of new	 stats
       added.	Now  it	 contains version info,	so mls can inform you that you
       have to re-create that cache file with new version. Unfortunately,  you
       have  to	re-create them also when you want new email clients to be rec-
       ognized also in old  (already  processed)  messages.  Note  that	 email
       clients detection was buggy in 1.2.2 (a lot of clients not recognized).

   PHP wrapper
       I  have written also PHP	wrapper	for mls	to make	it more	"interactive".
       It has two major	advantages over	plain HTML output from mls:  User  can
       choose  output  language	 and  number of	TOP items to show. It works by
       running mls with	appopriate command-line	options.  It's	safe,  because
       only two	items from user	are language and topXX which are checked using
       regexp,	so  running arbitrary code is not possible. You	can also alter
       mls output - for	example	change @ in email addresses to (at) to prevent
       spamming.

       You can have normal MBOX	file as	input, but  I  recommend  using	 cache
       file.   When  using cache file, the stats are produced in a moment. You
       can see how long	it took	to generate the	page, see  the	last  line  of
       HTML  source.  However,	there  is minor	speed problem. It takes	longer
       when you	specify	to show	many topXX (like 999). The problem  is	regexp
       that  searches  for  @. It has to search	for it in whole	mls output to-
       gether and when it is large, it takes a while (1.1 seconds on my	2.1GHz
       pentium4). I have added an  option  which  should  use  Perl-compatible
       regex  function	(preg_replace)	instead	 of  POSIX  (ereg_replace), if
       available. This will result in MUCH faster execution (50ms  instead  of
       1.1sec).

NOTES
   How it is all computed?
       OK, so let's start from beginning - the format of MBOX file. It's plain
       text file containing some email messages	delimited with one empty line.
       Each  message  starts  with  line  like	this From abc@a.sk  Thu	Aug 16
       15:48:58	2001.  After this line,	there are few headers, one empty  line
       and message text.  Storing emails in this format	is quite common	- your
       incoming	 mail is usually saved in MBOX format and also your folders in
       mail-readers like elm(1), pine(1), mutt(1)...

       Who is author of	an email message? It's taken from From:	 header	 field
       and everything except the actual	email address (like your full name) is
       stripped	off using quite	simple regular expression (regexp).

       Subject	is  taken from Subject:	header field. If it contains some Re:,
       those will be stripped off. There can be	up to 5	of them. Also  counted
       format  (  Re[3]:) is supported.	For example The	Bat! email client uses
       it. MIME-decoding is applied to subject lines (see below).

       Date is just everything in the Date: header. This header	 is  generated
       by  the	email  client, so it's date of message creation	and it doesn't
       have to be present in each message. If it isn't,	you are	warned by mes-
       sage like "Warning: 1 message(s)	not counted." in output. Some  clients
       don't  put  full	 date there and	usually	the day	of week	is missing and
       you are warned.	No timezones are considered, the date is taken as-is.

       Message size is everything between end of message header	and  beginning
       of  new	email  (or  end	 of file). So only actual size of message text
       (body) is counted, not headers.

       Email clients are taken from X-Mailer: or User-Agent: or	 X-Newsreader:
       headers	and  some  grouping is done to avoid different versions	of the
       same mailer to take the whole TOP 10. There  is	also  work-around  for
       Pine mailer (MLS	will search also Message-ID: header).

   What	is quoting? Why	I have it 95%?
       What is quoting?	When you reply to some message,	you can	insert part of
       the  original  message there, you quote the author of original message.
       Every line of original text is usually prepended	with > or  MP>,	 where
       MP  are	initials  of  the original sender's name (for example The Bat!
       uses this second	format).

       And what	is "quote ratio"? It's size of quoted text  divided  by	 total
       size  of	message, specified in percent. It's included in	stats, because
       many people reply to message, add one line of text  and	leaving	 there
       for example 10 pages of original	text, which makes the quote ratio even
       higher  than  90%!  In  times of	FIDONET, there were conferences, where
       quote ratio higher than 50% was forbidden. Try to think about  it  when
       replying	 to  message  in  mailing list where more than 300 people will
       download	and read it.

   And now all the stats
       At first, there are TOP 10 tables (or TOP XX when using -n  XX  parame-
       ter). First table shows people who have written most messages, how much
       and  how	 many percent of total message count it	is. Last row shows the
       "other" - number	of messages written by everyone	not listed  above  and
       how many	percent	it is. Second and third	tables are similar to this one
       -  they also show best authors, but not by the number of	messages writ-
       ten.  Authors are sorted	by total (or average) size of all  their  mes-
       sages,  but  without quoting (size of message minus how much was	quoted
       in that msg).  Next table shows most successful subjects	and  how  many
       messages	with this subject have been posted. The	other table shows most
       used  email clients.  The last table show people	with maximal quote ra-
       tio. It's computed as sum of quoted text	in all	his/her	 messages  di-
       vided by	total size of those messages.  Last row	shows an average - sum
       of quoted text in all messages divided by total size of all messages.

       Next  part  of  stats are some graphs. They show	how much messages have
       been written during different hours of day, days	of month and  days  of
       week.  From  these  you	can see	for example when (and how much)	people
       sleep :)	or if they work	during the working-hours or just write tons of
       messages...

       Next part contains info about messages which are	BEST  in  something  -
       message	with  max. quote ratio,	longest	message	and some details about
       most successful subject.

       At the end, there is final summary - total number  of  messages,	 their
       total and average size and number of different authors and subjects.

   MIME	(Multipurpose Internet Mail Extensions)
       What  is	 it?  Original	implementation email permitted only 7bit ASCII
       messages.  But during the time, there was need to send international or
       even binary files. MIME defines how can these be	encoded	into 7bit form
       suitable	for emailing and how to	decode it back to human	readable form.

       In email	message, you can have MIME-encoded text	(body of message), but
       also some headers - for example subject and From	field.	MLS  tries  to
       find  out  if subject lines are MIME-encoded and	if so, it tries	to de-
       code it,	to present it to you in	human-readable form. You can read more
       about MIME in RFC 1521 and 1522.

   Inspiration
       I was inspired by similar DOS program used before few years in  FIDONET
       and Slovak ULTRANET. It was created by Ivan Friedlander.

BUGS/TODO
             doesn't  support	header	fields splitted	to more	lines (you can
	      use formail(1) to	put them to one	line before using MLS)

             charset conversion in MIME-decoding

             more stats

VERSION
       This man	page is	written	for mls	version	1.3.

AUTHOR
       mls (MailListStat) is written by	Marek -Marki- Podmaka.

SEE ALSO
       Visit			      https://github.com/marki555/MailListStat
       <https://github.com/marki555/MailListStat>  for	more  information  and
       latest version of mls.

COPYING
       MailListStat - print useful statistics on email messages	Copyright  (C)
       2001-2003  Marek	Podmaka

       This program is free software; you can redistribute it and/or modify it
       under  the  terms of the	GNU General Public License as published	by the
       Free Software Foundation; either	version	2 of the License, or (at  your
       option) any later version.

       This  program  is  distributed  in the hope that	it will	be useful, but
       WITHOUT ANY  WARRANTY;  without	even  the  implied  warranty  of  MER-
       CHANTABILITY  or	FITNESS	FOR A PARTICULAR PURPOSE.  See the GNU General
       Public License for more details.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to the Free Software Foundation, Inc.,
       59 Temple Place,	Suite 330, Boston, MA  02111-1307  USA

Utils				   June	2003				mls(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=mls&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help