Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
clfmerge(1)			   logtools			   clfmerge(1)

NAME
       clfmerge	- merge	Common-Log Format web logs based on time-stamps

SYNOPSIS
       clfmerge	[--help	| -h] [-b size]	[-d] [-v] [file	names]

DESCRIPTION
       The  clfmerge program is	designed to avoid using	sort to	merge multiple
       web log files.  Web logs	for big	sites consist of multiple files	in the
       >100M size range	from a number of machines.  For	such files it  is  not
       practical  to  use a program such as gnusort to merge the files because
       the data	is not always entirely in order	(so the	merge option  of  gnu-
       sort  doesn't  work so well), but it is not in random order (so doing a
       complete	sort would be a	waste).	 Also the date	field  that  is	 being
       sorted  on is not particularly easy to specify for gnusort (I have seen
       it done but it was messy).

       This program is designed	to simply and quickly sort multiple large  log
       files  with no need for temporary storage space or overly large buffers
       in memory (the memory footprint is generally only a few megs).

OVERVIEW
       It will take a number (from 0 to	n) of file-names on the	command	 line,
       it  will	 open  them  for reading and read CLF format web log data from
       them all.  Lines	which don't appear to be in CLF	format (NB they	aren't
       parsed fully, only minimal parsing to determine the date	is  performed)
       will be rejected	and displayed on standard-error.

       If  zero	 files are specified then there	will be	no error, it will just
       silently	output nothing,	this is	for scripts which use the find command
       to find log files and which can't be counted on to find any log	files,
       it saves	doing an extra check in	your shell scripts.

       If  one	file  is specified then	the data will be read into a 1000 line
       buffer and it will be removed from the buffer (and displayed  on	 stan-
       dard  output) in	date order.  This is to	handle the case	of web servers
       which date entries on the connection time but write them	to the log  at
       completion  time	 and  thus  generate  log  files  that aren't in order
       (Netscape web server does this -	 I  haven't  checked  what  other  web
       servers do).

       If  more	 than one file is specified then a line	will be	read from each
       file, the file that had the earliest time stamp will be read from until
       it returns a time stamp later than one of the other  files.   Then  the
       file with the earlier time stamp	will be	read.  With multiple files the
       buffer  size  is	 1000 lines or 100 * the number	of files (whichever is
       larger).	 When the buffer becomes full the first	line will  be  removed
       and displayed on	standard output.

OPTIONS
       -b buffer-size
	      Specify  the buffer-size to use, if 0 is specified then it means
	      to disable the sliding-window sorting of the data	which improves
	      the speed.

       -d     Set domain-name mangling to on.	This  means  that  if  a  line
	      starts with as the name of the site that was requested then that
	      would  be	removed	from the start of the line and the GET / would
	      be changed to GET	http://www.company.com/	which allows  programs
	      like  Webalizer  to produce good graphs for large	hosting	sites.
	      Also it will make	the domain name	in lower case.

       -v     Be more verbose.

EXIT STATUS
       0 No errors

       1 Bad parameters

       2 Can't open one	of the specified files

       3 Can't write to	output

AUTHOR
       This program, its manual	page, and the Debian package were  written  by
       Russell Coker <russell@coker.com.au>.

SEE ALSO
       clfsplit(1),clfdomainsplit(1)

Russell	Coker <russell@coker.com.au> 0.06			   clfmerge(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=clfmerge&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help