Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
rwcombine(1)			SiLK Tool Suite			  rwcombine(1)

NAME
       rwcombine - Combine flows denoting a long-lived session into a single
       flow

SYNOPSIS
	 rwcombine [--actions=ACTIONS] [--ignore-fields=FIELDS]
	       [--max-idle-time=NUM]
	       [{--print-statistics | --print-statistics=FILENAME}]
	       [--temp-directory=DIR_PATH] [--buffer-size=SIZE]
	       [--note-add=TEXT] [--note-file-add=FILE]
	       [--compression-method=COMP_METHOD] [--print-filenames]
	       [--output-path=PATH] [--site-config-file=FILENAME]
	       {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

	 rwcombine --help

	 rwcombine --help-fields

	 rwcombine --version

DESCRIPTION
       rwcombine reads SiLK Flow records from one or more input	sources,
       searches	for flow records where the attributes field denotes records
       that were prematurely created or	were continuations of prematurely
       created flows, and attempts to combine those records into a single
       record.	All the	unmodified SiLK	records	and the	combined records are
       written to the file specified by	the --output-path switch or to the
       standard	output when the	--output-path switch is	not provided and the
       standard	output is not connected	to a terminal.

       Some flow exporters, such as yaf(1), provide fields that	describe
       characteristics about the flow record, and these	characteristics	are
       stored in the attributes	field of SiLK Flow records.  The two flags
       that rwcombine considers	are:

       "T" The flow generator prematurely created a record for a long-lived
	   session due to the connection's lifetime reaching the active
	   timeout of the flow generator.  (Also, when yaf is run with the
	   --silk switch, it prematurely creates a flow	and marks it with "T"
	   if the byte count of	the flow cannot	be stored in a 32-bit value.)

       "C" The flow generator created this flow	as a continuation of
	   long-running	connection, where the previous flow for	this
	   connection met a timeout.  (yaf only	sets this flag when it is
	   invoked with	the --silk switch.)

       A very long-running session may be represented by multiple flow
       records,	where the first	record is marked with the "T" flag, the	final
       record is marked	with the "C" flag, and intermediate records are	marked
       with both "C" (this record continues an earlier flow) and "T" (this
       record also met the active time-out).  rwcombine	attempts to combine
       these multiple flow records into	a single record.

       The input to rwcombine does not need to be sorted.  As part of its
       processing, rwcombine may re-order the records before writing them.

       rwcombine reads SiLK Flow records from the files	named on the command
       line or from the	standard input when no file names are specified	and
       --xargs is not present.	To read	the standard input in addition to the
       named files, use	"-" or "stdin" as a file name.	If an input file name
       ends in ".gz", the file is uncompressed as it is	read.  When the
       --xargs switch is provided, rwcombine reads the names of	the files to
       process from the	named text file	or from	the standard input if no file
       name argument is	provided to the	switch.	 The input to --xargs must
       contain one file	name per line.

   Algorithm
       The algorithm rwcombine uses to combine records is

       1.  rwcombine reads SiLK	flow records, examines the attributes field on
	   each	record,	and immediately	writes to the destination stream all
	   records where both the time-out flag	("T") and the continuation
	   flag	("C") are not set.  Records where one or both of those flags
	   are set are stored until all	input records have been	read.

       2.  rwcombine groups the	stored records into bins where the following
	   fields for each record in each bin are identical: sIP, dIP, sPort,
	   dPort, protocol, sensor, in,	out, nhIP, application,	class, and
	   type.

       3.  For each bin, the records are stored	by time	(sTime and elapsed).

       4.  Within a bin, rwcombine combines two	records	into a single record
	   when	the attributes field of	the first record has the "T"
	   (time-out) flag set and the second record has the "C"
	   (continuation) flag set.  When combining records, the bytes field
	   and packets fields are summed, the initialFlags from	the first
	   record is used, the sessionFlags field becomes the bit-wise OR of
	   both	sessionFlags fields and	the second record's initialFlags
	   field, and the eTime	is set to that of the second flow.

       5.  If the second record's "T" flag was set, rwcombine checks to	see if
	   the third record's "C" flag is set.	If it is, the third record
	   becomes part	of the new record.

       6.  The previous	step repeats for the records in	the bin	until the bin
	   contains a single record, the most recently added record did	not
	   have	the "T"	flag set, or the next record in	the bin	does not have
	   the "C" flag	set.

       7.  After examining a bin, rwcombine writes the record(s) the bin
	   contains to the destination stream.

       8.  Steps 3 through 7 are repeated for each bin.

       The --ignore-fields switch allows the user to remove fields from	the
       set that	rwcombine uses when grouping records in	Step 2.

       When combining two records into one (Step 4), rwcombine completely
       disregards the difference between the first record's end-time and the
       second record's start-time (the idle time).  To tell rwcombine not to
       combine those records when the difference is greater than a limit,
       specify that value as the argument to the --max-idle-time switch.

       To see information on the number	of flows combined and the minimum and
       maximum idle times, specify the --print-statistics switch.

       During its processing, rwcombine	will try to allocate a large (near
       2GB) in-memory array to hold the	records.  (You may use the
       --buffer-size switch to change this maximum buffer size.)  If more
       records are read	than will fit into memory, the in-core records are
       temporarily stored on disk as described by the --temp-directory switch.
       When all	records	have been read,	the on-disk files are merged to
       produce the output.

       By default, the temporary files are stored in the /tmp directory.
       Because the sizes of the	temporary files	may be large, it is strongly
       recommended that	/tmp not be used as the	temporary directory, and
       rwcombine will print a warning when /tmp	is used.  To modify the
       temporary directory used	by rwcombine, provide the --temp-directory
       switch, set the SILK_TMPDIR environment variable, or set	the TMPDIR
       environment variable.

OPTIONS
       Option names may	be abbreviated if the abbreviation is unique or	is an
       exact match for an option.  A parameter to an option may	be specified
       as --arg=param or --arg param, though the first form is required	for
       options that take optional parameters.

       --actions=ACTIONS
	   Select the type of action(s)	that rwcombine should take to combine
	   the input records.  The default action is "all", and	the following
	   actions are supported:

	   all Perform all the actions described below.

	   timeout
	       Combine into a single flow record those records where the
	       timeout flags in	the attributes field indicate that the flow
	       exporter	has divided a long-lived session into multiple flow
	       records.

	   This	switch is provided for future expansion	of rwcombine, since at
	   present rwcombine supports a	single action.	When writing a script
	   that	uses rwcombine,	specify	--action=timeout for compatibility
	   with	future versions	of rwcombine.

       --ignore-fields=FIELDS
	   Ignore the fields listed in FIELDS when determining if two flow
	   records should be grouped into the same bin;	that is, treat FIELDS
	   as being identical across all flows.	 By default, rwcombine puts
	   records into	a bin when the records have identical values for the
	   following fields: sIP, dIP, sPort, dPort, protocol, sensor, in,
	   out,	nhIP, application, class, and type.

	   FIELDS is a comma separated list of field-names, field-integers,
	   and ranges of field-integers; a range is specified by separating
	   the start and end of	the range with a hyphen	(-).  Field-names are
	   case-insensitive.  Example:

	    --ignore-fields=sensor,12-15

	   The list of supported fields	are:

	   sIP,1
	       source IP address

	   dIP,2
	       destination IP address

	   sPort,3
	       source port for TCP and UDP, or equivalent

	   dPort,4
	       destination port	for TCP	and UDP, or equivalent

	   protocol,5
	       IP protocol

	   sensor,12
	       name or ID of sensor at the collection point

	   in,13
	       router SNMP input interface or vlanId if	packing	tools were
	       configured to capture it	(see sensor.conf(5))

	   out,14
	       router SNMP output interface or postVlanId

	   nhIP,15
	       router next hop IP

	   class,20,type,21
	       class and type of sensor	at the collection point	(represented
	       internally by a single value)

	   application,29
	       guess as	to the content of the flow.  Some software that
	       generates flow records from packet data,	such as	yaf(1),	will
	       inspect the contents of the packets that	make up	a flow and use
	       traffic signatures to label the content of the flow.  SiLK
	       calls this label	the application; yaf refers to it as the
	       appLabel.  The application is the port number that is
	       traditionally used for that type	of traffic (see	the
	       /etc/services file on most UNIX systems).  For example, traffic
	       that the	flow generator recognizes as FTP will have a value of
	       21, even	if that	traffic	is being routed	through	the standard
	       HTTP/web	port (80).

       --max-idle-time=NUM
	   Do not combine flow records when the	start time of the second flow
	   record begins NUM seconds after the end time	of the first flow
	   record.  NUM	may be fractional.  If not specified, the maximum idle
	   time	may be considered infinite.

       --print-statistics
       --print-statistics=FILENAME
	   Print to the	standard error or to the specified FILENAME the	number
	   of flows records read and written, the number of flows that did not
	   require combining, the number of flows combined, the	number that
	   could not be	combined, and minimum and maximum idle time between
	   combined flow records.

       --temp-directory=DIR_PATH
	   Specify the name of the directory in	which to store data files
	   temporarily when more records have been read	that will fit into
	   RAM.	 This switch overrides the directory specified in the
	   SILK_TMPDIR environment variable, which overrides the directory
	   specified in	the TMPDIR variable, which overrides the default,
	   /tmp.

       --buffer-size=SIZE
	   Set the maximum size	of the buffer to use for holding the records,
	   in bytes.  A	larger buffer means fewer temporary files need to be
	   created, reducing the I/O wait times.  The default maximum for this
	   buffer is near 2GB.	The SIZE may be	given as an ordinary integer,
	   or as a real	number followed	by a suffix "K", "M" or	"G", which
	   represents the numerical value multiplied by	1,024 (kilo),
	   1,048,576 (mega), and 1,073,741,824 (giga), respectively.  For
	   example, 1.5K represents 1,536 bytes, or one	and one-half
	   kilobytes.  (This value does	not represent the absolute maximum
	   amount of RAM that rwcombine	will allocate, since additional
	   buffers will	be allocated for reading the input and writing the
	   output.)

       --output-path=PATH
	   Write the binary SiLK Flow records to PATH, where PATH is a
	   filename, a named pipe, the keyword "stderr"	to write the output to
	   the standard	error, or the keyword "stdout" or "-" to write the
	   output to the standard output.  If PATH names an existing file,
	   rwcombine exits with	an error unless	the SILK_CLOBBER environment
	   variable is set, in which case PATH is overwritten.	If this	switch
	   is not given, the output is written to the standard output.
	   Attempting to write the binary output to a terminal causes
	   rwcombine to	exit with an error.

       --note-add=TEXT
	   Add the specified TEXT to the header	of the output file as an
	   annotation.	This switch may	be repeated to add multiple
	   annotations to a file.  To view the annotations, use	the
	   rwfileinfo(1) tool.

       --note-file-add=FILENAME
	   Open	FILENAME and add the contents of that file to the header of
	   the output file as an annotation.	This switch may	be repeated to
	   add multiple	annotations.  Currently	the application	makes no
	   effort to ensure that FILENAME contains text; be careful that you
	   do not attempt to add a SiLK	data file as an	annotation.

       --compression-method=COMP_METHOD
	   Specify the compression library to use when writing output files.
	   If this switch is not given,	the value in the
	   SILK_COMPRESSION_METHOD environment variable	is used	if the value
	   names an available compression method.  When	no compression method
	   is specified, output	to the standard	output or to named pipes is
	   not compressed, and output to files is compressed using the default
	   chosen when SiLK was	compiled.  The valid values for	COMP_METHOD
	   are determined by which external libraries were found when SiLK was
	   compiled.  To see the available compression methods and the default
	   method, use the --help or --version switch.	SiLK can support the
	   following COMP_METHOD values	when the required libraries are
	   available.

	   none
	       Do not compress the output using	an external library.

	   zlib
	       Use the zlib(3) library for compressing the output, and always
	       compress	the output regardless of the destination.  Using zlib
	       produces	the smallest output files at the cost of speed.

	   lzo1x
	       Use the lzo1x algorithm from the	LZO real time compression
	       library for compression,	and always compress the	output
	       regardless of the destination.  This compression	provides good
	       compression with	less memory and	CPU overhead.

	   snappy
	       Use the snappy library for compression, and always compress the
	       output regardless of the	destination.  This compression
	       provides	good compression with less memory and CPU overhead.
	       Since SiLK 3.13.0.

	   best
	       Use lzo1x if available, otherwise use snappy if available,
	       otherwise use zlib if available.	 Only compress the output when
	       writing to a file.

       --print-filenames
	   Print to the	standard error the names of input files	as they	are
	   opened.

       --site-config-file=FILENAME
	   Read	the SiLK site configuration from the named file	FILENAME.
	   When	this switch is not provided, rwcombine searches	for the	site
	   configuration file in the locations specified in the	"FILES"
	   section.

       --xargs
       --xargs=FILENAME
	   Read	the names of the input files from FILENAME or from the
	   standard input if FILENAME is not provided.	The input is expected
	   to have one filename	per line.  rwcombine opens each	named file in
	   turn	and reads records from it as if	the filenames had been listed
	   on the command line.

       --help
	   Print the available options and exit.

       --help-fields
	   Print the description and alias(es) of each field and exit.

       --version
	   Print the version number and	information about how SiLK was
	   configured, then exit the application.

EXAMPLES
       In the following	examples, the dollar sign ("$")	represents the shell
       prompt.	The text after the dollar sign represents the command line.
       Lines have been wrapped for improved readability, and the back slash
       ("\") is	used to	indicate a wrapped line.

       Use rwfilter(1) to find ssh flow	records	that involve the host
       192.168.126.252.	 The output from rwcut(1) shows	the flow exporter
       split this long-lived ssh session into multiple flow records:

	$ rwfilter --saddr=192.168.126.252 --dport=22 --pass=- data.rw \
	  | rwcut --fields=flags,attributes,stime,etime
	   flags|attribut|		    sTime|		    eTime|
	 S PA	|T	 |2009/02/13T00:29:59.563|2009/02/13T00:59:39.668|
	   PA	|TC	 |2009/02/13T00:59:39.668|2009/02/13T01:29:19.478|
	   PA	|TC	 |2009/02/13T01:29:19.478|2009/02/13T01:58:48.890|
	   PA	|TC	 |2009/02/13T01:58:48.891|2009/02/13T02:28:43.599|
	F  PA	| C	 |2009/02/13T02:28:43.600|2009/02/13T02:32:58.272|

       Here is the other half of that conversation:

	$ rwfilter --daddr=192.168.126.252 --sport=22 --pass=- data.rw \
	  | rwcut --fields=flags,attributes,stime,etime
	   flags|attribut|		    sTime|		    eTime|
	 S PA	|T	 |2009/02/13T00:30:00.060|2009/02/13T00:59:39.667|
	   PA	|TC	 |2009/02/13T00:59:39.670|2009/02/13T01:29:19.478|
	   PA	|TC	 |2009/02/13T01:29:19.481|2009/02/13T01:58:48.890|
	   PA	|TC	 |2009/02/13T01:58:48.893|2009/02/13T02:28:43.599|
	F  PA	| C	 |2009/02/13T02:28:43.600|2009/02/13T02:32:58.271|

       Use rwuniq(1) to	compute	the byte and packet counts for that ssh
       session:

	$ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \
	  | rwuniq --fields=sip,dip,sport,dport	--values=records,byte,packets
		    sIP|	    dIP|sPort|dPort|Records|  Bytes|Packets|
	  10.11.156.107|192.168.126.252|   22|28975|	  5|4677240|   3881|
	192.168.126.252|  10.11.156.107|28975|	 22|	  5| 281939|   3891|

       Invoke rwcombine	on these records and store the result in the file
       combined.rw:

	$ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \
	  | rwcombine --print-statistics --output-path=combined.rw
	FLOW RECORD COUNTS:
	Read:					 10
	Initially Complete:	      -		  0 *
	Sorted & Examined:	      =		 10
	Missing	end:		      -		  0 *
	Missing	start &	end:	      -		  0 *
	Missing	start:		      -		  0 *
	Prior to combining:	      =		 10
	Eliminated:		      -		  8
	Made complete:		      =		  2 *
	Written:				  2 (sum of *)

	IDLE TIMES:
	Minimum:	0:00:00:00.000
	Penultimate:	0:00:00:00.000
	Maximum:	0:00:00:00.003

       View the	resulting records:

	$ rwcut	--fields=sip,dip,sport,dport,bytes,packets,flags combined.rw
		    sIP|	    dIP|sPort|dPort|  bytes|packets|   flags|
	  10.11.156.107|192.168.126.252|   22|28975|4677240|   3881|FS PA   |
	192.168.126.252|  10.11.156.107|28975|	 22| 281939|   3891|FS PA   |

	$ rwcut	--fields=sip,attributes,stime,etime combined.rw
		    sIP|attribut|		   sTime|		   eTime|
	  10.11.156.107|	|2009/02/13T00:30:00.060|2009/02/13T02:32:58.271|
	192.168.126.252|	|2009/02/13T00:29:59.563|2009/02/13T02:32:58.272|

ENVIRONMENT
       SILK_TMPDIR
	   When	set and	--temp-directory is not	specified, rwcombine writes
	   the temporary files it creates to this directory.  SILK_TMPDIR
	   overrides the value of TMPDIR.

       TMPDIR
	   When	set and	SILK_TMPDIR is not set,	rwcombine writes the temporary
	   files it creates to this directory.

       SILK_CLOBBER
	   The SiLK tools normally refuse to overwrite existing	files.
	   Setting SILK_CLOBBER	to a non-empty value removes this restriction.

       SILK_COMPRESSION_METHOD
	   This	environment variable is	used as	the value for
	   --compression-method	when that switch is not	provided.  Since SiLK
	   3.13.0.

       SILK_CONFIG_FILE
	   This	environment variable is	used as	the value for the
	   --site-config-file when that	switch is not provided.

       SILK_DATA_ROOTDIR
	   This	environment variable specifies the root	directory of data
	   repository.	As described in	the "FILES" section, rwcombine may use
	   this	environment variable when searching for	the SiLK site
	   configuration file.

       SILK_PATH
	   This	environment variable gives the root of the install tree.  When
	   searching for configuration files, rwcombine	may use	this
	   environment variable.  See the "FILES" section for details.

       SILK_TEMPFILE_DEBUG
	   When	set to 1, rwcombine prints debugging messages to the standard
	   error as it creates,	re-opens, and removes temporary	files.

FILES
       ${SILK_CONFIG_FILE}
       ${SILK_DATA_ROOTDIR}/silk.conf
       /data/silk.conf
       ${SILK_PATH}/share/silk/silk.conf
       ${SILK_PATH}/share/silk.conf
       /usr/local/share/silk/silk.conf
       /usr/local/share/silk.conf
	   Possible locations for the SiLK site	configuration file which are
	   checked when	the --site-config-file switch is not provided.

       ${SILK_TMPDIR}/
       ${TMPDIR}/
       /tmp/
	   Directory in	which to create	temporary files.

SEE ALSO
       rwfilter(1), rwcut(1), rwuniq(1), rwfileinfo(1),	sensor.conf(5),
       silk(7),	yaf(1),	zlib(3)

NOTES
       The first release of rwcombine occurred in SiLK 3.9.0.

SiLK 3.22.2			  2025-11-01			  rwcombine(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=rwcombine&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help