FreeBSD Manual Pages

home | help

RABINS(1) General Commands Manual RABINS(1)

NAME
rabins - process argus(8) data within specified bins.

SYNOPSIS
rabins [-B secs] -M splitmode [options]] [raoptions] [-- filter-expres-
sion]

DESCRIPTION
Rabins reads argus data from an argus-data source, and adjusts the data
so that it is aligned to a set of bins, or slots, that are based on ei-
ther time, input size, or count. The resulting output is split, modi-
fied, and optionally aggregated so that the data fits to the con-
straints of the specified bins. rabins is designed to be a combination
of rasplit and racluster, acting on multiple contexts of argus data.

The principal function of rabins is to align input data to a series of
bins, and then process the data within the context of each bin. This
is the basis for real-time stream block processing. Time series stream
block processing is cricital for flow data graphing, comparing, analyz-
ing, and correlation. Fixed load stream block processing, based on the
number of argus data records ('count'), or a fixed volume of data
('size') allows for control of resources in processing. While load
based options are very useful, they are rather esoteric. See the on-
line examples and rasplit.1 for examples of using these modes of opera-
tion.

Time Series Bins
Time series bin'ing is specified using the -M time option. Time bins
are specified by the size and granularity of the time bin. The granu-
larity, 's'econds, 'm'inutes, 'h'ours, 'd'ays, 'w'eeks, 'M'onths, and
'y'ears, dictates where the bin boundaries lie. To ensure that 0.5d
and 12h start on the same point in time, second, minute, hour, and day
based bins start at midnight, Jan 1st of the year of processing. Week,
month and year bins all start on natural time boundaries, for the pe-
riod.

rabins provides a separate processing context for each bin, so that ag-
gregation and sorting occur only within the context of each time pe-
riod. Records are placed into bins based on load or time. For load
based bins, input records are processed in received order and are not
modified. When using time based bins, records are placed into bins
based on the starting time of the record. By default, records that
span a time boundary are split into as many records as needed to fit
the record into appropriate bin sizes, using the algorithms used by
rasplit.1. Metrics are distributed uniformly within all the appropri-
ate bins. The result is a series of data and/or fragments that are time
aligned, appropriate for time seried analysis, and visualization.

When a record is split to conform to a time series bin, the resulting
starting and ending timestamps may or may not coincide with the time-
stamps of the bins themselves. For some applications, this treatment is
critical to the analytics that are working on the resulting data, such
as transaction duration, and flow traffic burst behavior. However, for
other analytics, like average load, and rate analysis and reporting,
the timestamps need to be modified so that they reflect the time range
of the actual time bin boundaries. Rabins supports the optional hard
option to specify that timestamps should conform to bin boundaries.
One of the results of this is that all durations in the reported
records will be the bin duration. This is extremely important when
processing certain time series metrics, like load.

Load Based Bins
Load based bin'ing is specified using the -M size or -M count options.
Load bins are used to constrain the resource used in bin processing.
So much load is input, aggregation is performed on the input load, and
when a threshold is reached, the entire aggregation cache is dumped,
reinitiallized, and reused. These can be used effectively to provide
realtime data reduction, but within a fixed amount of memory.

Output Processing
rabins has two basic modes of output, the default holds all output in
main memory until EOF is encountered on input, where each sorted bin is
written out. The second output mode, has rabins writing out the con-
tents of individual sorted bins, periodically based on a holding time,
specified using the -B secs option. The secs value should be chosen
such that rabins will have seen all the appropriate incoming data for
that time period. This is determined by the ARGUS_FLOW_STATUS_INTERVAL
used by the collection of argus data sources in the input data stream,
as well as any time drift that may exist amoung argus data processin
elements. When there is good time sync, and with an ARGUS_FLOW_STA-
TUS_INTERVAL of 5 seconds, appropriate secs values are between 5-15
seconds.

The output of rabins when using the -B secs option, is appropriate to
drive a number of processing elements, such as near real-time visual-
izations and alarm and reporting.

Output Stream
Like all ra.1 client programs, the output of rabins.1 is an argus data
stream, that can be written as binary data to a file or standard out-
put, or can be printed. rabins supports all the output functions pro-
vided by rasplit.1.

The output files name consists of a prefix, which is specified using
the -w ra option, and for all modes except time mode, a suffix, which
is created for each resulting file. If no prefix is provided, then ra-
bins will use 'x' as the default prefix. The suffix that is used is
determined by the mode of operation. When rabins is using the default
count mode or the size mode, the suffix is a group of letters 'aa',
'ab', and so on, such that concatenating the output files in sorted or-
der by file name produces the original input file. If rabins will need
to create more output files than are allowed by the default suffix
strategy, more letters will be added, in order to accomodate the needed
files.

When rabins is spliting based on time, rabins uses a default extension
of %Y.%m.%d.%h.%m.%s. This default can be overrided by adding a '%'
extension to the name provided using the -w option.

When standard out is specified, using -w -, rabins will output a single
argus-stream with START and STOP argus management records inserted ap-
propriately to indicate where the output is split. See argus(8) for
more information on output stream formats.

When rabins is spliting on output record count (the default), the num-
ber of records is specified as an ordinal counter, the default is 1000
records. When rabins is spliting based on the maximum output file
size, the size is specified as bytes. The scale of the bytes can be
specified by appending 'b', 'k' and 'm' to the number provided.

When rabins is spliting base on time, the time period is specified with
the option, and can be any period based in seconds (s), minutes (m),
hours (h), days (d), weeks (w), months (M) or years (y). Rabins will
create and modify records as required to split on prescribed time
boundaries. If any record spans a time boundary, the record is split
and the metrics are adjusted using a uniform distribution model to dis-
tribute the statistics between the two records.

See rasplit.1 for specifics.

RABINS SPECIFIC OPTIONS
rabins, like all ra based clients, supports a number of ra options in-
cluding remote data access, reading from multiple files and filtering
of input argus records through a terminating filter expression. Rabins
also provides all the functions of racluster.1 and rasplit.1, for pro-
cessing and outputing data. rabins specific options are:

-B secs
Holding time in seconds before closing a bin and outputing its
contents.

-M splitmode
Supported spliting modes are:

time <n[smhdwMy]>
bin records into time slots of n size. This is used for
time series analytics, especially graphing. Records, by
default are split, so that their timestamps do not span the
time range specified. Metrics are uniformly distributed
among the resulting records.

count <n[kmb]>
bin records into chunks based on the number of records.
This is used for archive management and parallel processing
analytics, to limit the size of data processing to fixed
numbers of records.

size <n[kmb]>
bin records into chunks based on the number of total bytes.
This is used for archive management and parallel processing
analytics, to limit the size of data processing to fixed
byte limitations.

-M modes
Supported processing modes are:
hard split on hard time boundaries. Each flow records start and
stop times will be the time boundary times. The default is
to use the original start and stop timestamps from the
records that make up the resulting aggregation.
nomodify
Do not split the record when including it into a time bin.
This allows a time bin to represent times outside of its
defintion. This option should not be used with the 'hard'
option, as you will modify metrics and semantics.
-m aggregation object
Supported aggregation objects are:
none use a null flow key.
srcid argus source identifier.
smac source mac(ether) addr.
dmac destination mac(ether) addr.
soui oui portion of the source mac(ether) addr.
doui oui portion of the destination mac(ether) addr.
smpls source mpls label.
dmpls destination label addr.
svlan source vlan label.
dvlan destination vlan addr.
saddr/[l|m] source IP addr/[cidr len | m.a.s.k].
daddr/[l|m] destination IP addr/[cidr len | m.a.s.k].
matrix/l sorted src and dst IP addr/cidr len.
proto transaction protocol.
sport source port number. Implies use of 'proto'.
dport destination port number. Implies use of 'proto'.
stos source TOS byte value.
dtos destination TOS byte value.
sttl src -> dst TTL value.
dttl dst -> src TTL value.
stcpb src -> dst TCP base sequence number.
dtcpb dst -> src TCP base sequence number.
inode[/l|m]] intermediate node IP addr/[cidr len | m.a.s.k],
source of ICMP mapped events.
sco source ARIN country code, if present.
dco destination ARIN country code, if present.
sas source node origin AS number, if available.
das destination node origin AS number, if available.
ias intermediate node origin AS number, if available.

-P sort field
Rabins can sort its output based on a sort field specification.
Because the -m option is used for aggregation fields, -P is used
to specify the print priority order. See rasort(1) for the list
of sortable fields.

-w filename
Rabins supports an extended -w option that allows for output
record contents to be inserted into the output filename. Speci-
fied using '$' (dollar) notation, any printable field can be used.
Care should be taken to honor any shell escape requirements when
specifying on the command line. See ra(1) for the list of print-
able fields.

Another extended feature, when using time mode, rabins will
process the supplied filename using strftime(3), so that time
fields can be inserted into the resulting output filename.

INVOCATION
This invocation aggregates inputfile based on 10 minute time bound-
aries. Input is split to fit within a 10 minute time boundary, and
within those boundaries, argus records are aggregated. The resulting
output its streamed to a single file.

rabins -r * -M time 10m -w outputfile

This next invocation aggregates inputfiles based on 5 minute time
boundaries, and the output is written to 5 minute files. Input is
split such that all records conform to hard 10 minute time boundaries,
and within those boundaries, argus records are aggregated, in this
case, based on IP address matrix.
The resulting output its streamed to files that are named relative to
the records output content, a prefix of /matrix/%Y/%m/%d/argus. and the
suffixes %H.%M.%S.

rabins -r * -M hard time 5m -m matrix -w "/matrix/%Y/%m/%d/argus.%H.%M.%S"

This next invocation aggregates input.stream based on matrix/24 into 10
second time boundaries, holds the data for an additional 5 seconds af-
ter the time boundary has passed, and then prints the complete sorted
contents of each bin to standard output. The output is printed at 10
second intervals, and the output is the content of the previous 10 sec
time bin. This example is meant to provide, every 10 seconds, the sum-
mary of all Class C subnet activity seen. It is intended to run inde-
finately printing out aggregated summary records. By modifying the ag-
gregation model, using the "-f racluster.conf" option, you can achieve
a great deal of data reduction with a lot of semantic reporting.

% rabins -S localhost -m matrix/24 -B 5s -M hard time 10s -p0 -s +1trans - ipv4
StartTime Trans Proto SrcAddr Dir DstAddr SrcPkts DstPkts SrcBytes DstBytes State
2012/02/15.13:37:00 5 ip 192.168.0.0/24 <-> 192.168.0.0/24 41 40 2860 12122 CON
2012/02/15.13:37:00 2 ip 192.168.0.0/24 -> 224.0.0.0/24 2 0 319 0 INT
[ 10 seconds pass]
2012/02/15.13:37:10 13 ip 192.168.0.0/24 <-> 208.59.201.0/24 269 351 97886 398700 CON
2012/02/15.13:37:10 14 ip 192.168.0.0/24 <-> 192.168.0.0/24 86 92 7814 46800 CON
2012/02/15.13:37:10 1 ip 17.172.224.0/24 <-> 192.168.0.0/24 52 37 68125 4372 CON
2012/02/15.13:37:10 1 ip 192.168.0.0/24 <-> 199.7.55.0/24 7 7 784 2566 CON
2012/02/15.13:37:10 1 ip 184.85.13.0/24 <-> 192.168.0.0/24 6 5 3952 2204 CON
2012/02/15.13:37:10 2 ip 66.235.132.0/24 <-> 192.168.0.0/24 5 6 915 3732 CON
2012/02/15.13:37:10 1 ip 74.125.226.0/24 <-> 192.168.0.0/24 3 4 709 888 CON
2012/02/15.13:37:10 3 ip 66.39.3.0/24 <-> 192.168.0.0/24 3 3 369 198 CON
2012/02/15.13:37:10 1 ip 192.168.0.0/24 <-> 205.188.1.0/24 1 1 54 356 CON
[ 10 seconds pass]
2012/02/15.13:37:20 6 ip 192.168.0.0/24 <-> 208.59.201.0/24 392 461 60531 623894 CON
2012/02/15.13:37:20 8 ip 192.168.0.0/24 <-> 192.168.0.0/24 95 111 6948 93536 CON
2012/02/15.13:37:20 3 ip 72.14.204.0/24 <-> 192.168.0.0/24 38 32 38568 4414 CON
2012/02/15.13:37:20 1 ip 17.112.156.0/24 <-> 192.168.0.0/24 26 13 21798 7116 CON
2012/02/15.13:37:20 2 ip 66.235.132.0/24 <-> 192.168.0.0/24 6 3 1232 4450 CON
2012/02/15.13:37:20 1 ip 66.235.133.0/24 <-> 192.168.0.0/24 1 2 82 132 CON
[ 10 seconds pass]
2012/02/15.13:37:30 117 ip 192.168.0.0/24 <-> 208.59.201.0/24 697 663 369769 134382 CON
2012/02/15.13:37:30 11 ip 192.168.0.0/24 <-> 192.168.0.0/24 147 187 11210 193253 CON
2012/02/15.13:37:30 1 ip 184.85.13.0/24 <-> 192.168.0.0/24 13 9 13408 9031 CON
2012/02/15.13:37:30 2 ip 66.235.132.0/24 <-> 192.168.0.0/24 8 7 1920 11563 CON
2012/02/15.13:37:30 1 ip 192.168.0.0/24 <-> 207.46.193.0/24 5 3 802 562 CON
2012/02/15.13:37:30 1 ip 17.112.156.0/24 <-> 192.168.0.0/24 5 2 646 3684 CON
2012/02/15.13:37:30 2 ip 192.168.0.0/24 -> 224.0.0.0/24 2 0 382 0 REQ
[ 10 seconds pass]

This next invocation reads IP argus(8) data from inputfile and
processes, the argus(8) data stream based on input byte size of no
greater than 1 Megabyte. The resulting output stream is written to a
single argus.out data file.

rabins -r argusfile -M size 1m -s +1dur -m proto -w argus.out - ip

This invocation reads IP argus(8) data from inputfile and aggregates
the argus(8) data stream based on input file size of no greater than 1K
flows. The resulting output stream is printed to the screen as stan-
dard argus records.

rabins -r argusfile -M count 1k -m proto -s stime dur proto spkts dpkts - ip

SEE ALSO
ra(1), racluster(1), rasplit(1), rarc(5), argus(8),

AUTHORS
Carter Bullard (carter@qosient.com).

rabins 3.0.8 12 August 2003 RABINS(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=rabins&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help

Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages