Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
AWK(1)			    General Commands Manual			AWK(1)

NAME
       awk -- pattern-directed scanning	and processing language

SYNOPSIS
       awk   [-safe]  [-version]  [-d[n]]  [-F	fs  |  --csv]  [-v  var=value]
	   [prog | -f progfile]	file ...

DESCRIPTION
       awk scans each input file for lines that	match any of a set of patterns
       specified literally in prog or in one or	more  files  specified	as  -f
       progfile.   With	 each  pattern	there can be an	associated action that
       will be performed when a	line of	a file matches the pattern.  Each line
       is matched against the pattern portion of every	pattern-action	state-
       ment; the associated action is performed	for each matched pattern.  The
       file name `-' means the standard	input.	Any file of the	form var=value
       is  treated  as	an  assignment,	not a filename,	and is executed	at the
       time it would have been opened if it were a filename.

       The options are as follows:

       -d[n]   Debug mode.  Set	debug level to n, or 1 if n is not  specified.
	       A value greater than 1 causes awk to dump core on fatal errors.

       -F fs   Define  the  input field	separator to be	the regular expression
	       fs.

       --csv   causes awk to process records using  (more  or  less)  standard
	       comma-separated values (CSV) format.

       -f progfile
	       Read  program  code from	the specified file progfile instead of
	       from the	command	line.

       -safe   Disable file output (print >, print >>),	process	creation  (cmd
	       |  getline,  print  |,  system)	and  access to the environment
	       (ENVIRON; see the section on variables below).  This is a first
	       (and not	very reliable) approximation to	a  "safe"  version  of
	       awk.

       -version
	       Print the version number	of awk to standard output and exit.

       -v var=value
	       Assign  value to	variable var before prog is executed; any num-
	       ber of -v options may be	present.

       The input is normally made up of	input  lines  (records)	 separated  by
       newlines,  or  by  the  value of	RS.  If	RS is null, then any number of
       blank lines are used as the record separator, and newlines are used  as
       field  separators (in addition to the value of FS).  This is convenient
       when working with multi-line records.

       An input	line is	normally made up of fields separated by	whitespace, or
       by the extended regular expression FS as	described below.   The	fields
       are  denoted $1,	$2, ..., while $0 refers to the	entire line.  If FS is
       null, the input line is split into one field per	character.  While both
       gawk and	mawk have the same behavior, it	is unspecified in the IEEE Std
       1003.1-2008 ("POSIX.1") standard.  If FS	is a single space, then	 lead-
       ing  and	trailing blank and newline characters are skipped.  Fields are
       delimited by one	or more	blank or newline characters.  A	blank  charac-
       ter  is	a  space  or  a	 tab.  If FS is	a single character, other than
       space, fields are delimited by each single occurrence of	 that  charac-
       ter.  The FS variable defaults to a single space.

       Normally,  any  number  of blanks separate fields.  In order to set the
       field separator to a single blank, use the -F option with  a  value  of
       `[ ]'.	If  a field separator of `t' is	specified, awk treats it as if
       `\t' had	been specified and uses	<TAB> as the field separator.  In  or-
       der to use a literal `t'	as the field separator,	use the	-F option with
       a value of `[t]'.

       A pattern-action	statement has the form:

	     pattern { action }

       A  missing  {  action  }	means print the	line; a	missing	pattern	always
       matches.	 Pattern-action	statements are separated by newlines or	 semi-
       colons.

       Newlines	 are  permitted	 after	a terminating statement	or following a
       comma (`,'), an open brace (`{'), a logical AND (`&&'),	a  logical  OR
       (`||'),	after the `do' or `else' keywords, or after the	closing	paren-
       thesis of an `if', `for', or `while' statement.	Additionally, a	 back-
       slash (`\') can be used to escape a newline between tokens.

       An  action  is a	sequence of statements.	 A statement can be one	of the
       following:

	     if	(expression) statement [else statement]
	     while (expression)	statement
	     for (expression; expression; expression) statement
	     for (var in array)	statement
	     do	statement while	(expression)
	     break
	     continue
	     { [statement ...] }
	     expression	# commonly var = expression
	     print [expression-list] [>expression]
	     printf format [..., expression-list] [>expression]
	     return [expression]
	     next # skip remaining patterns on this input line
	     nextfile #	skip rest of this file,	open next, start at top
	     delete array[expression] #	delete an array	element
	     delete array # delete all elements	of array
	     exit [expression] # exit immediately; status is expression

       Statements are terminated by semicolons,	newlines or right braces.   An
       empty  expression-list  stands for $0.  String constants	are quoted "",
       with the	usual C	escapes	recognized within (see printf(1)  for  a  com-
       plete  list of these).  Expressions take	on string or numeric values as
       appropriate,  and  are  built  using  the  operators  +	-  *  /	 %   ^
       (exponentiation), and concatenation (indicated by whitespace).  The op-
       erators ! ++ -- += -= *=	/= %= ^= > >= <	<= == != ?: are	also available
       in  expressions.	  Variables  may  be  scalars, array elements (denoted
       x[i]) or	fields.	 Variables are initialized to the null string.	 Array
       subscripts  may be any string, not necessarily numeric; this allows for
       a form of associative memory.  Multiple subscripts such as [i,j,k]  are
       permitted; the constituents are concatenated, separated by the value of
       SUBSEP (see the section on variables below).

       The  print statement prints its arguments on the	standard output	(or on
       a file if >file or >>file is present or on a pipe if | cmd is present),
       separated by the	current	output field separator,	and terminated by  the
       output  record  separator.  file	and cmd	may be literal names or	paren-
       thesized	expressions; identical string values in	 different  statements
       denote the same open file.  The printf statement	formats	its expression
       list according to the format (see printf(1)).

       Patterns	 are  arbitrary	Boolean	combinations (with ! ||	&&) of regular
       expressions and relational expressions.	awk supports extended  regular
       expressions  (EREs).   See re_format(7) for more	information on regular
       expressions.  Isolated regular expressions in a pattern	apply  to  the
       entire  line.  Regular expressions may also occur in relational expres-
       sions, using the	operators ~ and	!~.  /re/ is a	constant  regular  ex-
       pression;  any  string  (constant or variable) may be used as a regular
       expression, except in the position of an	isolated regular expression in
       a pattern.

       A pattern may consist of	two patterns separated by  a  comma;  in  this
       case,  the  action is performed for all lines from an occurrence	of the
       first pattern through an	occurrence of the second, inclusive.

       A relational expression is one of the following:

	     expression	matchop	regular-expression
	     expression	relop expression
	     expression	in array-name
	     (expr, expr, ...) in array-name

       where a relop is	any of the  six	 relational  operators	in  C,	and  a
       matchop is either ~ (matches) or	!~ (does not match).  A	conditional is
       an  arithmetic expression, a relational expression, or a	Boolean	combi-
       nation of these.

       The special patterns BEGIN and END may be used to capture  control  be-
       fore the	first input line is read and after the last.  BEGIN and	END do
       not  combine  with other	patterns.  They	may appear multiple times in a
       program and execute in the order	they are read by awk

       Variable	names with special meanings:

       ARGC	  Argument count, assignable.
       ARGV	  Argument array, assignable; non-null members	are  taken  as
		  filenames.
       CONVFMT	  Conversion format when converting numbers (default "%.6g").
       ENVIRON	  Array	of environment variables; subscripts are names.
       FILENAME	  The name of the current input	file.
       FNR	  Ordinal number of the	current	record in the current file.
       FS	  Regular expression used to separate fields; also settable by
		  option -F fs.
       NF	  Number  of fields in the current record.  $NF	can be used to
		  obtain the value of the last field in	the current record.
       NR	  Ordinal number of the	current	record.
       OFMT	  Output format	for numbers (default "%.6g").
       OFS	  Output field separator (default blank).
       ORS	  Output record	separator (default newline).
       RLENGTH	  The length of	the string matched by the match() function.
       RS	  Input	record separator (default newline).  If	 empty,	 blank
		  lines	separate records.  If more than	one character long, RS
		  is  treated  as  a regular expression, and records are sepa-
		  rated	by text	matching the expression.
       RSTART	  The starting position	of the string matched by  the  match()
		  function.
       SUBSEP	  Separates multiple subscripts	(default 034).

FUNCTIONS
       The  awk	 language  has	a  variety  of built-in	functions: arithmetic,
       string, input/output, general, and bit-operation.

       Functions may be	defined	(at the	position of  a	pattern-action	state-
       ment) thusly:

	     function foo(a, b,	c) { ...; return x }

       Parameters  are	passed	by  value if scalar, and by reference if array
       name; functions may be called recursively.  Parameters are local	to the
       function; all other variables are global.  Thus local variables may  be
       created by providing excess parameters in the function definition.

   Arithmetic Functions
       atan2(y,	x)  Return the arctangent of y/x in radians.

       cos(x)	    Return the cosine of x, where x is in radians.

       exp(x)	    Return the exponential of x.

       int(x)	    Return x truncated to an integer value.

       log(x)	    Return the natural logarithm of x.

       rand()	    Return a random number, n, such that 0<=n<1.

       sin(x)	    Return the sine of x, where	x is in	radians.

       sqrt(x)	    Return the square root of x.

       srand(expr)  Sets  seed	for  rand()  to	 expr and returns the previous
		    seed.  If expr is omitted, the time	of  day	 is  used  in-
		    stead.

   String Functions
       gsub(r, t, s)	The  same  as sub() except that	all occurrences	of the
			regular	expression are replaced.  gsub()  returns  the
			number of replacements.

       index(s,	t)	The  position  in s where the string t occurs, or 0 if
			it does	not.

       length(s)	The length of s	taken as a string, number of  elements
			in  an array for an array argument, or length of $0 if
			no argument is given.

       match(s,	r)	The position in	s where	the regular expression	r  oc-
			curs, or 0 if it does not.  The	variable RSTART	is set
			to  the	starting position of the matched string	(which
			is the same as the returned value) or zero if no match
			is found.  The variable	RLENGTH	is set to  the	length
			of the matched string, or -1 if	no match is found.

       split(s,	a, fs)	Splits	the  string  s into array elements a[1], a[2],
			..., a[n] and returns n.  The separation is done  with
			the  regular expression	fs or with the field separator
			FS if fs is not	given.	An empty string	as field sepa-
			rator splits the string	into  one  array  element  per
			character.

       sprintf(fmt, expr, ...)
			The string resulting from formatting expr, ... accord-
			ing to the printf(1) format fmt.

       sub(r, t, s)	Substitutes  t for the first occurrence	of the regular
			expression r in	the string s.  If s is not  given,  $0
			is  used.   An	ampersand  (`&')  in  t	is replaced in
			string s with regular expression r.  A literal	amper-
			sand  can  be specified	by preceding it	with two back-
			slashes	(`\\').	 A literal backslash can be  specified
			by  preceding it with another backslash	(`\\').	 sub()
			returns	the number of replacements.

       substr(s, m, n)	Return at most the n-character substring of s that be-
			gins at	position m counted from	1.  If n  is  omitted,
			or if n	specifies more characters than are left	in the
			string,	 the length of the substring is	limited	by the
			length of s.

       tolower(str)	Returns	a copy of str with all	upper-case  characters
			translated  to	their corresponding lower-case equiva-
			lents.

       toupper(str)	Returns	a copy of str with all	lower-case  characters
			translated  to	their corresponding upper-case equiva-
			lents.

   Input/Output	and General Functions
       close(expr)	     Closes the	file or	pipe expr.  expr should	 match
			     the  string  that	was  used  to open the file or
			     pipe.

       cmd | getline [var]   Read a record of input from a stream  piped  from
			     the  output of cmd.  If var is omitted, the vari-
			     ables $0 and NF are set.  Otherwise var  is  set.
			     If	the stream is not open,	it is opened.  As long
			     as	the stream remains open, subsequent calls will
			     read  subsequent  records	from  the stream.  The
			     stream remains open until explicitly closed  with
			     a	call to	close().  getline returns 1 for	a suc-
			     cessful input, 0 for end of file, and -1  for  an
			     error.

       fflush([expr])	     Flushes  any buffered output for the file or pipe
			     expr, or all open files or	pipes if expr is omit-
			     ted.  expr	should match the string	that was  used
			     to	open the file or pipe.

       getline		     Sets $0 to	the next input record from the current
			     input  file.  This	form of	getline	sets the vari-
			     ables NF, NR, and FNR.  getline returns 1	for  a
			     successful	 input,	 0 for end of file, and	-1 for
			     an	error.

       getline var	     Sets $0 to	variable var.  This  form  of  getline
			     sets the variables	NR and FNR.  getline returns 1
			     for a successful input, 0 for end of file,	and -1
			     for an error.

       getline [var]  <file  Sets  $0 to the next record from file.  If	var is
			     omitted, the variables $0 and NF are set.	Other-
			     wise var is set.  If file	is  not	 open,	it  is
			     opened.  As long as the stream remains open, sub-
			     sequent  calls  will read subsequent records from
			     file.  file remains open until explicitly	closed
			     with a call to close().

       systime()	     returns  the  current date	and time as a standard
			     "seconds since the	epoch" value.

       strftime(fmt, timestamp)
			     formats timestamp (a value	in seconds  since  the
			     epoch)  according	to  Fa fmt , which is a	format
			     string  as	 supported   by	  strftime(3).	  Both
			     timestamp	 and   fmt   may  be  omitted;	if  no
			     timestamp,	the current time of day	is  used,  and
			     if	no fmt,	a default format of "%a	%b %e %H:%M:%S
			     %Z	%Y" is used.

       system(cmd)	     Executes  cmd  and	returns	its exit status.  This
			     will be -1	upon error, cmd	's exit	status upon  a
			     normal  exit,  256	 +  sig	 upon death-by-signal,
			     where sig is the number of	the murdering  signal,
			     or	512 + sig if there was a core dump.

   Bit-Operation Functions
       compl(x)	     Returns the bitwise complement of integer argument	x.

       and(v1, v2, ...)
		     Performs  a bitwise AND on	all arguments provided,	as in-
		     tegers.  There must be at least two values.

       or(v1, v2, ...)
		     Performs a	bitwise	OR on all arguments provided, as inte-
		     gers.  There must be at least two values.

       xor(v1, v2, ...)
		     Performs a	bitwise	Exclusive-OR  on  all  arguments  pro-
		     vided, as integers.  There	must be	at least two values.

       lshift(x, n)  Returns integer argument x	shifted	by n bits to the left.

       rshift(x, n)  Returns  integer  argument	 x  shifted  by	 n bits	to the
		     right.

EXIT STATUS
       The awk utility exits 0 on success, and >0 if an	error occurs.

       But note	that the exit expression can modify the	exit status.

ENVIRONMENT VARIABLES
       If POSIXLY_CORRECT is set in the	 environment,  then  awk  follows  the
       POSIX  rules  for  sub()	 and  gsub() with respect to consecutive back-
       slashes and ampersands.

EXAMPLES
       Print lines longer than 72 characters:

	     length($0)	> 72

       Print first two fields in opposite order:

	     { print $2, $1 }

       Same, with input	fields separated by comma and/or spaces	and tabs:

	     BEGIN { FS	= ",[ \t]*|[ \t]+" }
		   { print $2, $1 }

       Add up first column, print sum and average:

	     { s += $1 }
	     END { print "sum is", s, "	average	is", s/NR }

       Print all lines between start/stop pairs:

	     /start/, /stop/

       Simulate	echo(1):

	     BEGIN { # Simulate	echo(1)
		     for (i = 1; i < ARGC; i++)	printf "%s ", ARGV[i]
		     printf "\n"
		     exit }

       Print an	error message to standard error:

	     { print "error!" >	"/dev/stderr" }

SEE ALSO
       cut(1), lex(1), printf(1), sed(1), re_format(7)

       A. V. Aho, B. W.	Kernighan, and P. J. Weinberger, The  AWK  Programming
       Language, Addison-Wesley, 1988, ISBN 0-201-07981-X.

STANDARDS
       The  awk	utility	is compliant with the IEEE Std 1003.1-2008 ("POSIX.1")
       specification, except awk does not support {n,m}	pattern	matching.

       The flags -d, -safe, and	-version  as  well  as	the  commands  fflush,
       compl,  and, or,	xor, lshift, rshift, are extensions to that specifica-
       tion.

HISTORY
       An awk utility appeared in Version 7 AT&T UNIX.

BUGS
       There are no explicit conversions  between  numbers  and	 strings.   To
       force  an expression to be treated as a number add 0 to it; to force it
       to be treated as	a string concatenate ""	to it.

       The scope rules for variables in	functions are a	botch; the  syntax  is
       worse.

       Input  is expected to be	UTF-8 encoded.	Other multibyte	character sets
       are not handled.	 However, in eight-bit locales,	awk treats each	 input
       byte as a separate character.

UNUSUAL	FLOATING-POINT VALUES
       awk  was	designed before	IEEE 754 arithmetic defined Not-A-Number (NaN)
       and Infinity values, which are supported	by all	modern	floating-point
       hardware.

       Because awk uses	strtod(3) and atof(3) to convert string	values to dou-
       ble-precision  floating-point  values,  modern C	libraries also convert
       strings starting	with inf and nan into infinity and NaN values  respec-
       tively.	This led to strange results, with something like this:

	     echo nancy	| awk '{ print $1 + 0 }'

       printing	"nan" instead of zero.

       awk now follows GNU AWK,	and prefilters string values before attempting
       to convert them to numbers, as follows:

       Hexadecimal values  Hexadecimal	values	(allowed since C99) convert to
			   zero, as they did prior to C99.

       NaN values	   The two strings "+nan" and  "-nan"  (case  indepen-
			   dent)  convert  to  NaN.   No others	do.  (NaNs can
			   have	signs.)

       Infinity	values	   The two strings "+inf" and  "-inf"  (case  indepen-
			   dent)  convert  to  positive	and negative infinity,
			   respectively.  No others do.

DEPRECATED BEHAVIOR
       One True	Awk has	accepted -F t to mean the same as -F <TAB> to make  it
       easier  to  specify tabs	as the separator character.  Upstream One True
       Awk has deprecated this wart in the name	of better  compatibility  with
       other awk implementations like gawk and mawk.

       Historically,  awk did not accept "0x" as a hex string.	However, since
       One True	Awk used strtod	to convert strings to floats, and since	"0x12"
       is a valid hexadecimal representation of	a floating  point  number,  On
       FreeBSD,	 awk has accepted this notation	as an extension	since One True
       Awk was imported	in FreeBSD 5.0.	 Upstream One True  Awk	 has  restored
       the  historical behavior	for better compatibility between the different
       awk implementations.  Both gawk	and  mawk  already  behave  similarly.
       Starting	with FreeBSD 14.0 awk will no longer accept this extension.

       The FreeBSD awk sets the	locale for many	years to match the environment
       it was running in.  This	lead to	pattern	ranges,	like "[A-Z]" sometimes
       matching	 lower	case characters	in some	locales.  This misbehavior was
       never in	upstream One True Awk  and  has	 been  removed	as  a  bug  in
       FreeBSD 12.3, FreeBSD 13.1, and FreeBSD 14.0.

FreeBSD	ports 15.0	       September 3, 2025			AWK(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=awk&manpath=FreeBSD+15.0-RELEASE+and+Ports>

home | help