Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
RE2V(1)								       RE2V(1)

NAME
       re2v - generate fast lexical analyzers for V

SYNOPSIS
       re2v [ OPTIONS ]	[ WARNINGS ] INPUT

       Input can be either a file or - for stdin.

INTRODUCTION
       re2v works as a preprocessor. It	reads the input	file (which is usually
       a  program  in V, but can be anything) and looks	for blocks of code en-
       closed in special-form start/end	markers. The  text  outside  of	 these
       blocks  is  copied  verbatim  into the output file. The contents	of the
       blocks are processed by re2v. It	translates them	to code	in V and  out-
       puts the	generated code in place	of the block.

       Here  is	 an  example  of a small program that checks if	a given	string
       contains	a decimal number:

	  // re2v $INPUT -o $OUTPUT -i

	  fn lex(yyinput string) {
	      mut yycursor := 0
	      /*!re2c
		  re2c:yyfill:enable = 0;

		  [1-9][0-9]* {	return }
		  *	      {	panic("error!")	}
	      */
	  }

	  fn main() {
	      lex("1234\x00")
	  }

       In the output re2v replaced the block in	the middle with	the  generated
       code:

	  // Code generated by re2v, DO	NOT EDIT.
	  // re2v $INPUT -o $OUTPUT -i

	  fn lex(yyinput string) {
	      mut yycursor := 0

	      mut yych := 0
	      yych = yyinput[yycursor]
	      match yych {
		  0x31...0x39 {	unsafe { goto yy2 } }
		  else { unsafe	{ goto yy1 } }
	      }
	  yy1:
	      yycursor += 1
	      panic("error!")
	  yy2:
	      yycursor += 1
	      yych = yyinput[yycursor]
	      match yych {
		  0x30...0x39 {	unsafe { goto yy2 } }
		  else { unsafe	{ goto yy3 } }
	      }
	  yy3:
	      return

	  }

	  fn main() {
	      lex("1234\x00")
	  }

BASICS
       A re2v program consists of a sequence of	blocks intermixed with code in
       the  target  language. A	block may contain definitions, configurations,
       rules, actions and directives in	any order:

       name = regular-expression ;
	      A	definition binds name to regular-expression. Names may contain
	      alphanumeric characters and underscore. The regular  expressions
	      section  gives  an  overview  of re2v syntax for regular expres-
	      sions. Once defined, the name can	be used	in other  regular  ex-
	      pressions	 and  in rules.	 Recursion in named definitions	is not
	      allowed, and each	name should be defined before it  is  used.  A
	      block inherits named definitions from the	global scope. Redefin-
	      ing a name that exists in	the current scope is an	error.

       configuration = value ;
	      A	configuration allows one to change re2v	behavior and customize
	      the  generated code. For a full list of configurations supported
	      by re2v see the configurations section. Depending	on a  particu-
	      lar configuration, the value can be a keyword, a nonnegative in-
	      teger  number  or	 a one-line string which should	be enclosed in
	      double or	single quotes unless it	consists of alphanumeric char-
	      acters. A	block inherits configurations from  the	 global	 scope
	      and  may	redefine  them or add new ones.	Configurations defined
	      inside of	a block	affect the whole block,	even if	they appear at
	      the end of it.

       regular-expression code
	      A	rule binds regular-expression to its semantic action (a	 block
	      of  code in curly	braces,	or a block of code that	starts with :=
	      and ends on a newline followed by	any non-whitespace character).
	      If the regular-expression	matches, the associated	code  is  exe-
	      cuted.   If multiple rules match,	the longest match takes	prece-
	      dence. If	multiple rules match the same string, the earliest one
	      takes precedence.	There are two special rules: the default  rule
	      *	 and  the  end of input	rule $.	 Default rule should always be
	      defined, it has the lowest priority regardless of	its  place  in
	      the block, and it	matches	any code unit (not necessarily a valid
	      character,  see  the encoding support section). The end of input
	      rule should be defined if	the corresponding method for  handling
	      the end of input is used.	 With start conditions rules have more
	      complex syntax.

       !action code
	      An  action  binds	 a  user-defined block of code to a particular
	      place in the generated finite state machine (in the same way  as
	      semantic actions bind code to the	final states). See the actions
	      section for a full list of predefined actions.

       !directive ;
	      A	 directive  is	one of the special predefined statements. Each
	      directive	has a unique purpose. See the directives  section  for
	      details.

   Blocks
       Block  start  and  end  markers are either /*!re2c and */, or %{	and %}
       (both styles are	supported). Starting from version 2.2 blocks may  have
       optional	names that allow them to be referenced in other	blocks.	 There
       are different kinds of blocks:

       /*!re2c[:<name>]	... */ or %{[:<name>] ... %}
	      A	 global	 block contains	definitions, configurations, rules and
	      directives.  re2v	compiles regular expressions  associated  with
	      each  rule  into a deterministic finite automaton, encodes it in
	      the form of conditional jumps in the  target  language  and  re-
	      places  the  block with the generated code. Names	and configura-
	      tions defined in a global	block are added	to  the	 global	 scope
	      and  become  visible  to	subsequent blocks. At the start	of the
	      program  the  global  scope  is  initialized  with  command-line
	      options.

       /*!local:re2c[:<name>] ... */ or	%{local[:<name>] ... %}
	      A	local block is like a global block, but	the names and configu-
	      rations  in  it  have  local  scope  (they  do  not affect other
	      blocks).

       /*!rules:re2c[:<name>] ... */ or	%{rules[:<name>] ... %}
	      A	rules block is like a local block, but it  does	 not  generate
	      any  code	 by  itself,  nor  does	 it add	any definitions	to the
	      global scope -- it is meant to be	reused in other	 blocks.  This
	      is  a  way  of sharing code (more	details	in the reusable	blocks
	      section).	Prior to re2v version 2.2  rules  blocks  required  -r
	      --reusable option.

       /*!use:re2c[:<name>] ...	*/ or %{use[:<name>] ... %}
	      A	use block that references a previously defined rules block. If
	      the  name	 is specified, re2v looks for a	rules blocks with this
	      name. Otherwise the most recent rules block is  used  (either  a
	      named  or	an unnamed one). A use block can add definitions, con-
	      figurations and rules of its own,	which are added	 to  those  of
	      the referenced rules block. Prior	to re2v	version	2.2 use	blocks
	      required -r --reusable option.

       /*!max:re2c[:<name1>[:<name2>...]] ... */ or
       %{max[:<name1>[:<name2>...]] ...	%}
	      A	block that generates YYMAXFILL definition. An optional list of
	      block  names specifies which blocks should be included when com-
	      puting YYMAXFILL value (if the list is empty, all	blocks are in-
	      cluded).	By default the generated code  is  a  macro-definition
	      for  C (#define YYMAXFILL	<n>), or a global variable for Go (var
	      YYMAXFILL	int = <n>). It can be customized with an optional con-
	      figuration format	that specifies a template string where @@{max}
	      (or @@ for short)	is replaced with the numeric value  of	YYMAX-
	      FILL.

       /*!maxnmatch:re2c[:<name1>[:<name2>...]]	... */ or %{maxn-
       match[:<name1>[:<name2>...]] ...	%}
	      A	 block	that  generates	YYMAXNMATCH definition (it requires -P
	      --posix-captures option).	An optional list of block names	speci-
	      fies which blocks	should be included when	computing  YYMAXNMATCH
	      value  (if  the list is empty, all blocks	are included).	By de-
	      fault the	generated code is a macro-definition  for  C  (#define
	      YYMAXNMATCH  <n>),  or a global variable for Go (var YYMAXNMATCH
	      int = <n>). It can be customized with an optional	 configuration
	      format that specifies a template string where @@{max} (or	@@ for
	      short) is	replaced with the numeric value	of YYMAXNMATCH.

       /*!stags:re2c[:<name1>[:<name2>...]] ...	*/,
       /*!mtags:re2c[:<name1>[:<name2>...]] ...	*/ or
       %{stags[:<name1>[:<name2>...]] ... %}, %{mtags[:<name1>[:<name2>...]]
       ... %{
	      Blocks  that  specify  a template	piece of code that is expanded
	      for each s-tag/m-tag variable generated  by  re2v.  An  optional
	      list  of	block  names specifies which blocks should be included
	      when computing the set of	tag variables (if the list  is	empty,
	      all  blocks  are	included).   There are two optional configura-
	      tions: format and	separator.  Configuration format  specifies  a
	      template string where @@{tag} (or	@@ for short) is replaced with
	      the  name	 of each tag variable.	Configuration separator	speci-
	      fies a piece of code used	to join	the  generated	format	pieces
	      for different tag	variables.

       /*!svars:re2c[:<name1>[:<name2>...]] ...	*/,
       /*!mvars:re2c[:<name1>[:<name2>...]] ...	*/ or
       %{svars[:<name1>[:<name2>...]] ... %}, %{mvars[:<name1>[:<name2>...]]
       ... %{
	      Blocks  that  specify  a template	piece of code that is expanded
	      for each s-tag/m-tag that	is either explicitly mentioned by  the
	      rules (with --tags option) or implicitly generated by re2v (with
	      --captvars  or  --posix-captvars	options).  An optional list of
	      block names specifies which blocks should	be included when  com-
	      puting the set of	tags (if the list is empty, all	blocks are in-
	      cluded).	There are two optional configurations: format and sep-
	      arator.	Configuration format specifies a template string where
	      @@{tag} (or @@ for short)	is replaced with the name of each tag.
	      Configuration separator specifies	a piece	of code	used  to  join
	      the generated format pieces for different	tags.

       /*!getstate:re2c[:<name1>[:<name2>...]] ... */ or %{get-
       state[:<name1>[:<name2>...]] ...	%}
	      A	 block	that generates conditional dispatch on the lexer state
	      (it requires --storable-state option). An	optional list of block
	      names specifies which blocks should be  included	in  the	 state
	      dispatch.	 The default transition	goes to	the start label	of the
	      first block on the list. If the list is empty,  all  blocks  are
	      included,	 and the default transition goes to the	first block in
	      the file that has	a start	label.	This block type	is  incompati-
	      ble  with	 the  --loop-switch option, as it requires cross-block
	      transitions that are unsupported without goto or function	calls.

       /*!conditions:re2c[:<name1>[:<name2>...]] ... */, /*!types:re2c... */
       or %{conditions[:<name1>[:<name2>...]] ... %}, %{types... %}
	      A	block that generates condition enumeration (it requires	--con-
	      ditions option). An optional list	of block names specifies which
	      blocks should be included	when computing the set	of  conditions
	      (if the list is empty, all blocks	are included).	By default the
	      generated	 code  is  an  enumeration  YYCONDTYPE.	It can be cus-
	      tomized with optional configurations format and separator.  Con-
	      figuration format	specifies a template string where @@{cond} (or
	      @@ for short) is replaced	with the name of each  condition,  and
	      @@{num}  is  replaced  with  a  numeric index of that condition.
	      Configuration separator specifies	a piece	of code	used  to  join
	      the generated format pieces for different	conditions.

       /*!include:re2c <file> */ or %{include <file> %}
	      This  block  allows  one to include <file>, which	must be	a dou-
	      ble-quoted file path. The	contents of  the  file	are  literally
	      substituted  in  place of	the block, in the same way as #include
	      works in C/C++. This block can be	used together with the	--dep-
	      file  option  to	generate  build	system dependencies on the in-
	      cluded files.

       /*!header:re2c:on*/ or %{header:on %}
	      This block marks the start of header file. Everything  after  it
	      and  up  to  the following header:off block is processed by re2v
	      and written to the header	file specified with  -t	 --type-header
	      option.

       /*!header:re2c:off*/ or %{header:off %}
	      This block marks the end of header file started with header:on*/
	      block.

       /*!ignore:re2c ... */ or	%{ignore ... %}
	      A	 block	which contents are ignored and removed from the	output
	      file.

   Configurations
       Here is a full list of configurations supported by re2v:

       re2c:api, re2c:input
	      Same as the --api	option.

       re2c:api:sigil
	      Specify the marker ("sigil") that	is used	 for  argument	place-
	      holders  in the API primitives. The default is @@. A placeholder
	      starts with sigil	followed by the	argument name in curly braces.
	      For example, if sigil is set to $, then placeholders  will  have
	      the  form	 ${name}. Single-argument APIs may use shorthand nota-
	      tion without the name in braces. This option can	be  overridden
	      by  options for individual API primitives, e.g.  re2c:YYFILL@len
	      for YYFILL.

       re2c:api:style
	      Specify API style. Possible values are  functions	 (the  default
	      for  C)  and  free-form (the default for Go and Rust).  In func-
	      tions style API primitives are generated with an	argument  list
	      in  parentheses  following  the name of the primitive. The argu-
	      ments are	provided only for autogenerated	 parameters  (such  as
	      the number of characters passed to YYFILL), but not for the gen-
	      eral lexer context, so the primitives behave more	like macros in
	      C/C++ or closures	in Go and Rust.	 In free-form style API	primi-
	      tives  do	 not  have  a  fixed  form:  they should be defined as
	      strings containing free-form pieces of  code  with  interpolated
	      variables	 of  the  form @@{var} or @@ (they correspond to argu-
	      ments in function-like style).  This configuration may be	 over-
	      ridden  for  individual API primitives, see for example re2c:YY-
	      FILL:naked configuration for YYFILL.

       re2c:bit-vectors, re2c:flags:bit-vectors, re2c:flags:b
	      Same as the --bit-vectors	 option,  but  can  be	configured  on
	      per-block	basis.

       re2c:captures, re2c:leftmost-captures
	      Same as the --leftmost-captures option, but can be configured on
	      per-block	basis.

       re2c:captvars, re2c:leftmost-captvars
	      Same as the --leftmost-captvars option, but can be configured on
	      per-block	basis.

       re2c:case-insensitive, re2c:flags:case-insensitive
	      Same  as the --case-insensitive option, but can be configured on
	      per-block	basis.

       re2c:case-inverted, re2c:flags:case-inverted
	      Same as the --case-inverted option, but  can  be	configured  on
	      per-block	basis.

       re2c:case-ranges, re2c:flags:case-ranges
	      Same  as	the  --case-ranges  option,  but  can be configured on
	      per-block	basis.

       re2c:computed-gotos, re2c:flags:computed-gotos, re2c:flags:g
	      Same as the --computed-gotos option, but can  be	configured  on
	      per-block	basis.

       re2c:computed-gotos:threshold, re2c:cgoto:threshold
	      If  computed goto	is used, this configuration specifies the com-
	      plexity threshold	that triggers the generation  of  jump	tables
	      instead  of  nested if statements	and bitmaps. The default value
	      is 9.

       re2c:cond:abort
	      If set to	a positive integer value, the default case in the gen-
	      erated condition dispatch	aborts program execution.

       re2c:cond:goto
	      Specifies	a piece	of code	used for  the  autogenerated  shortcut
	      rules :=>	in conditions. The default is goto @@;.	 The @@	place-
	      holder  is  substituted  with condition name (see	configurations
	      re2c:api:sigil and re2c:cond:goto@cond).

       re2c:cond:goto@cond
	      Specifies	 the  sigil  used   for	  argument   substitution   in
	      re2c:cond:goto  definition.  The default value is	@@.  Overrides
	      the more generic re2c:api:sigil configuration.

       re2c:cond:divider
	      Defines the divider for condition	blocks.	 The default value  is
	      /*  ***********************************  */.   Placeholders  are
	      substituted  with	 condition  name   (see	  re2c:api;sigil   and
	      re2c:cond:divider@cond).

       re2c:cond:divider@cond
	      Specifies	  the	sigil	used   for  argument  substitution  in
	      re2c:cond:divider	definition. The	default	is @@.	Overrides  the
	      more generic re2c:api:sigil configuration.

       re2c:cond:prefix, re2c:condprefix
	      Specifies	 the prefix used for condition labels.	The default is
	      yyc_.

       re2c:cond:enumprefix, re2c:condenumprefix
	      Specifies	the prefix used	for condition  identifiers.   The  de-
	      fault is yyc.

       re2c:debug-output, re2c:flags:debug-output, re2c:flags:d
	      Same  as	the  --debug-output  option,  but can be configured on
	      per-block	basis.

       re2c:empty-class, re2c:flags:empty-class
	      Same as the --empty-class	 option,  but  can  be	configured  on
	      per-block	basis.

       re2c:encoding:ebcdic, re2c:flags:ecb, re2c:flags:e
	      Same  as the --ebcdic option, but	can be configured on per-block
	      basis.

       re2c:encoding:ucs2, re2c:flags:wide-chars, re2c:flags:w
	      Same as the --ucs2 option, but can be  configured	 on  per-block
	      basis.

       re2c:encoding:utf8, re2c:flags:utf-8, re2c:flags:8
	      Same  as	the  --utf8 option, but	can be configured on per-block
	      basis.

       re2c:encoding:utf16, re2c:flags:utf-16, re2c:flags:x
	      Same as the --utf16 option, but can be configured	 on  per-block
	      basis.

       re2c:encoding:utf32, re2c:flags:unicode,	re2c:flags:u
	      Same  as	the --utf32 option, but	can be configured on per-block
	      basis.

       re2c:encoding-policy, re2c:flags:encoding-policy
	      Same as the --encoding-policy option, but	can be	configured  on
	      per-block	basis.

       re2c:eof
	      Specifies	the sentinel symbol used with the end-of-input rule $.
	      The  default  value  is  -1 ($ rule is not used).	Other possible
	      values include all valid code units. Only	 decimal  numbers  are
	      recognized.

       re2c:fn:sep
	      Specifies	 separator  used  in  YYFN elements (defaults to semi-
	      colon).

       re2c:header, re2c:flags:type-header, re2c:flags:t
	      Specifies	the name of the	generated header file relative to  the
	      directory	of the output file. Same as the	--header option	except
	      that the file path is relative.

       re2c:indent:string
	      Specifies	the string used	for indentation. The default is	a sin-
	      gle  tab character "\t". Indent string should contain whitespace
	      characters only.	To disable indentation entirely, set this con-
	      figuration to an empty string.

       re2c:indent:top
	      Specifies	the minimum amount of indentation to use. The  default
	      value  is	 zero. The value should	be a non-negative integer num-
	      ber.

       re2c:invert-captures
	      Same as the --invert-captures option, but	can be	configured  on
	      per-block	basis.

       re2c:label:prefix, re2c:labelprefix
	      Specifies	 the  prefix used for DFA state	labels.	The default is
	      yy.

       re2c:label:start, re2c:startlabel
	      Controls the generation of a  block  start  label.  The  default
	      value  is	 zero,	which  means that the start label is generated
	      only if it is used. An integer value greater  than  zero	forces
	      the generation of	start label even if it is unused by the	lexer.
	      A	 string	 value also forces start label generation and sets the
	      label name to the	specified string. This	configuration  applies
	      only  to	the current block (it is reset to default for the next
	      block).

       re2c:label:yyFillLabel
	      Specifies	the prefix of YYFILL labels used with re2c:eof and  in
	      storable state mode.

       re2c:label:yyloop
	      Specifies	 the  name of the label	marking	the start of the lexer
	      loop with	--loop-switch option. The default is yyloop.

       re2c:label:yyNext
	      Specifies	the name of the	optional label that follows YYGETSTATE
	      switch in	storable state mode (enabled  with  re2c:state:nextla-
	      bel). The	default	is yyNext.

       re2c:lookahead, re2c:flags:lookahead
	      Deprecated (see the deprecated --no-lookahead option).

       re2c:monadic
	      If  set  to non-zero, the	generated lexer	will use monadic nota-
	      tion (this configuration is specific to Haskell).

       re2c:nested-ifs,	re2c:flags:nested-ifs, re2c:flags:s
	      Same as the  --nested-ifs	 option,  but  can  be	configured  on
	      per-block	basis.

       re2c:posix-captures, re2c:flags:posix-captures, re2c:flags:P
	      Same  as	the  --posix-captures option, but can be configured on
	      per-block	basis.

       re2c:posix-captvars
	      Same as the --posix-captvars option, but can  be	configured  on
	      per-block	basis.

       re2c:tags, re2c:flags:tags, re2c:flags:T
	      Same  as	the  --tags option, but	can be configured on per-block
	      basis.

       re2c:tags:expression
	      Specifies	the expression used for	 tag  variables.   By  default
	      re2v generates expressions of the	form yyt<N>. This might	be in-
	      convenient,  for	example	if tag variables are defined as	fields
	      in a struct. All occurrences of @@{tag} or @@ are	replaced  with
	      the actual tag name. For example,	re2c:tags:expression = "s.@@";
	      results  in  expressions	of  the	form s.yyt<N> in the generated
	      code.  See also re2c:api:sigil configuration.

       re2c:tags:negative
	      Specifies	the constant expression	that is	used for negative  tag
	      value (typically this would be -1	if tags	are integer offsets in
	      the input	string,	or null	pointer	if they	are pointers).

       re2c:tags:prefix
	      Specifies	the prefix for tag variable names. The default is yyt.

       re2c:sentinel
	      Specifies	 the  sentinel symbol used for the end-of-input	checks
	      (when bounds checks are disabled with  re2c:yyfill:enable	 =  0;
	      and  re2c:eof  is	 not  set). This configuration does not	affect
	      code generation: its purpose is to verify	that the  sentinel  is
	      not  allowed  in the middle of a rule, and ensure	that the lexer
	      won't read past the end of buffer. The default value is -1`  (in
	      that  case  re2v assumes that the	sentinel is zero, which	is the
	      most common case). Only decimal numbers are recognized.

       re2c:state:abort
	      If set to	a positive integer value, the default case in the gen-
	      erated state dispatch aborts program execution, and an  explicit
	      -1 case contains transition to the start of the block.

       re2c:state:nextlabel
	      Controls if the YYGETSTATE switch	is followed by an yyNext label
	      (the default value is zero, which	corresponds to no label).  Al-
	      ternatively  one can use re2c:label:start	to generate a specific
	      start label, or an  explicit  getstate  block  to	 generate  the
	      YYGETSTATE switch	separately from	the lexer block.

       re2c:unsafe, re2c:flags:unsafe
	      Same  as	the  --no-unsafe  option,  but	can  be	 configured on
	      per-block	basis.	If set to zero,	it suppresses  the  generation
	      of unsafe	wrappers around	YYPEEK.	The default is non-zero	(wrap-
	      pers are generated).  This configuration is specific to Rust.

       re2c:YYBACKUP, re2c:define:YYBACKUP
	      Defines generic API primitive YYBACKUP.

       re2c:YYBACKUPCTX, re2c:define:YYBACKUPCTX
	      Defines generic API primitive YYBACKUPCTX.

       re2c:YYCONDTYPE,	re2c:define:YYCONDTYPE
	      Defines API primitive YYCONDTYPE.

       re2c:YYCTYPE, re2c:define:YYCTYPE
	      Defines API primitive YYCTYPE.

       re2c:YYCTXMARKER, re2c:define:YYCTXMARKER
	      Defines API primitive YYCTXMARKER.

       re2c:YYCURSOR, re2c:define:YYCURSOR
	      Defines API primitive YYCURSOR.

       re2c:YYDEBUG, re2c:define:YYDEBUG
	      Defines API primitive YYDEBUG.

       re2c:YYFILL, re2c:define:YYFILL
	      Defines API primitive YYFILL.

       re2c:YYFILL@len,	re2c:define:YYFILL@len
	      Specifies	the sigil used for argument substitution in YYFILL de-
	      finition.	  Defaults   to	  @@.	 Overrides  the	 more  generic
	      re2c:api:sigil configuration.

       re2c:YYFILL:naked, re2c:define:YYFILL:naked
	      Overrides	the more generic re2c:api:style	configuration for  YY-
	      FILL.  Zero value	corresponds to free-form API style.

       re2c:YYFN
	      Defines API primitive YYFN.

       re2c:YYINPUT
	      Defines API primitive YYINPUT.

       re2c:YYGETCOND, re2c:define:YYGETCONDITION
	      Defines API primitive YYGETCOND.

       re2c:YYGETCOND:naked, re2c:define:YYGETCONDITION:naked
	      Overrides	 the  more  generic  re2c:api:style  configuration for
	      YYGETCOND. Zero value corresponds	to free-form API style.

       re2c:YYGETSTATE,	re2c:define:YYGETSTATE
	      Defines API primitive YYGETSTATE.

       re2c:YYGETSTATE:naked, re2c:define:YYGETSTATE:naked
	      Overrides	the  more  generic  re2c:api:style  configuration  for
	      YYGETSTATE. Zero value corresponds to free-form API style.

       re2c:YYGETACCEPT, re2c:define:YYGETACCEPT
	      Defines API primitive YYGETACCEPT.

       re2c:YYLESSTHAN,	re2c:define:YYLESSTHAN
	      Defines generic API primitive YYLESSTHAN.

       re2c:YYLIMIT, re2c:define:YYLIMIT
	      Defines API primitive YYLIMIT.

       re2c:YYMARKER, re2c:define:YYMARKER
	      Defines API primitive YYMARKER.

       re2c:YYMTAGN, re2c:define:YYMTAGN
	      Defines generic API primitive YYMTAGN.

       re2c:YYMTAGP, re2c:define:YYMTAGP
	      Defines generic API primitive YYMTAGP.

       re2c:YYPEEK, re2c:define:YYPEEK
	      Defines generic API primitive YYPEEK.

       re2c:YYRESTORE, re2c:define:YYRESTORE
	      Defines generic API primitive YYRESTORE.

       re2c:YYRESTORECTX, re2c:define:YYRESTORECTX
	      Defines generic API primitive YYRESTORECTX.

       re2c:YYRESTORETAG, re2c:define:YYRESTORETAG
	      Defines generic API primitive YYRESTORETAG.

       re2c:YYSETCOND, re2c:define:YYSETCONDITION
	      Defines API primitive YYSETCOND.

       re2c:YYSETCOND@cond, re2c:define:YYSETCONDITION@cond
	      Specifies	 the sigil used	for argument substitution in YYSETCOND
	      definition. The default value is @@.  Overrides the more generic
	      re2c:api:sigil configuration.

       re2c:YYSETCOND:naked, re2c:define:YYSETCONDITION:naked
	      Overrides	the more generic re2c:api:style	configuration for  YY-
	      SETCOND. Zero value corresponds to free-form API style.

       re2c:YYSETSTATE,	re2c:define:YYSETSTATE
	      Defines API primitive YYSETSTATE.

       re2c:YYSETSTATE@state, re2c:define:YYSETSTATE@state
	      Specifies	the sigil used for argument substitution in YYSETSTATE
	      definition. The default value is @@.  Overrides the more generic
	      re2c:api:sigil configuration.

       re2c:YYSETSTATE:naked, re2c:define:YYSETSTATE:naked
	      Overrides	 the more generic re2c:api:style configuration for YY-
	      SETSTATE.	Zero value corresponds to free-form API	style.

       re2c:YYSETACCEPT, re2c:define:YYSETACCEPT
	      Defines API primitive YYSETACCEPT.

       re2c:YYSKIP, re2c:define:YYSKIP
	      Defines generic API primitive YYSKIP.

       re2c:YYSHIFT, re2c:define:YYSHIFT
	      Defines generic API primitive YYSHIFT.

       re2c:YYCOPYMTAG,	re2c:define:YYCOPYMTAG
	      Defines generic API primitive YYCOPYMTAG.

       re2c:YYCOPYSTAG,	re2c:define:YYCOPYSTAG
	      Defines generic API primitive YYCOPYSTAG.

       re2c:YYSHIFTMTAG, re2c:define:YYSHIFTMTAG
	      Defines generic API primitive YYSHIFTMTAG.

       re2c:YYSHIFTSTAG, re2c:define:YYSHIFTSTAG
	      Defines generic API primitive YYSHIFTSTAG.

       re2c:YYSTAGN, re2c:define:YYSTAGN
	      Defines generic API primitive YYSTAGN.

       re2c:YYSTAGP, re2c:define:YYSTAGP
	      Defines generic API primitive YYSTAGP.

       re2c:yyaccept, re2c:variable:yyaccept
	      Defines API primitive yyaccept.

       re2c:yybm, re2c:variable:yybm
	      Defines API primitive yybm.

       re2c:yybm:hex, re2c:variable:yybm:hex
	      If set to	nonzero, bitmaps for the --bit-vectors option are gen-
	      erated in	hexadecimal format. The	default	is zero	 (bitmaps  are
	      in decimal format).

       re2c:yych, re2c:variable:yych
	      Defines API primitive yych.

       re2c:yych:emit, re2c:variable:yych:emit
	      If  set  to zero,	yych definition	is not generated.  The default
	      is non-zero.

       re2c:yych:conversion, re2c:variable:yych:conversion
	      If set to	non-zero, re2v automatically generates a conversion to
	      YYCTYPE every time yych is read. The default is to zero (no con-
	      version).

       re2c:yych:literals, re2c:variable:yych:literals
	      Specifies	the form of literals that  yych	 is  matched  against.
	      Possible	values are: char (character literals in	single quotes,
	      non-printable ones use escape sequences that  start  with	 back-
	      slash), hex (hexadecimal integers) and char_or_hex (a mixture of
	      both,  character literals	for printable characters and hexadeci-
	      mal integers for others).

       re2c:yyctable, re2c:variable:yyctable
	      Defines API primitive yyctable.

       re2c:yynmatch, re2c:variable:yynmatch
	      Defines API primitive yynmatch.

       re2c:yypmatch, re2c:variable:yypmatch
	      Defines API primitive yypmatch.

       re2c:yytarget, re2c:variable:yytarget
	      Defines API primitive yytarget.

       re2c:yystable, re2c:variable:yystable
	      Deprecated.

       re2c:yystate, re2c:variable:yystate
	      Defines API primitive yystate.

       re2c:yyfill, re2c:variable:yyfill
	      Defines API primitive yyfill.

       re2c:yyfill:check
	      If set to	zero, suppresses the generation	 of  pre-YYFILL	 check
	      for the number of	input characters (the YYLESSTHAN definition in
	      generic  API and the YYLIMIT-based comparison in C pointer API).
	      The default is non-zero (generate	the check).

       re2c:yyfill:enable
	      If set to	zero, suppresses the generation	 of  YYFILL  (together
	      with  the	 check). This should be	used when the whole input fits
	      into one piece of	memory (there is no need  for  buffering)  and
	      the  end-of-input	 checks	do not rely on the YYFILL checks (e.g.
	      if a sentinel character is used).	 Use warnings (-W option)  and
	      re2c:sentinel  configuration  to verify that the generated lexer
	      cannot read past the end of input.  The default is non-zero (YY-
	      FILL is enabled).

       re2c:yyfill:parameter
	      If set to	zero, suppresses the generation	of parameter passed to
	      YYFILL.  The parameter is	the minimum number of characters  that
	      must be supplied.	 Defaults to non-zero (the parameter is	gener-
	      ated).   This  configuration  can	 be  overridden	 with re2c:YY-
	      FILL:naked or re2c:api:style.

   Regular expressions
       re2v uses the following syntax for regular expressions:

       "foo"  Case-sensitive string literal.

       'foo'  Case-insensitive string literal.

       [a-xyz],	[^a-xyz]
	      Character	class (possibly	negated).

       .      Any character except newline.

       R \ S  Difference of character classes R	and S.

       R*     Zero or more occurrences of R.

       R+     One or more occurrences of R.

       R?     Optional R.

       R{n}   Repetition of R exactly n	times.

       R{n,}  Repetition of R at least n times.

       R{n,m} Repetition of R from n to	m times.

       (R)    Just R; parentheses are used to override precedence. If submatch
	      extraction is enabled, (R) is a  capturing  or  a	 non-capturing
	      group depending on --invert-captures option.

       (!R)   If  submatch extraction is enabled, (!R) is a non-capturing or a
	      capturing	group depending	on --invert-captures option.

       R S    Concatenation: R followed	by S.

       R | S  Alternative: R or	S.

       R / S  Lookahead: R followed by S, but S	is not consumed.

       name   Regular expression defined as name (or literal string "name"  in
	      Flex compatibility mode).

       {name} Regular expression defined as name in Flex compatibility mode.

       @stag  An  s-tag:  saves	the last input position	at which @stag matches
	      in a variable named stag.

       #mtag  An m-tag:	saves all input	positions at which #mtag matches in  a
	      variable named mtag.

       Character  classes and string literals may contain the following	escape
       sequences: \a, \b, \f, \n, \r, \t, \v, \\, octal	escapes	\ooo and hexa-
       decimal escapes \xhh, \uhhhh and	\Uhhhhhhhh.

   Actions
       Here is a list of predefined actions supported by re2v:

       !entry code
	      Entry action binds a user-defined	block of  code	to  the	 start
	      state  of	 the current finite state machine. If start conditions
	      are used,	the entry action can be	set individually for each con-
	      dition. This action may be used to perform initialization,  e.g.
	      to save start location of	a lexeme.

       !pre_rule code
	      Pre-rule	action prepends	a user-defined block of	code to	seman-
	      tic actions of all rules in the current block (or	condition,  if
	      start  conditions	 are  used). This action may be	used to	factor
	      out the common part of all semantic actions (e.g.	saving the end
	      location of a lexeme).

       !post_rule code
	      Post-rule	action appends a user-defined block of code to	seman-
	      tic  actions of all rules	in the current block (or condition, if
	      start conditions are used). This action may be used to emit trap
	      statements that guard against unintended control flow.

   Directives
       Here is a full list of directives supported by re2v:

       !use:name ;
	      An in-block use directive	that merges a previously defined rules
	      block with the specified name into the current block. Named def-
	      initions,	configurations and rules of the	referenced  block  are
	      added  to	 the current ones. Conflicts between overlapping rules
	      and configurations are resolved in the usual way:	the first rule
	      takes priority, and the latest configuration overrides the  pre-
	      ceding ones. One exception is the	special	rules *, $ and <!> for
	      which  a block-local definition always takes priority. A use di-
	      rective can be placed anywhere inside of a block,	 and  multiple
	      use directives are allowed.

       !include	file ;
	      This  directive  is  the	same as	include	block: it inserts file
	      contents verbatim	in place of the	directive.

   Program interface
       The generated code interfaces with the outer program with the  help  of
       primitives,  collectively  referred  to	as  the	API.  Which primitives
       should be defined for a particular program depends on multiple factors,
       including the complexity	of regular expressions,	input  representation,
       buffering and the use of	various	features. All the necessary primitives
       should  be  defined by the user in the form of macros, functions, vari-
       ables or	any other suitable form	that makes the generated code  syntac-
       tically	and semantically correct. re2v does not	(and cannot) check the
       definitions, so if anything is missing or defined incorrectly, the gen-
       erated program may have compile-time or run-time	errors.	  This	manual
       provides	examples of API	definitions in the most	common cases.

       re2v  has three API flavors that	define the core	set of primitives used
       by a program:

       Simple API
	      This is the default API for the V	backend. It  consists  of  the
	      following	 primitives: YYINPUT (which should be defined as a se-
	      quence of	code units, e.g. a  string)  and  YYCURSOR,  YYMARKER,
	      YYCTXMARKER,  YYLIMIT (which should be defined as	indices	in YY-
	      INPUT).

       Record API
	      Record API is useful in cases when lexer state must be stored in
	      a	struct.	 It is enabled with --api record option	or re2c:api  =
	      record  configuration.  This API consists	of a variable yyrecord
	      (the name	can be overridden with re2c:yyrecord) that  should  be
	      defined  as  a  struct  with fields yyinput, yycursor, yymarker,
	      yyctxmarker, yylimit (only the fields used by the	generated code
	      need to be defined, and their names can be configured).

       Generic API
	      This is the most flexible	API. It	is enabled with	--api  generic
	      option  or re2c:api = generic configuration.  It contains	primi-
	      tives for	generic	operations: YYPEEK, YYSKIP, YYBACKUP,  YYBACK-
	      UPCTX,  YYSTAGP,	YYSTAGN,  YYMTAGP,  YYMTAGN,  YYRESTORE, YYRE-
	      STORECTX,	YYRESTORETAG, YYSHIFT, YYSHIFTSTAG,  YYSHIFTMTAG,  YY-
	      LESSTHAN.

       Here is a full list of API primitives that may be used by the generated
       code in order to	interface with the outer program.

       YYCTYPE
	      The  type	 of  the  input	 characters  (code units).  For	ASCII,
	      EBCDIC and UTF-8 encodings it should be 1-byte unsigned integer.
	      For UTF-16 or UCS-2 it should be 2-byte  unsigned	 integer.  For
	      UTF-32 it	should be 4-byte unsigned integer.

       YYCURSOR
	      An  l-value that stores the current input	position (a pointer or
	      an integer offset	in YYINPUT). Initially YYCURSOR	 should	 point
	      to  the  first  input character, and later it is advanced	by the
	      generated	code. When a rule matches, YYCURSOR  position  is  the
	      one after	the last matched character.

       YYLIMIT
	      An  r-value  that	stores the end of input	position (a pointer or
	      an integer offset	in YYINPUT). Initially YYLIMIT should point to
	      the position after the last available input character. It	is not
	      changed by the generated code. The lexer	compares  YYCURSOR  to
	      YYLIMIT  in order	to determine if	there are enough input charac-
	      ters left.

       YYMARKER
	      An l-value that stores the position of the latest	 matched  rule
	      (a  pointer  or an integer offset	in YYINPUT). It	is used	to re-
	      store the	YYCURSOR position if the longer	match  fails  and  the
	      lexer needs to rollback.	Initialization is not needed.

       YYCTXMARKER
	      An  l-value  that	stores the position of the trailing context (a
	      pointer or an integer offset in YYINPUT).	No  initialization  is
	      needed.  YYCTXMARKER  is needed only if the lookahead operator /
	      is used.

       YYFILL A	generic	API primitive with one variable	 len.	YYFILL	should
	      provide at least len more	input characters or fail.  If re2c:eof
	      is  used,	 then len is always 1 and  YYFILL should always	return
	      to the calling function; zero return  value  indicates  success.
	      If re2c:eof is not used, then YYFILL return value	is ignored and
	      it should	not return on failure. The maximum value of len	is YY-
	      MAXFILL.

       YYFN   A	primitive that defines function	prototype in --recursive-func-
	      tions  code  model.  Its value should be an array	of one or more
	      strings, where each string contains two or three components sep-
	      arated by	the  string  specified	in  re2c:fn:sep	 configuration
	      (typically  a  semicolon). The first array element defines func-
	      tion name	and return type	(empty for a void  function).	Subse-
	      quent  elements define function arguments: first,	the expression
	      for the argument used in function	body (usually  just  a	name);
	      second,  argument	 type; third, an optional formal parameter (it
	      defaults to the first component -	usually	both the argument  and
	      the parameter are	the same identifier).

       YYINPUT
	      An  r-value  that	 stores	 the  current input character sequence
	      (string, buffer, etc.).

       YYMAXFILL
	      An integral constant equal to the	maximum	value of the  argument
	      to YYFILL.  It can be generated with a max block.

       YYLESSTHAN
	      A	generic	API primitive with one variable	len.  It should	be de-
	      fined as an r-value of boolean type that equals true if and only
	      if there are less	than len input characters left.

       YYPEEK A	generic	API primitive with no variables.  It should be defined
	      as  an r-value of	type YYCTYPE that is equal to the character at
	      the current input	position.

       YYSKIP A	generic	API primitive that should advance  the	current	 input
	      position by one code unit.

       YYBACKUP
	      A	generic	API primitive that should save the current input posi-
	      tion (to be restored with	YYRESTORE later).

       YYRESTORE
	      A	 generic  API  primitive that should restore the current input
	      position to the value saved by YYBACKUP.

       YYBACKUPCTX
	      A	generic	API primitive that should save the current input posi-
	      tion as the position of the trailing  context  (to  be  restored
	      with YYRESTORECTX	later).

       YYRESTORECTX
	      A	generic	API primitive that should restore the trailing context
	      position saved with YYBACKUPCTX.

       YYRESTORETAG
	      A	 generic  API  primitive with one variable tag that should re-
	      store the	trailing context position to the value of tag.

       YYSTAGP
	      A	generic	API primitive with one variable	tag, where tag can  be
	      a	 pointer or an offset in YYINPUT (see submatch extraction sec-
	      tion for details). YYSTAGP should	set tag	to the	current	 input
	      position.

       YYSTAGN
	      A	 generic API primitive with one	variable tag, where tag	can be
	      a	pointer	or an offset in	YYINPUT	(see submatch extraction  sec-
	      tion  for	 details).  YYSTAGN  should to set tag to a value that
	      represents non-existent input position.

       YYMTAGP
	      A	generic	API primitive with one variable	tag.   YYMTAGP	should
	      append  the current position to the submatch history of tag (see
	      the submatch extraction section for details.)

       YYMTAGN
	      A	generic	API primitive with one variable	tag.   YYMTAGN	should
	      append a value that represents non-existent input	position posi-
	      tion to the submatch history of tag (see the submatch extraction
	      section for details.)

       YYSHIFT
	      A	 generic  API  primitive  with	one variable shift that	should
	      shift the	current	input position by shift	characters (the	 shift
	      value may	be negative).

       YYCOPYSTAG
	      A	 generic  API  primitive  with two variables, lhs and rhs that
	      should  copy  right-hand-side  s-tag   variable	rhs   to   the
	      left-hand-side s-tag variable lhs. For most languages this prim-
	      itive has	a default definition that assigns lhs to rhs.

       YYCOPYMTAG
	      A	 generic  API  primitive  with two variables, lhs and rhs that
	      should  copy  right-hand-side  m-tag   variable	rhs   to   the
	      left-hand-side m-tag variable lhs. For most languages this prim-
	      itive has	a default definition that assigns lhs to rhs.

       YYSHIFTSTAG
	      A	 generic  API primitive	with two variables, tag	and shift that
	      should shift tag by shift	code units (the	 shift	value  may  be
	      negative).

       YYSHIFTMTAG
	      A	 generic  API primitive	with two variables, tag	and shift that
	      should shift the latest value in the history  of	tag  by	 shift
	      code units (the shift value may be negative).

       YYMAXNMATCH
	      An  integral  constant equal to the maximal number of POSIX cap-
	      turing groups in a rule. It is generated with a maxnmatch	block.

       YYCONDTYPE
	      The type of the condition	enum.  It can be generated either with
	      conditions block or --header option.

       YYGETACCEPT
	      A	primitive with one variable var	that stores  numeric  selector
	      of  the  accepted	 rule. For most	languages this primitive has a
	      default definition that reads from var.

       YYSETACCEPT
	      A	primitive with two variables: var (an l-value that stores  nu-
	      meric  selector of the accepted rule), and val (the value	of se-
	      lector). For most	languages this primitive has a default defini-
	      tion that	assigns	var to val.

       YYGETCOND
	      An r-value of type YYCONDTYPE that is equal to the current  con-
	      dition identifier.

       YYSETCOND
	      A	 primitive  with one variable cond that	should set the current
	      condition	identifier to cond.

       YYGETSTATE
	      An r-value of integer type that is equal to  the	current	 lexer
	      state. It	should be initialized to -1.

       YYSETSTATE
	      A	 primitive with	one variable state that	should set the current
	      lexer state to state.

       YYDEBUG
	      This primitive is	generated only with -d,	--debug-output option.
	      Its purpose is to	add logging to the generated code (typical YY-
	      DEBUG definition is a print statement). YYDEBUG  statements  are
	      generated	in every state and have	two variables: state (either a
	      DFA state	index or -1) and symbol	(the current input symbol).

       yyaccept
	      An  l-value  of unsigned integral	type that stores the number of
	      the latest matched rule. User definition is necessary only  with
	      --storable-state option.

       yybm   A	 table	containing  compressed bitmaps for up to 8 transitions
	      (used with the --bitmaps option).	The table  contains  256  ele-
	      ments  and  is  indexed by 1-byte	code units. Each 8-bit element
	      combines boolean values for up to	8  transitions.	 k-Th  bit  of
	      n-th  element is true iff	n-th code unit is in the range of k-th
	      transition. The idea of  this  bitmap  is	 to  replace  many  if
	      branches	or  switch cases with one check	of a single bit	in the
	      table.

       yych   An l-value of type YYCTYPE that stores the current input charac-
	      ter.  User definition is necessary only with -f --storable-state
	      option.

       yyctable
	      Jump table generated for the initial condition dispatch (enabled
	      with the combination of --conditions  and	 --computed-gotos  op-
	      tions).

       yyfill An  l-value  that	 stores	the result of YYFILL call (this	may be
	      necessary	for pure  functional  languages,  where	 YYFILL	 is  a
	      monadic function with complex return value).

       yynmatch
	      An  l-value  of unsigned integral	type that stores the number of
	      POSIX capturing groups in	the matched rule.  Used	only  with  -P
	      --posix-captures option.

       yypmatch
	      An array of l-values that	are used to hold the tag values	corre-
	      sponding	to the capturing parentheses in	the matching rule. Ar-
	      ray length must be at least yynmatch * 2 (usually	YYMAXNMATCH  *
	      2	is a good choice).  Used only with -P --posix-captures option.

       yystable
	      Deprecated.

       yystate
	      An  l-value used with the	--loop-switch option to	store the cur-
	      rent DFA state.

       yytarget
	      Jump table that contains jump targets (label addresses) for  all
	      transitions  from	 a  state.  This table is local	to each	state.
	      Generation of yytarget tables is enabled	with  --computed-gotos
	      option.

   Options
       Some  of	 the  options  have  corresponding  configurations, others are
       global and cannot be changed after re2c starts reading the input	 file.
       Debug  options  generally require building re2c in debug	configuration.
       Internal	options	are useful for experimenting with the algorithms  used
       in re2c.

       -? --help -h
	      Show help	message.

       --api <simple | record |	generic>
	      Specify  the  API	 used  by the generated	code to	interface with
	      used-defined code. Option	simple shold be	used in	 simple	 cases
	      when  there's  no	 need  for  buffer refilling and storing lexer
	      state. Option record should be used when lexer state needs to be
	      stored in	a record (struct, class, etc.).	 Option	generic	should
	      be used in complex cases when the	other two APIs are not	flexi-
	      ble enough.

       --bit-vectors -b
	      Optimize conditional jumps using bit masks.  This	option implies
	      --nested-ifs.

       --captures, --leftmost-captures
	      Enable   submatch	 extraction  with  leftmost  greedy  capturing
	      groups. The result is collected into an array yybmatch of	capac-
	      ity 2 * YYMAXNMATCH, and yynmatch	is set to the number of	groups
	      for the matching rule.

       --captvars, --leftmost-captvars
	      Enable  submatch	extraction  with  leftmost  greedy   capturing
	      groups.  The result is collected into variables yytl<k>, yytr<k>
	      for k-th capturing group.

       --case-insensitive
	      Treat single-quoted and double-quoted strings  as	 case-insensi-
	      tive.

       --case-inverted
	      Invert  the  meaning of single-quoted and	double-quoted strings:
	      treat single-quoted strings as case-sensitive and	 double-quoted
	      strings as case-insensitive.

       --case-ranges
	      Collapse	consecutive  cases in a	switch statements into a range
	      of the form low ... high.	This syntax is a C/C++ language	exten-
	      sion that	is supported by	compilers like GCC, Clang and Tcc. The
	      main advantage over using	single cases is	smaller	generated code
	      and faster generation time, although for some compilers like Tcc
	      it also results in smaller binary	size.	This  option  is  sup-
	      ported only for C.

       --computed-gotos	-g
	      Optimize	conditional  jumps  using non-standard "computed goto"
	      extension	(which must be supported by the	compiler). re2v	gener-
	      ates jump	tables only in complex cases with a lot	of conditional
	      branches.	 Complexity   threshold	  can	be   configured	  with
	      cgoto:threshold  configuration.  This  option implies --bit-vec-
	      tors. It is supported only for C.

       --conditions --start-conditions -c
	      Enable support of	Flex-like "conditions":	multiple  interrelated
	      lexers  within  one  block.  This	 is an alternative to manually
	      specifying different re2v	blocks connected with goto or function
	      calls.

       --depfile FILE
	      Write dependency information to FILE in the form of  a  Makefile
	      rule  <output-file>  : <input-file> [include-file	...]. This al-
	      lows one to track	build dependencies in the presence of  include
	      blocks/directives,  so  that updating include files triggers re-
	      generation of the	output	file.	This  option  depends  on  the
	      --output option.

       --ebcdic	--ecb -e
	      Generate	a  lexer that reads input in EBCDIC encoding. re2v as-
	      sumes that the character range is	0 -- 0xFF and  character  size
	      is 1 byte.

       --empty-class <match-empty | match-none | error>
	      Define  the  way	re2v  treats  empty  character	classes.  With
	      match-empty (the default)	empty class matches empty input	(which
	      is illogical, but	backwards-compatible). With  match-none	 empty
	      class  always  fails  to match.  With error empty	class raises a
	      compilation error.

       --encoding-policy <fail | substitute | ignore>
	      Define the way re2v treats Unicode surrogates.  With  fail  re2v
	      aborts with an error when	a surrogate is encountered.  With sub-
	      stitute  re2v  silently  replaces	surrogates with	the error code
	      point 0xFFFD. With ignore	(the default) re2v  treats  surrogates
	      as normal	code points. The Unicode standard says that standalone
	      surrogates  are  invalid,	 but real-world	libraries and programs
	      behave in	different ways.

       --flex-syntax -F
	      Partial support for Flex syntax: in this mode named  definitions
	      don't  need  the	equal  sign and	the terminating	semicolon, and
	      when used	they must be surrounded	with curly braces. Names with-
	      out curly	braces are treated as double-quoted strings.

       --goto-label
	      Use "goto/label" code model: encode DFA in form of labeled  code
	      blocks  connected	 with  goto transitions	across blocks. This is
	      only supported for languages that	have a goto statement.

       --header	--type-header -t HEADER
	      Generate a HEADER	file. The contents of the file can  be	speci-
	      fied  using  special  blocks header:on and header:off. If	condi-
	      tions are	used, the generated header will	have a condition  enum
	      automatically appended to	it (unless there is an explicit	condi-
	      tions block).

       -I PATH
	      Add  PATH	to the list of locations which are used	when searching
	      for include files. This option is	useful in combination with in-
	      clude block or directive.	re2v looks for FILE in	the  directory
	      of  the  parent file and in the include locations	specified with
	      -I option.

       --input <default	| custom>
	      Deprecated alias for --api. Option default corresponds to	simple
	      (it is indeed the	default	for most backends, but not  for	 all).
	      Option custom corresponds	to generic.

       --input-encoding	<ascii | utf8>
	      Specify  the  way	 re2v  parses regular expressions.  With ascii
	      (the default) re2v handles input as ASCII-encoded: any  sequence
	      of  code	units  is  a sequence of standalone 1-byte characters.
	      With utf8	re2v handles  input  as	 UTF8-encoded  and  recognizes
	      multibyte	characters.

       --invert-captures
	      Invert the meaning of capturing and non-capturing	groups.	By de-
	      fault (...) is capturing and (! ...) is non-capturing. With this
	      option (!	...) is	capturing and (...) is non-capturing.

       --lang <none | c	| d | go | haskell | java | js | ocaml | python	| rust
       | v | zig>
	      Specify  the  target language. Supported languages are C,	D, Go,
	      Haskell, Java, JS, OCaml,	Python,	Rust, V, Zig  (more  languages
	      can be added via user-defined syntax files, see the --syntax op-
	      tion).  Option none disables default suntax configs, so that the
	      target language is undefined.

       --location-format <gnu |	msvc>
	      Specify location format in messages.   With  gnu	locations  are
	      printed as 'filename:line:column:	...'.  With msvc locations are
	      printed as 'filename(line,column)	...'.  The default is gnu.

       --loop-switch
	      Use  "loop/switch" code model: encode DFA	in form	of a loop over
	      a	switch statement, where	individual states  are	switch	cases.
	      State  is	 stored	 in  a	variable  yystate. Transitions between
	      states update yystate to the case	label of the destination state
	      and continue execution to	the head of the	loop.

       --nested-ifs -s
	      Use nested if statements instead of switch statements in	condi-
	      tional  jumps.  This usually results in more efficient code with
	      non-optimizing compilers.

       --no-debug-info -i
	      Do not output line directives. This may be useful	when the  gen-
	      erated code is stored in a version control system	(to avoid huge
	      autogenerated diffs on small changes).

       --no-generation-date
	      Suppress date output in the generated file.

       --no-version
	      Suppress version output in the generated file.

       --no-unsafe
	      Do  not generate unsafe wrapper over YYPEEK (this	option is spe-
	      cific to Rust). For  performance	reasons	 YYPEEK	 should	 avoid
	      bounds-checking,	as  the	 lexer	already	 performs end-of-input
	      checks in	a more efficient way.  The user	may choose to  provide
	      a	safe YYPEEK definition,	or a definition	that is	unsafe only in
	      release  builds,	in  which case the --no-unsafe option helps to
	      avoid warnings about redundant unsafe blocks.

       --output	-o OUTPUT
	      Specify the OUTPUT file.

       --posix-captures, -P
	      Enable submatch extraction with  POSIX-style  capturing  groups.
	      The  result  is collected	into an	array yybmatch of capacity 2 *
	      YYMAXNMATCH, and yynmatch	is set to the number of	groups for the
	      matching rule.

       --posix-captvars
	      Enable submatch extraction with  POSIX-style  capturing  groups.
	      The result is collected into variables yytl<k>, yytr<k> for k-th
	      capturing	group.

       --recursive-functions
	      Use  code	 model based on	co-recursive functions,	where each DFA
	      state is a separate function that	may call other state-functions
	      or itself.

       --reusable -r
	      Deprecated since version 2.2 (reusable blocks are	allowed	by de-
	      fault now).

       --skeleton -S
	      Ignore user-defined interface code and generate a	self-contained
	      "skeleton" program.  Additionally,  generate  input  files  with
	      strings  derived	from  the regular grammar and compressed match
	      results that are used to verify "skeleton" behavior on  all  in-
	      puts.  This  option  is useful for finding bugs in optimizations
	      and code generation. This	option is supported only for C.

       --storable-state	-f
	      Generate a lexer which can store its inner state.	 This is  use-
	      ful  in  push-model lexers which are stopped by an outer program
	      when there is not	enough input, and then resumed when more input
	      becomes available. In this mode users should additionally	define
	      YYGETSTATE and YYSETSTATE	primitives, and	variables yych,	 yyac-
	      cept and state should be part of the stored lexer	state.

       --syntax	FILE
	      Load  configurations  from  the specified	FILE and apply them on
	      top of the default syntax	file. Note that	FILE can define	only a
	      few configurations (if it's used to  amend  the  default	syntax
	      file),  or  it  can  define a whole new language backend (in the
	      latter case it is	recommended to use --lang none option).

       --tags -T
	      Enable submatch extraction with tags.

       --ucs2 --wide-chars -w
	      Generate a lexer that reads  UCS2-encoded	 input.	 re2v  assumes
	      that  the	character range	is 0 --	0xFFFF and character size is 2
	      bytes.  This option implies --nested-ifs.

       --utf8 --utf-8 -8
	      Generate a lexer that reads input	in UTF-8  encoding.  re2v  as-
	      sumes  that  the	character range	is 0 --	0x10FFFF and character
	      size is 1	byte.

       --utf16 --utf-16	-x
	      Generate a lexer that reads UTF16-encoded	 input.	 re2v  assumes
	      that  the	character range	is 0 --	0x10FFFF and character size is
	      2	bytes.	This option implies --nested-ifs.

       --utf32 --unicode -u
	      Generate a lexer that reads UTF32-encoded	 input.	 re2v  assumes
	      that  the	character range	is 0 --	0x10FFFF and character size is
	      4	bytes.	This option implies --nested-ifs.

       --verbose
	      Output a short message in	case of	success.

       --vernum	-V
	      Show version information in MMmmpp format	(major,	minor, patch).

       --version -v
	      Show version information.

       --single-pass -1
	      Deprecated. Does nothing (single pass is the default now).

       --debug-output -d
	      Emit YYDEBUG invocations in the generated	code. This  is	useful
	      to trace lexer execution.

       --dump-adfa
	      Debug option: output DFA after tunneling (in .dot	format).

       --dump-cfg
	      Debug  option:  output  control  flow graph of tag variables (in
	      .dot format).

       --dump-closure-stats
	      Debug option: output statistics on the number of states in  clo-
	      sure.

       --dump-dfa-det
	      Debug  option:  output DFA immediately after determinization (in
	      .dot format).

       --dump-dfa-min
	      Debug option: output DFA after minimization (in .dot format).

       --dump-dfa-tagopt
	      Debug option: output DFA after tag optimizations (in  .dot  for-
	      mat).

       --dump-dfa-tree
	      Debug  option:  output DFA under construction with states	repre-
	      sented as	tag history trees (in .dot format).

       --dump-dfa-raw
	      Debug  option:  output  DFA  under  construction	with  expanded
	      state-sets (in .dot format).

       --dump-interf
	      Debug  option:  output  interference  table produced by liveness
	      analysis of tag variables.

       --dump-nfa
	      Debug option: output NFA (in .dot	format).

       --emit-dot -D
	      Instead of normal	output generate	lexer graph  in	 .dot  format.
	      The  output  can	be  converted  to  an  image  with the help of
	      Graphviz (e.g. something like dot	-Tpng -odfa.png	dfa.dot).

       --dfa-minimization <moore | table>
	      Internal option: DFA minimization	algorithm used	by  re2v.  The
	      moore option is the Moore	algorithm (it is the default). The ta-
	      ble  option  is  the  "table filling" algorithm. Both algorithms
	      should produce the same DFA up to	states relabeling; table fill-
	      ing is simpler and much slower and serves	as a reference	imple-
	      mentation.

       --eager-skip
	      Internal	option:	make the generated lexer advance the input po-
	      sition eagerly --	immediately after reading  the	input  symbol.
	      This changes the default behavior	when the input position	is ad-
	      vanced lazily -- after transition	to the next state.

       --no-lookahead
	      Internal	option,	 deprecated.   It used to enable TDFA(0) algo-
	      rithm. Unlike TDFA(1), TDFA(0) algorithm does not	use one-symbol
	      lookahead. It applies register operations	to the incoming	 tran-
	      sitions  rather  than  the outgoing ones.	Benchmarks showed that
	      TDFA(0) algorithm	is less	efficient than TDFA(1).

       --no-optimize-tags
	      Internal option: suppress	optimization of	tag variables  (useful
	      for debugging).

       --posix-closure <gor1 | gtop>
	      Internal	option:	 specify  shortest-path	algorithm used for the
	      construction of epsilon-closure with POSIX disambiguation	seman-
	      tics: gor1 (the default) stands for  Goldberg-Radzik  algorithm,
	      and gtop stands for "global topological order" algorithm.

       --posix-prectable <complex | naive>
	      Internal	option:	 specify  the  algorithm used to compute POSIX
	      precedence table.	The complex algorithm computes precedence  ta-
	      ble  in one traversal of tag history tree	and has	quadratic com-
	      plexity in the number of TNFA states; it	is  the	 default.  The
	      naive algorithm has worst-case cubic complexity in the number of
	      TNFA  states,  but  it  is  much simpler than complex and	may be
	      slightly faster in non-pathological cases.

       --stadfa
	      Internal option, deprecated.  It used  to	 enable	 staDFA	 algo-
	      rithm,  which  differs from TDFA in that register	operations are
	      placed in	states rather than on transitions.  Benchmarks	showed
	      that staDFA algorithm is less efficient than TDFA.

       --fixed-tags <none | toplevel | all>
	      Internal	option:	 specify  whether  the	fixed-tag optimization
	      should be	applied	to all tags (all), none	 of  them  (none),  or
	      only  those in toplevel concatenation (toplevel).	The default is
	      all.  "Fixed" tags are those that	are  located  within  a	 fixed
	      distance	to  some other tag (called "base"). In such cases only
	      the base tag needs to be tracked,	and the	value of the fixed tag
	      can be computed as the value of the base tag plus	a static  off-
	      set.  For	 tags  that  are under alternative or repetition it is
	      also necessary to	check if the base tag has a no-match value (in
	      that case	fixed tag should also be set to	no-match, disregarding
	      the offset). For tags in top-level concatenation	the  check  is
	      not needed, because they always match.

   Warnings
       Warnings	 can  be invividually enabled, disabled	and turned into	an er-
       ror.

       -W     Turn on all warnings.

       -Werror
	      Turn warnings into errors. Note that this	option	alone  doesn't
	      turn  on	any warnings; it only affects those warnings that have
	      been turned on so	far or will be turned on later.

       -W<warning>
	      Turn on warning.

       -Wno-<warning>
	      Turn off warning.

       -Werror-<warning>
	      Turn on warning and treat	it as an error (this implies  -W<warn-
	      ing>).

       -Wno-error-<warning>
	      Don't  treat  this  particular warning as	an error. This doesn't
	      turn off the warning itself.

       -Wcondition-order
	      Warn if the generated program makes implicit  assumptions	 about
	      condition	 numbering.  One  should use either --header option or
	      conditions block to generate a mapping  of  condition  names  to
	      numbers and then use the autogenerated condition names.

       -Wempty-character-class
	      Warn  if a regular expression contains an	empty character	class.
	      Trying to	match an empty character  class	 makes	no  sense:  it
	      should  always  fail.  However, for backwards compatibility rea-
	      sons re2v	permits	empty character	classes	 and  treats  them  as
	      empty  strings.  Use  the	--empty-class option to	change the de-
	      fault behavior.

       -Wmatch-empty-string
	      Warn if a	rule is	nullable (matches an empty  string).   If  the
	      lexer  runs  in a	loop and the empty match is unintentional, the
	      lexer may	unexpectedly hang in an	infinite loop.

       -Wswapped-range
	      Warn if the lower	bound of a range is  greater  than  its	 upper
	      bound.  The  default  behavior  is  to  silently	swap the range
	      bounds.

       -Wundefined-control-flow
	      Warn if some input strings cause undefined control flow  in  the
	      lexer  (the  faulty  patterns are	reported). This	is a dangerous
	      and common mistake. It can be easily fixed by adding the default
	      rule * which has the lowest priority, matches any	code unit, and
	      always consumes a	single code unit.

       -Wunreachable-rules
	      Warn about rules that are	shadowed by other rules	and will never
	      match.

       -Wuseless-escape
	      Warn if a	symbol is escaped when it shouldn't be.	  By  default,
	      re2v  silently  ignores such escapes, but	this may as well indi-
	      cate a typo or an	error in the escape sequence.

       -Wnondeterministic-tags
	      Warn if a	tag has	n-th degree  of	 nondeterminism,  where	 n  is
	      greater than 1.

       -Wsentinel-in-midrule
	      Warn  if	the sentinel symbol occurs in the middle of a rule ---
	      this may cause reads past	the end	of buffer, crashes  or	memory
	      corruption in the	generated lexer. This warning is only applica-
	      ble  if  the sentinel method of checking for the end of input is
	      used.  It	is set to an error if re2c:sentinel  configuration  is
	      used.

       -Wundefined-syntax-config
	      Warn  if the syntax file specified with --syntax option is miss-
	      ing definitions of some configurations. This helps  to  maintain
	      user-defined syntax files: if a new release adds configurations,
	      old syntax file will raise a warning, and	the user will be noti-
	      fied. If some configurations are unused and do not need a	defin-
	      ition, they should be explicitly set to <undefined>.

   Syntax files
       Support	for different languages	in re2c	is based on the	idea of	syntax
       files.  A syntax	file is	a configuration	file that  defines  syntax  of
       the  target  language --	not the	whole language,	but a small part of it
       that is used by the generated code. Syntax files	make re2c very	flexi-
       ble,  but they should not be used as a replacement for re2c: configura-
       tions: their purpose is to define syntax	of the target language,	not to
       customize one particular	lexer. All supported  languages	 have  default
       syntax files that are part of the distribution (see include/syntax sub-
       directory);  they are also embedded in the re2v binary.	Users may pro-
       vide a custom syntax file that overrides	a few configurations  for  one
       of  supported  languages, or they may choose to redefine	all configura-
       tions (in that case --lang none option should be	used).	 Syntax	 files
       contain configurations of four different	kinds: feature lists, language
       configurations, inplace configurations and code templates.

       Feature lists
	  A  few  list	configurations	define various features	supported by a
	  given	backend, so that re2v may give a clear error if	the user tries
	  to enable an unsupported feature:

	  supported_apis
		 A list	of  supported  APIs  with  possible  elements  simple,
		 record, generic.

	  supported_api_styles
		 A  list  of supported API styles with possible	elements func-
		 tions,	free-form.

	  supported_code_models
		 A list	 of  supported	code  models  with  possible  elements
		 goto-label, loop-switch, recursive-functions.

	  supported_targets
		 A  list  of  supported	codegen	targets	with possible elements
		 code, dot, skeleton.

	  supported_features
		 A  list  of  supported	 features   with   possible   elements
		 nested-ifs,  bitmaps,	computed-gotos,	 case-ranges, monadic,
		 unsafe, tags, captures, captvars.

       Language	configurations
	  A few	boolean	configurations describe	features of  the  target  lan-
	  guage	that affect re2v parser	and code generator:

	  semicolons
		 Non-zero if the language uses semicolons after	statements.

	  backtick_quoted_strings
		 Non-zero if the language has backtick-quoted strings.

	  single_quoted_strings
		 Non-zero if the language has single-quoted strings.

	  indentation_sensitive
		 Non-zero if the language is indentation sensitive.

	  wrap_blocks_in_braces
		 Non-zero  if  compound	 statements  must  be wrapped in curly
		 braces.

       Inplace configurations
	  Syntax files define initial values of	all re2c:  configurations,  as
	  they	may differ for different languages. See	configurations section
	  for a	full list of all inplace configurations	and their meaning.

       Code templates
	  Code templates define	syntax of the target language. They are	 writ-
	  ten  in  a simple domain-specific language with the following	formal
	  grammar:

	      code-template ::
		    name '=' code-exprs	';'
		  | CODE_TEMPLATE ';'
		  | '<undefined>' ';'

	      code-exprs ::
		    <EMPTY>
		  | code-exprs code-expr

	      code-expr	::
		    STRING
		  | VARIABLE
		  | optional
		  | list

	      optional ::
		    '('	CONDITIONAL '?'	code-exprs ')'
		  | '('	CONDITIONAL '?'	code-exprs ':' code-exprs ')'

	      list ::
		    '['	VARIABLE ':' code-exprs	']'
		  | '['	VARIABLE '{' NUMBER '}'	':' code-exprs ']'
		  | '['	VARIABLE '{' NUMBER ','	NUMBER '}' ':' code-exprs ']'

	  A code template is a sequence	of  string  literals,  variables,  op-
	  tional  elements and lists, or a reference to	another	code template,
	  or a special value <undefined>. Variables are	placeholders that  are
	  substituted  during  code  generation	phase. List variables are spe-
	  cial:	when expanding list templates, re2v  repeats  expressions  the
	  right	 hand  side of the column a few	times, each time replacing oc-
	  currences of the list	variable with a	value specific to this repeti-
	  tion.	Lists have optional bounds (negative values are	 counted  from
	  the  end,  e.g.  -1 means the	last element). Conditional names start
	  with a dot.  Both conditionals and variables	may  be	 either	 local
	  (specific to the given code template)	or global (allowed in all code
	  templates).  When  re2v  reads syntax	file, it checks	that each code
	  template uses	only the variables and conditionals that  are  allowed
	  in it.

	  For  example,	 the following code template defines if-then-else con-
	  struct for a C-like language:

	      code:if_then_else	=
		  [branch{0}: topindent	"if " cond " {"	nl
		      indent [stmt: stmt] dedent]
		  [branch{1:-1}: topindent "} else" (.cond ? " if " cond) " {" nl
		      indent [stmt: stmt] dedent]
		  topindent "}"	nl;

	  Here branch is a list	 variable:  branch{0}  expands	to  the	 first
	  branch  (which  is  special, as there	is no else part), branch{1:-1}
	  expands to all remaining branches (if	any).  stmt  is	 also  a  list
	  variable:  [stmt:  stmt]  is a nested	list that expands to a list of
	  statements in	the body of the	current	branch.	topindent, indent, de-
	  dent and nl are global variables, and	.cond is a  local  conditional
	  (their meaning is described below). This code	template could produce
	  the following	code:

	      if x {
		  // do	something
	      }	else if	y {
		  // do	something else
	      }	else {
		  // don't do anything
	      }

	  Here's a list	of all code templates supported	by re2v	with their lo-
	  cal  variables  and  conditionals. Note that a particular definition
	  may, but does	not have to use	local variables	and conditionals.  Any
	  unused code templates	should be set to <undefined>.

	  code:var_local
		 Declaration or	definition  of	a  local  variable.  Supported
		 variables:  type  (the	type of	the variable), name (its name)
		 and init (initial value, if any). Conditionals:  .init	 (true
		 if there is an	initializer).

	  code:var_global
		 Same as code:var_local, except	that it's used in top-level.

	  code:const_local
		 Definition  of	 a  local  constant. Supported variables: type
		 (the type of the constant), name (its name) and init (initial
		 value).

	  code:const_global
		 Same as code:const_local, except that it's used in top-level.

	  code:array_local
		 Definition of a local	array  (table).	 Supported  variables:
		 type  (the  type  of array elements), name (array name), size
		 (its size), row (a list variable that does not	itself produce
		 any code, but expands list expression as many times as	 there
		 are rows in the table)	and elem (a list variable that expands
		 to  all table elements	in the current row -- it's meant to be
		 nested	in the row list).

	  code:array_global
		 Same as code:array_local, except that it's used in top-level.

	  code:array_elem
		 Reference to an element of an array (table). Supported	 vari-
		 ables:	 array (the name of the	array) and index (index	of the
		 element).

	  code:enum
		 Definition of an enumeration (it may be defined using a  spe-
		 cial  language	construct for enumerations, or simply as a few
		 standalone  constants).    Supported	variables   are	  type
		 (user-defined	enumeration  type  or  type of the constants),
		 elem (list variable that expands to the name of each  member)
		 and  init  (initializer for each member). Conditionals: .init
		 (true if there	is an initializer).

	  code:enum_elem
		 Enumeration element (a	member of a  user-defined  enumeration
		 type  or  a name of a constant, depending on how code:enum is
		 defined).  Supported variables	are name (the name of the ele-
		 ment) and type	(its type).

	  code:assign
		 Assignment statement. Supported variables are lhs (left  hand
		 side) and rhs (right hand side).

	  code:type_int
		 Signed	integer	type.

	  code:type_uint
		 Unsigned integer type.

	  code:type_yybm
		 Type of elements in the yybm table.

	  code:type_yytarget
		 Type of elements in the yytarget table.

	  code:cmp_eq
		 Operator "equals".

	  code:cmp_ne
		 Operator "not equals".

	  code:cmp_lt
		 Operator "less	than".

	  code:cmp_gt
		 Operator "greater than"

	  code:cmp_le
		 Operator "less	or equal"

	  code:cmp_ge
		 Operator "greater or equal"

	  code:if_then_else
		 If-then-else  statement  with one or more branches. Supported
		 variables: branch (a list variable that does not itself  pro-
		 duce  any  code, but expands list expression as many times as
		 there are branches), cond (condition of the  current  branch)
		 and  stmt  (a list variable that expands to all statements in
		 the current branch). Conditionals: .cond (true	if the current
		 branch	has a condition), .many	(true if there's more than one
		 branch).

	  code:if_then_else_oneline
		 A specialization of code:if_then_else for the case  when  all
		 branches  have	 one-line  statements. If this is <undefined>,
		 code:if_then_else is used instead.

	  code:switch
		 A switch statement with one or	more  cases.  Supported	 vari-
		 ables:	 expr  (the  switched-on  expression) and case (a list
		 variable that expands to all  cases-groups  with  their  code
		 blocks).

	  code:switch_cases
		 A  group  of  switch  cases that maps to a single code	block.
		 Supported variables are case (a list variable that expands to
		 all cases in this group) and stmt (a list variable  that  ex-
		 pands to all statements in the	code block.

	  code:switch_cases_oneline
		 A  specialization  of code:switch_cases for the case when the
		 code block consists of	a single one-line statement.  If  this
		 is <undefined>, code:switch_cases is used instead.

	  code:switch_case_range
		 A  single switch case that covers a range of values (possibly
		 consisting of a single	value).	 Supported  variable:  val  (a
		 list  variable	that expands to	all values in the range). Sup-
		 ported	conditionals: .many (true if  there's  more  than  one
		 value	in  the	 range)	 and .char_literals (true if this is a
		 switch	on character literals -- some languages	 provide  spe-
		 cial syntax for this case).

	  code:switch_case_default
		 Default switch	case.

	  code:loop
		 A  loop  that	runs forever (unless interrupted from the loop
		 body).	 Supported variables: label (loop label), stmt (a list
		 variable that expands to all statements in the	loop body).

	  code:continue
		 Continue statement. Supported variables:  label  (label  from
		 which to continue execution).

	  code:goto
		 Goto statement. Supported variables: label (label of the jump
		 target).

	  code:fndecl
		 Function  declaration.	 Supported  variables:	name (function
		 name),	type (return type), arg	(a list	variable that does not
		 itself	produce	code, but  expands  list  expression  as  many
		 times	as there are function arguments), argname (name	of the
		 current argument), argtype (type of  the  current  argument).
		 Conditional: .type (true if this is a non-void	function).

	  code:fndef
		 Like  code:fndecl,  but  used for function definitions, so it
		 has one additional list variable stmt	that  expands  to  all
		 statements in the function body.

	  code:fncall
		 Function  call	statement. Supported variables:	name (function
		 name),	retval (l-value	where the return value is  stored,  if
		 any)  and  arg	 (a list variable that expands to all function
		 arguments).  Conditionals: .args (true	if  the	 function  has
		 arguments)  and  .retval  (true  if  return value needs to be
		 saved).

	  code:tailcall
		 Tail call  statement.	Supported  variables:  name  (function
		 name),	 and arg (a list variable that expands to all function
		 arguments).  Conditionals: .args (true	if  the	 function  has
		 arguments) and	.retval	(true if this is a non-void function).

	  code:recursive_functions
		 Program body with --recursive-functions code model. Supported
		 variables:  fn	 (a list variable that does not	itself produce
		 any code, but expands list expression as many times as	 there
		 are  functions), fndecl (declaration of the current function)
		 and fndef (definition of the current function).

	  code:fingerprint
		 The fingerprint at the	top of the generated output file. Sup-
		 ported	variables: ver (re2v version that was used to generate
		 this) and date	(generation date).

	  code:line_info
		 The format of line directives (if this	is set to <undefined>,
		 no directives are generated). Supported variables: line (line
		 number) and file (filename).

	  code:abort
		 A statement that aborts program execution.

	  code:yydebug
		 YYDEBUG statement, possibly specialized for  different	 APIs.
		 Supported variables: YYDEBUG, yyrecord, yych (map to the cor-
		 responding re2c: configurations), state (DFA state number).

	  code:yypeek
		 YYPEEK	 statement,  possibly  specialized for different APIs.
		 Supported  variables:	YYPEEK,	 YYCTYPE,  YYINPUT,  YYCURSOR,
		 yyrecord,  yych  (map	to  the	corresponding re2c: configura-
		 tions). Conditionals: .cast (true if re2c:yych:conversion  is
		 set to	non-zero).

	  code:yyskip
		 YYSKIP	 statement,  possibly  specialized for different APIs.
		 Supported variables: YYSKIP, YYCURSOR,	yyrecord (map  to  the
		 corresponding re2c: configurations).

	  code:yybackup
		 YYBACKUP  statement, possibly specialized for different APIs.
		 Supported variables: YYBACKUP,	YYCURSOR,  YYMARKER,  yyrecord
		 (map to the corresponding re2c: configurations).

	  code:yybackupctx
		 YYBACKUPCTX  statement,  possibly  specialized	 for different
		 APIs.	Supported  variables:  YYBACKUPCTX,  YYCURSOR,	YYCTX-
		 MARKER,  yyrecord  (map to the	corresponding re2c: configura-
		 tions).

	  code:yyskip_yypeek
		 Combined code:yyskip and code:yypeek statement	 (defaults  to
		 code:yyskip followed by code:yypeek).

	  code:yypeek_yyskip
		 Combined  code:yypeek	and code:yyskip	statement (defaults to
		 code:yypeek followed by code:yyskip).

	  code:yyskip_yybackup
		 Combined code:yyskip and code:yybackup	statement (defaults to
		 code:yyskip followed by code:yybackup).

	  code:yybackup_yyskip
		 Combined code:yybackup	and code:yyskip	statement (defaults to
		 code:yybackup followed	by code:yyskip).

	  code:yybackup_yypeek
		 Combined code:yybackup	and code:yypeek	statement (defaults to
		 code:yybackup followed	by code:yypeek).

	  code:yyskip_yybackup_yypeek
		 Combined code:yyskip, code:yybackup and code:yypeek statement
		 (defaults to``code:yyskip`` followed  by  code:yybackup  fol-
		 lowed by code:yypeek).

	  code:yybackup_yypeek_yyskip
		 Combined code:yybackup, code:yypeek and code:yyskip statement
		 (defaults  to``code:yybackup``	 followed  by code:yypeek fol-
		 lowed by code:yyskip).

	  code:yyrestore
		 YYRESTORE statement, possibly specialized for different APIs.
		 Supported variables: YYRESTORE, YYCURSOR, YYMARKER,  yyrecord
		 (map to the corresponding re2c: configurations).

	  code:yyrestorectx
		 YYRESTORECTX  statement,  possibly  specialized for different
		 APIs.	Supported variables:  YYRESTORECTX,  YYCURSOR,	YYCTX-
		 MARKER,  yyrecord  (map to the	corresponding re2c: configura-
		 tions).

	  code:yyrestoretag
		 YYRESTORETAG statement, possibly  specialized	for  different
		 APIs.	 Supported variables: YYRESTORETAG, YYCURSOR, yyrecord
		 (map to the corresponding  re2c:  configurations),  tag  (the
		 name of tag variable used to restore position).

	  code:yyshift
		 YYSHIFT  statement,  possibly specialized for different APIs.
		 Supported variables: YYSHIFT, YYCURSOR, yyrecord (map to  the
		 corresponding	re2c:  configurations),	 offset	(the number of
		 code units to shift the current position).

	  code:yyshiftstag
		 YYSHIFTSTAG statement,	 possibly  specialized	for  different
		 APIs.	 Supported  variables: YYSHIFTSTAG, yyrecord, negative
		 (map to the corresponding  re2c:  configurations),  tag  (tag
		 variable  which  needs	 to be shifted), offset	(the number of
		 code units to shift). Conditionals: .nested (true if this  is
		 a  nested  tag	 --  in	 this  case  its  value	 may  equal to
		 re2c:tags:negative, which should not be shifted).

	  code:yyshiftmtag
		 YYSHIFTMTAG statement,	 possibly  specialized	for  different
		 APIs.	 Supported  variables: YYSHIFTMTAG (maps to the	corre-
		 sponding re2c:	configuration),	tag (tag variable which	 needs
		 to be shifted), offset	(the number of code units to shift).

	  code:yystagp
		 YYSTAGP  statement,  possibly specialized for different APIs.
		 Supported variables: YYSTAGP, YYCURSOR, yyrecord (map to  the
		 corresponding	re2c:  configurations),	tag (tag variable that
		 should	be updated).

	  code:yymtagp
		 YYMTAGP statement, possibly specialized for  different	 APIs.
		 Supported variables: YYMTAGP (maps to the corresponding re2c:
		 configuration), tag (tag variable that	should be updated).

	  code:yystagn
		 YYSTAGN  statement,  possibly specialized for different APIs.
		 Supported variables: YYSTAGN, negative, yyrecord (map to  the
		 corresponding	re2c:  configurations),	tag (tag variable that
		 should	be updated).

	  code:yymtagn
		 YYMTAGN statement, possibly specialized for  different	 APIs.
		 Supported variables: YYMTAGN (maps to the corresponding re2c:
		 configuration), tag (tag variable that	should be updated).

	  code:yycopystag
		 YYCOPYSTAG  statement,	 possibly  specialized	for  different
		 APIs.	Supported variables: YYCOPYSTAG, yyrecord (map to  the
		 corresponding re2c: configurations), lhs, rhs (left and right
		 hand side tag variables of the	copy operation).

	  code:yycopymtag
		 YYCOPYMTAG  statement,	 possibly  specialized	for  different
		 APIs.	Supported variables: YYCOPYMTAG, yyrecord (map to  the
		 corresponding re2c: configurations), lhs, rhs (left and right
		 hand side tag variables of the	copy operation).

	  code:yygetaccept
		 YYGETACCEPT  statement,  possibly  specialized	 for different
		 APIs.	Supported variables: YYGETACCEPT, yyrecord (map	to the
		 corresponding re2c: configurations), var (maps	to  re2c:yyac-
		 cept configuration).

	  code:yysetaccept
		 YYSETACCEPT  statement,  possibly  specialized	 for different
		 APIs.	Supported variables: YYSETACCEPT, yyrecord (map	to the
		 corresponding re2c: configurations), var (maps	to  re2c:yyac-
		 cept  configuration)  and  val	(numeric value of the accepted
		 rule).

	  code:yygetcond
		 YYGETCOND statement, possibly specialized for different APIs.
		 Supported variables: YYGETCOND, yyrecord (map to  the	corre-
		 sponding re2c:	configurations), var (maps to re2c:yycond con-
		 figuration).

	  code:yysetcond
		 YYSETCOND statement, possibly specialized for different APIs.
		 Supported  variables:	YYSETCOND, yyrecord (map to the	corre-
		 sponding re2c:	configurations), var (maps to re2c:yycond con-
		 figuration) and val (numeric condition	identifier).

	  code:yygetstate
		 YYGETSTATE  statement,	 possibly  specialized	for  different
		 APIs.	 Supported variables: YYGETSTATE, yyrecord (map	to the
		 corresponding re2c: configurations), var (maps	 to  re2c:yys-
		 tate configuration).

	  code:yysetstate
		 YYSETSTATE  statement,	 possibly  specialized	for  different
		 APIs.	Supported variables: YYSETSTATE, yyrecord (map to  the
		 corresponding	re2c:  configurations),	var (maps to re2c:yys-
		 tate configuration) and val (state number).

	  code:yylessthan
		 YYLESSTHAN  statement,	 possibly  specialized	for  different
		 APIs.	 Supported  variables:	YYLESSTHAN, YYCURSOR, YYLIMIT,
		 yyrecord (map to  the	corresponding  re2c:  configurations),
		 need  (the  number  of	 code  units to	check against).	Condi-
		 tional: .many (true if	the need is more than one).

	  code:yybm_filter
		 Condition that	is used	to filter out yych values that are not
		 covered by the	yybm table (used with --bitmaps	option).  Sup-
		 ported	variable: yych (maps to	re2c:yych configuration).

	  code:yybm_match
		 The format of yybm table check	(generated with	--bitmaps  op-
		 tion).	 Supported  variables:	yybm,  yych (map to the	corre-
		 sponding re2c:	configurations), offset	(offset	 in  the  yybm
		 table that needs to be	added to yych) and mask	(bit mask that
		 should	 be applied to the table entry to retrieve the boolean
		 value that needs to be	checked)

	  Here's a list	of all global variables	that  are  allowed  in	syntax
	  files:

	  nl	 A newline.

	  indent A variable that does not produce any code, but	has a side-ef-
		 fect of increasing indentation	level.

	  dedent A variable that does not produce any code, but	has a side-ef-
		 fect of decreasing indentation	level.

	  topindent
		 Indentation  string  for  the	current	statement. Indentation
		 level is tracked and automatically updated by the code	gener-
		 ator.

	  Here's a list	of all global conditionals that	are allowed in	syntax
	  files:

	  .api.simple
		 True  if  simple API is used (--api simple or re2c:api	= sim-
		 ple).

	  .api.generic
		 True if generic API is	used  (--api  generic  or  re2c:api  =
		 generic).

	  .api.record
		 True  if  record  API	is  used  (--api  record or re2c:api =
		 record).

	  .api_style.functions
		 True if function-like API style  is  used  (re2c:api-style  =
		 functions).

	  .api_style.freeform
		 True  if  free-form  API  style  is  used  (re2c:api-style  =
		 free-form).

	  .case_ranges
		 True if case ranges  feature  is  enabled  (--case-ranges  or
		 re2c:case-ranges = 1).

	  .code_model.goto_label
		 True  if   code model based on	goto/label is used (--goto-la-
		 bel).

	  .code_model.loop_switch
		 True  if  code	  model	  based	  on   loop/switch   is	  used
		 (--loop-switch).

	  .code_model.recursive_functions
		 True  if  code	 model	based  on  recursive functions is used
		 (--recursive-function).

	  .date	 True if the generated fingerprint should  contain  generation
		 date.

	  .loop_label
		 True  if  re2v	 generated  loops  must	have a label (re2c:la-
		 bel:yyloop is set to a	nonempty string).

	  .monadic
		 True if the generated code should be monadic (re2c:monadic  =
		 1).  This is only relevant for	pure functional	languages.

	  .start_conditions
		 True if start conditions are enabled (--start-conditions).

	  .storable_state
		 True if storable state	is enabled (--storable-state).

	  .unsafe
		 True  if re2v should use "unsafe" blocks in order to generate
		 faster	code (--unsafe,	re2c:unsafe = 1). This is  only	 rele-
		 vant for languages that have "unsafe" feature.

	  .version
		 True  if  the	generated fingerprint should contain re2v ver-
		 sion.

HANDLING THE END OF INPUT
       One of the main problems	for the	lexer is to know when to stop.	 There
       are a few terminating conditions:

        the  lexer may	match some rule	(including default rule	*) and come to
	 a final state

        the lexer may fail to match any rule and come to a default state

        the lexer may reach the end of	input

       The first two conditions	terminate the lexer in	a  "natural"  way:  it
       comes  to  a state with no outgoing transitions,	and the	matching auto-
       matically stops.	The third condition, end of input,  is	different:  it
       may  happen  in	any  state, and	the lexer should be able to handle it.
       Checking	for the	end of input interrupts	the normal lexer workflow  and
       adds  conditional  branches  to	the generated program, therefore it is
       necessary to minimize the number	of such	checks.	re2v  supports	a  few
       different  methods  for handling	the end	of input. Which	one to use de-
       pends on	the complexity of regular expressions, the need	for buffering,
       performance considerations and other factors. Here is a list  of	 meth-
       ods:

        Sentinel.   This  method  eliminates  the  need  for the end of input
	 checks	altogether. It is simple and efficient,	 but  limited  to  the
	 case  when there is a natural "sentinel" character that can never oc-
	 cur in	valid input. This character may	still occur in invalid	input,
	 but  it should	not be allowed by the regular expressions, except per-
	 haps as the last character of a rule. The sentinel is appended	at the
	 end of	input and serves as a stop signal: when	the lexer  reads  this
	 character,  it	 is either a syntax error or the end of	input. In both
	 cases the lexer should	stop. This method is used if  YYFILL  is  dis-
	 abled with re2c:yyfill:enable = 0; and	re2c:eof has the default value
	 -1.

        Sentinel  with	 bounds	checks.	 This method is	generic: it allows one
	 to handle any input without restrictions on the regular  expressions.
	 The idea is to	reduce the number of end of input checks by performing
	 them  only  on	 certain characters. Similar to	the "sentinel" method,
	 one of	the characters is chosen as a "sentinel" and appended  at  the
	 end  of input.	However, there is no restriction on where the sentinel
	 may occur (in fact, any character can	be  chosen  for	 a  sentinel).
	 When  the  lexer  reads  this	character,  it additionally performs a
	 bounds	check.	If the current position	is within  bounds,  the	 lexer
	 resumes  matching  and	 handles  the sentinel as a regular character.
	 Otherwise it invokes YYFILL (unless it	is disabled). If more input is
	 supplied, the lexer will rematch the last character and  continue  as
	 if  the  sentinel  wasn't there. Otherwise it must be the real	end of
	 input,	and the	lexer stops. This method is  used  when	 re2c:eof  has
	 non-negative value (it	should be set to the numeric value of the sen-
	 tinel). YYFILL	is optional.

        Bounds	 checks	 with  padding.	 This method is	generic, and it	may be
	 faster	than the "sentinel with	bounds checks" method, but it is  also
	 more  complex.	The idea is to partition DFA states into strongly con-
	 nected	components (SCCs) and generate a  single  check	 per  SCC  for
	 enough	 characters to cover the longest non-looping path in this SCC.
	 This reduces the number of checks, but	there is a problem with	 short
	 lexemes  at the end of	input, as the check requires enough characters
	 to cover the longest lexeme. This can be fixed	by padding  the	 input
	 with a	few fake characters that do not	form a valid lexeme suffix (so
	 that  the  lexer  cannot match	them). The length of padding should be
	 YYMAXFILL, generated with a max block.	If there is not	enough	input,
	 the  lexer  invokes  YYFILL which should supply at least the required
	 number	of characters or not return.  This method is used if YYFILL is
	 enabled and re2c:eof is -1 (this is the default configuration).

        Custom	checks.	 Generic API allows one	to override  basic  operations
	 like  reading	a  character,  which  makes it possible	to include the
	 end-of-input checks as	part of	them.  This  approach  is  error-prone
	 and  should  be  used	with  caution.	To use a custom	method,	enable
	 generic API with --api	custom or re2c:api = custom; and  disable  de-
	 fault bounds checks with re2c:yyfill:enable = 0; or re2c:yyfill:check
	 = 0;.

       The following subsections contain an example of each method.

   Sentinel
       This  example uses a sentinel character to handle the end of input. The
       program counts space-separated words in a null-terminated  string.  The
       sentinel	is null: it is the last	character of each input	string,	and it
       is  not	allowed	in the middle of a lexeme by any of the	rules (in par-
       ticular,	it is not included in character	ranges where  it  is  easy  to
       overlook).  If  a null occurs in	the middle of a	string,	it is a	syntax
       error and the lexer will	match default rule *, but it won't  read  past
       the  end	 of  input  or	crash  (use  -Wsentinel-in-midrule warning and
       re2c:sentinel configuration to  verify  this).  Configuration  re2c:yy-
       fill:enable  = 0; suppresses the	generation of bounds checks and	YYFILL
       invocations.

	  // re2v $INPUT -o $OUTPUT

	  // Expect a null-terminated string.
	  fn lex(yyinput string) int {
	      mut yycursor := 0
	      mut count	:= 0

	  loop:	/*!re2c
	      re2c:yyfill:enable = 0;

	      *	     { return -1 }
	      [\x00] { return count }
	      [a-z]+ { count +=	1; unsafe { goto loop }	}
	      [	]+   {	unsafe { goto loop } }
	      */
	  }

	  fn main() {
	      assert lex("\0") == 0
	      assert lex("one two three\0") == 3
	      assert lex("f0ur\0") == -1
	  }

   Sentinel with bounds	checks
       This example uses sentinel with bounds checks to	handle the end of  in-
       put  (this  method  was	added  in  version  1.2).  The	program	counts
       space-separated single-quoted strings. The sentinel character is	 null,
       which is	specified with re2c:eof	= 0; configuration. As in the sentinel
       method,	null is	the last character of each input string, but it	is al-
       lowed in	the middle of a	rule (for example, 'aaa\0aa'\0 is valid	input,
       but 'aaa\0 is a syntax error).  Bounds checks  are  generated  in  each
       state  that  matches  an	 input	character,  but	they are scoped	to the
       branch that handles null. Bounds	checks are of the form YYLIMIT <=  YY-
       CURSOR  or  YYLESSTHAN(1)  with	generic	API. If	the check condition is
       true, lexer has reached the end of input	and  should  stop  (YYFILL  is
       disabled	 with  re2c:yyfill:enable  =  0;  as  the  input fits into one
       buffer, see the YYFILL with sentinel section for	an example  that  uses
       YYFILL).	 Reaching  the	end of input opens three possibilities:	if the
       lexer is	in the initial state it	will match the	end-of-input  rule  $,
       otherwise  it  may fallback to a	previously matched rule	(including de-
       fault   rule   *)   or	 go    to    a	  default    state,    causing
       -Wundefined-control-flow.

	  // re2v $INPUT -o $OUTPUT

	  // Expects a null-terminated string.
	  fn lex(yyinput string) int {
	      mut yycursor, mut	yymarker := 0, 0
	      yylimit := yyinput.len - 1 // yylimit points at the terminating null
	      mut count	:= 0

	  loop:	/*!re2c
	      re2c:eof = 0;
	      re2c:yyfill:enable = 0;

	      str = [']	([^'\\]	| [\\][^])* ['];

	      *	   { return -1 }
	      $	   { return count }
	      str  { count += 1; unsafe	{ goto loop } }
	      [	]+ { unsafe { goto loop	} }

	      */
	  }

	  fn main() {
	      assert lex("\0") == 0
	      assert lex("'qu\0tes' 'are' 'fine: \\'' \0") == 3
	      assert lex("'unterminated\\'\0") == -1
	  }

   Bounds checks with padding
       This example uses bounds	checks with padding to handle the end of input
       (this method is enabled by default). The	program	counts space-separated
       single-quoted  strings. There is	a padding of YYMAXFILL null characters
       appended	at the end of input, where YYMAXFILL  value  is	 autogenerated
       with  a	max block. It is not necessary to use null for padding --- any
       characters can be used as long as they do not form a valid lexeme  suf-
       fix  (in	this example padding should not	contain	single quotes, as they
       may be mistaken for a suffix of a single-quoted	string).  There	 is  a
       "stop"  rule that matches the first padding character (null) and	termi-
       nates the lexer (note that it checks if null is	at  the	 beginning  of
       padding,	 otherwise  it is a syntax error). Bounds checks are generated
       only in some states that	are determined by the strongly connected  com-
       ponents	of  the	 underlying automaton. Checks have the form (YYLIMIT -
       YYCURSOR) < n or	YYLESSTHAN(n) with generic API,	where n	is the minimum
       number of characters that are needed for	the lexer to proceed (it  also
       means  that  the	next bounds check will occur in	at most	n characters).
       If the check condition is true, the lexer has reached the end of	 input
       and  will  invoke  YYFILL(n) that should	either supply at least n input
       characters or not return. In this example YYFILL	always fails and  ter-
       minates	the  lexer with	an error (which	is fine	because	the input fits
       into one	buffer). See the YYFILL	with padding section  for  an  example
       that refills the	input buffer with YYFILL.

	  // re2v $INPUT -o $OUTPUT

	  /*!max:re2c*/

	  // Expects yymaxfill-padded string.
	  fn lex(str string) int {
	      // Pad string with yymaxfill zeroes at the end.
	      mut yyinput := []u8{len: str.len + yymaxfill}
	      copy(mut &yyinput, str.bytes())

	      mut yycursor := 0
	      yylimit := yyinput.len
	      mut count	:= 0

	  loop:	/*!re2c
	      re2c:YYFILL = "return -1";

	      str = [']	([^'\\]	| [\\][^])* ['];

	      [\x00] {
		  // Check that	it is the sentinel, not	some unexpected	null.
		  if yycursor -	1 == str.len { return count } else { return -1 }
	      }
	      str  { count += 1; unsafe	{ goto loop } }
	      [	]+ { unsafe { goto loop	} }
	      *	   { return -1 }

	      */
	  }

	  fn main() {
	      assert lex("") ==	0
	      assert lex("'qu\0tes' 'are' 'fine: \\'' ") == 3
	      assert lex("'unterminated\\'") ==	-1
	      assert lex("'unexpected \00 null\\'") == -1
	  }

   Custom checks
       This  example  uses  a  custom  end-of-input  handling  method based on
       generic API.  The program counts	space-separated	single-quoted strings.
       It is the same as the sentinel example, except that the	input  is  not
       null-terminated.	To cover up for	the absence of a sentinel character at
       the  end	of input, YYPEEK is redefined to perform a bounds check	before
       it reads	the next input character.  This	is inefficient because	checks
       are  done  very often. If the check condition fails, YYPEEK returns the
       real character, otherwise it returns a fake sentinel character.

	  // re2v $INPUT -o $OUTPUT

	  // Returns "fake" terminating	null if	cursor has reached limit.
	  fn peek(str string, cur int) u8 {
	      return if	cur >= str.len { u8(0) } /* fake null */ else {	return str[cur]	}
	  }

	  // Expects a string without terminating null.
	  fn lex(str string) int {
	      mut cur := 0
	      mut count	:= 0

	  loop:	/*!re2c
	      re2c:api = generic;
	      re2c:yyfill:enable = 0;
	      re2c:YYPEEK = "peek(str, cur)";
	      re2c:YYSKIP = "cur += 1";

	      *	     { return -1 }
	      [\x00] { return count }
	      [a-z]+ { count +=	1; unsafe { goto loop }	}
	      [	]+   { unsafe {	goto loop } }

	      */
	  }

	  fn main() {
	      assert lex("") ==	0
	      assert lex("one two three") == 3
	      assert lex("f0ur") == -1
	  }

BUFFER REFILLING
       The need	for buffering arises when the input cannot be mapped in	memory
       all at once: either it is too large, or it comes	in a streaming fashion
       (like reading from a socket). The usual technique in such cases	is  to
       allocate	 a  fixed-sized	memory buffer and process input	in chunks that
       fit into	the buffer. When the current chunk is processed, it  is	 moved
       out  and	new data is moved in. In practice it is	somewhat more complex,
       because lexer state consists not	of a single input position, but	a  set
       of interrelated positions:

        cursor:  the  next  input character to	be read	(YYCURSOR in C pointer
	 API or	YYSKIP/YYPEEK in generic API)

        limit:	the position after the last available input character (YYLIMIT
	 in C pointer API, implicitly handled by YYLESSTHAN in generic API)

        marker: the position of the most recent match,	if  any	 (YYMARKER  in
	 default API or	YYBACKUP/YYRESTORE in generic API)

        token:	 the  start of the current lexeme (implicit in re2v API, as it
	 is not	needed for the normal lexer operation and can be  defined  and
	 updated by the	user)

        context  marker: the position of the trailing context (YYCTXMARKER in
	 C pointer API or YYBACKUPCTX/YYRESTORECTX in generic API)

        tag variables:	submatch  positions  (defined  with  stags  and	 mtags
	 blocks	and generic API	primitives YYSTAGP/YYSTAGN/YYMTAGP/YYMTAGN)

       Not all these are used in every case, but if used, they must be updated
       by  YYFILL.  All	 active	positions are contained	in the segment between
       token and cursor, therefore everything between buffer start  and	 token
       can  be	discarded,  the	 segment  from token and up to limit should be
       moved to	the beginning of buffer, and the free  space  at  the  end  of
       buffer  should be filled	with new data.	In order to avoid frequent YY-
       FILL calls it is	best to	fill in	as many	input characters  as  possible
       (even  though  fewer characters might suffice to	resume the lexer). The
       details of YYFILL implementation	are slightly  different	 depending  on
       which  EOF  handling  method  is	used: the case of EOF rule is somewhat
       simpler than the	case of	bounds-checking	with padding. Also  note  that
       if  -f  --storable-state	 option	is used, YYFILL	has slightly different
       semantics (described in the section about storable state).

   YYFILL with sentinel
       If EOF rule is used, YYFILL is a	function-like primitive	 that  accepts
       no  arguments and returns a value which is checked against zero.	YYFILL
       invocation is triggered by condition YYLIMIT <= YYCURSOR	in  C  pointer
       API and YYLESSTHAN() in generic API. A non-zero return value means that
       YYFILL  has  failed.  A successful YYFILL call must supply at least one
       character and adjust input positions accordingly. Limit must always  be
       set  to	one after the last input position in buffer, and the character
       at the limit position must be the sentinel symbol specified by re2c:eof
       configuration. The pictures below show the relative locations of	 input
       positions  in  buffer  before and after YYFILL call (sentinel symbol is
       marked with #, and the second picture shows the case when there is  not
       enough input to fill the	whole buffer).

			 <-- shift -->
		       >-A------------B---------C-------------D#-----------E->
		       buffer	    token    marker	    limit,
							    cursor
	  >-A------------B---------C-------------D------------E#->
		       buffer,	marker	      cursor	    limit
		       token

			 <-- shift -->
		       >-A------------B---------C-------------D#--E (EOF)
		       buffer	    token    marker	    limit,
							    cursor
	  >-A------------B---------C-------------D---E#........
		       buffer,	marker	     cursor limit
		       token

       Here  is	 an  example  of  a program that reads input file input.txt in
       chunks of 4096 bytes and	uses EOF rule.

	  // re2v $INPUT -o $OUTPUT

	  import os
	  import strings

	  const	bufsize	= 4096

	  struct State {
	      file     os.File
	  mut:
	      yyinput  []u8
	      yycursor int
	      yymarker int
	      yylimit  int
	      token    int
	      eof      bool
	  }

	  fn fill(mut st &State) int {
	      if st.eof	{ return -1 } // unexpected EOF

	      // Error:	lexeme too long. In real life can reallocate a larger buffer.
	      if st.token < 1 {	return -2 }

	      // Shift buffer contents (discard	everything up to the current token).
	      copy(mut &st.yyinput, st.yyinput[st.token..st.yylimit])
	      st.yycursor -= st.token
	      st.yymarker -= st.token
	      st.yylimit -= st.token
	      st.token = 0

	      // Fill free space at the	end of buffer with new data from file.
	      pos := st.file.tell() or { 0 }
	      if n := st.file.read_bytes_into(u64(pos),	mut st.yyinput[st.yylimit..bufsize]) {
		  st.yylimit +=	n
	      }
	      st.yyinput[st.yylimit] = 0 // append sentinel symbol

	      // If read less than expected, this is the end of	input.
	      st.eof = st.yylimit < bufsize

	      return 0
	  }

	  fn lex(mut yyrecord &State) int {
	      mut count	:= 0
	  loop:
	      yyrecord.token = yyrecord.yycursor
	      /*!re2c
		  re2c:api = record;
		  re2c:eof = 0;
		  re2c:YYFILL =	"fill(mut yyrecord) == 0";

		  str =	['] ([^'\\] | [\\][^])*	['];

		  *    { return	-1 }
		  $    { return	count }
		  str  { count += 1; unsafe { goto loop	} }
		  [ ]+ { unsafe	{ goto loop } }
	      */
	  }

	  fn main() {
	      fname := "input"
	      content := "'qu\0tes' 'are' 'fine: \\'' ";

	      // Prepare input file: a few times the size of the buffer, containing
	      // strings with zeroes and escaped quotes.
	      mut fw :=	os.create(fname)!
	      fw.write_string(strings.repeat_string(content, bufsize))!
	      fw.close()
	      count := 3 * bufsize // number of	quoted strings written to file

	      // Prepare lexer state: all offsets are at the end of buffer.
	      mut fr :=	os.open(fname)!
	      mut st :=	&State{
		  file:	    fr,
		  // Sentinel at `yylimit` offset is set to zero, which	triggers YYFILL.
		  yyinput:  []u8{len: bufsize +	1},
		  yycursor: bufsize,
		  yymarker: bufsize,
		  yylimit:  bufsize,
		  token:    bufsize,
		  eof:	    false,
	      }

	      // Run the lexer.
	      n	:= lex(mut st)
	      if n != count { panic("expected $count, got $n") }

	      // Cleanup: remove input file.
	      fr.close()
	      os.rm(fname)!
	  }

   YYFILL with padding
       In the default case (when EOF rule is  not  used)  YYFILL  is  a	 func-
       tion-like  primitive that accepts a single argument and does not	return
       any value.  YYFILL invocation is	triggered by condition (YYLIMIT	-  YY-
       CURSOR)	< n in C pointer API and YYLESSTHAN(n) in generic API. The ar-
       gument passed to	YYFILL is the minimal number of	characters  that  must
       be  supplied. If	it fails to do so, YYFILL must not return to the lexer
       (for that reason	it is best implemented as a macro  that	 returns  from
       the calling function on failure).  In case of a successful YYFILL invo-
       cation  the limit position must be set either to	one after the last in-
       put position in buffer, or to the end of	YYMAXFILL padding (in case YY-
       FILL has	successfully read at least n characters,  but  not  enough  to
       fill the	entire buffer).	The pictures below show	the relative locations
       of input	positions in buffer before and after YYFILL invocation (YYMAX-
       FILL padding on the second picture is marked with # symbols).

			 <-- shift -->		       <-- need	-->
		       >-A------------B---------C-----D-------E---F--------G->
		       buffer	    token    marker cursor  limit

	  >-A------------B---------C-----D-------E---F--------G->
		       buffer,	marker cursor		    limit
		       token

			 <-- shift -->		       <-- need	-->
		       >-A------------B---------C-----D-------E-F	 (EOF)
		       buffer	    token    marker cursor  limit

	  >-A------------B---------C-----D-------E-F###############
		       buffer,	marker cursor			limit
		       token			    <- YYMAXFILL ->

       Here  is	 an  example  of  a program that reads input file input.txt in
       chunks of 4096 bytes and	uses bounds-checking with padding.

	  // re2v $INPUT -o $OUTPUT

	  import os
	  import strings

	  /*!max:re2c*/
	  const	bufsize	= 4096

	  struct State {
	      file     os.File
	  mut:
	      yyinput  []u8
	      yycursor int
	      yylimit  int
	      token    int
	      eof      bool
	  }

	  fn fill(mut st &State, need int) int {
	      if st.eof	{ return -1 } // unexpected EOF

	      // Error:	lexeme too long. In real life can reallocate a larger buffer.
	      if st.token < need { return -2 }

	      // Shift buffer contents (discard	everything up to the current token).
	      copy(mut &st.yyinput, st.yyinput[st.token..st.yylimit])
	      st.yycursor -= st.token
	      st.yylimit -= st.token
	      st.token = 0

	      // Fill free space at the	end of buffer with new data from file.
	      pos := st.file.tell() or { 0 }
	      if n := st.file.read_bytes_into(u64(pos),	mut st.yyinput[st.yylimit..bufsize]) {
		  st.yylimit +=	n
	      }

	      // If read less than expected, this is the end of	input.
	      if st.yylimit < bufsize {
		  st.eof = true
		  for i	:= 0; i	< yymaxfill; i += 1 { st.yyinput[st.yylimit + i] = 0 }
		  st.yylimit +=	yymaxfill
	      }

	      return 0
	  }

	  fn lex(mut yyrecord &State) int {
	      mut count	:= 0
	  loop:
	      yyrecord.token = yyrecord.yycursor
	      /*!re2c
		  re2c:api = record;
		  re2c:YYFILL =	"r := fill(mut yyrecord, @@); if r != 0	{ return r }";

		  str =	['] ([^'\\] | [\\][^])*	['];

		  [\x00] {
		      // Check that it is the sentinel,	not some unexpected null.
		      return if	yyrecord.token == (yyrecord.yylimit - yymaxfill) { count } else	{ -1 }
		  }
		  str  { count += 1; unsafe { goto loop	} }
		  [ ]+ { unsafe	{ goto loop } }
		  *    { return	-1 }
	      */
	  }

	  fn main() {
	      fname := "input"
	      content := "'qu\0tes' 'are' 'fine: \\'' ";

	      // Prepare input file: a few times the size of the buffer, containing
	      // strings with zeroes and escaped quotes.
	      mut fw :=	os.create(fname)!
	      fw.write_string(strings.repeat_string(content, bufsize))!
	      fw.close()
	      count := 3 * bufsize // number of	quoted strings written to file

	      // Prepare lexer state: all offsets are at the end of buffer.
	      // This immediately triggers YYFILL, as the YYLESSTHAN condition is true.
	      mut fr :=	os.open(fname)!
	      mut st :=	&State{
		  file:	    fr,
		  yyinput:  []u8{len: bufsize +	yymaxfill},
		  yycursor: bufsize,
		  yylimit:  bufsize,
		  token:    bufsize,
		  eof:	    false,
	      }

	      // Run the lexer.
	      n	:= lex(mut st)
	      if n != count { panic("expected $count, got $n") }

	      // Cleanup: remove input file.
	      fr.close()
	      os.rm(fname)!
	  }

FEATURES
   Multiple blocks
       Sometimes it is necessary to have multiple interrelated lexers (for ex-
       ample, if there is a high-level state machine that transitions  between
       lexer  modes).  This  can  be implemented using multiple	connected re2v
       blocks. Another option is to use	start conditions.

       The implementation of connections between blocks	depends	on the	target
       language.  In languages that have goto statement	(such as C/C++ and Go)
       one  can	 have all blocks in one	function, each of them prefixed	with a
       label. Transition from one block	to another is a	simple goto.  In  lan-
       guages  that  do	 not have goto (such as	Rust) it is necessary to use a
       loop with a  switch  on	a  state  variable,  similar  to  the  yystate
       loop/switch  generated  by  re2v, or else wrap each block in a function
       and use function	calls.

       The example below uses multiple blocks to parse binary, octal,  decimal
       and hexadecimal numbers.	Each base has its own block. The initial block
       determines  base	 and dispatches	to other blocks. Common	configurations
       are defined in a	separate block at the beginning	of the	program;  they
       are inherited by	the other blocks.

	  // re2v $INPUT -o $OUTPUT -i

	  const	u32_lim	= u64(1) << 32

	  fn parse_u32(yyinput string) ?u32 {
	      mut yycursor, mut	yymarker := 0, 0
	      mut n := u64(0)
	      mut yych := 0

	      adddgt :=	fn (num	u64, base u64, digit u8) u64 {
		  n := num * base + u64(digit)
		  return if n >= u32_lim { u32_lim } else { n }
	      }
	      /*!re2c
		  re2c:yyfill:enable = 0;
		  re2c:yych:emit = 0;

		  end =	"\x00";

		  '0b' / [01]	     { unsafe{ goto bin	} }
		  "0"		     { unsafe{ goto oct	} }
		  ""   / [1-9]	     { unsafe{ goto dec	} }
		  '0x' / [0-9a-fA-F] { unsafe{ goto hex	} }
		  *		     { return none }
	      */
	  bin:
	      /*!re2c
		  end	{ unsafe{ goto end } }
		  [01]	{ n = adddgt(n,	2, yyinput[yycursor-1] - 48); unsafe{ goto bin } }
		  *	{ return none }
	      */
	  oct:
	      /*!re2c
		  end	{ unsafe{ goto end } }
		  [0-7]	{ n = adddgt(n,	8, yyinput[yycursor-1] - 48); unsafe{ goto oct } }
		  *	{ return none }
	      */
	  dec:
	      /*!re2c
		  end	{ unsafe{ goto end } }
		  [0-9]	{ n = adddgt(n,	10, yyinput[yycursor-1]	- 48); unsafe{ goto dec	} }
		  *	{ return none }
	      */
	  hex:
	      /*!re2c
		  end	{ unsafe{ goto end } }
		  [0-9]	{ n = adddgt(n,	16, yyinput[yycursor-1]	- 48); unsafe{ goto hex	} }
		  [a-f]	{ n = adddgt(n,	16, yyinput[yycursor-1]	- 87); unsafe{ goto hex	} }
		  [A-F]	{ n = adddgt(n,	16, yyinput[yycursor-1]	- 55); unsafe{ goto hex	} }
		  *	{ return none }
	      */
	  end:
	      if n < u32_lim {
		  return u32(n)
	      }
	      return none
	  }

	  fn main() {
	      test := fn (num ?u32, str	string)	{
		  if n := parse_u32(str) {
		      if m := num { if n != m {	panic("wrong number") }	}
		  } else {
		      if _ := num { panic("expected none") }
		  }
	      }
	      test(1234567890, "1234567890\0")
	      test(13, "0b1101\0")
	      test(0x7fe, "0x007Fe\0")
	      test(0o644, "0644\0")
	      test(none, "9999999999\0")
	      test(none, "123??\0")
	  }

   Start conditions
       Start  conditions are enabled with --start-conditions option. They pro-
       vide a way to encode multiple interrelated  automata  within  the  same
       re2v block.

       Each  condition corresponds to a	single automaton and has a unique name
       specified by the	user and a unique internal number defined by re2v. The
       numbers are used	to switch between conditions: the generated code  uses
       YYGETCOND  and YYSETCOND	primitives to get the current condition	or set
       it to the given	number.	 Use  conditions  block,  --header  option  or
       re2c:header  configuration  to  generate	numeric	condition identifiers.
       Configuration re2c:cond:enumprefix specifies the	 generated  identifier
       prefix.

       In condition mode every rule must be prefixed with a list of comma-sep-
       arated  condition  names	in angle brackets, or a	wildcard <*> to	denote
       all conditions. The rule	syntax is extended as follows:

	  < condition-list > regular-expression	code
		 A rule	that is	 merged	 to  every  condition  on  the	condi-
		 tion-list.   It  matches  regular-expression and executes the
		 associated code.

	  < condition-list > regular-expression	=> condition code
		 A rule	that is	 merged	 to  every  condition  on  the	condi-
		 tion-list.   It  matches regular-expression, sets the current
		 condition to condition	and executes the associated code.

	  < condition-list > regular-expression	:=> condition
		 A rule	that is	 merged	 to  every  condition  on  the	condi-
		 tion-list.   It  matches  regular-expression  and immediately
		 transitions to	condition (there is no semantic	action).

	  < condition-list > !action code
		 A rule	that binds code	to the	place  defined	by  action  in
		 every	condition  on the condition-list (see the actions sec-
		 tion for various types	of actions).

	  <! condition-list > code
		 A rule	that prepends code to semantic actions	of  all	 rules
		 for  every  condition	on  the	condition-list.	This syntax is
		 deprecated and	the !pre_rule action should  be	 used  instead
		 (it does exactly the same).

	  < > code
		 A  rule  that	creates	 a special entry condition with	number
		 zero and name "0" that	executes code before jumping to	 other
		 conditions.  This syntax is deprecated, and the !entry	action
		 should	 be used instead (it provides a	more fine-grained con-
		 trol, as the code can be specified on a per-condition	basis,
		 and  one  can	jump directly to condition start without going
		 through condition dispatch).

	  < > => condition code
		 Same as the previous rule, except that	it sets	the next  con-
		 dition.

	  < > :=> condition
		 Same  as  the previous	rule, except that it has no associated
		 code and immediately jumps to condition.

       The code	re2v generates for conditions depends  on  whether  re2v  uses
       goto/label approach or loop/switch approach to encode the automata.

       In languages that have goto statement (such as C/C++ and	Go) conditions
       are naturally implemented as blocks of code prefixed with labels	of the
       form  yyc_<cond>,  where	 cond is a condition name (label prefix	can be
       changed with re2c:cond:prefix). Transitions between conditions are  im-
       plemented  using	 goto and condition labels. Before all conditions re2v
       generates an initial switch on YYGETSTATE that jumps to the start state
       of the current condition.  The shortcut rules :=>  bypass  the  initial
       switch and jump directly	to the specified condition (re2c:cond:goto can
       be  used	 to  change the	default	behavior). The rules with semantic ac-
       tions do	not automatically jump to the next condition; this  should  be
       done by the user-defined	action code.

       In  languages that do not have goto (such as Rust) re2v reuses the yys-
       tate variable to	store condition	numbers. Each condition	gets a numeric
       identifier equal	to the number of its start state, and a	switch between
       conditions is no	different than a switch	between	DFA states of a	single
       condition. There	is no need for a separate  initial  condition  switch.
       (Since  the  same approach is used to implement storable	states,	YYGET-
       COND/YYSETCOND are redundant if both storable states and	conditions are
       used).

       The program below uses start conditions to parse	binary,	octal, decimal
       and hexadecimal numbers.	There is a single block	where  each  base  has
       its  own	 condition,  and  the initial condition	is connected to	all of
       them. User-defined variable cond	stores the current  condition  number;
       it is initialized to the	number of the initial condition	generated with
       conditions block.

	  // re2v $INPUT -o $OUTPUT -ci

	  /*!conditions:re2c*/

	  const	u32_lim	= u64(1) << 32

	  fn parse_u32(yyinput string) ?u32 {
	      mut yycursor, mut	yymarker := 0, 0
	      mut n := u64(0)
	      mut yycond := YYCONDTYPE.yycinit

	      adddgt :=	fn (num	u64, base u64, digit u8) u64 {
		  n := num * base + u64(digit)
		  return if n >= u32_lim { u32_lim } else { n }
	      }

	      /*!re2c
		  re2c:yyfill:enable = 0;

		  <*> *	{ return none }

		  <init> '0b' /	[01]	    :=>	bin
		  <init> "0"		    :=>	oct
		  <init> ""   /	[1-9]	    :=>	dec
		  <init> '0x' /	[0-9a-fA-F] :=>	hex

		  <bin,	oct, dec, hex> "\x00" {	return if n < u32_lim {	u32(n) } else {	none } }

		  <bin>	[01]  {	n = adddgt(n, 2,  yyinput[yycursor-1] -	48); unsafe{ goto yyc_bin } }
		  <oct>	[0-7] {	n = adddgt(n, 8,  yyinput[yycursor-1] -	48); unsafe{ goto yyc_oct } }
		  <dec>	[0-9] {	n = adddgt(n, 10, yyinput[yycursor-1] -	48); unsafe{ goto yyc_dec } }
		  <hex>	[0-9] {	n = adddgt(n, 16, yyinput[yycursor-1] -	48); unsafe{ goto yyc_hex } }
		  <hex>	[a-f] {	n = adddgt(n, 16, yyinput[yycursor-1] -	87); unsafe{ goto yyc_hex } }
		  <hex>	[A-F] {	n = adddgt(n, 16, yyinput[yycursor-1] -	55); unsafe{ goto yyc_hex } }
	      */
	  }

	  fn main() {
	      test := fn (num ?u32, str	string)	{
		  if n := parse_u32(str) {
		      if m := num { if n != m {	panic("wrong number") }	}
		  } else {
		      if _ := num { panic("expected none") }
		  }
	      }
	      test(1234567890, "1234567890\0")
	      test(13, "0b1101\0")
	      test(0x7fe, "0x007Fe\0")
	      test(0o644, "0644\0")
	      test(none, "9999999999\0")
	      test(none, "123??\0")
	  }

   Storable state
       With  --storable-state option re2v generates a lexer that can store its
       current state, return to	the caller, and	later  resume  operations  ex-
       actly  where  it	 left  off. The	default	mode of	operation in re2v is a
       "pull" model, in	which the lexer	"pulls"	more input whenever  it	 needs
       it.  This may be	unacceptable in	cases when the input becomes available
       piece by	piece (for example, if the lexer is invoked by the parser,  or
       if the lexer program communicates via a socket protocol with some other
       program	that  must wait	for a reply from the lexer before it transmits
       the next	message). Storable state feature is intended exactly for  such
       cases:  it  allows  one to generate lexers that work in a "push"	model.
       When the	lexer needs more input,	it stores its state and	returns	to the
       caller. Later, when more	input becomes available,  the  caller  resumes
       the  lexer  exactly where it stopped. There are a few changes necessary
       compared	to the "pull" model:

        Define	YYSETSTATE() and YYGETSTATE(state) primitives.

        Define	yych, yyaccept (if used) and state variables as	a part of per-
	 sistent lexer state. The state	variable should	be initialized to -1.

        YYFILL	should return to the outer program instead of trying to	supply
	 more input. Return code should	indicate that lexer needs more input.

        The outer program should recognize situations when lexer  needs  more
	 input and respond appropriately.

        Optionally  use getstate block	to generate YYGETSTATE switch detached
	 from the main lexer. This only	works for  languages  that  have  goto
	 (not in --loop-switch mode).

        Use re2c:eof and the sentinel with bounds checks method to handle the
	 end of	input. Padding-based method may	not work because it is unclear
	 when to append	padding: the current end of input may not be the ulti-
	 mate end of input, and	appending padding too early may	cut off	a par-
	 tially	 read  greedy  lexeme.	Furthermore, due to high-level program
	 logic getting more input may depend on	processing the lexeme  at  the
	 end  of buffer	(which already is blocked due to the end-of-input con-
	 dition).

       Here is an example of a "push" model lexer that simulates reading pack-
       ets from	a socket. The lexer loops until	it encounters the end of input
       and returns to the calling function. The	calling	function provides more
       input by	"sending" the next packet and  resumes	lexing.	 This  process
       stops when all the packets have been sent, or when there	is an error.

	  // re2v -f $INPUT -o $OUTPUT

	  import log
	  import os

	  // Use a small buffer	to cover the case when a lexeme	doesn't	fit.
	  // In	real world use a larger	buffer.
	  const	bufsize	= 10

	  struct State {
	  mut:
	      file     os.File
	      yyinput  []u8
	      yycursor int
	      yymarker int
	      yylimit  int
	      token    int
	      yystate  int
	  }

	  enum Status {
	      lex_end
	      lex_ready
	      lex_waiting
	      lex_bad_packet
	      lex_big_packet
	  }

	  fn fill(mut st &State) Status	{
	      shift := st.token
	      used := st.yylimit - st.token
	      free := bufsize -	used

	      // Error:	no space. In real life can reallocate a	larger buffer.
	      if free <	1 { return .lex_big_packet }

	      // Shift buffer contents (discard	already	processed data).
	      copy(mut &st.yyinput, st.yyinput[shift..shift+used])
	      st.yycursor -= shift
	      st.yymarker -= shift
	      st.yylimit -= shift
	      st.token -= shift

	      // Fill free space at the	end of buffer with new data.
	      pos := st.file.tell() or { 0 }
	      if n := st.file.read_bytes_into(u64(pos),	mut st.yyinput[st.yylimit..bufsize]) {
		  st.yylimit +=	n
	      }
	      st.yyinput[st.yylimit] = 0 // append sentinel symbol

	      return .lex_ready
	  }

	  fn lex(mut yyrecord &State, mut recv &int) Status {
	      mut yych := u8(0)
	      /*!getstate:re2c*/
	  loop:
	      yyrecord.token = yyrecord.yycursor
	      /*!re2c
		  re2c:api = record;
		  re2c:eof = 0;
		  re2c:YYFILL =	"return	.lex_waiting";

		  packet = [a-z]+[;];

		  *	 { return .lex_bad_packet }
		  $	 { return .lex_end }
		  packet { recv	+= 1; unsafe{ goto loop	} }
	      */
	  }

	  fn test(expect Status, packets []string) {
	      // Create	a pipe (open the same file for reading and writing).
	      fname := "pipe"
	      mut fw :=	os.create(fname) or { panic("cannot create file") }
	      mut fr :=	os.open(fname) or { panic("cannot open file") }

	      // Initialize lexer state: `state` value is -1, all offsets are at the end
	      // of buffer.
	      mut st :=	&State{
		  file:	    fr,
		  // Sentinel at `yylimit` offset is set to zero, which	triggers YYFILL.
		  yyinput:  []u8{len: bufsize +	1},
		  yycursor: bufsize,
		  yymarker: bufsize,
		  yylimit:  bufsize,
		  token:    bufsize,
		  yystate:  -1,
	      }

	      // Main loop. The	buffer contains	incomplete data	which appears packet by
	      // packet. When the lexer	needs more input it saves its internal state and
	      // returns to the	caller which should provide more input and resume lexing.
	      mut status := Status.lex_ready
	      mut send := 0
	      mut recv := 0
	      for {
		  status = lex(mut st, mut &recv)
		  if status == .lex_end	{
		      break
		  } else if status == .lex_waiting {
		      if send <	packets.len {
			  log.debug("sending packet $send")
			  fw.write_string(packets[send]) or { panic("cannot write to file") }
			  fw.flush()
			  send += 1
		      }
		      status = fill(mut	st)
		      log.debug("filled	buffer $st.yyinput, status $status")
		      if status	!= .lex_ready {
			  break
		      }
		  } else if status == .lex_bad_packet {
		      break
		  }
	      }

	      // Check results.
	      if status	!= expect || (status ==	.lex_end && recv != send) {
		  panic("expected $expect with $send packet(s),	got $status with $recv packet(s)")
	      }

	      // Cleanup: remove input file.
	      fr.close()
	      fw.close()
	      os.rm(fname) or {	panic("cannot remove file") }
	  }

	  fn main() {
	      //log.set_level(.debug)

	      test(.lex_end, [])
	      test(.lex_end, ["zero;", "one;", "two;", "three;", "four;"])
	      test(.lex_bad_packet, ["??;"])
	      test(.lex_big_packet, ["looooooooooooong;"])
	  }

   Reusable blocks
       Reusable	  blocks   of	the  form  /*!rules:re2c[:<name>]  ...	*/  or
       %{rules[:<name>]	... %} can be reused any number	of times and  combined
       with  other  re2v  blocks. The <name> is	optional. A rules block	can be
       used in a use block or directive. The code for a	rules block is	gener-
       ated at every point of use.

       Use   blocks   are   defined   with   /*!use:re2c[:<name>]  ...	*/  or
       %{use[:<name>] ... %}. The <name> is optional: if it's  not  specified,
       the associated rules block is the most recent one (whether named	or un-
       named).	 A  use	 block	can  add named definitions, configurations and
       rules of	its own.  An important use case	for use	blocks is a lexer that
       supports	multiple input encodings: the same rules block is reused  mul-
       tiple  times with encoding-specific configurations (see the example be-
       low).

       In-block	use directive !use:<name>; can be used from inside of  a  re2v
       block.  It  merges the referenced block <name> into the current one. If
       some of the merged rules	and configurations overlap with	the previously
       defined ones, conflicts are resolved in the  usual  way:	 the  earliest
       rule takes priority, and	latest configuration overrides preceding ones.
       One  exception  are the special rules *,	$ and (in condition mode) <!>,
       for which a block-local definition overrides any	 inherited  ones.  Use
       directive  allows  one to combine different re2v	blocks together	in one
       block (see the example below).

       Named blocks and	in-block use directive were added in re2v version 2.2.
       Since that version reusable blocks are allowed by default  (no  special
       option  is  needed).  Before version 2.2	reuse mode was enabled with -r
       --reusable option. Before version 1.2  reusable	blocks	could  not  be
       mixed with normal blocks.

   Example of a	!use directive
	  // re2v $INPUT -o $OUTPUT

	  // This example shows	how to combine reusable	re2c blocks: two blocks
	  // ('colors' and 'fish') are merged into one.	The 'salmon' rule occurs
	  // in	both blocks; the 'fish'	block takes priority because it	is used
	  // earlier. Default rule * occurs in all three blocks; the local (not
	  // inherited)	definition takes priority.

	  enum What {
	      color
	      fish
	      dunno
	  }

	  /*!rules:re2c:colors
	      *				   { panic("eh!") }
	      "red" | "salmon" | "magenta" { return .color }
	  */

	  /*!rules:re2c:fish
	      *				   { panic("oh!") }
	      "haddock"	| "salmon" | "eel" { return .fish }
	  */

	  fn lex(yyinput string) What {
	      mut yycursor, mut	yymarker := 0, 0
	      /*!re2c
		  re2c:yyfill:enable = 0;

		  !use:fish;
		  !use:colors;
		  * { return .dunno }  // overrides inherited '*' rules
	      */
	  }

	  fn main() {
	      assert lex("salmon") == .fish
	      assert lex("what?") == .dunno
	  }

   Example of a	/*!use:re2c ...	*/ block
	  // re2v $INPUT -o $OUTPUT --input-encoding utf8

	  // This example supports multiple input encodings: UTF-8 and UTF-32.
	  // Both lexers are generated from the	same rules block, and the use
	  // blocks add	only encoding-specific configurations.
	  /*!rules:re2c
	      re2c:yyfill:enable = 0;

	      "x y" { return 0 }
	      *	      {	return 1 }
	  */

	  fn lex_utf8(yyinput []u8) int	{
	      mut yycursor, mut	yymarker := 0, 0
	      /*!use:re2c
		  re2c:encoding:utf8 = 1;
		  re2c:YYCTYPE = u8; //	the default
	      */
	  }

	  fn lex_utf32(yyinput []u32) int {
	      mut yycursor, mut	yymarker := 0, 0
	      /*!use:re2c
		  re2c:encoding:utf32 =	1;
		  re2c:YYCTYPE = u32;
	      */
	  }

	  fn main() {
	      s8 := [u8(0xe2), u8(0x88), u8(0x80), u8(0x78), u8(0x20), u8(0xe2), u8(0x88), u8(0x83), u8(0x79)]
	      s32 := [u32(0x2200), u32(0x78), u32(0x20), u32(0x2203), u32(0x79)]
	      assert lex_utf8(s8) == 0
	      assert lex_utf32(s32) == 0
	  }

   Submatch extraction
       re2v has	two options for	submatch extraction.

       Tags   The  first option	is to use standalone tags of the form @stag or
	      #mtag, where stag	and mtag  are  arbitrary  used-defined	names.
	      Tags are enabled with -T --tags option or	re2c:tags = 1 configu-
	      ration.  Semantically tags are position markers: they can	be in-
	      serted anywhere in a regular expression, and they	 bind  to  the
	      corresponding  position  (or  multiple  positions)  in the input
	      string.  S-tags bind to the last matching	position,  and	m-tags
	      bind  to	a  list	 of  positions (they may be used in repetition
	      subexpressions, where a single position in a regular  expression
	      corresponds to multiple positions	in the input string). All tags
	      should  be defined by the	user, either manually or with the help
	      of svars and mvars blocks. If there is more than	one  way  tags
	      can  be  matched	against	the input, ambiguity is	resolved using
	      leftmost greedy disambiguation strategy.

       Captures
	      The second option	is to use capturing groups. They  are  enabled
	      with --captures option or	re2c:captures =	1 configuration. There
	      are  two flavours	for different disambiguation policies, --left-
	      most-captures (the default) is for leftmost greedy policy,  and,
	      --posix-captures is for POSIX longest-match policy. In this mode
	      all   parenthesized   subexpressions  are	 considered  capturing
	      groups, and a bang can be	used to	mark non-capturing groups:  (!
	      ... ). With --invert-captures option or re2c:invert-captures = 1
	      configuration  the  meaning  of bang is inverted.	 The number of
	      groups for the matching rule is stored in	 a  variable  yynmatch
	      (the  whole  regular  expression is group	number zero), and sub-
	      match results are	stored in yypmatch array.  Both	 yynmatch  and
	      yypmatch	should	be defined by the user,	and yypmatch size must
	      be at least [yynmatch * 2]. Use maxnmatch	block to   define  YY-
	      MAXNMATCH,  a  constant that equals to the maximum value of yyn-
	      match among all rules.

       Captvars
	      Another way to use capturing groups is the --captvars option  or
	      re2c:captvars = 1	configuration. The only	difference with	--cap-
	      tures  is	in the way the generated code stores submatch results:
	      instead  of  yynmatch  and  yypmatch  re2v  generates  variables
	      yytl<k>  and  yytr<k>  for k-th capturing	group (the user	should
	      declare these using an svars  block).  Captures  with  variables
	      support  two  disambiguation  policies:  --leftmost-captvars  or
	      re2c:leftmost-captvars = 1 for leftmost greedy policy  (the  de-
	      fault one) and --posix-captvars or re2c:posix-captvars for POSIX
	      longest-match policy.

       Under  the hood all these options translate into	tags and Tagged	Deter-
       ministic	Finite Automata	with Lookahead.	 The core idea of TDFA	is  to
       minimize	 the  overhead	on  submatch  extraction.   In the extreme, if
       there're	no tags	or captures in a regular expression, TDFA is  just  an
       ordinary	DFA. If	the number of tags is moderate,	the overhead is	barely
       noticeable.  The	generated TDFA uses a number of	tag variables which do
       not map directly	to tags: a single variable may be used	for  different
       tags, and a tag may require multiple variables to hold all its possible
       values.	Eventually  ambiguity is resolved, and only one	final variable
       per tag survives. Tag variables should be defined using stags or	 mtags
       blocks.	If  lexer state	is stored, tag variables should	be part	of it.
       They also need to be updated  by	YYFILL.

       S-tags support the following operations:

        save input position to	an s-tag: t = YYCURSOR with C pointer API or a
	 user-defined operation	YYSTAGP(t) with	generic	API

        save default value to an s-tag: t = NULL with	C  pointer  API	 or  a
	 user-defined operation	YYSTAGN(t) with	generic	API

        copy one s-tag	to another: t1 = t2

       M-tags support the following operations:

        append	 input	position  to  an  m-tag: a user-defined	operation YYM-
	 TAGP(t) with both default and generic API

        append	default	value to an m-tag: a user-defined operation YYMTAGN(t)
	 with both default and generic API

        copy one m-tag	to another: t1 = t2

       S-tags can be implemented  as  scalar  values  (pointers	 or  offsets).
       M-tags  need a more complex representation, as they need	to store a se-
       quence of tag values. The most naive and	inefficient representation  of
       an m-tag	is a list (array, vector) of tag values; a more	efficient rep-
       resentation  is to store	all m-tags in a	prefix-tree represented	as ar-
       ray of nodes (v,	p), where v is tag value and p is a pointer to	parent
       node.

       Here  is	 a  simple  example of using s-tags to parse semantic versions
       consisting of three numeric components: major, minor, patch (the	latter
       is optional).  See below	for a more complex example that	uses YYFILL.

	  // re2v $INPUT -o $OUTPUT

	  struct SemVer	{
	      major int
	      minor int
	      patch int
	  }

	  fn s2n(s string) int { // convert pre-parsed string to number
	      mut n := 0
	      for c in s { n = n * 10 +	int(c -	48) }
	      return n
	  }

	  fn parse(yyinput string) ?SemVer {
	      mut yycursor, mut	yymarker := 0, 0

	      // Final tag variables available in semantic action.
	      /*!svars:re2c format = 'mut @@ :=	0\n'; */

	      // Intermediate tag variables used by the	lexer (must be autogenerated).
	      /*!stags:re2c format = 'mut @@ :=	-1\n'; */

	      /*!re2c
		  re2c:yyfill:enable = 0;
		  re2c:tags = 1;

		  num =	[0-9]+;

		  @t1 num @t2 "." @t3 num @t4 ("." @t5 num)? [\x00] {
		      return SemVer{
			  major: s2n(yyinput[t1..t2]),
			  minor: s2n(yyinput[t3..t4]),
			  patch: if t5 == -1 { 0 } else	{ s2n(yyinput[t5..yycursor - 1]) }
		      }
		  }
		  * { return none }
	      */
	  }

	  fn main() {
	      test := fn (result ?SemVer, expect ?SemVer) {
		  if r := result {
		      if e := expect { if r != e { panic("expected $e, got $r")	} }
		  } else {
		      if _ := result { panic("expected none") }
		  }
	      }
	      test(parse("23.34\0"), SemVer{23,	34, 0})
	      test(parse("1.2.9999\0"),	SemVer{1, 2, 9999})
	      test(parse("1.a\0"), none)
	  }

       Here is a more complex example of using s-tags with YYFILL to  parse  a
       file  with  newline-separated semantic versions.	Tag variables are part
       of the lexer state, and they are	adjusted in YYFILL  like  other	 input
       positions.   Note  that it is necessary for s-tags because their	values
       are invalidated after shifting buffer contents. It may not be necessary
       in a custom implementation where	tag variables store  offsets  relative
       to  the	start of the input string rather than the buffer, which	may be
       the case	with m-tags.

	  // re2v $INPUT -o $OUTPUT

	  import arrays
	  import os
	  import strings

	  const	bufsize	= 4096
	  const	tag_none = -1

	  struct State {
	      file     os.File
	  mut:
	      yyinput  []u8
	      yycursor int
	      yymarker int
	      yylimit  int
	      token    int
	      // Intermediate tag variables must be part of the	lexer state passed to YYFILL.
	      // They don't correspond to tags and should be autogenerated by re2c.
	      /*!stags:re2c format = "\t@@ int\n"; */
	      eof      bool
	  }

	  struct SemVer	{
	      major int
	      minor int
	      patch int
	  }

	  fn s2n(s []u8) int { // convert pre-parsed string to number
	      mut n := 0
	      for c in s { n = n * 10 +	int(c -	48) }
	      return n
	  }

	  fn fill(mut st &State) int {
	      if st.eof	{ return -1 } // unexpected EOF

	      // Error:	lexeme too long. In real life can reallocate a larger buffer.
	      if st.token < 1 {	return -2 }

	      // Shift buffer contents (discard	everything up to the current token).
	      copy(mut &st.yyinput, st.yyinput[st.token..st.yylimit])
	      st.yycursor -= st.token
	      st.yymarker -= st.token
	      st.yylimit -= st.token
	      // Tag variables need to be shifted like other input positions. The check
	      // for -1	is only	needed if some tags are	nested inside of alternative or
	      // repetition, so	that they can have -1 value.
	      /*!stags:re2c format = "\tif st.@@ != -1 { st.@@ -= st.token }\n"; */
	      st.token = 0

	      // Fill free space at the	end of buffer with new data from file.
	      pos := st.file.tell() or { 0 }
	      if n := st.file.read_bytes_into(u64(pos),	mut st.yyinput[st.yylimit..bufsize]) {
		  st.yylimit +=	n
	      }
	      st.yyinput[st.yylimit] = 0 // append sentinel symbol

	      // If read less than expected, this is the end of	input.
	      st.eof = st.yylimit < bufsize

	      return 0
	  }

	  fn parse(mut st &State) ?[]SemVer {
	      // Final tag variables available in semantic action.
	      /*!svars:re2c format = "mut @@ :=	tag_none\n"; */

	      mut vers := []SemVer{}
	  loop:
	      st.token = st.yycursor
	      /*!re2c
		  re2c:api = record;
		  re2c:yyrecord	= st;
		  re2c:YYFILL =	"fill(mut st) == 0";
		  re2c:tags = 1;
		  re2c:eof = 0;

		  num =	[0-9]+;

		  num @t1 "." @t2 num @t3 ("." @t4 num)? [\n] {
		      ver := SemVer {
			  major: s2n(st.yyinput[st.token..t1]),
			  minor: s2n(st.yyinput[t2..t3]),
			  patch: if t4 == -1 { 0 } else	{ s2n(st.yyinput[t4..st.yycursor - 1]) }
		      }
		      vers = arrays.concat(vers, ver)
		      unsafe { goto loop }
		  }
		  $ { return vers }
		  * { return none }
	      */
	  }

	  fn main() {
	      fname := "input"
	      content := "1.22.333\n";

	      // Prepare input file: a few times the size of the buffer, containing
	      // strings with zeroes and escaped quotes.
	      mut fw :=	os.create(fname)!
	      fw.write_string(strings.repeat_string(content, bufsize))!
	      fw.close()

	      // Prepare lexer state: all offsets are at the end of buffer.
	      mut fr :=	os.open(fname)!
	      mut st :=	&State{
		  file:	     fr,
		  // Sentinel at `yylimit` offset is set to zero, which	triggers YYFILL.
		  yyinput:  []u8{len: bufsize +	1},
		  yycursor: bufsize,
		  yymarker: bufsize,
		  yylimit:  bufsize,
		  token:    bufsize,
		  eof:	    false,
	      }

	      // Run the lexer.
	      expect :=	[]SemVer{len: bufsize, init: SemVer{1, 22, 333}}
	      result :=	parse(mut st) or { panic("parse	failed") }
	      if result	!= expect { panic("error") }

	      // Cleanup: remove input file.
	      fr.close()
	      os.rm(fname)!
	  }

       Here is an example of using capturing groups  to	 parse	semantic  ver-
       sions.

	  // re2v $INPUT -o $OUTPUT

	  struct SemVer	{
	      major int
	      minor int
	      patch int
	  }

	  fn s2n(s string) int { // convert pre-parsed string to number
	      mut n := 0
	      for c in s { n = n * 10 +	int(c -	48) }
	      return n
	  }

	  fn parse(yyinput string) ?SemVer {
	      mut yycursor, mut	yymarker := 0, 0

	      // Final tag variables available in semantic action.
	      /*!svars:re2c format = 'mut @@ :=	0\n'; */

	      // Intermediate tag variables used by the	lexer (must be autogenerated).
	      /*!stags:re2c format = 'mut @@ :=	0\n'; */

	      /*!re2c
		  re2c:yyfill:enable = 0;
		  re2c:captvars	= 1;

		  num =	[0-9]+;

		  (num)	"." (num) ("." num)? [\x00] {
		      _	:= yytl0; _ := yytr0 //	some variables are unused
		      return SemVer {
			  major: s2n(yyinput[yytl1..yytr1]),
			  minor: s2n(yyinput[yytl2..yytr2]),
			  patch: if yytl3 == -1	{0} else {s2n(yyinput[yytl3 + 1..yytr3])}
		      }
		  }
		  * { return none }
	      */
	  }

	  fn main() {
	      test := fn (result ?SemVer, expect ?SemVer) {
		  if r := result {
		      if e := expect { if r != e { panic("expected $e, got $r")	} }
		  } else {
		      if _ := result { panic("expected none") }
		  }
	      }
	      test(parse("23.34\0"), SemVer{23,	34, 0})
	      test(parse("1.2.9999\0"),	SemVer{1, 2, 9999})
	      test(parse("1.a\0"), none)
	  }

       Here  is	 an example of using m-tags to parse a version with a variable
       number of components. Tag variables are stored in a trie.

	  // re2v $INPUT -o $OUTPUT

	  import arrays

	  const	mtag_root = -1
	  const	tag_none = -1

	  // An	m-tag tree is a	way to store histories with an O(1) copy operation.
	  // Histories naturally form a	tree, as they have common start	and fork at some
	  // point. The	tree is	stored as an array of pairs (tag value,	link to	parent).
	  // An	m-tag is represented with a single link	in the tree (array index).
	  struct MtagElem {
	      elem int
	      pred int
	  }
	  type MtagTrie	= []MtagElem

	  // Append a single value to an m-tag history.
	  fn add_mtag(mut trie &MtagTrie, mtag int, value int) int {
	      trie = arrays.concat(trie, MtagElem{value, mtag})
	      return trie.len -	1
	  }

	  // Recursively unwind	tag histories and collect version components.
	  fn unwind(trie MtagTrie, x int, y int, str string) []int {
	      // Reached the root of the m-tag tree, stop recursion.
	      if x == mtag_root	&& y ==	mtag_root {
		  return []
	      }

	      // Unwind	history	further.
	      mut result := unwind(trie, trie[x].pred, trie[y].pred, str)

	      // Get tag values. Tag histories must have equal length.
	      if x == mtag_root	|| y ==	mtag_root {
		  panic("tag histories have different length")
	      }
	      ex := trie[x].elem
	      ey := trie[y].elem

	      if ex != tag_none	&& ey != tag_none {
		  // Both tags are valid string	indices, extract component.
		  result = arrays.concat(result, s2n(str[ex..ey]))
	      }	else if	!(ex ==	tag_none && ey == tag_none) {
		  panic("both tags should be tag_none")
	      }
	      return result
	  }

	  fn s2n(s string) int { // convert pre-parsed string to number
	      mut n := 0
	      for c in s { n = n * 10 +	int(c -	48) }
	      return n
	  }

	  fn parse(yyinput string) ?[]int {
	      mut yycursor, mut	yymarker := 0, 0
	      mut trie := []MtagElem{}

	      // Final tag variables available in semantic action.
	      /*!svars:re2c format = 'mut @@ :=	tag_none\n'; */
	      /*!mvars:re2c format = "mut @@ :=	mtag_root\n"; */

	      // Intermediate tag variables used by the	lexer (must be autogenerated).
	      /*!stags:re2c format = 'mut @@ :=	tag_none\n'; */
	      /*!mtags:re2c format = "mut @@ :=	mtag_root\n"; */

	      /*!re2c
		  re2c:tags = 1;
		  re2c:yyfill:enable = 0;
		  re2c:YYMTAGP = "@@ = add_mtag(mut &trie, @@, yycursor)";
		  re2c:YYMTAGN = "@@ = add_mtag(mut &trie, @@, tag_none)";

		  num =	[0-9]+;

		  @t1 num @t2 ("." #t3 num #t4)* [\x00]	{
		      mut ver := []int{}
		      ver = arrays.concat(ver, s2n(yyinput[t1..t2]))
		      ver = arrays.append(ver, unwind(trie, t3,	t4, yyinput))
		      return ver
		  }
		  * { return none }
	      */
	  }

	  fn main() {
	      test := fn (result ?[]int, expect	?[]int)	{
		  if r := result {
		      if e := expect { if r != e { panic("expected $e, got $r")	} }
		  } else {
		      if _ := result { panic("expected none") }
		  }
	      }
	      test(parse("1\0"), [1])
	      test(parse("1.2.3.4.5.6.7\0"), [1, 2, 3, 4, 5, 6,	7])
	      test(parse("1.\0"), none)
	  }

   Encoding support
       It is necessary to understand the difference between  code  points  and
       code  units.  A	code point is a	numeric	identifier of a	symbol.	A code
       unit is the smallest unit of storage in the encoded text. A single code
       point may be represented	with one or more code units. In	a fixed-length
       encoding	all code points	are represented	with the same number  of  code
       units.  In  a  variable-length  encoding	code points may	be represented
       with a different	number of code units.  Note that the  "any"  rule  [^]
       matches any code	point, but not necessarily any code unit (the only way
       to  match  any code unit	regardless of the encoding is the default rule
       *).  The	generated lexer	works with a stream of code units: yych	stores
       a code unit, and	YYCTYPE	is the code unit type. Regular expressions, on
       the other hand, are specified in	terms of code points. When  re2v  com-
       piles regular expressions to automata it	translates code	points to code
       units.  This  is	generally not a	simple mapping:	in variable-length en-
       codings a single	code point range may get translated to a complex  code
       unit graph.  The	following encodings are	supported:

        ASCII	(enabled  by default). It is a fixed-length encoding with code
	 space [0-255] and 1-byte code points and code units.

        EBCDIC	(enabled with  --ebcdic	 or  re2c:encoding:ebcdic).  It	 is  a
	 fixed-length  encoding	with code space	[0-255]	and 1-byte code	points
	 and code units.

        UCS2  (enabled	 with  --ucs2  or   re2c:encoding:ucs2).   It	is   a
	 fixed-length  encoding	 with  code  space  [0-0xFFFF] and 2-byte code
	 points	and code units.

        UTF8 (enabled with --utf8  or	re2c:encoding:utf8).  It  is  a	 vari-
	 able-length  Unicode  encoding. Code unit size	is 1 byte. Code	points
	 are represented with 1	-- 4 code units.

        UTF16 (enabled	with --utf16 or	re2c:encoding:utf16). It  is  a	 vari-
	 able-length  Unicode encoding.	Code unit size is 2 bytes. Code	points
	 are represented with 1	-- 2 code units.

        UTF32	(enabled  with	--utf32	 or  re2c:encoding:utf32).  It	is   a
	 fixed-length Unicode encoding with code space [0-0x10FFFF] and	4-byte
	 code points and code units.

       Include	file  include/unicode_categories.re  provides re2v definitions
       for the standard	Unicode	categories.

       Option --input-encoding specifies source	file encoding,	which  can  be
       used  to	 enable	 Unicode  literals in regular expressions. For example
       --input-encoding	utf8 tells re2v	that the source	file is	 in  UTF8  (it
       differs	from  --utf8  which sets input text encoding). Option --encod-
       ing-policy specifies the	way  re2v  handles  Unicode  surrogates	 (code
       points in range [0xD800-0xDFFF]).

       Below is	an example of a	lexer for UTF8 encoded Unicode identifiers.

	  // re2v $INPUT -o $OUTPUT --utf8 -si

	  /*!include:re2c "unicode_categories.re" */

	  fn lex(yyinput string) int {
	      mut yycursor, mut	yymarker := 0, 0
	      /*!re2c
		  re2c:yyfill:enable = 0;

		  // Simplified	"Unicode Identifier and	Pattern	Syntax"
		  // (see https://unicode.org/reports/tr31)
		  id_start    =	L | Nl | [$_];
		  id_continue =	id_start | Mn |	Mc | Nd	| Pc | [\u200D\u05F3];
		  identifier  =	id_start id_continue*;

		  identifier { return 0	}
		  *	     { return 1	}
	      */
	  }

	  fn main() {
	      if lex("_\0") != 0 {
		  panic("error")
	      }
	  }

   Include files
       re2v allows one to include other	files using a block of the form	/*!in-
       clude:re2c  FILE	*/ or %{include	FILE %}, or an in-block	directive !in-
       clude FILE ;, where FILE	is a path to the file to  be  included.	  re2v
       looks  for  include files in the	directory of the including file	and in
       include locations, which	can be specified with the -I  option.  Include
       blocks/directives  in re2v work in the same way as C/C++	#include: FILE
       contents	are copy-pasted	verbatim in place of the block/directive.  In-
       clude  files  may have further includes of their	own. Use --depfile op-
       tion to track build dependencies	of the output file on  include	files.
       re2v  provides  some  predefined	include	files that can be found	in the
       include/	subdirectory of	the project. These files  contain  definitions
       that  may  be useful to other projects (such as Unicode categories) and
       form something like a standard library for re2v.	Below is an example of
       using include files.

   Include file	1 (definitions.v)
	  enum Result {
	      ok
	      fail
	  }

	  /*!re2c
	      number = [1-9][0-9]*;
	  */

   Include file	2 (extra_rules.re.inc)
	  // floating-point numbers
	  frac	= [0-9]* "." [0-9]+ | [0-9]+ ".";
	  exp	= 'e' [+-]? [0-9]+;
	  float	= frac exp? | [0-9]+ exp;

	  float	{ return .ok }

   Input file
	  // re2v $INPUT -o $OUTPUT -i

	  /*!include:re2c "definitions.v" */

	  fn lex(yyinput string) Result	{
	      mut yycursor, mut	yymarker := 0, 0
	      /*!re2c
		  re2c:yyfill:enable = 0;

		  *	 { return .fail	}
		  number { return .ok }
		  !include "extra_rules.re.inc";
	      */
	  }

	  fn main() {
	      assert lex("123\0") == .ok
	      assert lex("123.4567\0") == .ok
	  }

   Header files
       re2v allows one to generate header file from the	input .re  file	 using
       --header	 option	 or  re2c:header  configuration	and block pairs	of the
       form /*!header:re2c:on*/	and /*!header:re2c:off*/, or %{header:on%} and
       %{header:off%}. The first block marks the beginning of header file, and
       the second block	marks the end of it. Everything	between	 these	blocks
       is  processed  by  re2v,	 and the generated code	is written to the file
       specified with --header option or re2c:header configuration (or	stdout
       if neither option nor configuration is used). Autogenerated header file
       may  be needed in cases when re2v is used to generate definitions  that
       must be visible from other translation units.

       Here is an example of generating	a header file that contains definition
       of the lexer state with tag variables (the number variables depends  on
       the regular grammar and is unknown to the programmer).

   Input file
	  // re2v $INPUT -o $OUTPUT -i --header	lexer/state.v
	  module main

	  import lexer // the package is generated by re2c

	  /*!header:re2c:on*/
	  module lexer

	  pub struct State {
	  pub mut:
	      yyinput string
	      yycursor int
	      /*!stags:re2c format="@@ int\n"; */
	  }
	  /*!header:re2c:off*/

	  fn lex(mut yyrecord &lexer.State) int	{
	      mut t := 0
	      /*!re2c
		  re2c:header =	"lexer/state.v";
		  re2c:api = record;
		  re2c:yyfill:enable = 0;
		  re2c:tags = 1;

		  [a]* @t [b]* { return	t }
	      */
	  }

	  fn main() {
	      mut st :=	&lexer.State{yyinput:"ab\0",}
	      if lex(mut st) !=	1 {
		  panic("error")
	      }
	  }

   Header file
	  // Code generated by re2c, DO	NOT EDIT.

	  module lexer

	  pub struct State {
	  pub mut:
	      yyinput string
	      yycursor int

	  yyt1 int
	  }

   Skeleton programs
       With the	-S, --skeleton option, re2v ignores all	non-re2v code and gen-
       erates a	self-contained C program that can be further compiled and exe-
       cuted.	The  program  consists	of lexer code and input	data. For each
       constructed DFA (block or condition) re2v generates a standalone	 lexer
       and  two	 files:	an .input file with strings derived from the DFA and a
       .keys file with expected	match results. The program runs	each lexer  on
       the  corresponding  .input  file	and compares results with the expecta-
       tions.  Skeleton	programs are very useful for a number of reasons:

        They can check	correctness of various re2v optimizations (the data is
	 generated early in the	process, before	any DFA	 transformations  have
	 taken place).

        Generating  a	set of input data with good coverage may be useful for
	 both testing and benchmarking.

        Generating self-contained executable programs allows one to get mini-
	 mized test cases (the original	code may be large or have a lot	of de-
	 pendencies).

       The difficulty with generating input data is that for all but the  most
       trivial	cases  the number of possible input strings is too large (even
       if the string length is limited). re2v solves this difficulty by	gener-
       ating sufficiently many strings to cover	almost all DFA transitions. It
       uses the	following algorithm. First, it constructs a  skeleton  of  the
       DFA. For	encodings with 1-byte code unit	size (such as ASCII, UTF-8 and
       EBCDIC)	skeleton is just an exact copy of the original DFA. For	encod-
       ings with multibyte code	units skeleton is a copy of DFA	 with  certain
       transitions omitted: namely, re2v takes at most 256 code	units for each
       disjoint	 continuous  range  that corresponds to	a DFA transition.  The
       chosen values are evenly	distributed and	include	range bounds.  Instead
       of  trying to cover all possible	paths in the skeleton (which is	infea-
       sible) re2v generates sufficiently many paths  to  cover	 all  skeleton
       transitions,  and  thus	trigger	the corresponding conditional jumps in
       the lexer.  The algorithm implementation	is limited by ~1Gb of  transi-
       tions  and consumes constant amount of memory (re2v writes data to file
       as soon as it is	generated).

   Visualization and debug
       With the	-D, --emit-dot option, re2v does not generate  code.  Instead,
       it dumps	the generated DFA in DOT format.  One can convert this dump to
       an  image of the	DFA using Graphviz or another library.	Note that this
       option shows the	final DFA after	it has gone through a number of	 opti-
       mizations  and transformations. Earlier stages can be dumped with vari-
       ous debug options, such as --dump-nfa,  --dump-dfa-raw  etc.  (see  the
       full list of options).

SEE ALSO
       You  can	 find  more  information  about	 re2c at the official website:
       http://re2c.org.	  Similar  programs  are   flex(1),   lex(1),	quex(-
       http://quex.sourceforge.net).

AUTHORS
       re2v  was originally written by Peter Bumbulis (peter@csg.uwaterloo.ca)
       in 1993.	 Marcus	Boerger	and Dan	Nuffer spent several years to turn the
       original	idea into a production ready code generator. Since then	it has
       been maintained and developed by	 multiple  volunteers,	most  notably,
       Brian   Young   (bayoung@acm.org),   Marcus   Boerger,  Dan  Nuffer  (-
       nuffer@users.sourceforge.net), Ulya  Trofimovich	 (skvadrik@gmail.com),
       Serghei	Iakovlev,  Sergei  Trofimovich,	 Petr Skocik, ligfx raekye and
       PolarGoose.

								       RE2V(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=re2v&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help