Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
RE2C(1)								       RE2C(1)

NAME
       re2c - generate fast lexical analyzers for C/C++

SYNOPSIS
       Note: some examples are in C++ (but can be adapted to C).

       re2c [ OPTIONS ]	[ WARNINGS ] INPUT

       Input can be either a file or - for stdin.

INTRODUCTION
       re2c works as a preprocessor. It	reads the input	file (which is usually
       a  program  in C/C++, but can be	anything) and looks for	blocks of code
       enclosed	in special-form	start/end markers. The text outside  of	 these
       blocks  is  copied  verbatim  into the output file. The contents	of the
       blocks are processed by re2c. It	translates them	to code	in  C/C++  and
       outputs the generated code in place of the block.

       Here  is	 an  example  of a small program that checks if	a given	string
       contains	a decimal number:

	  // re2c $INPUT -o $OUTPUT -i --case-ranges
	  #include <assert.h>

	  int lex(const	char *s) {
	      const char *YYCURSOR = s;
	      /*!re2c
		  re2c:yyfill:enable = 0;
		  re2c:define:YYCTYPE =	char;

		  [1-9][0-9]* {	return 0; }
		  *	      {	return 1; }
	      */
	  }

	  int main() {
	      assert(lex("1234") == 0);
	      return 0;
	  }

       In the output re2c replaced the block in	the middle with	the  generated
       code:

	  /* Generated by re2c */
	  // re2c $INPUT -o $OUTPUT -i --case-ranges
	  #include <assert.h>

	  int lex(const	char *s) {
	      const char *YYCURSOR = s;

	  {
	      char yych;
	      yych = *YYCURSOR;
	      switch (yych) {
		  case '1' ... '9': goto yy2;
		  default: goto	yy1;
	      }
	  yy1:
	      ++YYCURSOR;
	      {	return 1; }
	  yy2:
	      yych = *++YYCURSOR;
	      switch (yych) {
		  case '0' ... '9': goto yy2;
		  default: goto	yy3;
	      }
	  yy3:
	      {	return 0; }
	  }

	  }

	  int main() {
	      assert(lex("1234") == 0);
	      return 0;
	  }

BASICS
       A re2c program consists of a sequence of	blocks intermixed with code in
       the  target  language. A	block may contain definitions, configurations,
       rules, actions and directives in	any order:

       name = regular-expression ;
	      A	definition binds name to regular-expression. Names may contain
	      alphanumeric characters and underscore. The regular  expressions
	      section  gives  an  overview  of re2c syntax for regular expres-
	      sions. Once defined, the name can	be used	in other  regular  ex-
	      pressions	 and  in rules.	 Recursion in named definitions	is not
	      allowed, and each	name should be defined before it  is  used.  A
	      block inherits named definitions from the	global scope. Redefin-
	      ing a name that exists in	the current scope is an	error.

       configuration = value ;
	      A	configuration allows one to change re2c	behavior and customize
	      the  generated code. For a full list of configurations supported
	      by re2c see the configurations section. Depending	on a  particu-
	      lar configuration, the value can be a keyword, a nonnegative in-
	      teger  number  or	 a one-line string which should	be enclosed in
	      double or	single quotes unless it	consists of alphanumeric char-
	      acters. A	block inherits configurations from  the	 global	 scope
	      and  may	redefine  them or add new ones.	Configurations defined
	      inside of	a block	affect the whole block,	even if	they appear at
	      the end of it.

       regular-expression code
	      A	rule binds regular-expression to its semantic action (a	 block
	      of  code in curly	braces,	or a block of code that	starts with :=
	      and ends on a newline followed by	any non-whitespace character).
	      If the regular-expression	matches, the associated	code  is  exe-
	      cuted.   If multiple rules match,	the longest match takes	prece-
	      dence. If	multiple rules match the same string, the earliest one
	      takes precedence.	There are two special rules: the default  rule
	      *	 and  the  end of input	rule $.	 Default rule should always be
	      defined, it has the lowest priority regardless of	its  place  in
	      the block, and it	matches	any code unit (not necessarily a valid
	      character,  see  the encoding support section). The end of input
	      rule should be defined if	the corresponding method for  handling
	      the end of input is used.	 With start conditions rules have more
	      complex syntax.

       !action code
	      An  action  binds	 a  user-defined block of code to a particular
	      place in the generated finite state machine (in the same way  as
	      semantic actions bind code to the	final states). See the actions
	      section for a full list of predefined actions.

       !directive ;
	      A	 directive  is	one of the special predefined statements. Each
	      directive	has a unique purpose. See the directives  section  for
	      details.

   Blocks
       Block  start  and  end  markers are either /*!re2c and */, or %{	and %}
       (both styles are	supported). Starting from version 2.2 blocks may  have
       optional	names that allow them to be referenced in other	blocks.	 There
       are different kinds of blocks:

       /*!re2c[:<name>]	... */ or %{[:<name>] ... %}
	      A	 global	 block contains	definitions, configurations, rules and
	      directives.  re2c	compiles regular expressions  associated  with
	      each  rule  into a deterministic finite automaton, encodes it in
	      the form of conditional jumps in the  target  language  and  re-
	      places  the  block with the generated code. Names	and configura-
	      tions defined in a global	block are added	to  the	 global	 scope
	      and  become  visible  to	subsequent blocks. At the start	of the
	      program  the  global  scope  is  initialized  with  command-line
	      options.

       /*!local:re2c[:<name>] ... */ or	%{local[:<name>] ... %}
	      A	local block is like a global block, but	the names and configu-
	      rations  in  it  have  local  scope  (they  do  not affect other
	      blocks).

       /*!rules:re2c[:<name>] ... */ or	%{rules[:<name>] ... %}
	      A	rules block is like a local block, but it  does	 not  generate
	      any  code	 by  itself,  nor  does	 it add	any definitions	to the
	      global scope -- it is meant to be	reused in other	 blocks.  This
	      is  a  way  of sharing code (more	details	in the reusable	blocks
	      section).	Prior to re2c version 2.2  rules  blocks  required  -r
	      --reusable option.

       /*!use:re2c[:<name>] ...	*/ or %{use[:<name>] ... %}
	      A	use block that references a previously defined rules block. If
	      the  name	 is specified, re2c looks for a	rules blocks with this
	      name. Otherwise the most recent rules block is  used  (either  a
	      named  or	an unnamed one). A use block can add definitions, con-
	      figurations and rules of its own,	which are added	 to  those  of
	      the referenced rules block. Prior	to re2c	version	2.2 use	blocks
	      required -r --reusable option.

       /*!max:re2c[:<name1>[:<name2>...]] ... */ or
       %{max[:<name1>[:<name2>...]] ...	%}
	      A	block that generates YYMAXFILL definition. An optional list of
	      block  names specifies which blocks should be included when com-
	      puting YYMAXFILL value (if the list is empty, all	blocks are in-
	      cluded).	By default the generated code  is  a  macro-definition
	      for  C (#define YYMAXFILL	<n>), or a global variable for Go (var
	      YYMAXFILL	int = <n>). It can be customized with an optional con-
	      figuration format	that specifies a template string where @@{max}
	      (or @@ for short)	is replaced with the numeric value  of	YYMAX-
	      FILL.

       /*!maxnmatch:re2c[:<name1>[:<name2>...]]	... */ or %{maxn-
       match[:<name1>[:<name2>...]] ...	%}
	      A	 block	that  generates	YYMAXNMATCH definition (it requires -P
	      --posix-captures option).	An optional list of block names	speci-
	      fies which blocks	should be included when	computing  YYMAXNMATCH
	      value  (if  the list is empty, all blocks	are included).	By de-
	      fault the	generated code is a macro-definition  for  C  (#define
	      YYMAXNMATCH  <n>),  or a global variable for Go (var YYMAXNMATCH
	      int = <n>). It can be customized with an optional	 configuration
	      format that specifies a template string where @@{max} (or	@@ for
	      short) is	replaced with the numeric value	of YYMAXNMATCH.

       /*!stags:re2c[:<name1>[:<name2>...]] ...	*/,
       /*!mtags:re2c[:<name1>[:<name2>...]] ...	*/ or
       %{stags[:<name1>[:<name2>...]] ... %}, %{mtags[:<name1>[:<name2>...]]
       ... %{
	      Blocks  that  specify  a template	piece of code that is expanded
	      for each s-tag/m-tag variable generated  by  re2c.  An  optional
	      list  of	block  names specifies which blocks should be included
	      when computing the set of	tag variables (if the list  is	empty,
	      all  blocks  are	included).   There are two optional configura-
	      tions: format and	separator.  Configuration format  specifies  a
	      template string where @@{tag} (or	@@ for short) is replaced with
	      the  name	 of each tag variable.	Configuration separator	speci-
	      fies a piece of code used	to join	the  generated	format	pieces
	      for different tag	variables.

       /*!svars:re2c[:<name1>[:<name2>...]] ...	*/,
       /*!mvars:re2c[:<name1>[:<name2>...]] ...	*/ or
       %{svars[:<name1>[:<name2>...]] ... %}, %{mvars[:<name1>[:<name2>...]]
       ... %{
	      Blocks  that  specify  a template	piece of code that is expanded
	      for each s-tag/m-tag that	is either explicitly mentioned by  the
	      rules (with --tags option) or implicitly generated by re2c (with
	      --captvars  or  --posix-captvars	options).  An optional list of
	      block names specifies which blocks should	be included when  com-
	      puting the set of	tags (if the list is empty, all	blocks are in-
	      cluded).	There are two optional configurations: format and sep-
	      arator.	Configuration format specifies a template string where
	      @@{tag} (or @@ for short)	is replaced with the name of each tag.
	      Configuration separator specifies	a piece	of code	used  to  join
	      the generated format pieces for different	tags.

       /*!getstate:re2c[:<name1>[:<name2>...]] ... */ or %{get-
       state[:<name1>[:<name2>...]] ...	%}
	      A	 block	that generates conditional dispatch on the lexer state
	      (it requires --storable-state option). An	optional list of block
	      names specifies which blocks should be  included	in  the	 state
	      dispatch.	 The default transition	goes to	the start label	of the
	      first block on the list. If the list is empty,  all  blocks  are
	      included,	 and the default transition goes to the	first block in
	      the file that has	a start	label.	This block type	is  incompati-
	      ble  with	 the  --loop-switch option, as it requires cross-block
	      transitions that are unsupported without goto or function	calls.

       /*!conditions:re2c[:<name1>[:<name2>...]] ... */, /*!types:re2c... */
       or %{conditions[:<name1>[:<name2>...]] ... %}, %{types... %}
	      A	block that generates condition enumeration (it requires	--con-
	      ditions option). An optional list	of block names specifies which
	      blocks should be included	when computing the set	of  conditions
	      (if the list is empty, all blocks	are included).	By default the
	      generated	 code  is  an  enumeration  YYCONDTYPE.	It can be cus-
	      tomized with optional configurations format and separator.  Con-
	      figuration format	specifies a template string where @@{cond} (or
	      @@ for short) is replaced	with the name of each  condition,  and
	      @@{num}  is  replaced  with  a  numeric index of that condition.
	      Configuration separator specifies	a piece	of code	used  to  join
	      the generated format pieces for different	conditions.

       /*!include:re2c <file> */ or %{include <file> %}
	      This  block  allows  one to include <file>, which	must be	a dou-
	      ble-quoted file path. The	contents of  the  file	are  literally
	      substituted  in  place of	the block, in the same way as #include
	      works in C/C++. This block can be	used together with the	--dep-
	      file  option  to	generate  build	system dependencies on the in-
	      cluded files.

       /*!header:re2c:on*/ or %{header:on %}
	      This block marks the start of header file. Everything  after  it
	      and  up  to  the following header:off block is processed by re2c
	      and written to the header	file specified with  -t	 --type-header
	      option.

       /*!header:re2c:off*/ or %{header:off %}
	      This block marks the end of header file started with header:on*/
	      block.

       /*!ignore:re2c ... */ or	%{ignore ... %}
	      A	 block	which contents are ignored and removed from the	output
	      file.

   Configurations
       Here is a full list of configurations supported by re2c:

       re2c:api, re2c:input
	      Same as the --api	option.

       re2c:api:sigil
	      Specify the marker ("sigil") that	is used	 for  argument	place-
	      holders  in the API primitives. The default is @@. A placeholder
	      starts with sigil	followed by the	argument name in curly braces.
	      For example, if sigil is set to $, then placeholders  will  have
	      the  form	 ${name}. Single-argument APIs may use shorthand nota-
	      tion without the name in braces. This option can	be  overridden
	      by  options for individual API primitives, e.g.  re2c:YYFILL@len
	      for YYFILL.

       re2c:api:style
	      Specify API style. Possible values are  functions	 (the  default
	      for  C)  and  free-form (the default for Go and Rust).  In func-
	      tions style API primitives are generated with an	argument  list
	      in  parentheses  following  the name of the primitive. The argu-
	      ments are	provided only for autogenerated	 parameters  (such  as
	      the number of characters passed to YYFILL), but not for the gen-
	      eral lexer context, so the primitives behave more	like macros in
	      C/C++ or closures	in Go and Rust.	 In free-form style API	primi-
	      tives  do	 not  have  a  fixed  form:  they should be defined as
	      strings containing free-form pieces of  code  with  interpolated
	      variables	 of  the  form @@{var} or @@ (they correspond to argu-
	      ments in function-like style).  This configuration may be	 over-
	      ridden  for  individual API primitives, see for example re2c:YY-
	      FILL:naked configuration for YYFILL.

       re2c:bit-vectors, re2c:flags:bit-vectors, re2c:flags:b
	      Same as the --bit-vectors	 option,  but  can  be	configured  on
	      per-block	basis.

       re2c:captures, re2c:leftmost-captures
	      Same as the --leftmost-captures option, but can be configured on
	      per-block	basis.

       re2c:captvars, re2c:leftmost-captvars
	      Same as the --leftmost-captvars option, but can be configured on
	      per-block	basis.

       re2c:case-insensitive, re2c:flags:case-insensitive
	      Same  as the --case-insensitive option, but can be configured on
	      per-block	basis.

       re2c:case-inverted, re2c:flags:case-inverted
	      Same as the --case-inverted option, but  can  be	configured  on
	      per-block	basis.

       re2c:case-ranges, re2c:flags:case-ranges
	      Same  as	the  --case-ranges  option,  but  can be configured on
	      per-block	basis.

       re2c:computed-gotos, re2c:flags:computed-gotos, re2c:flags:g
	      Same as the --computed-gotos option, but can  be	configured  on
	      per-block	basis.

       re2c:computed-gotos:threshold, re2c:cgoto:threshold
	      If  computed goto	is used, this configuration specifies the com-
	      plexity threshold	that triggers the generation  of  jump	tables
	      instead  of  nested if statements	and bitmaps. The default value
	      is 9.

       re2c:cond:abort
	      If set to	a positive integer value, the default case in the gen-
	      erated condition dispatch	aborts program execution.

       re2c:cond:goto
	      Specifies	a piece	of code	used for  the  autogenerated  shortcut
	      rules :=>	in conditions. The default is goto @@;.	 The @@	place-
	      holder  is  substituted  with condition name (see	configurations
	      re2c:api:sigil and re2c:cond:goto@cond).

       re2c:cond:goto@cond
	      Specifies	 the  sigil  used   for	  argument   substitution   in
	      re2c:cond:goto  definition.  The default value is	@@.  Overrides
	      the more generic re2c:api:sigil configuration.

       re2c:cond:divider
	      Defines the divider for condition	blocks.	 The default value  is
	      /*  ***********************************  */.   Placeholders  are
	      substituted  with	 condition  name   (see	  re2c:api;sigil   and
	      re2c:cond:divider@cond).

       re2c:cond:divider@cond
	      Specifies	  the	sigil	used   for  argument  substitution  in
	      re2c:cond:divider	definition. The	default	is @@.	Overrides  the
	      more generic re2c:api:sigil configuration.

       re2c:cond:prefix, re2c:condprefix
	      Specifies	 the prefix used for condition labels.	The default is
	      yyc_.

       re2c:cond:enumprefix, re2c:condenumprefix
	      Specifies	the prefix used	for condition  identifiers.   The  de-
	      fault is yyc.

       re2c:debug-output, re2c:flags:debug-output, re2c:flags:d
	      Same  as	the  --debug-output  option,  but can be configured on
	      per-block	basis.

       re2c:empty-class, re2c:flags:empty-class
	      Same as the --empty-class	 option,  but  can  be	configured  on
	      per-block	basis.

       re2c:encoding:ebcdic, re2c:flags:ecb, re2c:flags:e
	      Same  as the --ebcdic option, but	can be configured on per-block
	      basis.

       re2c:encoding:ucs2, re2c:flags:wide-chars, re2c:flags:w
	      Same as the --ucs2 option, but can be  configured	 on  per-block
	      basis.

       re2c:encoding:utf8, re2c:flags:utf-8, re2c:flags:8
	      Same  as	the  --utf8 option, but	can be configured on per-block
	      basis.

       re2c:encoding:utf16, re2c:flags:utf-16, re2c:flags:x
	      Same as the --utf16 option, but can be configured	 on  per-block
	      basis.

       re2c:encoding:utf32, re2c:flags:unicode,	re2c:flags:u
	      Same  as	the --utf32 option, but	can be configured on per-block
	      basis.

       re2c:encoding-policy, re2c:flags:encoding-policy
	      Same as the --encoding-policy option, but	can be	configured  on
	      per-block	basis.

       re2c:eof
	      Specifies	the sentinel symbol used with the end-of-input rule $.
	      The  default  value  is  -1 ($ rule is not used).	Other possible
	      values include all valid code units. Only	 decimal  numbers  are
	      recognized.

       re2c:fn:sep
	      Specifies	 separator  used  in  YYFN elements (defaults to semi-
	      colon).

       re2c:header, re2c:flags:type-header, re2c:flags:t
	      Specifies	the name of the	generated header file relative to  the
	      directory	of the output file. Same as the	--header option	except
	      that the file path is relative.

       re2c:indent:string
	      Specifies	the string used	for indentation. The default is	a sin-
	      gle  tab character "\t". Indent string should contain whitespace
	      characters only.	To disable indentation entirely, set this con-
	      figuration to an empty string.

       re2c:indent:top
	      Specifies	the minimum amount of indentation to use. The  default
	      value  is	 zero. The value should	be a non-negative integer num-
	      ber.

       re2c:invert-captures
	      Same as the --invert-captures option, but	can be	configured  on
	      per-block	basis.

       re2c:label:prefix, re2c:labelprefix
	      Specifies	 the  prefix used for DFA state	labels.	The default is
	      yy.

       re2c:label:start, re2c:startlabel
	      Controls the generation of a  block  start  label.  The  default
	      value  is	 zero,	which  means that the start label is generated
	      only if it is used. An integer value greater  than  zero	forces
	      the generation of	start label even if it is unused by the	lexer.
	      A	 string	 value also forces start label generation and sets the
	      label name to the	specified string. This	configuration  applies
	      only  to	the current block (it is reset to default for the next
	      block).

       re2c:label:yyFillLabel
	      Specifies	the prefix of YYFILL labels used with re2c:eof and  in
	      storable state mode.

       re2c:label:yyloop
	      Specifies	 the  name of the label	marking	the start of the lexer
	      loop with	--loop-switch option. The default is yyloop.

       re2c:label:yyNext
	      Specifies	the name of the	optional label that follows YYGETSTATE
	      switch in	storable state mode (enabled  with  re2c:state:nextla-
	      bel). The	default	is yyNext.

       re2c:lookahead, re2c:flags:lookahead
	      Deprecated (see the deprecated --no-lookahead option).

       re2c:monadic
	      If  set  to non-zero, the	generated lexer	will use monadic nota-
	      tion (this configuration is specific to Haskell).

       re2c:nested-ifs,	re2c:flags:nested-ifs, re2c:flags:s
	      Same as the  --nested-ifs	 option,  but  can  be	configured  on
	      per-block	basis.

       re2c:posix-captures, re2c:flags:posix-captures, re2c:flags:P
	      Same  as	the  --posix-captures option, but can be configured on
	      per-block	basis.

       re2c:posix-captvars
	      Same as the --posix-captvars option, but can  be	configured  on
	      per-block	basis.

       re2c:tags, re2c:flags:tags, re2c:flags:T
	      Same  as	the  --tags option, but	can be configured on per-block
	      basis.

       re2c:tags:expression
	      Specifies	the expression used for	 tag  variables.   By  default
	      re2c generates expressions of the	form yyt<N>. This might	be in-
	      convenient,  for	example	if tag variables are defined as	fields
	      in a struct. All occurrences of @@{tag} or @@ are	replaced  with
	      the actual tag name. For example,	re2c:tags:expression = "s.@@";
	      results  in  expressions	of  the	form s.yyt<N> in the generated
	      code.  See also re2c:api:sigil configuration.

       re2c:tags:negative
	      Specifies	the constant expression	that is	used for negative  tag
	      value (typically this would be -1	if tags	are integer offsets in
	      the input	string,	or null	pointer	if they	are pointers).

       re2c:tags:prefix
	      Specifies	the prefix for tag variable names. The default is yyt.

       re2c:sentinel
	      Specifies	 the  sentinel symbol used for the end-of-input	checks
	      (when bounds checks are disabled with  re2c:yyfill:enable	 =  0;
	      and  re2c:eof  is	 not  set). This configuration does not	affect
	      code generation: its purpose is to verify	that the  sentinel  is
	      not  allowed  in the middle of a rule, and ensure	that the lexer
	      won't read past the end of buffer. The default value is -1`  (in
	      that  case  re2c assumes that the	sentinel is zero, which	is the
	      most common case). Only decimal numbers are recognized.

       re2c:state:abort
	      If set to	a positive integer value, the default case in the gen-
	      erated state dispatch aborts program execution, and an  explicit
	      -1 case contains transition to the start of the block.

       re2c:state:nextlabel
	      Controls if the YYGETSTATE switch	is followed by an yyNext label
	      (the default value is zero, which	corresponds to no label).  Al-
	      ternatively  one can use re2c:label:start	to generate a specific
	      start label, or an  explicit  getstate  block  to	 generate  the
	      YYGETSTATE switch	separately from	the lexer block.

       re2c:unsafe, re2c:flags:unsafe
	      Same  as	the  --no-unsafe  option,  but	can  be	 configured on
	      per-block	basis.	If set to zero,	it suppresses  the  generation
	      of unsafe	wrappers around	YYPEEK.	The default is non-zero	(wrap-
	      pers are generated).  This configuration is specific to Rust.

       re2c:YYBACKUP, re2c:define:YYBACKUP
	      Defines generic API primitive YYBACKUP.

       re2c:YYBACKUPCTX, re2c:define:YYBACKUPCTX
	      Defines generic API primitive YYBACKUPCTX.

       re2c:YYCONDTYPE,	re2c:define:YYCONDTYPE
	      Defines API primitive YYCONDTYPE.

       re2c:YYCTYPE, re2c:define:YYCTYPE
	      Defines API primitive YYCTYPE.

       re2c:YYCTXMARKER, re2c:define:YYCTXMARKER
	      Defines API primitive YYCTXMARKER.

       re2c:YYCURSOR, re2c:define:YYCURSOR
	      Defines API primitive YYCURSOR.

       re2c:YYDEBUG, re2c:define:YYDEBUG
	      Defines API primitive YYDEBUG.

       re2c:YYFILL, re2c:define:YYFILL
	      Defines API primitive YYFILL.

       re2c:YYFILL@len,	re2c:define:YYFILL@len
	      Specifies	the sigil used for argument substitution in YYFILL de-
	      finition.	  Defaults   to	  @@.	 Overrides  the	 more  generic
	      re2c:api:sigil configuration.

       re2c:YYFILL:naked, re2c:define:YYFILL:naked
	      Overrides	the more generic re2c:api:style	configuration for  YY-
	      FILL.  Zero value	corresponds to free-form API style.

       re2c:YYFN
	      Defines API primitive YYFN.

       re2c:YYINPUT
	      Defines API primitive YYINPUT.

       re2c:YYGETCOND, re2c:define:YYGETCONDITION
	      Defines API primitive YYGETCOND.

       re2c:YYGETCOND:naked, re2c:define:YYGETCONDITION:naked
	      Overrides	 the  more  generic  re2c:api:style  configuration for
	      YYGETCOND. Zero value corresponds	to free-form API style.

       re2c:YYGETSTATE,	re2c:define:YYGETSTATE
	      Defines API primitive YYGETSTATE.

       re2c:YYGETSTATE:naked, re2c:define:YYGETSTATE:naked
	      Overrides	the  more  generic  re2c:api:style  configuration  for
	      YYGETSTATE. Zero value corresponds to free-form API style.

       re2c:YYGETACCEPT, re2c:define:YYGETACCEPT
	      Defines API primitive YYGETACCEPT.

       re2c:YYLESSTHAN,	re2c:define:YYLESSTHAN
	      Defines generic API primitive YYLESSTHAN.

       re2c:YYLIMIT, re2c:define:YYLIMIT
	      Defines API primitive YYLIMIT.

       re2c:YYMARKER, re2c:define:YYMARKER
	      Defines API primitive YYMARKER.

       re2c:YYMTAGN, re2c:define:YYMTAGN
	      Defines generic API primitive YYMTAGN.

       re2c:YYMTAGP, re2c:define:YYMTAGP
	      Defines generic API primitive YYMTAGP.

       re2c:YYPEEK, re2c:define:YYPEEK
	      Defines generic API primitive YYPEEK.

       re2c:YYRESTORE, re2c:define:YYRESTORE
	      Defines generic API primitive YYRESTORE.

       re2c:YYRESTORECTX, re2c:define:YYRESTORECTX
	      Defines generic API primitive YYRESTORECTX.

       re2c:YYRESTORETAG, re2c:define:YYRESTORETAG
	      Defines generic API primitive YYRESTORETAG.

       re2c:YYSETCOND, re2c:define:YYSETCONDITION
	      Defines API primitive YYSETCOND.

       re2c:YYSETCOND@cond, re2c:define:YYSETCONDITION@cond
	      Specifies	 the sigil used	for argument substitution in YYSETCOND
	      definition. The default value is @@.  Overrides the more generic
	      re2c:api:sigil configuration.

       re2c:YYSETCOND:naked, re2c:define:YYSETCONDITION:naked
	      Overrides	the more generic re2c:api:style	configuration for  YY-
	      SETCOND. Zero value corresponds to free-form API style.

       re2c:YYSETSTATE,	re2c:define:YYSETSTATE
	      Defines API primitive YYSETSTATE.

       re2c:YYSETSTATE@state, re2c:define:YYSETSTATE@state
	      Specifies	the sigil used for argument substitution in YYSETSTATE
	      definition. The default value is @@.  Overrides the more generic
	      re2c:api:sigil configuration.

       re2c:YYSETSTATE:naked, re2c:define:YYSETSTATE:naked
	      Overrides	 the more generic re2c:api:style configuration for YY-
	      SETSTATE.	Zero value corresponds to free-form API	style.

       re2c:YYSETACCEPT, re2c:define:YYSETACCEPT
	      Defines API primitive YYSETACCEPT.

       re2c:YYSKIP, re2c:define:YYSKIP
	      Defines generic API primitive YYSKIP.

       re2c:YYSHIFT, re2c:define:YYSHIFT
	      Defines generic API primitive YYSHIFT.

       re2c:YYCOPYMTAG,	re2c:define:YYCOPYMTAG
	      Defines generic API primitive YYCOPYMTAG.

       re2c:YYCOPYSTAG,	re2c:define:YYCOPYSTAG
	      Defines generic API primitive YYCOPYSTAG.

       re2c:YYSHIFTMTAG, re2c:define:YYSHIFTMTAG
	      Defines generic API primitive YYSHIFTMTAG.

       re2c:YYSHIFTSTAG, re2c:define:YYSHIFTSTAG
	      Defines generic API primitive YYSHIFTSTAG.

       re2c:YYSTAGN, re2c:define:YYSTAGN
	      Defines generic API primitive YYSTAGN.

       re2c:YYSTAGP, re2c:define:YYSTAGP
	      Defines generic API primitive YYSTAGP.

       re2c:yyaccept, re2c:variable:yyaccept
	      Defines API primitive yyaccept.

       re2c:yybm, re2c:variable:yybm
	      Defines API primitive yybm.

       re2c:yybm:hex, re2c:variable:yybm:hex
	      If set to	nonzero, bitmaps for the --bit-vectors option are gen-
	      erated in	hexadecimal format. The	default	is zero	 (bitmaps  are
	      in decimal format).

       re2c:yych, re2c:variable:yych
	      Defines API primitive yych.

       re2c:yych:emit, re2c:variable:yych:emit
	      If  set  to zero,	yych definition	is not generated.  The default
	      is non-zero.

       re2c:yych:conversion, re2c:variable:yych:conversion
	      If set to	non-zero, re2c automatically generates a conversion to
	      YYCTYPE every time yych is read. The default is to zero (no con-
	      version).

       re2c:yych:literals, re2c:variable:yych:literals
	      Specifies	the form of literals that  yych	 is  matched  against.
	      Possible	values are: char (character literals in	single quotes,
	      non-printable ones use escape sequences that  start  with	 back-
	      slash), hex (hexadecimal integers) and char_or_hex (a mixture of
	      both,  character literals	for printable characters and hexadeci-
	      mal integers for others).

       re2c:yyctable, re2c:variable:yyctable
	      Defines API primitive yyctable.

       re2c:yynmatch, re2c:variable:yynmatch
	      Defines API primitive yynmatch.

       re2c:yypmatch, re2c:variable:yypmatch
	      Defines API primitive yypmatch.

       re2c:yytarget, re2c:variable:yytarget
	      Defines API primitive yytarget.

       re2c:yystable, re2c:variable:yystable
	      Deprecated.

       re2c:yystate, re2c:variable:yystate
	      Defines API primitive yystate.

       re2c:yyfill, re2c:variable:yyfill
	      Defines API primitive yyfill.

       re2c:yyfill:check
	      If set to	zero, suppresses the generation	 of  pre-YYFILL	 check
	      for the number of	input characters (the YYLESSTHAN definition in
	      generic  API and the YYLIMIT-based comparison in C pointer API).
	      The default is non-zero (generate	the check).

       re2c:yyfill:enable
	      If set to	zero, suppresses the generation	 of  YYFILL  (together
	      with  the	 check). This should be	used when the whole input fits
	      into one piece of	memory (there is no need  for  buffering)  and
	      the  end-of-input	 checks	do not rely on the YYFILL checks (e.g.
	      if a sentinel character is used).	 Use warnings (-W option)  and
	      re2c:sentinel  configuration  to verify that the generated lexer
	      cannot read past the end of input.  The default is non-zero (YY-
	      FILL is enabled).

       re2c:yyfill:parameter
	      If set to	zero, suppresses the generation	of parameter passed to
	      YYFILL.  The parameter is	the minimum number of characters  that
	      must be supplied.	 Defaults to non-zero (the parameter is	gener-
	      ated).   This  configuration  can	 be  overridden	 with re2c:YY-
	      FILL:naked or re2c:api:style.

   Regular expressions
       re2c uses the following syntax for regular expressions:

       "foo"  Case-sensitive string literal.

       'foo'  Case-insensitive string literal.

       [a-xyz],	[^a-xyz]
	      Character	class (possibly	negated).

       .      Any character except newline.

       R \ S  Difference of character classes R	and S.

       R*     Zero or more occurrences of R.

       R+     One or more occurrences of R.

       R?     Optional R.

       R{n}   Repetition of R exactly n	times.

       R{n,}  Repetition of R at least n times.

       R{n,m} Repetition of R from n to	m times.

       (R)    Just R; parentheses are used to override precedence. If submatch
	      extraction is enabled, (R) is a  capturing  or  a	 non-capturing
	      group depending on --invert-captures option.

       (!R)   If  submatch extraction is enabled, (!R) is a non-capturing or a
	      capturing	group depending	on --invert-captures option.

       R S    Concatenation: R followed	by S.

       R | S  Alternative: R or	S.

       R / S  Lookahead: R followed by S, but S	is not consumed.

       name   Regular expression defined as name (or literal string "name"  in
	      Flex compatibility mode).

       {name} Regular expression defined as name in Flex compatibility mode.

       @stag  An  s-tag:  saves	the last input position	at which @stag matches
	      in a variable named stag.

       #mtag  An m-tag:	saves all input	positions at which #mtag matches in  a
	      variable named mtag.

       Character  classes and string literals may contain the following	escape
       sequences: \a, \b, \f, \n, \r, \t, \v, \\, octal	escapes	\ooo and hexa-
       decimal escapes \xhh, \uhhhh and	\Uhhhhhhhh.

   Actions
       Here is a list of predefined actions supported by re2c:

       !entry code
	      Entry action binds a user-defined	block of  code	to  the	 start
	      state  of	 the current finite state machine. If start conditions
	      are used,	the entry action can be	set individually for each con-
	      dition. This action may be used to perform initialization,  e.g.
	      to save start location of	a lexeme.

       !pre_rule code
	      Pre-rule	action prepends	a user-defined block of	code to	seman-
	      tic actions of all rules in the current block (or	condition,  if
	      start  conditions	 are  used). This action may be	used to	factor
	      out the common part of all semantic actions (e.g.	saving the end
	      location of a lexeme).

       !post_rule code
	      Post-rule	action appends a user-defined block of code to	seman-
	      tic  actions of all rules	in the current block (or condition, if
	      start conditions are used). This action may be used to emit trap
	      statements that guard against unintended control flow.

   Directives
       Here is a full list of directives supported by re2c:

       !use:name ;
	      An in-block use directive	that merges a previously defined rules
	      block with the specified name into the current block. Named def-
	      initions,	configurations and rules of the	referenced  block  are
	      added  to	 the current ones. Conflicts between overlapping rules
	      and configurations are resolved in the usual way:	the first rule
	      takes priority, and the latest configuration overrides the  pre-
	      ceding ones. One exception is the	special	rules *, $ and <!> for
	      which  a block-local definition always takes priority. A use di-
	      rective can be placed anywhere inside of a block,	 and  multiple
	      use directives are allowed.

       !include	file ;
	      This  directive  is  the	same as	include	block: it inserts file
	      contents verbatim	in place of the	directive.

   Program interface
       The generated code interfaces with the outer program with the  help  of
       primitives,  collectively  referred  to	as  the	API.  Which primitives
       should be defined for a particular program depends on multiple factors,
       including the complexity	of regular expressions,	input  representation,
       buffering and the use of	various	features. All the necessary primitives
       should  be  defined by the user in the form of macros, functions, vari-
       ables or	any other suitable form	that makes the generated code  syntac-
       tically	and semantically correct. re2c does not	(and cannot) check the
       definitions, so if anything is missing or defined incorrectly, the gen-
       erated program may have compile-time or run-time	errors.	  This	manual
       provides	examples of API	definitions in the most	common cases.

       re2c  has three API flavors that	define the core	set of primitives used
       by a program:

       Simple API
	      This is the default API for C/C++	backend. It consists of	primi-
	      tives YYCURSOR, YYMARKER,	YYCTXMARKER and	YYLIMIT, which	should
	      be defined as pointers of	type YYCTYPE*.

       Record API
	      (added  in version 4.0) Record API is useful in cases when lexer
	      state must be stored in a	struct.	  It  is  enabled  with	 --api
	      record  option or	re2c:api = record configuration. This API con-
	      sists of a variable yyrecord (the	name can  be  overridden  with
	      re2c:yyrecord)  that  should  be defined as a struct with	fields
	      yycursor,	yymarker, yyctxmarker, yylimit (only the  fields  used
	      by the generated code need to be defined,	and their names	can be
	      configured).

       Generic API
	      (added in	version	0.14) This is the most flexible	API. It	is en-
	      abled with --api generic option or re2c:api = generic configura-
	      tion.   This  API	 contains  primitives  for generic operations:
	      YYPEEK, YYSKIP, YYBACKUP,	YYBACKUPCTX,  YYSTAGP,	YYSTAGN,  YYM-
	      TAGP,  YYMTAGN,  YYRESTORE, YYRESTORECTX,	YYRESTORETAG, YYSHIFT,
	      YYSHIFTSTAG, YYSHIFTMTAG,	YYLESSTHAN.  Generic API supports  two
	      styles that determine the	form in	which the primitives should be
	      defined:

	      Free-form
		     Free-form	  style	   is	enabled	  with	 configuration
		     re2c:api:style =  free-form.   In	this  style  interface
		     primitives	 should	be defined as free-form	pieces of code
		     with interpolated variables of the	form  @@{var}  or  op-
		     tionally  just  @@	if there is a single variable. The set
		     of	variables is specific to each  primitive.  Here's  how
		     free-form generic API can be defined in terms of pointers
		     cursor, limit, marker and ctxmarker:

			/*!re2c
			  re2c:YYPEEK	    = "*cursor";
			  re2c:YYSKIP	    = "++cursor;";
			  re2c:YYBACKUP	    = "marker =	cursor;";
			  re2c:YYRESTORE    = "cursor =	marker;";
			  re2c:YYBACKUPCTX  = "ctxmarker = cursor;";
			  re2c:YYRESTORECTX = "cursor =	ctxmarker;";
			  re2c:YYRESTORETAG = "cursor =	${tag};";
			  re2c:YYLESSTHAN   = "limit - cursor <	@@{len}";
			  re2c:YYSTAGP	    = "@@{tag} = cursor;";
			  re2c:YYSTAGN	    = "@@{tag} = NULL;";
			  re2c:YYSHIFT	    = "cursor += @@{shift};";
			  re2c:YYSHIFTSTAG  = "@@{tag} += @@{shift};";
			*/

	      Function-like
		     Historically  function-like  style	is the default one. It
		     also can be enabled with configuration  re2c:api:style  =
		     functions.	 In this style primitives should be defined as
		     functions or macros with parentheses, accepting the  nec-
		     essary  arguments.	 Here's	 how function-like generic API
		     can be defined in terms of	pointers cursor, limit,	marker
		     and ctxmarker using preprocessor macros:

			#define	 YYPEEK()		  *cursor
			#define	 YYSKIP()		  ++cursor
			#define	 YYBACKUP()		  marker = cursor
			#define	 YYRESTORE()		  cursor = marker
			#define	 YYBACKUPCTX()		  ctxmarker = cursor
			#define	 YYRESTORECTX()		  cursor = ctxmarker
			#define	 YYRESTORETAG(tag)	  cursor = tag
			#define	 YYLESSTHAN(len)	  limit	- cursor < len
			#define	 YYSTAGP(tag)		  tag =	cursor
			#define	 YYSTAGN(tag)		  tag =	NULL
			#define	 YYSHIFT(shift)		  cursor += shift
			#define	 YYSHIFTSTAG(tag, shift)  tag += shift

       Here is a full list of API primitives that may be used by the generated
       code in order to	interface with the outer program.

       YYCTYPE
	      The type of the  input  characters  (code	 units).   For	ASCII,
	      EBCDIC and UTF-8 encodings it should be 1-byte unsigned integer.
	      For  UTF-16  or  UCS-2 it	should be 2-byte unsigned integer. For
	      UTF-32 it	should be 4-byte unsigned integer.

       YYCURSOR
	      An l-value that stores the current input position	(a pointer  or
	      an  integer  offset in YYINPUT). Initially YYCURSOR should point
	      to the first input character, and	later it is  advanced  by  the
	      generated	 code.	When  a	rule matches, YYCURSOR position	is the
	      one after	the last matched character.

       YYLIMIT
	      An r-value that stores the end of	input position (a  pointer  or
	      an integer offset	in YYINPUT). Initially YYLIMIT should point to
	      the position after the last available input character. It	is not
	      changed  by  the	generated code.	The lexer compares YYCURSOR to
	      YYLIMIT in order to determine if there are enough	input  charac-
	      ters left.

       YYMARKER
	      An  l-value  that	stores the position of the latest matched rule
	      (a pointer or an integer offset in YYINPUT). It is used  to  re-
	      store  the  YYCURSOR  position if	the longer match fails and the
	      lexer needs to rollback.	Initialization is not needed.

       YYCTXMARKER
	      An l-value that stores the position of the trailing  context  (a
	      pointer  or  an integer offset in	YYINPUT). No initialization is
	      needed. YYCTXMARKER is needed only if the	lookahead  operator  /
	      is used.

       YYFILL A	 generic  API  primitive with one variable len.	 YYFILL	should
	      provide at least len more	input characters or fail.  If re2c:eof
	      is used, then len	is always 1 and	 YYFILL	should	always	return
	      to  the  calling	function; zero return value indicates success.
	      If re2c:eof is not used, then YYFILL return value	is ignored and
	      it should	not return on failure. The maximum value of len	is YY-
	      MAXFILL.

       YYFN   A	primitive that defines function	prototype in --recursive-func-
	      tions code model.	Its value should be an array of	 one  or  more
	      strings, where each string contains two or three components sep-
	      arated  by  the  string  specified  in re2c:fn:sep configuration
	      (typically a semicolon). The first array element	defines	 func-
	      tion  name  and return type (empty for a void function).	Subse-
	      quent elements define function arguments:	first, the  expression
	      for  the	argument  used in function body	(usually just a	name);
	      second, argument type; third, an optional	formal	parameter  (it
	      defaults	to the first component - usually both the argument and
	      the parameter are	the same identifier).

       YYINPUT
	      An r-value that stores  the  current  input  character  sequence
	      (string, buffer, etc.).

       YYMAXFILL
	      An  integral constant equal to the maximum value of the argument
	      to YYFILL.  It can be generated with a max block.

       YYLESSTHAN
	      A	generic	API primitive with one variable	len.  It should	be de-
	      fined as an r-value of boolean type that equals true if and only
	      if there are less	than len input characters left.

       YYPEEK A	generic	API primitive with no variables.  It should be defined
	      as an r-value of type YYCTYPE that is equal to the character  at
	      the current input	position.

       YYSKIP A	 generic  API  primitive that should advance the current input
	      position by one code unit.

       YYBACKUP
	      A	generic	API primitive that should save the current input posi-
	      tion (to be restored with	YYRESTORE later).

       YYRESTORE
	      A	generic	API primitive that should restore  the	current	 input
	      position to the value saved by YYBACKUP.

       YYBACKUPCTX
	      A	generic	API primitive that should save the current input posi-
	      tion  as	the  position  of the trailing context (to be restored
	      with YYRESTORECTX	later).

       YYRESTORECTX
	      A	generic	API primitive that should restore the trailing context
	      position saved with YYBACKUPCTX.

       YYRESTORETAG
	      A	generic	API primitive with one variable	tag  that  should  re-
	      store the	trailing context position to the value of tag.

       YYSTAGP
	      A	 generic API primitive with one	variable tag, where tag	can be
	      a	pointer	or an offset in	YYINPUT	(see submatch extraction  sec-
	      tion  for	 details). YYSTAGP should set tag to the current input
	      position.

       YYSTAGN
	      A	generic	API primitive with one variable	tag, where tag can  be
	      a	 pointer or an offset in YYINPUT (see submatch extraction sec-
	      tion for details). YYSTAGN should	to set tag  to	a  value  that
	      represents non-existent input position.

       YYMTAGP
	      A	 generic  API primitive	with one variable tag.	YYMTAGP	should
	      append the current position to the submatch history of tag  (see
	      the submatch extraction section for details.)

       YYMTAGN
	      A	 generic  API primitive	with one variable tag.	YYMTAGN	should
	      append a value that represents non-existent input	position posi-
	      tion to the submatch history of tag (see the submatch extraction
	      section for details.)

       YYSHIFT
	      A	generic	API primitive with  one	 variable  shift  that	should
	      shift  the current input position	by shift characters (the shift
	      value may	be negative).

       YYCOPYSTAG
	      A	generic	API primitive with two variables,  lhs	and  rhs  that
	      should   copy   right-hand-side	s-tag	variable  rhs  to  the
	      left-hand-side s-tag variable lhs. For most languages this prim-
	      itive has	a default definition that assigns lhs to rhs.

       YYCOPYMTAG
	      A	generic	API primitive with two variables,  lhs	and  rhs  that
	      should   copy   right-hand-side	m-tag	variable  rhs  to  the
	      left-hand-side m-tag variable lhs. For most languages this prim-
	      itive has	a default definition that assigns lhs to rhs.

       YYSHIFTSTAG
	      A	generic	 API primitive with two	variables, tag and shift  that
	      should  shift  tag  by  shift code units (the shift value	may be
	      negative).

       YYSHIFTMTAG
	      A	generic	API primitive with two variables, tag and  shift  that
	      should  shift  the  latest  value	in the history of tag by shift
	      code units (the shift value may be negative).

       YYMAXNMATCH
	      An integral constant equal to the	maximal	number of  POSIX  cap-
	      turing groups in a rule. It is generated with a maxnmatch	block.

       YYCONDTYPE
	      The type of the condition	enum.  It can be generated either with
	      conditions block or --header option.

       YYGETACCEPT
	      A	 primitive  with one variable var that stores numeric selector
	      of the accepted rule. For	most languages this  primitive	has  a
	      default definition that reads from var.

       YYSETACCEPT
	      A	 primitive with	two variables: var (an l-value that stores nu-
	      meric selector of	the accepted rule), and	val (the value of  se-
	      lector). For most	languages this primitive has a default defini-
	      tion that	assigns	var to val.

       YYGETCOND
	      An  r-value of type YYCONDTYPE that is equal to the current con-
	      dition identifier.

       YYSETCOND
	      A	primitive with one variable cond that should set  the  current
	      condition	identifier to cond.

       YYGETSTATE
	      An  r-value  of  integer type that is equal to the current lexer
	      state. It	should be initialized to -1.

       YYSETSTATE
	      A	primitive with one variable state that should set the  current
	      lexer state to state.

       YYDEBUG
	      This primitive is	generated only with -d,	--debug-output option.
	      Its purpose is to	add logging to the generated code (typical YY-
	      DEBUG  definition	 is a print statement).	YYDEBUG	statements are
	      generated	in every state and have	two variables: state (either a
	      DFA state	index or -1) and symbol	(the current input symbol).

       yyaccept
	      An l-value of unsigned integral type that	stores the  number  of
	      the  latest matched rule.	User definition	is necessary only with
	      --storable-state option.

       yybm   A	table containing compressed bitmaps for	up  to	8  transitions
	      (used  with  the	--bitmaps option). The table contains 256 ele-
	      ments and	is indexed by 1-byte code units.  Each	8-bit  element
	      combines	boolean	 values	 for  up to 8 transitions. k-Th	bit of
	      n-th element is true iff n-th code unit is in the	range of  k-th
	      transition.  The	idea  of  this	bitmap	is  to replace many if
	      branches or switch cases with one	check of a single bit  in  the
	      table.

       yych   An l-value of type YYCTYPE that stores the current input charac-
	      ter.  User definition is necessary only with -f --storable-state
	      option.

       yyctable
	      Jump table generated for the initial condition dispatch (enabled
	      with  the	 combination  of --conditions and --computed-gotos op-
	      tions).

       yyfill An l-value that stores the result	of YYFILL call	(this  may  be
	      necessary	 for  pure  functional	languages,  where  YYFILL is a
	      monadic function with complex return value).

       yynmatch
	      An l-value of unsigned integral type that	stores the  number  of
	      POSIX  capturing	groups in the matched rule.  Used only with -P
	      --posix-captures option.

       yypmatch
	      An array of l-values that	are used to hold the tag values	corre-
	      sponding to the capturing	parentheses in the matching rule.  Ar-
	      ray  length must be at least yynmatch * 2	(usually YYMAXNMATCH *
	      2	is a good choice).  Used only with -P --posix-captures option.

       yystable
	      Deprecated.

       yystate
	      An l-value used with the --loop-switch option to store the  cur-
	      rent DFA state.

       yytarget
	      Jump  table that contains	jump targets (label addresses) for all
	      transitions from a state.	This table is  local  to  each	state.
	      Generation  of  yytarget tables is enabled with --computed-gotos
	      option.

   Options
       Some of the  options  have  corresponding  configurations,  others  are
       global  and cannot be changed after re2c	starts reading the input file.
       Debug options generally require building	re2c in	 debug	configuration.
       Internal	 options are useful for	experimenting with the algorithms used
       in re2c.

       -? --help -h
	      Show help	message.

       --api <simple | record |	generic>
	      Specify the API used by the generated  code  to  interface  with
	      used-defined  code.  Option simple shold be used in simple cases
	      when there's no need for	buffer	refilling  and	storing	 lexer
	      state. Option record should be used when lexer state needs to be
	      stored in	a record (struct, class, etc.).	 Option	generic	should
	      be  used in complex cases	when the other two APIs	are not	flexi-
	      ble enough.

       --bit-vectors -b
	      Optimize conditional jumps using bit masks.  This	option implies
	      --nested-ifs.

       --captures, --leftmost-captures
	      Enable  submatch	extraction  with  leftmost  greedy   capturing
	      groups. The result is collected into an array yybmatch of	capac-
	      ity 2 * YYMAXNMATCH, and yynmatch	is set to the number of	groups
	      for the matching rule.

       --captvars, --leftmost-captvars
	      Enable   submatch	 extraction  with  leftmost  greedy  capturing
	      groups. The result is collected into variables yytl<k>,  yytr<k>
	      for k-th capturing group.

       --case-insensitive
	      Treat  single-quoted  and	double-quoted strings as case-insensi-
	      tive.

       --case-inverted
	      Invert the meaning of single-quoted and  double-quoted  strings:
	      treat  single-quoted strings as case-sensitive and double-quoted
	      strings as case-insensitive.

       --case-ranges
	      Collapse consecutive cases in a switch statements	into  a	 range
	      of the form low ... high.	This syntax is a C/C++ language	exten-
	      sion that	is supported by	compilers like GCC, Clang and Tcc. The
	      main advantage over using	single cases is	smaller	generated code
	      and faster generation time, although for some compilers like Tcc
	      it  also	results	 in  smaller binary size.  This	option is sup-
	      ported only for C.

       --computed-gotos	-g
	      Optimize conditional jumps using	non-standard  "computed	 goto"
	      extension	(which must be supported by the	compiler). re2c	gener-
	      ates jump	tables only in complex cases with a lot	of conditional
	      branches.	  Complexity   threshold   can	 be   configured  with
	      cgoto:threshold configuration. This  option  implies  --bit-vec-
	      tors. It is supported only for C.

       --conditions --start-conditions -c
	      Enable  support of Flex-like "conditions": multiple interrelated
	      lexers within one	block. This  is	 an  alternative  to  manually
	      specifying different re2c	blocks connected with goto or function
	      calls.

       --depfile FILE
	      Write  dependency	 information to	FILE in	the form of a Makefile
	      rule <output-file> : <input-file>	[include-file ...].  This  al-
	      lows  one	to track build dependencies in the presence of include
	      blocks/directives, so that updating include files	 triggers  re-
	      generation  of  the  output  file.   This	 option	depends	on the
	      --output option.

       --ebcdic	--ecb -e
	      Generate a lexer that reads input	in EBCDIC encoding.  re2c  as-
	      sumes  that  the character range is 0 -- 0xFF and	character size
	      is 1 byte.

       --empty-class <match-empty | match-none | error>
	      Define  the  way	re2c  treats  empty  character	classes.  With
	      match-empty (the default)	empty class matches empty input	(which
	      is  illogical,  but backwards-compatible). With match-none empty
	      class always fails to match.  With error empty  class  raises  a
	      compilation error.

       --encoding-policy <fail | substitute | ignore>
	      Define  the  way re2c treats Unicode surrogates.	With fail re2c
	      aborts with an error when	a surrogate is encountered.  With sub-
	      stitute re2c silently replaces surrogates	with  the  error  code
	      point  0xFFFD.  With ignore (the default)	re2c treats surrogates
	      as normal	code points. The Unicode standard says that standalone
	      surrogates are invalid, but real-world  libraries	 and  programs
	      behave in	different ways.

       --flex-syntax -F
	      Partial  support for Flex	syntax:	in this	mode named definitions
	      don't need the equal sign	and  the  terminating  semicolon,  and
	      when used	they must be surrounded	with curly braces. Names with-
	      out curly	braces are treated as double-quoted strings.

       --goto-label
	      Use  "goto/label"	code model: encode DFA in form of labeled code
	      blocks connected with goto transitions across  blocks.  This  is
	      only supported for languages that	have a goto statement.

       --header	--type-header -t HEADER
	      Generate	a  HEADER file.	The contents of	the file can be	speci-
	      fied using special blocks	header:on and  header:off.  If	condi-
	      tions  are used, the generated header will have a	condition enum
	      automatically appended to	it (unless there is an explicit	condi-
	      tions block).

       -I PATH
	      Add PATH to the list of locations	which are used when  searching
	      for include files. This option is	useful in combination with in-
	      clude  block  or directive. re2c looks for FILE in the directory
	      of the parent file and in	the include locations  specified  with
	      -I option.

       --input <default	| custom>
	      Deprecated alias for --api. Option default corresponds to	simple
	      (it  is  indeed the default for most backends, but not for all).
	      Option custom corresponds	to generic.

       --input-encoding	<ascii | utf8>
	      Specify the way re2c parses  regular  expressions.   With	 ascii
	      (the  default) re2c handles input	as ASCII-encoded: any sequence
	      of code units is a sequence  of  standalone  1-byte  characters.
	      With  utf8  re2c	handles	 input	as UTF8-encoded	and recognizes
	      multibyte	characters.

       --invert-captures
	      Invert the meaning of capturing and non-capturing	groups.	By de-
	      fault (...) is capturing and (! ...) is non-capturing. With this
	      option (!	...) is	capturing and (...) is non-capturing.

       --lang <none | c	| d | go | haskell | java | js | ocaml | python	| rust
       | v | zig>
	      Specify the target language. Supported languages are C,  D,  Go,
	      Haskell,	Java,  JS, OCaml, Python, Rust,	V, Zig (more languages
	      can be added via user-defined syntax files, see the --syntax op-
	      tion). Option none disables default suntax configs, so that  the
	      target language is undefined.

       --location-format <gnu |	msvc>
	      Specify  location	 format	 in  messages.	With gnu locations are
	      printed as 'filename:line:column:	...'.  With msvc locations are
	      printed as 'filename(line,column)	...'.  The default is gnu.

       --loop-switch
	      Use "loop/switch"	code model: encode DFA in form of a loop  over
	      a	 switch	 statement,  where individual states are switch	cases.
	      State is stored  in  a  variable	yystate.  Transitions  between
	      states update yystate to the case	label of the destination state
	      and continue execution to	the head of the	loop.

       --nested-ifs -s
	      Use  nested if statements	instead	of switch statements in	condi-
	      tional jumps. This usually results in more efficient  code  with
	      non-optimizing compilers.

       --no-debug-info -i
	      Do  not output line directives. This may be useful when the gen-
	      erated code is stored in a version control system	(to avoid huge
	      autogenerated diffs on small changes).

       --no-generation-date
	      Suppress date output in the generated file.

       --no-version
	      Suppress version output in the generated file.

       --no-unsafe
	      Do not generate unsafe wrapper over YYPEEK (this option is  spe-
	      cific  to	 Rust).	 For  performance  reasons YYPEEK should avoid
	      bounds-checking, as  the	lexer  already	performs  end-of-input
	      checks  in a more	efficient way.	The user may choose to provide
	      a	safe YYPEEK definition,	or a definition	that is	unsafe only in
	      release builds, in which case the	--no-unsafe  option  helps  to
	      avoid warnings about redundant unsafe blocks.

       --output	-o OUTPUT
	      Specify the OUTPUT file.

       --posix-captures, -P
	      Enable  submatch	extraction  with POSIX-style capturing groups.
	      The result is collected into an array yybmatch of	capacity  2  *
	      YYMAXNMATCH, and yynmatch	is set to the number of	groups for the
	      matching rule.

       --posix-captvars
	      Enable  submatch	extraction  with POSIX-style capturing groups.
	      The result is collected into variables yytl<k>, yytr<k> for k-th
	      capturing	group.

       --recursive-functions
	      Use code model based on co-recursive functions, where  each  DFA
	      state is a separate function that	may call other state-functions
	      or itself.

       --reusable -r
	      Deprecated since version 2.2 (reusable blocks are	allowed	by de-
	      fault now).

       --skeleton -S
	      Ignore user-defined interface code and generate a	self-contained
	      "skeleton"  program.  Additionally,  generate  input  files with
	      strings derived from the regular grammar	and  compressed	 match
	      results  that  are used to verify	"skeleton" behavior on all in-
	      puts. This option	is useful for finding  bugs  in	 optimizations
	      and code generation. This	option is supported only for C.

       --storable-state	-f
	      Generate	a lexer	which can store	its inner state.  This is use-
	      ful in push-model	lexers which are stopped by an	outer  program
	      when there is not	enough input, and then resumed when more input
	      becomes available. In this mode users should additionally	define
	      YYGETSTATE  and YYSETSTATE primitives, and variables yych, yyac-
	      cept and state should be part of the stored lexer	state.

       --syntax	FILE
	      Load configurations from the specified FILE and  apply  them  on
	      top of the default syntax	file. Note that	FILE can define	only a
	      few  configurations  (if	it's  used to amend the	default	syntax
	      file), or	it can define a	whole new  language  backend  (in  the
	      latter case it is	recommended to use --lang none option).

       --tags -T
	      Enable submatch extraction with tags.

       --ucs2 --wide-chars -w
	      Generate	a  lexer  that	reads UCS2-encoded input. re2c assumes
	      that the character range is 0 -- 0xFFFF and character size is  2
	      bytes.  This option implies --nested-ifs.

       --utf8 --utf-8 -8
	      Generate	a  lexer  that reads input in UTF-8 encoding. re2c as-
	      sumes that the character range is	0 --  0x10FFFF	and  character
	      size is 1	byte.

       --utf16 --utf-16	-x
	      Generate	a  lexer  that reads UTF16-encoded input. re2c assumes
	      that the character range is 0 -- 0x10FFFF	and character size  is
	      2	bytes.	This option implies --nested-ifs.

       --utf32 --unicode -u
	      Generate	a  lexer  that reads UTF32-encoded input. re2c assumes
	      that the character range is 0 -- 0x10FFFF	and character size  is
	      4	bytes.	This option implies --nested-ifs.

       --verbose
	      Output a short message in	case of	success.

       --vernum	-V
	      Show version information in MMmmpp format	(major,	minor, patch).

       --version -v
	      Show version information.

       --single-pass -1
	      Deprecated. Does nothing (single pass is the default now).

       --debug-output -d
	      Emit  YYDEBUG  invocations in the	generated code.	This is	useful
	      to trace lexer execution.

       --dump-adfa
	      Debug option: output DFA after tunneling (in .dot	format).

       --dump-cfg
	      Debug option: output control flow	graph  of  tag	variables  (in
	      .dot format).

       --dump-closure-stats
	      Debug  option: output statistics on the number of	states in clo-
	      sure.

       --dump-dfa-det
	      Debug option: output DFA immediately after  determinization  (in
	      .dot format).

       --dump-dfa-min
	      Debug option: output DFA after minimization (in .dot format).

       --dump-dfa-tagopt
	      Debug  option:  output DFA after tag optimizations (in .dot for-
	      mat).

       --dump-dfa-tree
	      Debug option: output DFA under construction with	states	repre-
	      sented as	tag history trees (in .dot format).

       --dump-dfa-raw
	      Debug  option:  output  DFA  under  construction	with  expanded
	      state-sets (in .dot format).

       --dump-interf
	      Debug option: output interference	 table	produced  by  liveness
	      analysis of tag variables.

       --dump-nfa
	      Debug option: output NFA (in .dot	format).

       --emit-dot -D
	      Instead  of  normal  output generate lexer graph in .dot format.
	      The output can be	 converted  to	an  image  with	 the  help  of
	      Graphviz (e.g. something like dot	-Tpng -odfa.png	dfa.dot).

       --dfa-minimization <moore | table>
	      Internal	option:	 DFA  minimization algorithm used by re2c. The
	      moore option is the Moore	algorithm (it is the default). The ta-
	      ble option is the	"table	filling"  algorithm.  Both  algorithms
	      should produce the same DFA up to	states relabeling; table fill-
	      ing  is simpler and much slower and serves as a reference	imple-
	      mentation.

       --eager-skip
	      Internal option: make the	generated lexer	advance	the input  po-
	      sition  eagerly  --  immediately after reading the input symbol.
	      This changes the default behavior	when the input position	is ad-
	      vanced lazily -- after transition	to the next state.

       --no-lookahead
	      Internal option, deprecated.  It used to	enable	TDFA(0)	 algo-
	      rithm. Unlike TDFA(1), TDFA(0) algorithm does not	use one-symbol
	      lookahead.  It applies register operations to the	incoming tran-
	      sitions rather than the outgoing ones.  Benchmarks  showed  that
	      TDFA(0) algorithm	is less	efficient than TDFA(1).

       --no-optimize-tags
	      Internal	option:	suppress optimization of tag variables (useful
	      for debugging).

       --posix-closure <gor1 | gtop>
	      Internal option: specify shortest-path algorithm	used  for  the
	      construction of epsilon-closure with POSIX disambiguation	seman-
	      tics:  gor1  (the	default) stands	for Goldberg-Radzik algorithm,
	      and gtop stands for "global topological order" algorithm.

       --posix-prectable <complex | naive>
	      Internal option: specify the algorithm  used  to	compute	 POSIX
	      precedence  table. The complex algorithm computes	precedence ta-
	      ble in one traversal of tag history tree and has quadratic  com-
	      plexity  in  the	number	of TNFA	states;	it is the default. The
	      naive algorithm has worst-case cubic complexity in the number of
	      TNFA states, but it is much simpler  than	 complex  and  may  be
	      slightly faster in non-pathological cases.

       --stadfa
	      Internal	option,	 deprecated.   It  used	to enable staDFA algo-
	      rithm, which differs from	TDFA in	that register  operations  are
	      placed  in  states rather	than on	transitions. Benchmarks	showed
	      that staDFA algorithm is less efficient than TDFA.

       --fixed-tags <none | toplevel | all>
	      Internal option:	specify	 whether  the  fixed-tag  optimization
	      should  be  applied  to  all tags	(all), none of them (none), or
	      only those in toplevel concatenation (toplevel). The default  is
	      all.   "Fixed"  tags  are	 those that are	located	within a fixed
	      distance to some other tag (called "base"). In such  cases  only
	      the base tag needs to be tracked,	and the	value of the fixed tag
	      can  be computed as the value of the base	tag plus a static off-
	      set. For tags that are under alternative	or  repetition	it  is
	      also necessary to	check if the base tag has a no-match value (in
	      that case	fixed tag should also be set to	no-match, disregarding
	      the  offset).  For  tags in top-level concatenation the check is
	      not needed, because they always match.

   Warnings
       Warnings	can be invividually enabled, disabled and turned into  an  er-
       ror.

       -W     Turn on all warnings.

       -Werror
	      Turn  warnings  into errors. Note	that this option alone doesn't
	      turn on any warnings; it only affects those warnings  that  have
	      been turned on so	far or will be turned on later.

       -W<warning>
	      Turn on warning.

       -Wno-<warning>
	      Turn off warning.

       -Werror-<warning>
	      Turn  on warning and treat it as an error	(this implies -W<warn-
	      ing>).

       -Wno-error-<warning>
	      Don't treat this particular warning as an	 error.	 This  doesn't
	      turn off the warning itself.

       -Wcondition-order
	      Warn  if	the generated program makes implicit assumptions about
	      condition	numbering. One should use either  --header  option  or
	      conditions  block	 to  generate  a mapping of condition names to
	      numbers and then use the autogenerated condition names.

       -Wempty-character-class
	      Warn if a	regular	expression contains an empty character	class.
	      Trying  to  match	 an  empty  character class makes no sense: it
	      should always fail.  However, for	backwards  compatibility  rea-
	      sons  re2c  permits  empty  character classes and	treats them as
	      empty strings. Use the --empty-class option to  change  the  de-
	      fault behavior.

       -Wmatch-empty-string
	      Warn  if	a  rule	is nullable (matches an	empty string).	If the
	      lexer runs in a loop and the empty match is  unintentional,  the
	      lexer may	unexpectedly hang in an	infinite loop.

       -Wswapped-range
	      Warn  if	the  lower  bound of a range is	greater	than its upper
	      bound. The default  behavior  is	to  silently  swap  the	 range
	      bounds.

       -Wundefined-control-flow
	      Warn  if	some input strings cause undefined control flow	in the
	      lexer (the faulty	patterns are reported).	This  is  a  dangerous
	      and common mistake. It can be easily fixed by adding the default
	      rule * which has the lowest priority, matches any	code unit, and
	      always consumes a	single code unit.

       -Wunreachable-rules
	      Warn about rules that are	shadowed by other rules	and will never
	      match.

       -Wuseless-escape
	      Warn  if	a symbol is escaped when it shouldn't be.  By default,
	      re2c silently ignores such escapes, but this may as  well	 indi-
	      cate a typo or an	error in the escape sequence.

       -Wnondeterministic-tags
	      Warn  if	a  tag	has  n-th degree of nondeterminism, where n is
	      greater than 1.

       -Wsentinel-in-midrule
	      Warn if the sentinel symbol occurs in the	middle of a  rule  ---
	      this  may	 cause reads past the end of buffer, crashes or	memory
	      corruption in the	generated lexer. This warning is only applica-
	      ble if the sentinel method of checking for the end of  input  is
	      used.   It  is set to an error if	re2c:sentinel configuration is
	      used.

       -Wundefined-syntax-config
	      Warn if the syntax file specified	with --syntax option is	 miss-
	      ing  definitions	of some	configurations.	This helps to maintain
	      user-defined syntax files: if a new release adds configurations,
	      old syntax file will raise a warning, and	the user will be noti-
	      fied. If some configurations are unused and do not need a	defin-
	      ition, they should be explicitly set to <undefined>.

   Syntax files
       Support for different languages in re2c is based	on the idea of	syntax
       files.	A  syntax  file	is a configuration file	that defines syntax of
       the target language -- not the whole language, but a small part	of  it
       that  is	used by	the generated code. Syntax files make re2c very	flexi-
       ble, but	they should not	be used	as a replacement for re2c:  configura-
       tions: their purpose is to define syntax	of the target language,	not to
       customize  one  particular  lexer. All supported	languages have default
       syntax files that are part of the distribution (see include/syntax sub-
       directory); they	are also embedded in the re2c binary.  Users may  pro-
       vide  a	custom syntax file that	overrides a few	configurations for one
       of supported languages, or they may choose to redefine  all  configura-
       tions  (in  that	case --lang none option	should be used).  Syntax files
       contain configurations of four different	kinds: feature lists, language
       configurations, inplace configurations and code templates.

       Feature lists
	  A few	list configurations define various  features  supported	 by  a
	  given	backend, so that re2c may give a clear error if	the user tries
	  to enable an unsupported feature:

	  supported_apis
		 A  list  of  supported	 APIs  with  possible elements simple,
		 record, generic.

	  supported_api_styles
		 A list	of supported API styles	with possible  elements	 func-
		 tions,	free-form.

	  supported_code_models
		 A  list  of  supported	 code  models  with  possible elements
		 goto-label, loop-switch, recursive-functions.

	  supported_targets
		 A list	of supported codegen targets  with  possible  elements
		 code, dot, skeleton.

	  supported_features
		 A   list   of	 supported  features  with  possible  elements
		 nested-ifs, bitmaps,  computed-gotos,	case-ranges,  monadic,
		 unsafe, tags, captures, captvars.

       Language	configurations
	  A  few  boolean  configurations describe features of the target lan-
	  guage	that affect re2c parser	and code generator:

	  semicolons
		 Non-zero if the language uses semicolons after	statements.

	  backtick_quoted_strings
		 Non-zero if the language has backtick-quoted strings.

	  single_quoted_strings
		 Non-zero if the language has single-quoted strings.

	  indentation_sensitive
		 Non-zero if the language is indentation sensitive.

	  wrap_blocks_in_braces
		 Non-zero if compound statements  must	be  wrapped  in	 curly
		 braces.

       Inplace configurations
	  Syntax  files	 define	initial	values of all re2c: configurations, as
	  they may differ for different	languages. See configurations  section
	  for a	full list of all inplace configurations	and their meaning.

       Code templates
	  Code	templates define syntax	of the target language.	They are writ-
	  ten in a simple domain-specific language with	the  following	formal
	  grammar:

	      code-template ::
		    name '=' code-exprs	';'
		  | CODE_TEMPLATE ';'
		  | '<undefined>' ';'

	      code-exprs ::
		    <EMPTY>
		  | code-exprs code-expr

	      code-expr	::
		    STRING
		  | VARIABLE
		  | optional
		  | list

	      optional ::
		    '('	CONDITIONAL '?'	code-exprs ')'
		  | '('	CONDITIONAL '?'	code-exprs ':' code-exprs ')'

	      list ::
		    '['	VARIABLE ':' code-exprs	']'
		  | '['	VARIABLE '{' NUMBER '}'	':' code-exprs ']'
		  | '['	VARIABLE '{' NUMBER ','	NUMBER '}' ':' code-exprs ']'

	  A  code  template  is	 a sequence of string literals,	variables, op-
	  tional elements and lists, or	a reference to another code  template,
	  or  a	special	value <undefined>. Variables are placeholders that are
	  substituted during code generation phase. List  variables  are  spe-
	  cial:	 when  expanding  list templates, re2c repeats expressions the
	  right	hand side of the column	a few times, each time	replacing  oc-
	  currences of the list	variable with a	value specific to this repeti-
	  tion.	 Lists	have optional bounds (negative values are counted from
	  the end, e.g.	-1 means the last element).  Conditional  names	 start
	  with	a  dot.	  Both	conditionals and variables may be either local
	  (specific to the given code template)	or global (allowed in all code
	  templates). When re2c	reads syntax file, it checks  that  each  code
	  template  uses  only the variables and conditionals that are allowed
	  in it.

	  For example, the following code template defines  if-then-else  con-
	  struct for a C-like language:

	      code:if_then_else	=
		  [branch{0}: topindent	"if " cond " {"	nl
		      indent [stmt: stmt] dedent]
		  [branch{1:-1}: topindent "} else" (.cond ? " if " cond) " {" nl
		      indent [stmt: stmt] dedent]
		  topindent "}"	nl;

	  Here	branch	is  a  list  variable:	branch{0} expands to the first
	  branch (which	is special, as there is	no  else  part),  branch{1:-1}
	  expands  to  all  remaining  branches	 (if any). stmt	is also	a list
	  variable: [stmt: stmt] is a nested list that expands to  a  list  of
	  statements in	the body of the	current	branch.	topindent, indent, de-
	  dent	and  nl	are global variables, and .cond	is a local conditional
	  (their meaning is described below). This code	template could produce
	  the following	code:

	      if x {
		  // do	something
	      }	else if	y {
		  // do	something else
	      }	else {
		  // don't do anything
	      }

	  Here's a list	of all code templates supported	by re2c	with their lo-
	  cal variables	and conditionals. Note that  a	particular  definition
	  may, but does	not have to use	local variables	and conditionals.  Any
	  unused code templates	should be set to <undefined>.

	  code:var_local
		 Declaration  or  definition  of  a  local variable. Supported
		 variables: type (the type of the variable), name  (its	 name)
		 and  init  (initial value, if any). Conditionals: .init (true
		 if there is an	initializer).

	  code:var_global
		 Same as code:var_local, except	that it's used in top-level.

	  code:const_local
		 Definition of a local	constant.  Supported  variables:  type
		 (the type of the constant), name (its name) and init (initial
		 value).

	  code:const_global
		 Same as code:const_local, except that it's used in top-level.

	  code:array_local
		 Definition  of	 a  local  array (table). Supported variables:
		 type (the type	of array elements), name  (array  name),  size
		 (its size), row (a list variable that does not	itself produce
		 any  code, but	expands	list expression	as many	times as there
		 are rows in the table)	and elem (a list variable that expands
		 to all	table elements in the current row -- it's meant	to  be
		 nested	in the row list).

	  code:array_global
		 Same as code:array_local, except that it's used in top-level.

	  code:array_elem
		 Reference  to an element of an	array (table). Supported vari-
		 ables:	array (the name	of the array) and index	(index of  the
		 element).

	  code:enum
		 Definition  of	an enumeration (it may be defined using	a spe-
		 cial language construct for enumerations, or simply as	a  few
		 standalone   constants).    Supported	 variables   are  type
		 (user-defined enumeration type	or  type  of  the  constants),
		 elem  (list variable that expands to the name of each member)
		 and init (initializer for each	member).  Conditionals:	 .init
		 (true if there	is an initializer).

	  code:enum_elem
		 Enumeration  element  (a member of a user-defined enumeration
		 type or a name	of a constant, depending on how	 code:enum  is
		 defined).  Supported variables	are name (the name of the ele-
		 ment) and type	(its type).

	  code:assign
		 Assignment  statement.	Supported variables are	lhs (left hand
		 side) and rhs (right hand side).

	  code:type_int
		 Signed	integer	type.

	  code:type_uint
		 Unsigned integer type.

	  code:type_yybm
		 Type of elements in the yybm table.

	  code:type_yytarget
		 Type of elements in the yytarget table.

	  code:cmp_eq
		 Operator "equals".

	  code:cmp_ne
		 Operator "not equals".

	  code:cmp_lt
		 Operator "less	than".

	  code:cmp_gt
		 Operator "greater than"

	  code:cmp_le
		 Operator "less	or equal"

	  code:cmp_ge
		 Operator "greater or equal"

	  code:if_then_else
		 If-then-else statement	with one or more  branches.  Supported
		 variables:  branch (a list variable that does not itself pro-
		 duce any code,	but expands list expression as many  times  as
		 there	are  branches),	cond (condition	of the current branch)
		 and stmt (a list variable that	expands	to all	statements  in
		 the current branch). Conditionals: .cond (true	if the current
		 branch	has a condition), .many	(true if there's more than one
		 branch).

	  code:if_then_else_oneline
		 A  specialization  of code:if_then_else for the case when all
		 branches have one-line	statements. If	this  is  <undefined>,
		 code:if_then_else is used instead.

	  code:switch
		 A  switch  statement  with one	or more	cases. Supported vari-
		 ables:	expr (the switched-on expression)  and	case  (a  list
		 variable  that	 expands  to  all cases-groups with their code
		 blocks).

	  code:switch_cases
		 A group of switch cases that maps to  a  single  code	block.
		 Supported variables are case (a list variable that expands to
		 all  cases  in	this group) and	stmt (a	list variable that ex-
		 pands to all statements in the	code block.

	  code:switch_cases_oneline
		 A specialization of code:switch_cases for the case  when  the
		 code  block  consists of a single one-line statement. If this
		 is <undefined>, code:switch_cases is used instead.

	  code:switch_case_range
		 A single switch case that covers a range of values  (possibly
		 consisting  of	 a  single  value). Supported variable:	val (a
		 list variable that expands to all values in the range).  Sup-
		 ported	 conditionals:	.many  (true  if there's more than one
		 value in the range) and .char_literals	(true  if  this	 is  a
		 switch	 on  character literals	-- some	languages provide spe-
		 cial syntax for this case).

	  code:switch_case_default
		 Default switch	case.

	  code:loop
		 A loop	that runs forever (unless interrupted  from  the  loop
		 body).	 Supported variables: label (loop label), stmt (a list
		 variable that expands to all statements in the	loop body).

	  code:continue
		 Continue  statement.  Supported  variables: label (label from
		 which to continue execution).

	  code:goto
		 Goto statement. Supported variables: label (label of the jump
		 target).

	  code:fndecl
		 Function declaration.	Supported  variables:  name  (function
		 name),	type (return type), arg	(a list	variable that does not
		 itself	 produce  code,	 but  expands  list expression as many
		 times as there	are function arguments), argname (name of  the
		 current  argument),  argtype  (type of	the current argument).
		 Conditional: .type (true if this is a non-void	function).

	  code:fndef
		 Like code:fndecl, but used for	function  definitions,	so  it
		 has  one  additional  list  variable stmt that	expands	to all
		 statements in the function body.

	  code:fncall
		 Function call statement. Supported variables: name  (function
		 name),	 retval	 (l-value where	the return value is stored, if
		 any) and arg (a list variable that expands  to	 all  function
		 arguments).   Conditionals:  .args  (true if the function has
		 arguments) and	.retval	(true if  return  value	 needs	to  be
		 saved).

	  code:tailcall
		 Tail  call  statement.	 Supported  variables:	name (function
		 name),	and arg	(a list	variable that expands to all  function
		 arguments).   Conditionals:  .args  (true if the function has
		 arguments) and	.retval	(true if this is a non-void function).

	  code:recursive_functions
		 Program body with --recursive-functions code model. Supported
		 variables: fn (a list variable	that does not  itself  produce
		 any  code, but	expands	list expression	as many	times as there
		 are functions), fndecl	(declaration of	the current  function)
		 and fndef (definition of the current function).

	  code:fingerprint
		 The fingerprint at the	top of the generated output file. Sup-
		 ported	variables: ver (re2c version that was used to generate
		 this) and date	(generation date).

	  code:line_info
		 The format of line directives (if this	is set to <undefined>,
		 no directives are generated). Supported variables: line (line
		 number) and file (filename).

	  code:abort
		 A statement that aborts program execution.

	  code:yydebug
		 YYDEBUG  statement,  possibly specialized for different APIs.
		 Supported variables: YYDEBUG, yyrecord, yych (map to the cor-
		 responding re2c: configurations), state (DFA state number).

	  code:yypeek
		 YYPEEK	statement, possibly specialized	 for  different	 APIs.
		 Supported  variables:	YYPEEK,	 YYCTYPE,  YYINPUT,  YYCURSOR,
		 yyrecord, yych	(map to	 the  corresponding  re2c:  configura-
		 tions).  Conditionals:	.cast (true if re2c:yych:conversion is
		 set to	non-zero).

	  code:yyskip
		 YYSKIP	statement, possibly specialized	 for  different	 APIs.
		 Supported  variables:	YYSKIP,	YYCURSOR, yyrecord (map	to the
		 corresponding re2c: configurations).

	  code:yybackup
		 YYBACKUP statement, possibly specialized for different	 APIs.
		 Supported  variables:	YYBACKUP, YYCURSOR, YYMARKER, yyrecord
		 (map to the corresponding re2c: configurations).

	  code:yybackupctx
		 YYBACKUPCTX statement,	 possibly  specialized	for  different
		 APIs.	 Supported  variables:	YYBACKUPCTX,  YYCURSOR,	YYCTX-
		 MARKER, yyrecord (map to the corresponding  re2c:  configura-
		 tions).

	  code:yyskip_yypeek
		 Combined  code:yyskip	and code:yypeek	statement (defaults to
		 code:yyskip followed by code:yypeek).

	  code:yypeek_yyskip
		 Combined code:yypeek and code:yyskip statement	 (defaults  to
		 code:yypeek followed by code:yyskip).

	  code:yyskip_yybackup
		 Combined code:yyskip and code:yybackup	statement (defaults to
		 code:yyskip followed by code:yybackup).

	  code:yybackup_yyskip
		 Combined code:yybackup	and code:yyskip	statement (defaults to
		 code:yybackup followed	by code:yyskip).

	  code:yybackup_yypeek
		 Combined code:yybackup	and code:yypeek	statement (defaults to
		 code:yybackup followed	by code:yypeek).

	  code:yyskip_yybackup_yypeek
		 Combined code:yyskip, code:yybackup and code:yypeek statement
		 (defaults  to``code:yyskip``  followed	 by code:yybackup fol-
		 lowed by code:yypeek).

	  code:yybackup_yypeek_yyskip
		 Combined code:yybackup, code:yypeek and code:yyskip statement
		 (defaults to``code:yybackup`` followed	 by  code:yypeek  fol-
		 lowed by code:yyskip).

	  code:yyrestore
		 YYRESTORE statement, possibly specialized for different APIs.
		 Supported  variables: YYRESTORE, YYCURSOR, YYMARKER, yyrecord
		 (map to the corresponding re2c: configurations).

	  code:yyrestorectx
		 YYRESTORECTX statement, possibly  specialized	for  different
		 APIs.	 Supported  variables:	YYRESTORECTX, YYCURSOR,	YYCTX-
		 MARKER, yyrecord (map to the corresponding  re2c:  configura-
		 tions).

	  code:yyrestoretag
		 YYRESTORETAG  statement,  possibly  specialized for different
		 APIs.	Supported variables: YYRESTORETAG, YYCURSOR,  yyrecord
		 (map  to  the	corresponding  re2c: configurations), tag (the
		 name of tag variable used to restore position).

	  code:yyshift
		 YYSHIFT statement, possibly specialized for  different	 APIs.
		 Supported  variables: YYSHIFT,	YYCURSOR, yyrecord (map	to the
		 corresponding re2c: configurations), offset  (the  number  of
		 code units to shift the current position).

	  code:yyshiftstag
		 YYSHIFTSTAG  statement,  possibly  specialized	 for different
		 APIs.	Supported variables: YYSHIFTSTAG,  yyrecord,  negative
		 (map  to  the	corresponding  re2c: configurations), tag (tag
		 variable which	needs to be shifted), offset  (the  number  of
		 code  units to	shift).	Conditionals: .nested (true if this is
		 a nested  tag	--  in	this  case  its	 value	may  equal  to
		 re2c:tags:negative, which should not be shifted).

	  code:yyshiftmtag
		 YYSHIFTMTAG  statement,  possibly  specialized	 for different
		 APIs.	Supported variables: YYSHIFTMTAG (maps to  the	corre-
		 sponding  re2c: configuration), tag (tag variable which needs
		 to be shifted), offset	(the number of code units to shift).

	  code:yystagp
		 YYSTAGP statement, possibly specialized for  different	 APIs.
		 Supported  variables: YYSTAGP,	YYCURSOR, yyrecord (map	to the
		 corresponding re2c: configurations), tag (tag	variable  that
		 should	be updated).

	  code:yymtagp
		 YYMTAGP  statement,  possibly specialized for different APIs.
		 Supported variables: YYMTAGP (maps to the corresponding re2c:
		 configuration), tag (tag variable that	should be updated).

	  code:yystagn
		 YYSTAGN statement, possibly specialized for  different	 APIs.
		 Supported  variables: YYSTAGN,	negative, yyrecord (map	to the
		 corresponding re2c: configurations), tag (tag	variable  that
		 should	be updated).

	  code:yymtagn
		 YYMTAGN  statement,  possibly specialized for different APIs.
		 Supported variables: YYMTAGN (maps to the corresponding re2c:
		 configuration), tag (tag variable that	should be updated).

	  code:yycopystag
		 YYCOPYSTAG  statement,	 possibly  specialized	for  different
		 APIs.	 Supported variables: YYCOPYSTAG, yyrecord (map	to the
		 corresponding re2c: configurations), lhs, rhs (left and right
		 hand side tag variables of the	copy operation).

	  code:yycopymtag
		 YYCOPYMTAG  statement,	 possibly  specialized	for  different
		 APIs.	 Supported variables: YYCOPYMTAG, yyrecord (map	to the
		 corresponding re2c: configurations), lhs, rhs (left and right
		 hand side tag variables of the	copy operation).

	  code:yygetaccept
		 YYGETACCEPT statement,	 possibly  specialized	for  different
		 APIs.	Supported variables: YYGETACCEPT, yyrecord (map	to the
		 corresponding	re2c: configurations), var (maps to re2c:yyac-
		 cept configuration).

	  code:yysetaccept
		 YYSETACCEPT statement,	 possibly  specialized	for  different
		 APIs.	Supported variables: YYSETACCEPT, yyrecord (map	to the
		 corresponding	re2c: configurations), var (maps to re2c:yyac-
		 cept configuration) and val (numeric value  of	 the  accepted
		 rule).

	  code:yygetcond
		 YYGETCOND statement, possibly specialized for different APIs.
		 Supported  variables:	YYGETCOND, yyrecord (map to the	corre-
		 sponding re2c:	configurations), var (maps to re2c:yycond con-
		 figuration).

	  code:yysetcond
		 YYSETCOND statement, possibly specialized for different APIs.
		 Supported variables: YYSETCOND, yyrecord (map to  the	corre-
		 sponding re2c:	configurations), var (maps to re2c:yycond con-
		 figuration) and val (numeric condition	identifier).

	  code:yygetstate
		 YYGETSTATE  statement,	 possibly  specialized	for  different
		 APIs.	Supported variables: YYGETSTATE, yyrecord (map to  the
		 corresponding	re2c:  configurations),	var (maps to re2c:yys-
		 tate configuration).

	  code:yysetstate
		 YYSETSTATE  statement,	 possibly  specialized	for  different
		 APIs.	 Supported variables: YYSETSTATE, yyrecord (map	to the
		 corresponding re2c: configurations), var (maps	 to  re2c:yys-
		 tate configuration) and val (state number).

	  code:yylessthan
		 YYLESSTHAN  statement,	 possibly  specialized	for  different
		 APIs.	Supported variables:  YYLESSTHAN,  YYCURSOR,  YYLIMIT,
		 yyrecord  (map	 to  the  corresponding	re2c: configurations),
		 need (the number of code  units  to  check  against).	Condi-
		 tional: .many (true if	the need is more than one).

	  code:yybm_filter
		 Condition that	is used	to filter out yych values that are not
		 covered by the	yybm table (used with --bitmaps	option).  Sup-
		 ported	variable: yych (maps to	re2c:yych configuration).

	  code:yybm_match
		 The  format of	yybm table check (generated with --bitmaps op-
		 tion).	Supported variables: yybm, yych	 (map  to  the	corre-
		 sponding  re2c:  configurations),  offset (offset in the yybm
		 table that needs to be	added to yych) and mask	(bit mask that
		 should	be applied to the table	entry to retrieve the  boolean
		 value that needs to be	checked)

	  Here's  a  list  of  all global variables that are allowed in	syntax
	  files:

	  nl	 A newline.

	  indent A variable that does not produce any code, but	has a side-ef-
		 fect of increasing indentation	level.

	  dedent A variable that does not produce any code, but	has a side-ef-
		 fect of decreasing indentation	level.

	  topindent
		 Indentation string for	 the  current  statement.  Indentation
		 level is tracked and automatically updated by the code	gener-
		 ator.

	  Here's  a list of all	global conditionals that are allowed in	syntax
	  files:

	  .api.simple
		 True if simple	API is used (--api simple or re2c:api  =  sim-
		 ple).

	  .api.generic
		 True  if  generic  API	 is  used (--api generic or re2c:api =
		 generic).

	  .api.record
		 True if record	API  is	 used  (--api  record  or  re2c:api  =
		 record).

	  .api_style.functions
		 True  if  function-like  API  style is	used (re2c:api-style =
		 functions).

	  .api_style.freeform
		 True  if  free-form  API  style  is  used  (re2c:api-style  =
		 free-form).

	  .case_ranges
		 True  if  case	 ranges	 feature  is enabled (--case-ranges or
		 re2c:case-ranges = 1).

	  .code_model.goto_label
		 True if  code model based on goto/label is  used  (--goto-la-
		 bel).

	  .code_model.loop_switch
		 True	if   code   model   based   on	 loop/switch  is  used
		 (--loop-switch).

	  .code_model.recursive_functions
		 True if code model  based  on	recursive  functions  is  used
		 (--recursive-function).

	  .date	 True  if  the generated fingerprint should contain generation
		 date.

	  .loop_label
		 True if re2c generated	loops  must  have  a  label  (re2c:la-
		 bel:yyloop is set to a	nonempty string).

	  .monadic
		 True  if the generated	code should be monadic (re2c:monadic =
		 1).  This is only relevant for	pure functional	languages.

	  .start_conditions
		 True if start conditions are enabled (--start-conditions).

	  .storable_state
		 True if storable state	is enabled (--storable-state).

	  .unsafe
		 True if re2c should use "unsafe" blocks in order to  generate
		 faster	 code  (--unsafe, re2c:unsafe =	1). This is only rele-
		 vant for languages that have "unsafe" feature.

	  .version
		 True if the generated fingerprint should  contain  re2c  ver-
		 sion.

HANDLING THE END OF INPUT
       One  of the main	problems for the lexer is to know when to stop.	 There
       are a few terminating conditions:

        the lexer may match some rule (including default rule *) and come  to
	 a final state

        the lexer may fail to match any rule and come to a default state

        the lexer may reach the end of	input

       The  first  two	conditions  terminate the lexer	in a "natural" way: it
       comes to	a state	with no	outgoing transitions, and the  matching	 auto-
       matically  stops.  The  third condition,	end of input, is different: it
       may happen in any state,	and the	lexer should be	 able  to  handle  it.
       Checking	 for the end of	input interrupts the normal lexer workflow and
       adds conditional	branches to the	generated  program,  therefore	it  is
       necessary  to  minimize	the number of such checks. re2c	supports a few
       different methods for handling the end of input.	Which one to  use  de-
       pends on	the complexity of regular expressions, the need	for buffering,
       performance  considerations  and	other factors. Here is a list of meth-
       ods:

        Sentinel.  This method	eliminates the	need  for  the	end  of	 input
	 checks	 altogether.  It  is  simple and efficient, but	limited	to the
	 case when there is a natural "sentinel" character that	can never  oc-
	 cur  in valid input. This character may still occur in	invalid	input,
	 but it	should not be allowed by the regular expressions, except  per-
	 haps as the last character of a rule. The sentinel is appended	at the
	 end  of  input	and serves as a	stop signal: when the lexer reads this
	 character, it is either a syntax error	or the end of input.  In  both
	 cases	the  lexer  should stop. This method is	used if	YYFILL is dis-
	 abled with re2c:yyfill:enable = 0; and	re2c:eof has the default value
	 -1.

        Sentinel with bounds checks.  This method is generic: it  allows  one
	 to  handle any	input without restrictions on the regular expressions.
	 The idea is to	reduce the number of end of input checks by performing
	 them only on certain characters. Similar to  the  "sentinel"  method,
	 one  of  the characters is chosen as a	"sentinel" and appended	at the
	 end of	input. However,	there is no restriction	on where the  sentinel
	 may  occur  (in  fact,	 any  character	can be chosen for a sentinel).
	 When the lexer	reads  this  character,	 it  additionally  performs  a
	 bounds	 check.	  If  the current position is within bounds, the lexer
	 resumes matching and handles the sentinel  as	a  regular  character.
	 Otherwise it invokes YYFILL (unless it	is disabled). If more input is
	 supplied,  the	 lexer will rematch the	last character and continue as
	 if the	sentinel wasn't	there. Otherwise it must be the	 real  end  of
	 input,	 and  the  lexer  stops. This method is	used when re2c:eof has
	 non-negative value (it	should be set to the numeric value of the sen-
	 tinel). YYFILL	is optional.

        Bounds	checks with padding.  This method is generic, and  it  may  be
	 faster	 than the "sentinel with bounds	checks"	method,	but it is also
	 more complex. The idea	is to partition	DFA states into	strongly  con-
	 nected	 components  (SCCs)  and  generate  a single check per SCC for
	 enough	characters to cover the	longest	non-looping path in this  SCC.
	 This  reduces the number of checks, but there is a problem with short
	 lexemes at the	end of input, as the check requires enough  characters
	 to  cover  the	longest	lexeme.	This can be fixed by padding the input
	 with a	few fake characters that do not	form a valid lexeme suffix (so
	 that the lexer	cannot match them). The	length of  padding  should  be
	 YYMAXFILL,  generated with a max block. If there is not enough	input,
	 the lexer invokes YYFILL which	should supply at  least	 the  required
	 number	of characters or not return.  This method is used if YYFILL is
	 enabled and re2c:eof is -1 (this is the default configuration).

        Custom	 checks.   Generic API allows one to override basic operations
	 like reading a	character, which makes	it  possible  to  include  the
	 end-of-input  checks  as  part	of them.  This approach	is error-prone
	 and should be used with caution.  To  use  a  custom  method,	enable
	 generic  API  with --api custom or re2c:api = custom; and disable de-
	 fault bounds checks with re2c:yyfill:enable = 0; or re2c:yyfill:check
	 = 0;.

       The following subsections contain an example of each method.

   Sentinel
       This example uses a sentinel character to handle	the end	of input.  The
       program	counts	space-separated	words in a null-terminated string. The
       sentinel	is null: it is the last	character of each input	string,	and it
       is not allowed in the middle of a lexeme	by any of the rules  (in  par-
       ticular,	 it  is	 not  included in character ranges where it is easy to
       overlook). If a null occurs in the middle of a string, it is  a	syntax
       error  and  the lexer will match	default	rule *,	but it won't read past
       the end of  input  or  crash  (use  -Wsentinel-in-midrule  warning  and
       re2c:sentinel  configuration  to	 verify	 this).	Configuration re2c:yy-
       fill:enable = 0;	suppresses the generation of bounds checks and	YYFILL
       invocations.

	  // re2c $INPUT -o $OUTPUT
	  #include <assert.h>

	  // Expect a null-terminated string.
	  static int lex(const char *YYCURSOR) {
	      int count	= 0;

	      for (;;) {
	      /*!re2c
		  re2c:define:YYCTYPE =	char;
		  re2c:yyfill:enable = 0;

		  *	 { return -1; }
		  [\x00] { return count; }
		  [a-z]+ { ++count; continue; }
		  [ ]+	 { continue; }
	      */
	      }
	  }

	  int main() {
	      assert(lex("") ==	0);
	      assert(lex("one two three") == 3);
	      assert(lex("f0ur") == -1);
	      return 0;
	  }

   Sentinel with bounds	checks
       This  example uses sentinel with	bounds checks to handle	the end	of in-
       put (this  method  was  added  in  version  1.2).  The  program	counts
       space-separated	single-quoted strings. The sentinel character is null,
       which is	specified with re2c:eof	= 0; configuration. As in the sentinel
       method, null is the last	character of each input	string,	but it is  al-
       lowed in	the middle of a	rule (for example, 'aaa\0aa'\0 is valid	input,
       but  'aaa\0  is	a  syntax error).  Bounds checks are generated in each
       state that matches an input character,  but  they  are  scoped  to  the
       branch  that handles null. Bounds checks	are of the form	YYLIMIT	<= YY-
       CURSOR or YYLESSTHAN(1) with generic API. If  the  check	 condition  is
       true,  lexer  has  reached  the end of input and	should stop (YYFILL is
       disabled	with re2c:yyfill:enable	=  0;  as  the	input  fits  into  one
       buffer,	see  the YYFILL	with sentinel section for an example that uses
       YYFILL).	Reaching the end of input opens	three  possibilities:  if  the
       lexer  is  in  the initial state	it will	match the end-of-input rule $,
       otherwise it may	fallback to a previously matched rule  (including  de-
       fault	rule	*)    or    go	  to	a   default   state,   causing
       -Wundefined-control-flow.

	  // re2c $INPUT -o $OUTPUT
	  #include <assert.h>

	  // Expect a null-terminated string.
	  static int lex(const char *str, unsigned int len) {
	      const char *YYCURSOR = str, *YYLIMIT = str + len,	*YYMARKER;
	      int count	= 0;

	      for (;;) {
	      /*!re2c
		  re2c:define:YYCTYPE =	char;
		  re2c:yyfill:enable = 0;
		  re2c:eof = 0;

		  str =	['] ([^'\\] | [\\][^])*	['];

		  *    { return	-1; }
		  $    { return	count; }
		  str  { ++count; continue; }
		  [ ]+ { continue; }
	      */
	      }
	  }

	  #define TEST(s, r) assert(lex(s, sizeof(s) - 1) == r)
	  int main() {
	      TEST("", 0);
	      TEST("'qu\0tes' 'are' 'fine: \\''	", 3);
	      TEST("'unterminated\\'", -1);
	      return 0;
	  }

   Bounds checks with padding
       This example uses bounds	checks with padding to handle the end of input
       (this method is enabled by default). The	program	counts space-separated
       single-quoted strings. There is a padding of YYMAXFILL null  characters
       appended	 at  the  end of input,	where YYMAXFILL	value is autogenerated
       with a max block. It is not necessary to	use null for padding  ---  any
       characters  can be used as long as they do not form a valid lexeme suf-
       fix (in this example padding should not contain single quotes, as  they
       may  be	mistaken  for  a suffix	of a single-quoted string). There is a
       "stop" rule that	matches	the first padding character (null) and	termi-
       nates  the  lexer  (note	 that it checks	if null	is at the beginning of
       padding,	otherwise it is	a syntax error). Bounds	checks	are  generated
       only  in	some states that are determined	by the strongly	connected com-
       ponents of the underlying automaton. Checks have	the  form  (YYLIMIT  -
       YYCURSOR) < n or	YYLESSTHAN(n) with generic API,	where n	is the minimum
       number  of characters that are needed for the lexer to proceed (it also
       means that the next bounds check	will occur in at most  n  characters).
       If  the check condition is true,	the lexer has reached the end of input
       and will	invoke YYFILL(n) that should either supply at  least  n	 input
       characters  or not return. In this example YYFILL always	fails and ter-
       minates the lexer with an error (which is fine because the  input  fits
       into  one  buffer).  See	the YYFILL with	padding	section	for an example
       that refills the	input buffer with YYFILL.

	  // re2c $INPUT -o $OUTPUT
	  #include <assert.h>
	  #include <stdlib.h>
	  #include <string.h>

	  /*!max:re2c*/

	  static int lex(const char *str, unsigned int len) {
	      // Make a	copy of	the string with	YYMAXFILL zeroes at the	end.
	      char *buf	= (char*) malloc(len + YYMAXFILL);
	      memcpy(buf, str, len);
	      memset(buf + len,	0, YYMAXFILL);

	      const char *YYCURSOR = buf, *YYLIMIT = buf + len + YYMAXFILL;
	      int count	= 0;

	  loop:
	      /*!re2c
		  re2c:api:style = free-form;
		  re2c:define:YYCTYPE =	char;
		  re2c:define:YYFILL = "goto fail;";

		  str =	['] ([^'\\] | [\\][^])*	['];

		  [\x00] {
		      // Check that it is the sentinel,	not some unexpected null.
		      if (YYCURSOR - 1 == buf +	len) goto exit;	else goto fail;
		  }
		  str  { ++count; goto loop; }
		  [ ]+ { goto loop; }
		  *    { goto fail; }
	      */

	  fail:
	      count = -1;

	  exit:
	      free(buf);
	      return count;
	  }

	  #define TEST(s, r) assert(lex(s, sizeof(s) - 1) == r)
	  int main() {
	      TEST("", 0);
	      TEST("'qu\0tes' 'are' 'fine: \\''	", 3);
	      TEST("'unterminated\\'", -1);
	      TEST("'unexpected	\0 null\\'", -1);
	      return 0;
	  }

   Custom checks
       This example uses  a  custom  end-of-input  handling  method  based  on
       generic API.  The program counts	space-separated	single-quoted strings.
       It  is  the  same as the	sentinel example, except that the input	is not
       null-terminated.	To cover up for	the absence of a sentinel character at
       the end of input, YYPEEK	is redefined to	perform	a bounds check	before
       it  reads the next input	character.  This is inefficient	because	checks
       are done	very often. If the check condition fails, YYPEEK  returns  the
       real character, otherwise it returns a fake sentinel character.

	  // re2c $INPUT -o $OUTPUT
	  #include <assert.h>
	  #include <stdlib.h>
	  #include <string.h>

	  static int lex(const char *str, unsigned int len) {
	      // For the sake of example create	a string without terminating null.
	      char *buf	= (char*) malloc(len);
	      memcpy(buf, str, len);

	      const char *cur =	buf, *lim = buf	+ len;
	      int count	= 0;

	      for (;;) {
	      /*!re2c
		  re2c:yyfill:enable = 0;
		  re2c:api = custom;
		  re2c:api:style = free-form;
		  re2c:define:YYCTYPE =	char;
		  re2c:define:YYPEEK = "cur < lim ? *cur : 0";	// fake	null
		  re2c:define:YYSKIP = "++cur;";

		  *	 { count = -1; break; }
		  [\x00] { break;; }
		  [a-z]+ { ++count; continue;; }
		  [ ]+	 { continue; }
	      */
	      }

	      free(buf);
	      return count;
	  }

	  #define TEST(s, r) assert(lex(s, sizeof(s) - 1) == r)
	  int main() {
	      TEST("", 0);
	      TEST("one	two three ", 3);
	      TEST("f0ur", -1);
	      return 0;
	  }

BUFFER REFILLING
       The need	for buffering arises when the input cannot be mapped in	memory
       all at once: either it is too large, or it comes	in a streaming fashion
       (like  reading  from a socket). The usual technique in such cases is to
       allocate	a fixed-sized memory buffer and	process	input in  chunks  that
       fit  into  the buffer. When the current chunk is	processed, it is moved
       out and new data	is moved in. In	practice it is somewhat	more  complex,
       because	lexer state consists not of a single input position, but a set
       of interrelated positions:

        cursor: the next input	character to be	read (YYCURSOR	in  C  pointer
	 API or	YYSKIP/YYPEEK in generic API)

        limit:	the position after the last available input character (YYLIMIT
	 in C pointer API, implicitly handled by YYLESSTHAN in generic API)

        marker:  the  position	 of the	most recent match, if any (YYMARKER in
	 default API or	YYBACKUP/YYRESTORE in generic API)

        token:	the start of the current lexeme	(implicit in re2c API,	as  it
	 is  not  needed for the normal	lexer operation	and can	be defined and
	 updated by the	user)

        context marker: the position of the trailing context (YYCTXMARKER  in
	 C pointer API or YYBACKUPCTX/YYRESTORECTX in generic API)

        tag  variables:  submatch  positions  (defined	 with  stags and mtags
	 blocks	and generic API	primitives YYSTAGP/YYSTAGN/YYMTAGP/YYMTAGN)

       Not all these are used in every case, but if used, they must be updated
       by YYFILL. All active positions are contained in	 the  segment  between
       token  and  cursor, therefore everything	between	buffer start and token
       can be discarded, the segment from token	and  up	 to  limit  should  be
       moved  to  the  beginning  of  buffer, and the free space at the	end of
       buffer should be	filled with new	data.  In order	to avoid frequent  YY-
       FILL  calls  it is best to fill in as many input	characters as possible
       (even though fewer characters might suffice to resume the  lexer).  The
       details	of  YYFILL  implementation are slightly	different depending on
       which EOF handling method is used: the case of  EOF  rule  is  somewhat
       simpler	than  the case of bounds-checking with padding.	Also note that
       if -f --storable-state option is	used, YYFILL  has  slightly  different
       semantics (described in the section about storable state).

   YYFILL with sentinel
       If  EOF	rule is	used, YYFILL is	a function-like	primitive that accepts
       no arguments and	returns	a value	which is checked against zero.	YYFILL
       invocation  is  triggered by condition YYLIMIT <= YYCURSOR in C pointer
       API and YYLESSTHAN() in generic API. A non-zero return value means that
       YYFILL has failed. A successful YYFILL call must	supply	at  least  one
       character  and adjust input positions accordingly. Limit	must always be
       set to one after	the last input position	in buffer, and	the  character
       at the limit position must be the sentinel symbol specified by re2c:eof
       configuration.  The pictures below show the relative locations of input
       positions in buffer before and after YYFILL call	 (sentinel  symbol  is
       marked  with #, and the second picture shows the	case when there	is not
       enough input to fill the	whole buffer).

			 <-- shift -->
		       >-A------------B---------C-------------D#-----------E->
		       buffer	    token    marker	    limit,
							    cursor
	  >-A------------B---------C-------------D------------E#->
		       buffer,	marker	      cursor	    limit
		       token

			 <-- shift -->
		       >-A------------B---------C-------------D#--E (EOF)
		       buffer	    token    marker	    limit,
							    cursor
	  >-A------------B---------C-------------D---E#........
		       buffer,	marker	     cursor limit
		       token

       Here is an example of a program that  reads  input  file	 input.txt  in
       chunks of 4096 bytes and	uses EOF rule.

	  // re2c $INPUT -o $OUTPUT
	  #include <assert.h>
	  #include <stdio.h>
	  #include <string.h>

	  #define BUFSIZE 4095

	  typedef struct {
	      FILE *file;
	      char buf[BUFSIZE + 1], *lim, *cur, *mar, *tok; //	+1 for sentinel
	      int eof;
	  } Input;

	  static int fill(Input	*in) {
	      if (in->eof) return 1;

	      const size_t shift = in->tok - in->buf;
	      const size_t used	= in->lim - in->tok;

	      // Error:	lexeme too long. In real life could reallocate a larger	buffer.
	      if (shift	< 1) return 2;

	      // Shift buffer contents (discard	everything up to the current token).
	      memmove(in->buf, in->tok,	used);
	      in->lim -= shift;
	      in->cur -= shift;
	      in->mar -= shift;
	      in->tok -= shift;

	      // Fill free space at the	end of buffer with new data from file.
	      in->lim += fread(in->lim,	1, BUFSIZE - used, in->file);
	      in->lim[0] = 0;
	      in->eof =	in->lim	< in->buf + BUFSIZE;
	      return 0;
	  }

	  static int lex(Input *in) {
	      int count	= 0;
	  loop:
		  in->tok = in->cur;
	      /*!re2c
		  re2c:api:style = free-form;
		  re2c:define:YYCTYPE =	char;
		  re2c:define:YYCURSOR = in->cur;
		  re2c:define:YYMARKER = in->mar;
		  re2c:define:YYLIMIT =	in->lim;
		  re2c:define:YYFILL = "fill(in) == 0";
		  re2c:eof = 0;

		  str =	['] ([^'\\] | [\\][^])*	['];

		  *    { return	-1; }
		  $    { return	count; }
		  str  { ++count; goto loop; }
		  [ ]+ { goto loop; }
	      */
	  }

	  int main() {
	      const char *fname	= "input";
	      const char content[] = "'qu\0tes'	'are' 'fine: \\'' ";

	      // Prepare input file: a few times the size of the buffer, containing
	      // strings with zeroes and escaped quotes.
	      FILE *f =	fopen(fname, "w");
	      for (int i = 0; i	< BUFSIZE; ++i)	{
		  fwrite(content, 1, sizeof(content) - 1, f);
	      }
	      fclose(f);
	      int count	= 3 * BUFSIZE; // number of quoted strings written to file

	      // Initialize lexer state: all pointers are at the end of	buffer.
	      Input in;
	      in.file =	fopen(fname, "r");
	      in.cur = in.mar =	in.tok = in.lim	= in.buf + BUFSIZE;
	      in.eof = 0;
	      // Sentinel (at YYLIMIT pointer) is set to zero, which triggers YYFILL.
	      in.lim[0]	= 0;

	      // Run the lexer.
	      assert(lex(&in) == count);

	      // Cleanup: remove input file.
	      fclose(in.file);
	      remove(fname);
	      return 0;
	  }

   YYFILL with padding
       In  the	default	 case  (when  EOF  rule	is not used) YYFILL is a func-
       tion-like primitive that	accepts	a single argument and does not	return
       any  value.  YYFILL invocation is triggered by condition	(YYLIMIT - YY-
       CURSOR) < n in C	pointer	API and	YYLESSTHAN(n) in generic API. The  ar-
       gument  passed  to YYFILL is the	minimal	number of characters that must
       be supplied. If it fails	to do so, YYFILL must not return to the	 lexer
       (for  that  reason  it is best implemented as a macro that returns from
       the calling function on failure).  In case of a successful YYFILL invo-
       cation the limit	position must be set either to one after the last  in-
       put position in buffer, or to the end of	YYMAXFILL padding (in case YY-
       FILL  has  successfully	read  at least n characters, but not enough to
       fill the	entire buffer).	The pictures below show	the relative locations
       of input	positions in buffer before and after YYFILL invocation (YYMAX-
       FILL padding on the second picture is marked with # symbols).

			 <-- shift -->		       <-- need	-->
		       >-A------------B---------C-----D-------E---F--------G->
		       buffer	    token    marker cursor  limit

	  >-A------------B---------C-----D-------E---F--------G->
		       buffer,	marker cursor		    limit
		       token

			 <-- shift -->		       <-- need	-->
		       >-A------------B---------C-----D-------E-F	 (EOF)
		       buffer	    token    marker cursor  limit

	  >-A------------B---------C-----D-------E-F###############
		       buffer,	marker cursor			limit
		       token			    <- YYMAXFILL ->

       Here is an example of a program that  reads  input  file	 input.txt  in
       chunks of 4096 bytes and	uses bounds-checking with padding.

	  // re2c $INPUT -o $OUTPUT
	  #include <assert.h>
	  #include <stdio.h>
	  #include <string.h>

	  /*!max:re2c*/
	  #define BUFSIZE (4096	- YYMAXFILL)

	  typedef struct {
	      FILE *file;
	      char buf[BUFSIZE + YYMAXFILL], *lim, *cur, *tok;
	      int eof;
	  } Input;

	  static int fill(Input	*in, size_t need) {
	      if (in->eof) return 1;

	      const size_t shift = in->tok - in->buf;
	      const size_t used	= in->lim - in->tok;

	      // Error:	lexeme too long. In real life could reallocate a larger	buffer.
	      if (shift	< need)	return 2;

	      // Shift buffer contents (discard	everything up to the current token).
	      memmove(in->buf, in->tok,	used);
	      in->lim -= shift;
	      in->cur -= shift;
	      in->tok -= shift;

	      // Fill free space at the	end of buffer with new data from file.
	      in->lim += fread(in->lim,	1, BUFSIZE - used, in->file);

	      // If read less than expected, this is end of input => add zero padding
	      // so that the lexer can access characters at the	end of buffer.
	      if (in->lim < in->buf + BUFSIZE) {
		  in->eof = 1;
		  memset(in->lim, 0, YYMAXFILL);
		  in->lim += YYMAXFILL;
	      }

	      return 0;
	  }

	  static int lex(Input *in) {
	      int count	= 0;
	  loop:
		  in->tok = in->cur;
	      /*!re2c
		  re2c:api:style = free-form;
		  re2c:define:YYCTYPE =	char;
		  re2c:define:YYCURSOR = in->cur;
		  re2c:define:YYLIMIT =	in->lim;
		  re2c:define:YYFILL = "if (fill(in, @@) != 0) return -1;";

		  str =	['] ([^'\\] | [\\][^])*	['];

		  [\x00] {
		      // Check that it is the sentinel,	not some unexpected null.
		      return in->tok ==	in->lim	- YYMAXFILL ? count : -1;
		  }
		  str  { ++count; goto loop; }
		  [ ]+ { goto loop; }
		  *    { return	-1; }
	      */
	  }

	  int main() {
	      const char *fname	= "input";
	      const char content[] = "'qu\0tes'	'are' 'fine: \\'' ";

	      // Prepare input file: a few times the size of the buffer, containing
	      // strings with zeroes and escaped quotes.
	      FILE *f =	fopen(fname, "w");
	      for (int i = 0; i	< BUFSIZE; ++i)	{
		  fwrite(content, 1, sizeof(content) - 1, f);
	      }
	      fclose(f);
	      int count	= 3 * BUFSIZE; // number of quoted strings written to file

	      // Initialize lexer state: all pointers are at the end of	buffer.
	      // This immediately triggers YYFILL, as the check	`in->cur < in->lim` fails.
	      Input in;
	      in.file =	fopen(fname, "r");
	      in.cur = in.tok =	in.lim = in.buf	+ BUFSIZE;
	      in.eof = 0;

	      // Run the lexer.
	      assert(lex(&in) == count);

	      // Cleanup: remove input file.
	      fclose(in.file);
	      remove(fname);
	      return 0;
	  }

FEATURES
   Multiple blocks
       Sometimes it is necessary to have multiple interrelated lexers (for ex-
       ample,  if there	is a high-level	state machine that transitions between
       lexer modes). This can be implemented  using  multiple  connected  re2c
       blocks. Another option is to use	start conditions.

       The  implementation of connections between blocks depends on the	target
       language.  In languages that have goto statement	(such as C/C++ and Go)
       one can have all	blocks in one function,	each of	them prefixed  with  a
       label.  Transition from one block to another is a simple	goto.  In lan-
       guages that do not have goto (such as Rust) it is necessary  to	use  a
       loop  with  a  switch  on  a  state  variable,  similar	to the yystate
       loop/switch generated by	re2c, or else wrap each	block  in  a  function
       and use function	calls.

       The  example below uses multiple	blocks to parse	binary,	octal, decimal
       and hexadecimal numbers.	Each base has its own block. The initial block
       determines base and dispatches to other blocks.	Common	configurations
       are  defined  in	a separate block at the	beginning of the program; they
       are inherited by	the other blocks.

	  // re2c $INPUT -o $OUTPUT -i
	  #include <stdint.h>
	  #include <limits.h>
	  #include <assert.h>

	  static const uint64_t	ERROR =	UINT64_MAX;

	  #define CHECK(n) if (n > UINT32_MAX) return ERROR;

	  static uint64_t parse_u32(const char *s) {
	      const char *YYCURSOR = s,	*YYMARKER;
	      uint64_t u = 0;

	      /*!re2c
		  re2c:yyfill:enable = 0;
		  re2c:define:YYCTYPE =	char;

		  end =	"\x00";

		  '0b' / [01]	     { goto bin; }
		  "0"		     { goto oct; }
		  "" / [1-9]	     { goto dec; }
		  '0x' / [0-9a-fA-F] { goto hex; }
		  *		     { return ERROR; }
	      */
	  bin:
	      /*!re2c
		  end	{ return u; }
		  [01]	{ u = u	* 2 + (YYCURSOR[-1] - '0'); CHECK(u); goto bin;	}
		  *	{ return ERROR;	}
	      */
	  oct:
	      /*!re2c
		  end	{ return u; }
		  [0-7]	{ u = u	* 8 + (YYCURSOR[-1] - '0'); CHECK(u); goto oct;	}
		  *	{ return ERROR;	}
	      */
	  dec:
	      /*!re2c
		  end	{ return u; }
		  [0-9]	{ u = u	* 10 + (YYCURSOR[-1] - '0'); CHECK(u); goto dec; }
		  *	{ return ERROR;	}
	      */
	  hex:
	      /*!re2c
		  end	{ return u; }
		  [0-9]	{ u = u	* 16 + (YYCURSOR[-1] - '0');	  CHECK(u); goto hex; }
		  [a-f]	{ u = u	* 16 + (YYCURSOR[-1] - 'a' + 10); CHECK(u); goto hex; }
		  [A-F]	{ u = u	* 16 + (YYCURSOR[-1] - 'A' + 10); CHECK(u); goto hex; }
		  *	{ return ERROR;	}
	      */
	  }

	  int main() {
	      assert(parse_u32("") == ERROR);
	      assert(parse_u32("1234567890") ==	1234567890);
	      assert(parse_u32("0b1101") == 13);
	      assert(parse_u32("0x7Fe")	== 2046);
	      assert(parse_u32("0644") == 420);
	      assert(parse_u32("9999999999") ==	ERROR);
	      return 0;
	  }

   Start conditions
       Start conditions	are enabled with --start-conditions option. They  pro-
       vide  a	way  to	 encode	multiple interrelated automata within the same
       re2c block.

       Each condition corresponds to a single automaton	and has	a unique  name
       specified by the	user and a unique internal number defined by re2c. The
       numbers	are used to switch between conditions: the generated code uses
       YYGETCOND and YYSETCOND primitives to get the current condition or  set
       it  to  the  given  number.  Use	 conditions  block, --header option or
       re2c:header configuration to generate  numeric  condition  identifiers.
       Configuration  re2c:cond:enumprefix  specifies the generated identifier
       prefix.

       In condition mode every rule must be prefixed with a list of comma-sep-
       arated condition	names in angle brackets, or a wildcard <*>  to	denote
       all conditions. The rule	syntax is extended as follows:

	  < condition-list > regular-expression	code
		 A  rule  that	is  merged  to	every  condition on the	condi-
		 tion-list.  It	matches	regular-expression  and	 executes  the
		 associated code.

	  < condition-list > regular-expression	=> condition code
		 A  rule  that	is  merged  to	every  condition on the	condi-
		 tion-list.  It	matches	regular-expression, sets  the  current
		 condition to condition	and executes the associated code.

	  < condition-list > regular-expression	:=> condition
		 A  rule  that	is  merged  to	every  condition on the	condi-
		 tion-list.  It	 matches  regular-expression  and  immediately
		 transitions to	condition (there is no semantic	action).

	  < condition-list > !action code
		 A  rule  that	binds  code  to	the place defined by action in
		 every condition on the	condition-list (see the	 actions  sec-
		 tion for various types	of actions).

	  <! condition-list > code
		 A  rule  that	prepends code to semantic actions of all rules
		 for every condition on	the  condition-list.  This  syntax  is
		 deprecated  and  the  !pre_rule action	should be used instead
		 (it does exactly the same).

	  < > code
		 A rule	that creates a special	entry  condition  with	number
		 zero  and name	"0" that executes code before jumping to other
		 conditions.  This syntax is deprecated, and the !entry	action
		 should	be used	instead	(it provides a more fine-grained  con-
		 trol,	as the code can	be specified on	a per-condition	basis,
		 and one can jump directly to condition	 start	without	 going
		 through condition dispatch).

	  < > => condition code
		 Same  as the previous rule, except that it sets the next con-
		 dition.

	  < > :=> condition
		 Same as the previous rule, except that	it has	no  associated
		 code and immediately jumps to condition.

       The  code  re2c	generates  for conditions depends on whether re2c uses
       goto/label approach or loop/switch approach to encode the automata.

       In languages that have goto statement (such as C/C++ and	Go) conditions
       are naturally implemented as blocks of code prefixed with labels	of the
       form yyc_<cond>,	where cond is a	condition name (label  prefix  can  be
       changed	with re2c:cond:prefix).	Transitions between conditions are im-
       plemented using goto and	condition labels. Before all  conditions  re2c
       generates an initial switch on YYGETSTATE that jumps to the start state
       of  the	current	 condition.  The shortcut rules	:=> bypass the initial
       switch and jump directly	to the specified condition (re2c:cond:goto can
       be used to change the default behavior).	The rules  with	 semantic  ac-
       tions  do  not automatically jump to the	next condition;	this should be
       done by the user-defined	action code.

       In languages that do not	have goto (such	as Rust) re2c reuses the  yys-
       tate variable to	store condition	numbers. Each condition	gets a numeric
       identifier equal	to the number of its start state, and a	switch between
       conditions is no	different than a switch	between	DFA states of a	single
       condition.  There  is  no need for a separate initial condition switch.
       (Since the same approach	is used	to implement storable  states,	YYGET-
       COND/YYSETCOND are redundant if both storable states and	conditions are
       used).

       The program below uses start conditions to parse	binary,	octal, decimal
       and  hexadecimal	 numbers.  There is a single block where each base has
       its own condition, and the initial condition is	connected  to  all  of
       them.  User-defined  variable cond stores the current condition number;
       it is initialized to the	number of the initial condition	generated with
       conditions block.

	  // re2c $INPUT -o $OUTPUT -ci
	  #include <stdint.h>
	  #include <limits.h>
	  #include <assert.h>

	  static const uint64_t	ERROR =	UINT64_MAX;
	  /*!conditions:re2c*/

	  static uint64_t parse_u32(const char *s) {
	      const char *YYCURSOR = s,	*YYMARKER;
	      int c = yycinit;
	      uint64_t u = 0;

	      /*!re2c
		  re2c:api:style = free-form;
		  re2c:define:YYCTYPE =	char;
		  re2c:define:YYGETCONDITION = "c";
		  re2c:define:YYSETCONDITION = "c = @@;";
		  re2c:yyfill:enable = 0;

		  <*> *	{ return ERROR;	}

		  <init> '0b' /	[01]	    :=>	bin
		  <init> "0"		    :=>	oct
		  <init> "" / [1-9]	    :=>	dec
		  <init> '0x' /	[0-9a-fA-F] :=>	hex

		  <bin,	oct, dec, hex> "\x00" {	return u; }

		  <bin>	[01]  {	u = u *	2  + (YYCURSOR[-1] - '0');	goto yyc_bin; }
		  <oct>	[0-7] {	u = u *	8  + (YYCURSOR[-1] - '0');	goto yyc_oct; }
		  <dec>	[0-9] {	u = u *	10 + (YYCURSOR[-1] - '0');	goto yyc_dec; }
		  <hex>	[0-9] {	u = u *	16 + (YYCURSOR[-1] - '0');	goto yyc_hex; }
		  <hex>	[a-f] {	u = u *	16 + (YYCURSOR[-1] - 'a' + 10);	goto yyc_hex; }
		  <hex>	[A-F] {	u = u *	16 + (YYCURSOR[-1] - 'A' + 10);	goto yyc_hex; }
		  <!*> { if (u > UINT32_MAX) return ERROR; }
	      */
	  }

	  int main() {
	      assert(parse_u32("") == ERROR);
	      assert(parse_u32("1234567890") ==	1234567890);
	      assert(parse_u32("0b1101") == 13);
	      assert(parse_u32("0x7Fe")	== 2046);
	      assert(parse_u32("0644") == 420);
	      assert(parse_u32("9999999999") ==	ERROR);
	      return 0;
	  }

   Storable state
       With --storable-state option re2c generates a lexer that	can store  its
       current	state,	return	to the caller, and later resume	operations ex-
       actly where it left off.	The default mode of operation  in  re2c	 is  a
       "pull"  model,  in which	the lexer "pulls" more input whenever it needs
       it. This	may be unacceptable in cases when the input becomes  available
       piece  by piece (for example, if	the lexer is invoked by	the parser, or
       if the lexer program communicates via a socket protocol with some other
       program that must wait for a reply from the lexer before	 it  transmits
       the  next message). Storable state feature is intended exactly for such
       cases: it allows	one to generate	lexers that work in  a	"push"	model.
       When the	lexer needs more input,	it stores its state and	returns	to the
       caller.	Later,	when  more input becomes available, the	caller resumes
       the lexer exactly where it stopped. There are a few  changes  necessary
       compared	to the "pull" model:

        Define	YYSETSTATE() and YYGETSTATE(state) primitives.

        Define	yych, yyaccept (if used) and state variables as	a part of per-
	 sistent lexer state. The state	variable should	be initialized to -1.

        YYFILL	should return to the outer program instead of trying to	supply
	 more input. Return code should	indicate that lexer needs more input.

        The  outer  program should recognize situations when lexer needs more
	 input and respond appropriately.

        Optionally use	getstate block to generate YYGETSTATE switch  detached
	 from  the  main  lexer.  This only works for languages	that have goto
	 (not in --loop-switch mode).

        Use re2c:eof and the sentinel with bounds checks method to handle the
	 end of	input. Padding-based method may	not work because it is unclear
	 when to append	padding: the current end of input may not be the ulti-
	 mate end of input, and	appending padding too early may	cut off	a par-
	 tially	read greedy lexeme.  Furthermore, due  to  high-level  program
	 logic	getting	 more input may	depend on processing the lexeme	at the
	 end of	buffer (which already is blocked due to	the end-of-input  con-
	 dition).

       Here is an example of a "push" model lexer that simulates reading pack-
       ets from	a socket. The lexer loops until	it encounters the end of input
       and returns to the calling function. The	calling	function provides more
       input  by  "sending"  the  next packet and resumes lexing. This process
       stops when all the packets have been sent, or when there	is an error.

	  // re2c $INPUT -o $OUTPUT -f
	  #include <assert.h>
	  #include <stdio.h>
	  #include <string.h>

	  #define DEBUG	0
	  #define LOG(...) if (DEBUG) fprintf(stderr, __VA_ARGS__);

	  // Use a small buffer	to cover the case when a lexeme	doesn't	fit.
	  // In	real world use a larger	buffer.
	  #define BUFSIZE 10

	  typedef struct {
	      FILE *file;
	      char buf[BUFSIZE + 1], *lim, *cur, *mar, *tok;
	      int state;
	  } State;

	  typedef enum {END, READY, WAITING, BAD_PACKET, BIG_PACKET} Status;

	  static Status	fill(State *st)	{
	      const size_t shift = st->tok - st->buf;
	      const size_t used	= st->lim - st->tok;
	      const size_t free	= BUFSIZE - used;

	      // Error:	no space. In real life can reallocate a	larger buffer.
	      if (free < 1) return BIG_PACKET;

	      // Shift buffer contents (discard	already	processed data).
	      memmove(st->buf, st->tok,	used);
	      st->lim -= shift;
	      st->cur -= shift;
	      st->mar -= shift;
	      st->tok -= shift;

	      // Fill free space at the	end of buffer with new data.
	      const size_t read	= fread(st->lim, 1, free, st->file);
	      st->lim += read;
	      st->lim[0] = 0; // append	sentinel symbol

	      return READY;
	  }

	  static Status	lex(State *st, unsigned	int *recv) {
	      char yych;
	      /*!getstate:re2c*/

	      for (;;) {
		  st->tok = st->cur;
	      /*!re2c
		  re2c:api:style = free-form;
		  re2c:define:YYCTYPE =	"char";
		  re2c:define:YYCURSOR = "st->cur";
		  re2c:define:YYMARKER = "st->mar";
		  re2c:define:YYLIMIT =	"st->lim";
		  re2c:define:YYGETSTATE = "st->state";
		  re2c:define:YYSETSTATE = "st->state =	@@;";
		  re2c:define:YYFILL = "return WAITING;";
		  re2c:eof = 0;

		  packet = [a-z]+[;];

		  *	 { return BAD_PACKET; }
		  $	 { return END; }
		  packet { *recv = *recv + 1; continue;	}
	      */
	      }
	  }

	  void test(const char **packets, Status expect) {
	      // Create	a "socket" (open the same file for reading and writing).
	      const char *fname	= "pipe";
	      FILE *fw = fopen(fname, "w");
	      FILE *fr = fopen(fname, "r");
	      setvbuf(fw, NULL,	_IONBF,	0);
	      setvbuf(fr, NULL,	_IONBF,	0);

	      // Initialize lexer state: `state` value is -1, all pointers are at the end
	      // of buffer.
	      State st;
	      st.file =	fr;
	      st.cur = st.mar =	st.tok = st.lim	= st.buf + BUFSIZE;
	      // Sentinel (at YYLIMIT pointer) is set to zero, which triggers YYFILL.
	      st.lim[0]	= 0;
	      st.state = -1;

	      // Main loop. The	buffer contains	incomplete data	which appears packet by
	      // packet. When the lexer	needs more input it saves its internal state and
	      // returns to the	caller which should provide more input and resume lexing.
	      Status status;
	      unsigned int send	= 0, recv = 0;
	      for (;;) {
		  status = lex(&st, &recv);
		  if (status ==	END) {
		      LOG("done: got %u	packets\n", recv);
		      break;
		  } else if (status == WAITING)	{
		      LOG("waiting...\n");
		      if (*packets) {
			  LOG("sent packet %u\n", send);
			  fprintf(fw, "%s", *packets++);
			  ++send;
		      }
		      status = fill(&st);
		      LOG("queue: '%s'\n", st.buf);
		      if (status == BIG_PACKET)	{
			  LOG("error: packet too big\n");
			  break;
		      }
		      assert(status == READY);
		  } else {
		      assert(status == BAD_PACKET);
		      LOG("error: ill-formed packet\n");
		      break;
		  }
	      }

	      // Check results.
	      assert(status == expect);
	      if (status == END) assert(recv ==	send);

	      // Cleanup: remove input file.
	      fclose(fw);
	      fclose(fr);
	      remove(fname);
	  }

	  int main() {
	      const char *packets1[] = {0};
	      const char *packets2[] = {"zero;", "one;", "two;", "three;", "four;", 0};
	      const char *packets3[] = {"zer0;", 0};
	      const char *packets4[] = {"looooooooooong;", 0};

	      test(packets1, END);
	      test(packets2, END);
	      test(packets3, BAD_PACKET);
	      test(packets4, BIG_PACKET);

	      return 0;
	  }

   Reusable blocks
       Reusable	 blocks	 of  the  form	/*!rules:re2c[:<name>]	 ...   */   or
       %{rules[:<name>]	 ... %}	can be reused any number of times and combined
       with other re2c blocks. The <name> is optional. A rules	block  can  be
       used  in	a use block or directive. The code for a rules block is	gener-
       ated at every point of use.

       Use  blocks  are	 defined   with	  /*!use:re2c[:<name>]	 ...   */   or
       %{use[:<name>]  ...  %}.	The <name> is optional:	if it's	not specified,
       the associated rules block is the most recent one (whether named	or un-
       named).	A use block can	 add  named  definitions,  configurations  and
       rules of	its own.  An important use case	for use	blocks is a lexer that
       supports	 multiple input	encodings: the same rules block	is reused mul-
       tiple times with	encoding-specific configurations (see the example  be-
       low).

       In-block	 use  directive	!use:<name>; can be used from inside of	a re2c
       block. It merges	the referenced block <name> into the current  one.  If
       some of the merged rules	and configurations overlap with	the previously
       defined	ones,  conflicts  are  resolved	in the usual way: the earliest
       rule takes priority, and	latest configuration overrides preceding ones.
       One exception are the special rules *, $	and (in	condition  mode)  <!>,
       for  which  a  block-local definition overrides any inherited ones. Use
       directive allows	one to combine different re2c blocks together  in  one
       block (see the example below).

       Named blocks and	in-block use directive were added in re2c version 2.2.
       Since  that  version reusable blocks are	allowed	by default (no special
       option is needed). Before version 2.2 reuse mode	was  enabled  with  -r
       --reusable  option.  Before  version  1.2  reusable blocks could	not be
       mixed with normal blocks.

   Example of a	!use directive
	  // re2c $INPUT -o $OUTPUT
	  #include <assert.h>

	  // This example shows	how to combine reusable	re2c blocks: two blocks
	  // ('colors' and 'fish') are merged into one.	The 'salmon' rule occurs
	  // in	both blocks; the 'fish'	block takes priority because it	is used
	  // earlier. Default rule * occurs in all three blocks; the local (not
	  // inherited)	definition takes priority.

	  typedef enum { COLOR,	FISH, DUNNO } What;

	  /*!rules:re2c:colors
	      *				   { assert(false); }
	      "red" | "salmon" | "magenta" { return COLOR; }
	  */

	  /*!rules:re2c:fish
	      *				   { assert(false); }
	      "haddock"	| "salmon" | "eel" { return FISH; }
	  */

	  static What lex(const	char *s) {
	      const char *YYCURSOR = s,	*YYMARKER;
	      /*!re2c
		  re2c:yyfill:enable = 0;
		  re2c:define:YYCTYPE =	char;

		  !use:fish;
		  !use:colors;
		  * { return DUNNO; }  // overrides inherited '*' rules
	      */
	  }

	  int main() {
	      assert(lex("salmon") == FISH);
	      assert(lex("what?") == DUNNO);
	      return 0;
	  }

   Example of a	/*!use:re2c ...	*/ block
	  // re2c $INPUT -o $OUTPUT --input-encoding utf8
	  #include <assert.h>
	  #include <stdint.h>

	  // This example supports multiple input encodings: UTF-8 and UTF-32.
	  // Both lexers are generated from the	same rules block, and the use
	  // blocks add	only encoding-specific configurations.
	  /*!rules:re2c
	      re2c:yyfill:enable = 0;

	      "x y" { return 0;	}
	      *	      {	return 1; }
	  */

	  static int lex_utf8(const uint8_t *s)	{
	      const uint8_t *YYCURSOR =	s, *YYMARKER;
	      /*!use:re2c
		  re2c:define:YYCTYPE =	uint8_t;
		  re2c:encoding:utf8 = 1;
	      */
	  }

	  static int lex_utf32(const uint32_t *s) {
	      const uint32_t *YYCURSOR = s, *YYMARKER;
	      /*!use:re2c
		  re2c:define:YYCTYPE =	uint32_t;
		  re2c:encoding:utf32 =	1;
	      */
	  }

	  int main() {
	      static const uint8_t s8[]	= // UTF-8
		  { 0xe2, 0x88,	0x80, 0x78, 0x20, 0xe2,	0x88, 0x83, 0x79 };

	      static const uint32_t s32[] = // UTF32
		  { 0x00002200,	0x00000078, 0x00000020,	0x00002203, 0x00000079 };

	      assert(lex_utf8(s8) == 0);
	      assert(lex_utf32(s32) == 0);
	      return 0;
	  }

   Submatch extraction
       re2c has	two options for	submatch extraction.

       Tags   The first	option is to use standalone tags of the	form @stag  or
	      #mtag,  where  stag  and	mtag are arbitrary used-defined	names.
	      Tags are enabled with -T --tags option or	re2c:tags = 1 configu-
	      ration. Semantically tags	are position markers: they can be  in-
	      serted  anywhere	in  a regular expression, and they bind	to the
	      corresponding position (or  multiple  positions)	in  the	 input
	      string.	S-tags	bind to	the last matching position, and	m-tags
	      bind to a	list of	positions (they	 may  be  used	in  repetition
	      subexpressions,  where a single position in a regular expression
	      corresponds to multiple positions	in the input string). All tags
	      should be	defined	by the user, either manually or	with the  help
	      of  svars	 and  mvars blocks. If there is	more than one way tags
	      can be matched against the input,	ambiguity  is  resolved	 using
	      leftmost greedy disambiguation strategy.

       Captures
	      The  second  option is to	use capturing groups. They are enabled
	      with --captures option or	re2c:captures =	1 configuration. There
	      are two flavours for different disambiguation policies,  --left-
	      most-captures  (the default) is for leftmost greedy policy, and,
	      --posix-captures is for POSIX longest-match policy. In this mode
	      all  parenthesized  subexpressions  are	considered   capturing
	      groups,  and a bang can be used to mark non-capturing groups: (!
	      ... ). With --invert-captures option or re2c:invert-captures = 1
	      configuration the	meaning	of bang	is inverted.   The  number  of
	      groups  for  the	matching rule is stored	in a variable yynmatch
	      (the whole regular expression is group number  zero),  and  sub-
	      match  results  are  stored in yypmatch array. Both yynmatch and
	      yypmatch should be defined by the	user, and yypmatch  size  must
	      be  at  least [yynmatch *	2]. Use	maxnmatch block	to  define YY-
	      MAXNMATCH, a constant that equals	to the maximum value  of  yyn-
	      match among all rules.

       Captvars
	      Another  way to use capturing groups is the --captvars option or
	      re2c:captvars = 1	configuration. The only	difference with	--cap-
	      tures is in the way the generated	code stores submatch  results:
	      instead  of  yynmatch  and  yypmatch  re2c  generates  variables
	      yytl<k> and yytr<k> for k-th capturing group  (the  user	should
	      declare  these  using  an	 svars block). Captures	with variables
	      support  two  disambiguation  policies:  --leftmost-captvars  or
	      re2c:leftmost-captvars  =	 1 for leftmost	greedy policy (the de-
	      fault one) and --posix-captvars or re2c:posix-captvars for POSIX
	      longest-match policy.

       Under the hood all these	options	translate into tags and	Tagged	Deter-
       ministic	 Finite	 Automata with Lookahead.  The core idea of TDFA is to
       minimize	the overhead on	 submatch  extraction.	 In  the  extreme,  if
       there're	 no  tags or captures in a regular expression, TDFA is just an
       ordinary	DFA. If	the number of tags is moderate,	the overhead is	barely
       noticeable. The generated TDFA uses a number of tag variables which  do
       not  map	 directly to tags: a single variable may be used for different
       tags, and a tag may require multiple variables to hold all its possible
       values. Eventually ambiguity is resolved, and only one  final  variable
       per  tag	survives. Tag variables	should be defined using	stags or mtags
       blocks. If lexer	state is stored, tag variables should be part  of  it.
       They also need to be updated  by	YYFILL.

       S-tags support the following operations:

        save input position to	an s-tag: t = YYCURSOR with C pointer API or a
	 user-defined operation	YYSTAGP(t) with	generic	API

        save  default	value  to  an  s-tag: t	= NULL with C pointer API or a
	 user-defined operation	YYSTAGN(t) with	generic	API

        copy one s-tag	to another: t1 = t2

       M-tags support the following operations:

        append	input position to an  m-tag:  a	 user-defined  operation  YYM-
	 TAGP(t) with both default and generic API

        append	default	value to an m-tag: a user-defined operation YYMTAGN(t)
	 with both default and generic API

        copy one m-tag	to another: t1 = t2

       S-tags  can  be	implemented  as	 scalar	 values	(pointers or offsets).
       M-tags need a more complex representation, as they need to store	a  se-
       quence  of tag values. The most naive and inefficient representation of
       an m-tag	is a list (array, vector) of tag values; a more	efficient rep-
       resentation is to store all m-tags in a prefix-tree represented as  ar-
       ray  of nodes (v, p), where v is	tag value and p	is a pointer to	parent
       node.

       Here is a simple	example	of using s-tags	 to  parse  semantic  versions
       consisting of three numeric components: major, minor, patch (the	latter
       is optional).  See below	for a more complex example that	uses YYFILL.

	  // re2c $INPUT -o $OUTPUT
	  #include <assert.h>
	  #include <stddef.h>

	  typedef struct { int major, minor, patch; } SemVer;

	  static int s2n(const char *s,	const char *e) { // pre-parsed string to number
	      int n = 0;
	      for (; s < e; ++s) n = n * 10 + (*s - '0');
	      return n;
	  }

	  static int lex(const char *str, SemVer *ver) {
	      const char *YYCURSOR = str, *YYMARKER;

	      // User-defined tag variables that are available in semantic action.
	      const char *t1, *t2, *t3,	*t4, *t5;

	      // Autogenerated tag variables used by the lexer to track	tag values.
	      /*!stags:re2c format = 'const char *@@;\n'; */

	      /*!re2c
		  re2c:yyfill:enable = 0;
		  re2c:define:YYCTYPE =	char;
		  re2c:tags = 1;

		  num =	[0-9]+;

		  @t1 num @t2 "." @t3 num @t4 ("." @t5 num)? [\x00] {
		      ver->major = s2n(t1, t2);
		      ver->minor = s2n(t3, t4);
		      ver->patch = t5 != NULL ?	s2n(t5,	YYCURSOR - 1) :	0;
		      return 0;
		  }
		  * { return 1;	}
	      */
	  }

	  int main() {
	      SemVer v;
	      assert(lex("23.34", &v) == 0 && v.major == 23 && v.minor == 34 &&	v.patch	== 0);
	      assert(lex("1.2.999", &v)	== 0 &&	v.major	== 1 &&	v.minor	== 2 &&	v.patch	== 999);
	      assert(lex("1.a",	&v) == 1);
	      return 0;
	  }

       Here  is	 a more	complex	example	of using s-tags	with YYFILL to parse a
       file with newline-separated semantic versions. Tag variables  are  part
       of  the	lexer  state, and they are adjusted in YYFILL like other input
       positions.  Note	that it	is necessary for s-tags	because	 their	values
       are invalidated after shifting buffer contents. It may not be necessary
       in  a  custom implementation where tag variables	store offsets relative
       to the start of the input string	rather than the	buffer,	which  may  be
       the case	with m-tags.

	  // re2c $INPUT -o $OUTPUT --tags
	  #include <assert.h>
	  #include <stddef.h>
	  #include <stdio.h>
	  #include <string.h>
	  #include <vector>

	  #define BUFSIZE 4095

	  struct Input {
	      FILE *file;
	      char buf[BUFSIZE + 1], *lim, *cur, *mar, *tok;
	      // Tag variables must be part of the lexer state passed to YYFILL.
	      // They don't correspond to tags and should be autogenerated by re2c.
	      /*!stags:re2c format = 'const char *@@;';	*/
	      bool eof;
	  };

	  struct SemVer	{ int major, minor, patch; };

	  static bool operator==(const SemVer &x, const	SemVer &y) {
	      return x.major ==	y.major	&& x.minor == y.minor && x.patch == y.patch;
	  }

	  static int s2n(const char *s,	const char *e) { // pre-parsed string to number
	      int n = 0;
	      for (; s < e; ++s) n = n * 10 + (*s - '0');
	      return n;
	  }

	  static int fill(Input	&in) {
	      if (in.eof) return 1;

	      const size_t shift = in.tok - in.buf;
	      const size_t used	= in.lim - in.tok;

	      // Error:	lexeme too long. In real life could reallocate a larger	buffer.
	      if (shift	< 1) return 2;

	      // Shift buffer contents (discard	everything up to the current token).
	      memmove(in.buf, in.tok, used);
	      in.lim -=	shift;
	      in.cur -=	shift;
	      in.mar -=	shift;
	      in.tok -=	shift;
	      // Tag variables need to be shifted like other input positions. The check
	      // for non-NULL is only needed if	some tags are nested inside of alternative
	      // or repetition,	so that	they can have NULL value.
	      /*!stags:re2c format = "if (in.@@) in.@@ -= shift;\n"; */

	      // Fill free space at the	end of buffer with new data from file.
	      in.lim +=	fread(in.lim, 1, BUFSIZE - used, in.file);
	      in.lim[0]	= 0;
	      in.eof = in.lim <	in.buf + BUFSIZE;
	      return 0;
	  }

	  static bool lex(Input	&in, std::vector<SemVer> &vers)	{
	      // User-defined local variables that store final tag values.
	      // They are different from tag variables autogenerated with `stags:re2c`,
	      // as they are set at the	end of match and used only in semantic actions.
	      const char *t1, *t2, *t3,	*t4;
	      for (;;) {
		  in.tok = in.cur;
	      /*!re2c
		  re2c:eof = 0;
		  re2c:api:style = free-form;
		  re2c:define:YYCTYPE =	char;
		  re2c:define:YYCURSOR = in.cur;
		  re2c:define:YYMARKER = in.mar;
		  re2c:define:YYLIMIT =	in.lim;
		  re2c:define:YYFILL = "fill(in) == 0";
		  re2c:tags:expression = "in.@@";

		  num =	[0-9]+;

		  num @t1 "." @t2 num @t3 ("." @t4 num)? [\n] {
		      int major	= s2n(in.tok, t1);
		      int minor	= s2n(t2, t3);
		      int patch	= t4 !=	NULL ? s2n(t4, in.cur -	1) : 0;
		      SemVer ver = {major, minor, patch};
		      vers.push_back(ver);
		      continue;
		  }
		  $ { return true; }
		  * { return false; }
	      */}
	  }

	  int main() {
	      const char *fname	= "input";
	      const SemVer semver = {1,	22, 333};
	      std::vector<SemVer> expect(BUFSIZE, semver), actual;

	      // Prepare input file (make sure it exceeds buffer size).
	      FILE *f =	fopen(fname, "w");
	      for (int i = 0; i	< BUFSIZE; ++i)	fprintf(f, "1.22.333\n");
	      fclose(f);

	      // Reopen	input file for reading.
	      f	= fopen(fname, "r");

	      // Initialize lexer state: all pointers are at the end of	buffer.
	      Input in;
	      in.file =	f;
	      in.cur = in.mar =	in.tok = in.lim	= in.buf + BUFSIZE;
	      /*!stags:re2c format = "in.@@ = in.lim;\n"; */
	      in.eof = false;
	      // Sentinel (at YYLIMIT pointer) is set to zero, which triggers YYFILL.
	      *in.lim =	0;

	      // Run the lexer and check results.
	      assert(lex(in, actual) &&	expect == actual);

	      // Cleanup: remove input file.
	      fclose(f);
	      remove(fname);
	      return 0;
	  }

       Here  is	 an  example  of using capturing groups	to parse semantic ver-
       sions.

	  // re2c $INPUT -o $OUTPUT
	  #include <assert.h>
	  #include <stddef.h>

	  typedef struct { int major, minor, patch; } SemVer;

	  static int s2n(const char *s,	const char *e) { // pre-parsed string to number
	      int n = 0;
	      for (; s < e; ++s) n = n * 10 + (*s - '0');
	      return n;
	  }

	  static int lex(const char *str, SemVer *ver) {
	      const char *YYCURSOR = str, *YYMARKER;

	      // Final tag variables available in semantic action.
	      /*!svars:re2c format = 'const char *@@;\n'; */

	      // Intermediate tag variables used by the	lexer (must be autogenerated).
	      /*!stags:re2c format = 'const char *@@;\n'; */

	      /*!re2c
		  re2c:yyfill:enable = 0;
		  re2c:define:YYCTYPE =	char;
		  re2c:captvars	= 1;

		  num =	[0-9]+;

		  (num)	"." (num) ("." num)? [\x00] {
		      (void) yytl0; (void) yytr0; // some variables are	unused
		      ver->major = s2n(yytl1, yytr1);
		      ver->minor = s2n(yytl2, yytr2);
		      ver->patch = yytl3 ? s2n(yytl3 + 1, yytr3) : 0;
		      return 0;
		  }
		  * { return 1;	}
	      */
	  }

	  int main() {
	      SemVer v;
	      assert(lex("23.34", &v) == 0 && v.major == 23 && v.minor == 34 &&	v.patch	== 0);
	      assert(lex("1.2.999", &v)	== 0 &&	v.major	== 1 &&	v.minor	== 2 &&	v.patch	== 999);
	      assert(lex("1.a",	&v) == 1);
	      return 0;
	  }

       Here is an example of using m-tags to parse a version with  a  variable
       number of components. Tag variables are stored in a trie.

	  // re2c $INPUT -o $OUTPUT
	  #include <assert.h>
	  #include <stddef.h>
	  #include <vector>

	  static const int MTAG_ROOT = -1;

	  // An	m-tag tree is a	way to store histories with an O(1) copy operation.
	  // Histories naturally form a	tree, as they have common start	and fork at some
	  // point. The	tree is	stored as an array of pairs (tag value,	link to	parent).
	  // An	m-tag is represented with a single link	in the tree (array index).
	  struct Mtag {
	      const char *elem;	// tag value
	      int pred;	// index of the	predecessor node or root
	  };
	  typedef std::vector<Mtag> MtagTrie;

	  typedef std::vector<int> Ver;	// unbounded number of version components

	  static int s2n(const char *s,	const char *e) { // pre-parsed string to number
	      int n = 0;
	      for (; s < e; ++s) n = n * 10 + (*s - '0');
	      return n;
	  }

	  // Append a single value to an m-tag history.
	  static void add_mtag(MtagTrie	&trie, int &mtag, const	char *value) {
	      Mtag m = {value, mtag};
	      mtag = (int)trie.size();
	      trie.push_back(m);
	  }

	  // Recursively unwind	tag histories and collect version components.
	  static void unfold(const MtagTrie &trie, int x, int y, Ver &ver) {
	      // Reached the root of the m-tag tree, stop recursion.
	      if (x == MTAG_ROOT && y == MTAG_ROOT) return;

	      // Unwind	history	further.
	      unfold(trie, trie[x].pred, trie[y].pred, ver);

	      // Get tag values. Tag histories must have equal length.
	      assert(x != MTAG_ROOT && y != MTAG_ROOT);
	      const char *ex = trie[x].elem, *ey = trie[y].elem;

	      if (ex !=	NULL &&	ey != NULL) {
		  // Both tags are valid pointers, extract component.
		  ver.push_back(s2n(ex,	ey));
	      }	else {
		  // Both tags are NULL	(this corresponds to zero repetitions).
		  assert(ex == NULL && ey == NULL);
	      }
	  }

	  static bool parse(const char *str, Ver &ver) {
	      const char *YYCURSOR = str, *YYMARKER;
	      MtagTrie mt;

	      // User-defined tag variables that are available in semantic action.
	      const char *t1, *t2;
	      int t3, t4;

	      // Autogenerated tag variables used by the lexer to track	tag values.
	      /*!stags:re2c format = 'const char *@@ = NULL;'; */
	      /*!mtags:re2c format = 'int @@ = MTAG_ROOT;'; */

	      /*!re2c
		  re2c:api:style = free-form;
		  re2c:define:YYCTYPE =	char;
		  re2c:define:YYSTAGP =	"@@ = YYCURSOR;";
		  re2c:define:YYSTAGN =	"@@ = NULL;";
		  re2c:define:YYMTAGP =	"add_mtag(mt, @@, YYCURSOR);";
		  re2c:define:YYMTAGN =	"add_mtag(mt, @@, NULL);";
		  re2c:yyfill:enable = 0;
		  re2c:tags = 1;

		  num =	[0-9]+;

		  @t1 num @t2 ("." #t3 num #t4)* [\x00]	{
		      ver.clear();
		      ver.push_back(s2n(t1, t2));
		      unfold(mt, t3, t4, ver);
		      return true;
		  }
		  * { return false; }
	      */
	  }

	  int main() {
	      Ver v;
	      assert(parse("1",	v) && v	== Ver({1}));
	      assert(parse("1.2.3.4.5.6.7", v) && v == Ver({1, 2, 3, 4,	5, 6, 7}));
	      assert(!parse("1.2.", v));
	      return 0;
	  }

   Encoding support
       It  is  necessary  to understand	the difference between code points and
       code units. A code point	is a numeric identifier	of a  symbol.  A  code
       unit is the smallest unit of storage in the encoded text. A single code
       point may be represented	with one or more code units. In	a fixed-length
       encoding	 all  code points are represented with the same	number of code
       units. In a variable-length encoding code  points  may  be  represented
       with  a	different  number of code units.  Note that the	"any" rule [^]
       matches any code	point, but not necessarily any code unit (the only way
       to match	any code unit regardless of the	encoding is the	 default  rule
       *).  The	generated lexer	works with a stream of code units: yych	stores
       a code unit, and	YYCTYPE	is the code unit type. Regular expressions, on
       the  other  hand, are specified in terms	of code	points.	When re2c com-
       piles regular expressions to automata it	translates code	points to code
       units. This is generally	not a simple mapping: in  variable-length  en-
       codings	a single code point range may get translated to	a complex code
       unit graph.  The	following encodings are	supported:

        ASCII (enabled	by default). It	is a fixed-length encoding  with  code
	 space [0-255] and 1-byte code points and code units.

        EBCDIC	 (enabled  with	 --ebcdic  or  re2c:encoding:ebcdic).  It is a
	 fixed-length encoding with code space [0-255] and 1-byte code	points
	 and code units.

        UCS2	(enabled   with	  --ucs2   or  re2c:encoding:ucs2).  It	 is  a
	 fixed-length encoding with code  space	 [0-0xFFFF]  and  2-byte  code
	 points	and code units.

        UTF8  (enabled	 with  --utf8  or  re2c:encoding:utf8).	 It is a vari-
	 able-length Unicode encoding. Code unit size is 1 byte.  Code	points
	 are represented with 1	-- 4 code units.

        UTF16	(enabled  with	--utf16	or re2c:encoding:utf16). It is a vari-
	 able-length Unicode encoding. Code unit size is 2 bytes. Code	points
	 are represented with 1	-- 2 code units.

        UTF32	 (enabled  with	 --utf32  or  re2c:encoding:utf32).  It	 is  a
	 fixed-length Unicode encoding with code space [0-0x10FFFF] and	4-byte
	 code points and code units.

       Include file include/unicode_categories.re  provides  re2c  definitions
       for the standard	Unicode	categories.

       Option  --input-encoding	 specifies  source file	encoding, which	can be
       used to enable Unicode literals in  regular  expressions.  For  example
       --input-encoding	 utf8  tells  re2c that	the source file	is in UTF8 (it
       differs from --utf8 which sets input text  encoding).  Option  --encod-
       ing-policy  specifies  the  way	re2c  handles Unicode surrogates (code
       points in range [0xD800-0xDFFF]).

       Below is	an example of a	lexer for UTF8 encoded Unicode identifiers.

	  // re2c $INPUT -o $OUTPUT -8 --case-ranges -i
	  #include <assert.h>
	  #include <stdint.h>

	  /*!include:re2c "unicode_categories.re" */

	  static int lex(const char *s)	{
	      const char *YYCURSOR = s,	*YYMARKER;
	      /*!re2c
		  re2c:define:YYCTYPE =	'unsigned char';
		  re2c:yyfill:enable = 0;

		  // Simplified	"Unicode Identifier and	Pattern	Syntax"
		  // (see https://unicode.org/reports/tr31)
		  id_start    =	L | Nl | [$_];
		  id_continue =	id_start | Mn |	Mc | Nd	| Pc | [\u200D\u05F3];
		  identifier  =	id_start id_continue*;

		  identifier { return 0; }
		  *	     { return 1; }
	      */
	  }

	  int main() {
	      assert(lex("_") == 0);
	      return 0;
	  }

   Include files
       re2c allows one to include other	files using a block of the form	/*!in-
       clude:re2c FILE */ or %{include FILE %},	or an in-block directive  !in-
       clude  FILE  ;,	where FILE is a	path to	the file to be included.  re2c
       looks for include files in the directory	of the including file  and  in
       include	locations,  which can be specified with	the -I option. Include
       blocks/directives in re2c work in the same way as C/C++ #include:  FILE
       contents	 are copy-pasted verbatim in place of the block/directive. In-
       clude files may have further includes of	their own. Use	--depfile  op-
       tion  to	 track build dependencies of the output	file on	include	files.
       re2c provides some predefined include files that	can be	found  in  the
       include/	 subdirectory  of the project. These files contain definitions
       that may	be useful to other projects (such as Unicode  categories)  and
       form something like a standard library for re2c.	Below is an example of
       using include files.

   Include file	1 (definitions.h)
	  typedef enum { OK, FAIL } Result;

	  /*!re2c
	      number = [1-9][0-9]*;
	  */

   Include file	2 (extra_rules.re.inc)
	  // floating-point numbers
	  frac	= [0-9]* "." [0-9]+ | [0-9]+ ".";
	  exp	= 'e' [+-]? [0-9]+;
	  float	= frac exp? | [0-9]+ exp;

	  float	{ return OK; }

   Input file
	  // re2c $INPUT -o $OUTPUT -i
	  #include <assert.h>
	  /*!include:re2c "definitions.h" */

	  Result lex(const char	*s) {
	      const char *YYCURSOR = s,	*YYMARKER;
	      /*!re2c
		  re2c:define:YYCTYPE =	char;
		  re2c:yyfill:enable = 0;

		  *	 { return FAIL;	}
		  number { return OK; }
		  !include "extra_rules.re.inc";
	      */
	  }

	  int main() {
	      assert(lex("123")	== OK);
	      assert(lex("123.4567") ==	OK);
	      return 0;
	  }

   Header files
       re2c  allows  one to generate header file from the input	.re file using
       --header	option or re2c:header configuration and	 block	pairs  of  the
       form /*!header:re2c:on*/	and /*!header:re2c:off*/, or %{header:on%} and
       %{header:off%}. The first block marks the beginning of header file, and
       the  second  block marks	the end	of it. Everything between these	blocks
       is processed by re2c, and the generated code is	written	 to  the  file
       specified  with --header	option or re2c:header configuration (or	stdout
       if neither option nor configuration is used). Autogenerated header file
       may be needed in	cases when re2c	is used	to generate definitions	  that
       must be visible from other translation units.

       Here is an example of generating	a header file that contains definition
       of  the lexer state with	tag variables (the number variables depends on
       the regular grammar and is unknown to the programmer).

   Input file
	  // re2c $INPUT -o $OUTPUT -i --header	lexer/state.h
	  #include <assert.h>
	  #include <stddef.h>
	  #include "lexer/state.h" // the header is generated by re2c

	  /*!header:re2c:on*/
	  typedef struct {
	      const char *str, *cur;
	      /*!stags:re2c format = "const char *@@;";	*/
	  } LexerState;
	  /*!header:re2c:off*/

	  long lex(LexerState* st) {
	      const char *t;
	      /*!re2c
		  re2c:header =	"lexer/state.h";
		  re2c:yyfill:enable = 0;
		  re2c:define:YYCTYPE =	char;
		  re2c:define:YYCURSOR = "st->cur";
		  re2c:tags = 1;
		  re2c:tags:expression = "st->@@";

		  [a]* @t [b]* { return	t - st->str; }
	      */
	  }

	  int main() {
	      const char *s = "ab";
	      LexerState st = {	s, s /*!stags:re2c format = ", NULL"; */ };
	      assert(lex(&st) == 1);
	      return 0;
	  }

   Header file
	  /* Generated by re2c */

	  typedef struct {
	      const char *str, *cur;
	      const char *yyt1;
	  } LexerState;

   Skeleton programs
       With the	-S, --skeleton option, re2c ignores all	non-re2c code and gen-
       erates a	self-contained C program that can be further compiled and exe-
       cuted.  The program consists of lexer code and  input  data.  For  each
       constructed  DFA	(block or condition) re2c generates a standalone lexer
       and two files: an .input	file with strings derived from the DFA	and  a
       .keys  file with	expected match results.	The program runs each lexer on
       the corresponding .input	file and compares results  with	 the  expecta-
       tions.  Skeleton	programs are very useful for a number of reasons:

        They can check	correctness of various re2c optimizations (the data is
	 generated  early  in the process, before any DFA transformations have
	 taken place).

        Generating a set of input data	with good coverage may be  useful  for
	 both testing and benchmarking.

        Generating self-contained executable programs allows one to get mini-
	 mized test cases (the original	code may be large or have a lot	of de-
	 pendencies).

       The  difficulty with generating input data is that for all but the most
       trivial cases the number	of possible input strings is too  large	 (even
       if the string length is limited). re2c solves this difficulty by	gener-
       ating sufficiently many strings to cover	almost all DFA transitions. It
       uses  the  following  algorithm.	First, it constructs a skeleton	of the
       DFA. For	encodings with 1-byte code unit	size (such as ASCII, UTF-8 and
       EBCDIC) skeleton	is just	an exact copy of the original DFA. For	encod-
       ings  with  multibyte code units	skeleton is a copy of DFA with certain
       transitions omitted: namely, re2c takes at most 256 code	units for each
       disjoint	continuous range that corresponds to a	DFA  transition.   The
       chosen  values are evenly distributed and include range bounds. Instead
       of trying to cover all possible paths in	the skeleton (which is	infea-
       sible)  re2c  generates	sufficiently  many paths to cover all skeleton
       transitions, and	thus trigger the corresponding	conditional  jumps  in
       the  lexer.  The	algorithm implementation is limited by ~1Gb of transi-
       tions and consumes constant amount of memory (re2c writes data to  file
       as soon as it is	generated).

   Visualization and debug
       With  the  -D, --emit-dot option, re2c does not generate	code. Instead,
       it dumps	the generated DFA in DOT format.  One can convert this dump to
       an image	of the DFA using Graphviz or another library.  Note that  this
       option  shows the final DFA after it has	gone through a number of opti-
       mizations and transformations. Earlier stages can be dumped with	 vari-
       ous  debug  options,  such  as --dump-nfa, --dump-dfa-raw etc. (see the
       full list of options).

SEE ALSO
       You can find more information  about  re2c  at  the  official  website:
       http://re2c.org.	   Similar   programs	are  flex(1),  lex(1),	quex(-
       http://quex.sourceforge.net).

AUTHORS
       re2c was	originally written by Peter Bumbulis  (peter@csg.uwaterloo.ca)
       in 1993.	 Marcus	Boerger	and Dan	Nuffer spent several years to turn the
       original	idea into a production ready code generator. Since then	it has
       been  maintained	 and  developed	 by multiple volunteers, most notably,
       Brian  Young  (bayoung@acm.org),	 Marcus	  Boerger,   Dan   Nuffer   (-
       nuffer@users.sourceforge.net),  Ulya  Trofimovich (skvadrik@gmail.com),
       Serghei Iakovlev, Sergei	Trofimovich, Petr  Skocik,  ligfx  raekye  and
       PolarGoose.

								       RE2C(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=re2c&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help