Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
PCRESTACK(3)		   Library Functions Manual		  PCRESTACK(3)

NAME
       PCRE - Perl-compatible regular expressions

PCRE DISCUSSION	OF STACK USAGE
       When  you call pcre[16|32]_exec(), it makes use of an internal function
       called match(). This calls itself recursively at	branch points  in  the
       pattern,	 in  order  to	remember the state of the match	so that	it can
       back up and try a different alternative if  the	first  one  fails.  As
       matching	proceeds deeper	and deeper into	the tree of possibilities, the
       recursion depth increases. The match() function is also called in other
       circumstances, for example, whenever a parenthesized sub-pattern	is en-
       tered, and in certain cases of repetition.

       Not all calls of	match()	increase the recursion depth; for an item such
       as  a* it may be	called several times at	the same level,	after matching
       different numbers of a's. Furthermore, in a number of cases  where  the
       result  of  the	recursive call would immediately be passed back	as the
       result of the current call (a "tail recursion"),	the function  is  just
       restarted instead.

       The  above  comments apply when pcre[16|32]_exec() is run in its	normal
       interpretive  manner.   If   the	  pattern   was	  studied   with   the
       PCRE_STUDY_JIT_COMPILE  option, and just-in-time	compiling was success-
       ful, and	the options passed to pcre[16|32]_exec() were  not  incompati-
       ble,  the  matching  process  uses the JIT-compiled code	instead	of the
       match() function. In this case, the memory requirements are handled en-
       tirely differently. See the pcrejit documentation for details.

       The pcre[16|32]_dfa_exec() function operates in an  entirely  different
       way,  and uses recursion	only when there	is a regular expression	recur-
       sion or subroutine call in the pattern. This includes the processing of
       assertion and "once-only" subpatterns, which are	handled	 like  subrou-
       tine  calls.  Normally, these are never very deep, and the limit	on the
       complexity of pcre[16|32]_dfa_exec() is controlled  by  the  amount  of
       workspace  it is	given.	However, it is possible	to write patterns with
       runaway	  infinite    recursions;    such    patterns	 will	 cause
       pcre[16|32]_dfa_exec()  to  run	out  of	stack. At present, there is no
       protection against this.

       The comments that follow	do NOT apply to	 pcre[16|32]_dfa_exec();  they
       are relevant only for pcre[16|32]_exec()	without	the JIT	optimization.

   Reducing pcre[16|32]_exec()'s stack usage
       Each  time  that	match()	is actually called recursively,	it uses	memory
       from the	process	stack. For certain kinds of  pattern  and  data,  very
       large  amounts of stack may be needed, despite the recognition of "tail
       recursion".  You	can often reduce the amount of recursion,  and	there-
       fore  the  amount of stack used,	by modifying the pattern that is being
       matched.	Consider, for example, this pattern:

	 ([^<]|<(?!inet))+

       It matches from wherever	it starts until	it encounters "<inet"  or  the
       end  of	the  data,  and	is the kind of pattern that might be used when
       processing an XML file. Each iteration of the outer parentheses matches
       either one character that is not	"<" or a "<" that is not  followed  by
       "inet".	However, each time a parenthesis is processed, a recursion oc-
       curs, so	this formulation uses a	stack frame for	each  matched  charac-
       ter.  For  a long string, a lot of stack	is required. Consider now this
       rewritten pattern, which	matches	exactly	the same strings:

	 ([^<]++|<(?!inet))+

       This uses very much less	stack, because runs of characters that do  not
       contain	"<" are	"swallowed" in one item	inside the parentheses.	Recur-
       sion happens only when a	"<" character that is not followed  by	"inet"
       is  encountered	(and  we assume	this is	relatively rare). A possessive
       quantifier is used to stop any backtracking into	the  runs  of  non-"<"
       characters, but that is not related to stack usage.

       This  example shows that	one way	of avoiding stack problems when	match-
       ing long	subject	strings	is to write repeated parenthesized subpatterns
       to match	more than one character	whenever possible.

   Compiling PCRE to use heap instead of stack for pcre[16|32]_exec()
       In environments where stack memory is constrained, you  might  want  to
       compile	PCRE to	use heap memory	instead	of stack for remembering back-
       up points when pcre[16|32]_exec() is running. This makes	it run	a  lot
       more slowly, however.  Details of how to	do this	are given in the pcre-
       build  documentation.  When  built  in  this  way, instead of using the
       stack, PCRE obtains and frees memory by calling the functions that  are
       pointed	to  by the pcre[16|32]_stack_malloc and	pcre[16|32]_stack_free
       variables. By default, these point to malloc() and free(), but you  can
       replace the pointers to cause PCRE to use your own functions. Since the
       block sizes are always the same,	and are	always freed in	reverse	order,
       it  may	be  possible  to implement customized memory handlers that are
       more efficient than the standard	functions.

   Limiting pcre[16|32]_exec()'s stack usage
       You can set limits on the number	of times that match() is called,  both
       in  total  and  recursively. If a limit is exceeded, pcre[16|32]_exec()
       returns an error	code. Setting suitable limits should prevent  it  from
       running	out of stack. The default values of the	limits are very	large,
       and unlikely ever to operate. They can be changed when PCRE  is	built,
       and they	can also be set	when pcre[16|32]_exec()	is called. For details
       of these	interfaces, see	the pcrebuild documentation and	the section on
       extra data for pcre[16|32]_exec() in the	pcreapi	documentation.

       As a very rough rule of thumb, you should reckon	on about 500 bytes per
       recursion.  Thus,  if  you  want	 to limit your stack usage to 8Mb, you
       should set the limit at 16000 recursions. A 64Mb	stack,	on  the	 other
       hand, can support around	128000 recursions.

       In Unix-like environments, the pcretest test program has	a command line
       option (-S) that	can be used to increase	the size of its	stack. As long
       as  the	stack is large enough, another option (-M) can be used to find
       the smallest limits that	allow a	particular pattern to  match  a	 given
       subject	string.	 This is done by calling pcre[16|32]_exec() repeatedly
       with different limits.

   Obtaining an	estimate of stack usage
       The actual amount of stack used per recursion can vary quite a lot, de-
       pending on the compiler that was	used to	build PCRE and	the  optimiza-
       tion or debugging options that were set for it. The rule	of thumb value
       of  500 bytes mentioned above may be larger or smaller than what	is ac-
       tually needed. A	better approximation can be obtained by	 running  this
       command:

	 pcretest -m -C

       The  -C	option causes pcretest to output information about the options
       with which PCRE was compiled. When -m is	also given (before -C),	infor-
       mation about stack use is given in a line like this:

	 Match recursion uses stack: approximate frame size = 640 bytes

       The value is approximate	because	some recursions	need a bit more	(up to
       perhaps 16 more bytes).

       If the above command is given when PCRE is compiled to use the heap in-
       stead of	the stack for recursion, the value that	is output is the  size
       of each block that is obtained from the heap.

   Changing stack size in Unix-like systems
       In  Unix-like environments, there is not	often a	problem	with the stack
       unless very long	strings	are involved,  though  the  default  limit  on
       stack  size  varies  from system	to system. Values from 8Mb to 64Mb are
       common. You can find your default limit by running the command:

	 ulimit	-s

       Unfortunately, the effect of running out	of  stack  is  often  SIGSEGV,
       though  sometimes  a more explicit error	message	is given. You can nor-
       mally increase the limit	on stack size by code such as this:

	 struct	rlimit rlim;
	 getrlimit(RLIMIT_STACK, &rlim);
	 rlim.rlim_cur = 100*1024*1024;
	 setrlimit(RLIMIT_STACK, &rlim);

       This reads the current limits (soft and hard) using  getrlimit(),  then
       attempts	 to  increase  the  soft limit to 100Mb	using setrlimit(). You
       must do this before calling pcre[16|32]_exec().

   Changing stack size in Mac OS X
       Using setrlimit(), as described above, should also work on Mac OS X. It
       is also possible	to set a stack size when linking a program. There is a
       discussion about	stack sizes in Mac OS X	at this	web  site:  http://de-
       veloper.apple.com/qa/qa2005/qa1419.html.

AUTHOR
       Philip Hazel
       University Computing Service
       Cambridge CB2 3QH, England.

REVISION
       Last updated: 24	June 2012
       Copyright (c) 1997-2012 University of Cambridge.

PCRE 8.30			 24 June 2012			  PCRESTACK(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=pcrestack&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help