Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
PCRE2SERIALIZE(3)	   Library Functions Manual	     PCRE2SERIALIZE(3)

NAME
       PCRE2 - Perl-compatible regular expressions (revised API)

SAVING AND RE-USING PRECOMPILED	PCRE2 PATTERNS

       int32_t pcre2_serialize_decode(pcre2_code **codes,
	 int32_t number_of_codes, const	uint8_t	*bytes,
	 pcre2_general_context *gcontext);

       int32_t pcre2_serialize_encode(const pcre2_code **codes,
	 int32_t number_of_codes, uint8_t **serialized_bytes,
	 PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);

       void pcre2_serialize_free(uint8_t *bytes);

       int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes);

       If  you	are running an application that	uses a large number of regular
       expression patterns, it may be useful to	store them  in	a  precompiled
       form  instead  of  having to compile them every time the	application is
       run. However, if	you are	using the just-in-time	optimization  feature,
       it is not possible to save and reload the JIT data, because it is posi-
       tion-dependent.	The  host  on  which the patterns are reloaded must be
       running the same	version	of PCRE2, with the same	code unit  width,  and
       must  also have the same	endianness, pointer width and PCRE2_SIZE type.
       For example, patterns compiled on a 32-bit system using PCRE2's	16-bit
       library cannot be reloaded on a 64-bit system, nor can they be reloaded
       using the 8-bit library.

       Note  that  "serialization" in PCRE2 does not convert compiled patterns
       to an abstract format like Java or .NET serialization.  The  serialized
       output  is really just a	bytecode dump, which is	why it can only	be re-
       loaded in the same environment as the one that created  it.  Hence  the
       restrictions  mentioned	above.	 Applications  that are	not statically
       linked with a fixed version of PCRE2 must be prepared to	recompile pat-
       terns from their	sources, in order to be	immune to PCRE2	upgrades.

SECURITY CONCERNS

       The facility for	saving and restoring compiled patterns is intended for
       use within individual applications.  As	such,  the  data  supplied  to
       pcre2_serialize_decode()	 is expected to	be trusted data, not data from
       arbitrary external sources.  There  is  only  some  simple  consistency
       checking, not complete validation of what is being re-loaded. Corrupted
       data may	cause undefined	results. For example, if the length field of a
       pattern in the serialized data is corrupted, the	deserializing code may
       read beyond the end of the byte stream that is passed to	it.

SAVING COMPILED	PATTERNS

       Before compiled patterns	can be saved they must be serialized, which in
       PCRE2  means converting the pattern to a	stream of bytes. A single byte
       stream may contain any number of	compiled patterns, but they  must  all
       use  the	same character tables. A single	copy of	the tables is included
       in the byte stream (its size is 1088 bytes). For	more details of	 char-
       acter  tables,  see the section on locale support in the	pcre2api docu-
       mentation.

       The function pcre2_serialize_encode() creates a serialized byte	stream
       from  a	list of	compiled patterns. Its first two arguments specify the
       list, being a pointer to	a vector of pointers to	compiled patterns, and
       the length of the vector. The third and fourth arguments	point to vari-
       ables which are set to point to the created byte	stream and its length,
       respectively. The final argument	is a pointer  to  a  general  context,
       which  can  be  used  to	specify	custom memory management functions. If
       this argument is	NULL, malloc() is used to obtain memory	for  the  byte
       stream. The yield of the	function is the	number of serialized patterns,
       or one of the following negative	error codes:

	 PCRE2_ERROR_BADDATA	  the number of	patterns is zero or less
	 PCRE2_ERROR_BADMAGIC	  mismatch of id bytes in one of the patterns
	 PCRE2_ERROR_NOMEMORY	  memory allocation failed
	 PCRE2_ERROR_MIXEDTABLES  the patterns do not all use the same tables
	 PCRE2_ERROR_NULL	  the 1st, 3rd,	or 4th argument	is NULL

       PCRE2_ERROR_BADMAGIC  means  either that	a pattern's code has been cor-
       rupted, or that a slot in the vector does not point to a	compiled  pat-
       tern.

       Once a set of patterns has been serialized you can save the data	in any
       appropriate  manner. Here is sample code	that compiles two patterns and
       writes them to a	file. It assumes that the variable fd refers to	a file
       that is open for	output.	The error checking that	should be present in a
       real application	has been omitted for simplicity.

	 int errorcode;
	 uint8_t *bytes;
	 PCRE2_SIZE erroroffset;
	 PCRE2_SIZE bytescount;
	 pcre2_code *list_of_codes[2];
	 list_of_codes[0] = pcre2_compile("first pattern",
	   PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
	 list_of_codes[1] = pcre2_compile("second pattern",
	   PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
	 errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
	   &bytescount,	NULL);
	 errorcode = fwrite(bytes, 1, bytescount, fd);

       Note that the serialized	data is	binary data that may  contain  any  of
       the  256	 possible  byte	values.	On systems that	make a distinction be-
       tween binary and	non-binary data, be sure that the file is  opened  for
       binary output.

       Serializing  a  set  of patterns	leaves the original data untouched, so
       they can	still be used for matching. Their memory  must	eventually  be
       freed in	the usual way by calling pcre2_code_free(). When you have fin-
       ished with the byte stream, it too must be freed	by calling pcre2_seri-
       alize_free().  If  this function	is called with a NULL argument,	it re-
       turns immediately without doing anything.

RE-USING PRECOMPILED PATTERNS

       In order	to re-use a set	of saved patterns you must first make the  se-
       rialized	 byte stream available in main memory (for example, by reading
       from a file). The management of this memory block is up to the applica-
       tion. You can use the pcre2_serialize_get_number_of_codes() function to
       find out	how many compiled patterns are in the serialized data  without
       actually	decoding the patterns:

	 uint8_t *bytes	= <serialized data>;
	 int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);

       The pcre2_serialize_decode() function reads a byte stream and recreates
       the compiled patterns in	new memory blocks, setting pointers to them in
       a  vector.  The	first two arguments are	a pointer to a suitable	vector
       and its length, and the third argument points to	a byte stream. The fi-
       nal argument is a pointer to a general context, which can  be  used  to
       specify custom memory management	functions for the decoded patterns. If
       this argument is	NULL, malloc() and free() are used. After deserializa-
       tion, the byte stream is	no longer needed and can be discarded.

	 pcre2_code *list_of_codes[2];
	 uint8_t *bytes	= <serialized data>;
	 int32_t number_of_codes =
	   pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);

       If  the	vector	is  not	 large enough for all the patterns in the byte
       stream, it is filled with those that fit, and  the  remainder  are  ig-
       nored.  The yield of the	function is the	number of decoded patterns, or
       one of the following negative error codes:

	 PCRE2_ERROR_BADDATA	second argument	is zero	or less
	 PCRE2_ERROR_BADMAGIC	mismatch of id bytes in	the data
	 PCRE2_ERROR_BADMODE	mismatch of code unit size or PCRE2 version
	 PCRE2_ERROR_BADSERIALIZEDDATA	other sanity check failure
	 PCRE2_ERROR_MEMORY	memory allocation failed
	 PCRE2_ERROR_NULL	first or third argument	is NULL

       PCRE2_ERROR_BADMAGIC may	mean that the data is corrupt, or that it  was
       compiled	on a system with different endianness.

       Decoded patterns	can be used for	matching in the	usual way, and must be
       freed  by  calling pcre2_code_free(). However, be aware that there is a
       potential race issue if you are using multiple patterns that  were  de-
       coded  from a single byte stream	in a multithreaded application.	A sin-
       gle copy	of the character tables	is used	by all	the  decoded  patterns
       and a reference count is	used to	arrange	for its	memory to be automati-
       cally  freed when the last pattern is freed, but	there is no locking on
       this reference count. Therefore,	if you want to call  pcre2_code_free()
       for  these  patterns  in	 different  threads, you must arrange your own
       locking,	and ensure that	pcre2_code_free()  cannot  be  called  by  two
       threads at the same time.

       If  a pattern was processed by pcre2_jit_compile() before being serial-
       ized, the JIT data is discarded and so is no longer available  after  a
       save/restore  cycle.  You can, however, process a restored pattern with
       pcre2_jit_compile() if you wish.

AUTHOR

       Philip Hazel
       Retired from University Computing Service
       Cambridge, England.

REVISION

       Last updated: 19	January	2024
       Copyright (c) 1997-2018 University of Cambridge.

PCRE2 10.45			19 January 2024		     PCRE2SERIALIZE(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=pcre2serialize&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help