Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
str(3)				String Library				str(3)

NAME
       OSSP str	- String Handling

VERSION
       OSSP str	0.9.12 (12-Oct-2005)

SYNOPSIS
       str_len,	  str_copy,   str_dup,	str_concat,  str_splice,  str_compare,
       str_span,  str_locate,  str_token,  str_parse,  str_format,   str_hash,
       str_base64.

DESCRIPTION
       OSSP  str  is  a	generic	string library written in ISO-C	which provides
       functions for handling, matching, parsing, searching and	formatting  of
       ISO-C  strings.	So  it	can  be	 considered  as	 a  superset  of POSIX
       string(3), but its main intention is to provide a more  convenient  and
       compact API plus	a more generalized functionality.

FUNCTIONS
       The following functions are provided by the OSSP	str API:

       str_size_t str_len(const	char *s);
	   This	 function  determines the length of string s, i.e., the	number
	   of characters starting at s	that  precede  the  terminating	 "NUL"
	   character. It returns "NULL"	if s is	"NULL".

       char *str_copy(char *s, const char *t, size_t n);
	   This	copies the characters in string	t into the string s, but never
	   more	 than  n characters (if	n is greater than 0). The two involved
	   strings can overlap and the characters in s are always "NUL"-termi-
	   nated. The string s has to be large enough to hold  all  characters
	   to  be  copied.  function returns "NULL" if s or t are "NULL". Else
	   it returns the pointer to the written  "NUL"-terminating  character
	   in s.

       char *str_dup(const char	*s, str_size_t n);
	   This	 returns  a copy of the	characters in string s,	but never more
	   than	n characters if	n is greater than 0. It	returns	"NULL" if s is
	   "NULL". The returned	 string	 has  to  be  deallocated  later  with
	   free(3).

       char *str_concat(char *s, ...);
	   This	 functions concatenates	the characters of all string arguments
	   into	a new allocated	string and returns this	new string.  If	 s  is
	   "NULL"  the function	returns	"NULL".	Else it	returns	the pointer to
	   the written final "NUL"-terminating character in  s.	 The  returned
	   string later	has to be deallocated with free(3).

       char *str_splice(char *s, str_size_t off, str_size_t n, char *t,
       str_size_t m);
	   This	 splices the string t into string s, i.e., the n characters at
	   offset off in s are removed and at their location the string	 t  is
	   inserted  (or just the first	m characters of	t if m is greater than
	   0). It returns "NULL" if s or t are "NULL".	Else the string	 s  is
	   returned.  The  function  supports  also the	situation where	t is a
	   sub-string of s as long as the area s+off...s+off+n and t...t+m  do
	   not	overlap.  The  caller always has to make sure that enough room
	   exists in s.

       int str_compare(const char *s, const char *t, str_size_t	n, int mode);
	   This	performs a lexicographical comparison of the two strings s and
	   t (but never	compares more than n characters	of them)  and  returns
	   one	of  three  return values: a value lower	than 0 if s is lexico-
	   graphically lower than t, a value of	exactly	0 if s and t are equal
	   and a value greater than 0 if s is lexicographically	higher than t.
	   Per default (mode is	0) the comparison is  case-sensitive,  but  if
	   "STR_NOCASE"	 is used for mode the comparison is done in a case-in-
	   sensitive way.

       char *str_span(const char *s, size_t n, const char *charset, int	mode);
	   This	functions spans	a string s according to	the characters	speci-
	   fied	 in  charset.  If mode is 0, this means	that s is spanned from
	   left	to right starting at s (and ending either  when	 reaching  the
	   terminating	"NUL" character	or already after n spanned characters)
	   as long as the characters of	s are contained	in charset.

	   Alternatively one can use a mode of	"STR_COMPLEMENT"  to  indicate
	   that	 s is spanned as long as the characters	of s are not contained
	   in charset, i.e., charset then  specifies  the  complement  of  the
	   spanning characters.

	   In  both  cases  one	 can  additionally  "or"  (with	the C operator
	   ``"|"'') "STR_RIGHT"	into mode to indicate  that  the  spanning  is
	   done	right to left starting at the terminating "NUL"	character of s
	   (and	ending either when reaching s or already after n spanned char-
	   acters).

       char *str_locate(const char *s, str_size_t n, const char	*t);
	   This	 functions searches for	the (smaller) string t inside (larger)
	   string s. If	n is not 0, the	search is performed  only  inside  the
	   first n characters of s.

       char *str_token(char **s, const char *delim, const char *quote, const
       char *comment, int mode);
	   This	 function  considers  the string s to consist of a sequence of
	   zero	or more	text tokens separated by spans of one or more  charac-
	   ters	from the separator string delim. However, text between matched
	   pairs of quotemarks (characters in quote) is	treated	as plain text,
	   never as delimiter (separator) text.	Each call of this function re-
	   turns a pointer to the first	character of the first token of	s. The
	   token is "NUL"-terminated, i.e., the	string s is processed in a de-
	   structive  way.  If	there are quotation marks or escape sequences,
	   the input string is rewritten with quoted sections and  escape  se-
	   quences properly interpreted.

	   This	function keeps track of	its parsing position in	the string be-
	   tween  separate calls by simply adjusting the callers s pointer, so
	   that	subsequent calls with the same pointer variable	s  will	 start
	   processing  from  the  position immediately after the last returned
	   token.  In this way subsequent calls	will work through the string s
	   until no tokens remain. When	no token remains in s, "NULL"  is  re-
	   turned.  The	 string	 of token separators (delim) and the string of
	   quote characters (quote) may	be changed from	call to	call.

	   If a	character in the string	s is not quoted	or escaped, and	is  in
	   the	comment	set, then it is	overwritten with a "NUL" character and
	   the rest of the string is ignored. The characters  to  be  used  as
	   quote  characters  are specified in the quote set, and must be used
	   in balanced pairs. If there is more than one	flavor of quote	 char-
	   acter,  one	kind  of  quote	character may be used to quote another
	   kind. If an unbalanced quote	is found, the function silently	act as
	   if one had been placed at the end of	the input string.   The	 delim
	   and	quote  strings	must  be disjoint, i.e., they have to share no
	   characters.

	   The mode argument can be used  to  modify  the  processing  of  the
	   string  (default  for  mode	is  0):	"STR_STRIPQUOTES" forces quote
	   characters to be stripped from  quoted  tokens;  "STR_BACKSLASHESC"
	   enables  the	interpretation (and expansion) of backslash escape se-
	   quences (`\x') through ANSI-C rules;	"STR_SKIPDELIMS"  forces  that
	   after the terminating "NUL" is written and the token	returned, fur-
	   ther	 delimiters are	skipped	(this allows one to make sure that the
	   delimiters for one word don't become	part of	the next word  if  one
	   change  delimiters  between calls); and "STR_TRIGRAPHS" enables the
	   recognition and expansion of	ANSI C Trigraph	sequences (as  a  side
	   effect this enables "STR_BACKSLASHESC", too).

       int str_parse(const char	*s, const char *pop, ...);
	   This	 parses	the string s according to the parsing operation	speci-
	   fied	by pop.	If the parsing operation succeeds, 1 is	 returned.  If
	   the parsing operation failed	because	the pattern pop	did not	match,
	   0 is	returned. If the parsing operation failed because the underly-
	   ing regular expression library failed, "-1" is returned.

	   The	pop  string  usually has one of	the following two syntax vari-
	   ants: `m delim regex	delim flags*' (for matching operations)	and `s
	   delim regex delim subst  delim  flags*'  (for  substitution	opera-
	   tions).  For	more details about the syntax variants and semantic of
	   the pop argument see	section	GORY  DETAILS,	Parsing	 Specification
	   below.  The syntax of the regex part	in pop is mostly equivalent to
	   Perl	5's regular expression syntax. For the complete	and  gory  de-
	   tails  see  perlre(1).  A  brief summary you	can find under section
	   GORY	DETAILS, Perl Regular Expressions below.

       int str_format(char *s, str_size_t n, const char	*fmt, ...);
	   This	formats	a new string according to fmt and optionally following
	   arguments and writes	it into	the string s, but never	 more  than  n
	   characters at all. It returns the number of written characters.  If
	   s is	"NULL" it just calculates the number of	characters which would
	   be written.

	   The	function  generates the	output string under the	control	of the
	   fmt format string that specifies how	subsequent arguments (or argu-
	   ments accessed  via	the  variable-length  argument	facilities  of
	   stdarg(3)) are converted for	output.

	   The format string fmt is composed of	zero or	more directives: ordi-
	   nary	 characters  (not %), which are	copied unchanged to the	output
	   stream; and conversion specifications, each	of  which  results  in
	   fetching  zero or more subsequent arguments.	Each conversion	speci-
	   fication is introduced by the character %. The arguments must  cor-
	   respond  properly (after type promotion) with the conversion	speci-
	   fier. Which conversion specifications are supported	are  described
	   in detail under GORY	DETAILS, Format	Specification below.

       unsigned	long str_hash(const char *s, str_size_t	n, int mode);
	   This	 function calculates a hash value of string s (or of its first
	   n characters	if n is	equal to 0). The following  hashing  functions
	   are	supported  and	can  be	 selected  with	 mode: STR_HASH_DJBX33
	   (Daniel J. Berstein,	Times 33 Hash with  Addition),	STR_HASH_BJDDJ
	   (Bob	Jenkins, Dr. Dobbs Journal), and STR_HASH_MACRC32 (Mark	Adler,
	   Cyclic Redundancy Check with	32-Bit). This function is intended for
	   fast	use in hashing algorithms and not for use as cryptographically
	   strong message digests.

       int str_base64(char *s, str_size_t n, unsigned char *ucp, str_size_t
       ucn, int	mode);
	   This	 function  Base64 encodes ucn bytes starting at	ucp and	writes
	   the resulting string	into s (but never more than n  characters  are
	   written).  The  mode	 for  this operation has to be "STR_BASE64_EN-
	   CODE".  Additionally	one can	OR the	value  "STR_BASE64_STRICT"  to
	   enable  strict  encoding  where after every 72th output character a
	   newline character is	inserted. The function returns the  number  of
	   output characters written.  If s is "NULL" the function just	calcu-
	   lates the number of required	output characters.

	   Alternatively,  if mode is "STR_BASE64_DECODE" the string s (or the
	   first n characters only if n	is not 0) is decoded  and  the	output
	   bytes  written  at  ucp. Again, if ucp is "NULL" only the number of
	   required output bytes are calculated.

GORY DETAILS
       In this part of the documentation more complex topics are documented in
       detail.

       Perl Regular Expressions

       The regular expressions used in OSSP str	are more or less Perl compati-
       ble (they are provided by a stripped down and built-in version  of  the
       PCRE library). So the syntax description	in perlre(1) applies and don't
       has  to	be repeated here again.	For a deeper understanding and details
       you should have a look at the book `Mastering Regular Expressions' (see
       also the	perlbook(1) manpage) by	Jeffrey	Friedl.	 For convenience  rea-
       sons  we	 give  you only	a brief	summary	of Perl	compatible regular ex-
       pressions:

       The following metacharacters have their standard	egrep(1) meanings:

	 \	Quote the next metacharacter
	 ^	Match the beginning of the line
	 .	Match any character (except newline)
	 $	Match the end of the line (or before newline at	the end)
	 |	Alternation
	 ()	Grouping
	 []	Character class

       The following standard quantifiers are recognized:

	 *	Match 0	or more	times (greedy)
	 *?	Match 0	or more	times (non greedy)
	 +	Match 1	or more	times (greedy)
	 +?	Match 1	or more	times (non greedy)
	 ?	Match 1	or 0 times (greedy)
	 ??	Match 1	or 0 times (non	greedy)
	 {n}	Match exactly n	times (greedy)
	 {n}?	Match exactly n	times (non greedy)
	 {n,}	Match at least n times (greedy)
	 {n,}?	Match at least n times (non greedy)
	 {n,m}	Match at least n but not more than m times (greedy)
	 {n,m}?	Match at least n but not more than m times (non	greedy)

       The following backslash sequences are recognized:

	 \t	Tab		      (HT, TAB)
	 \n	Newline		      (LF, NL)
	 \r	Return		      (CR)
	 \f	Form feed	      (FF)
	 \a	Alarm (bell)	      (BEL)
	 \e	Escape (think troff)  (ESC)
	 \033	Octal char
	 \x1B	Hex char
	 \c[	Control	char
	 \l	Lowercase next char
	 \u	Uppercase next char
	 \L	Lowercase till \E
	 \U	Uppercase till \E
	 \E	End case modification
	 \Q	Quote (disable)	pattern	metacharacters till \E

       The following non zero-width assertions are recognized:

	 \w	Match a	"word" character (alphanumeric plus "_")
	 \W	Match a	non-word character
	 \s	Match a	whitespace character
	 \S	Match a	non-whitespace character
	 \d	Match a	digit character
	 \D	Match a	non-digit character

       The following zero-width	assertions are recognized:

	 \b	Match a	word boundary
	 \B	Match a	non-(word boundary)
	 \A	Match only at beginning	of string
	 \Z	Match only at end of string, or	before newline at the end
	 \z	Match only at end of string
	 \G	Match only where previous m//g left off	(works only with /g)

       The following regular expression	extensions are recognized:

	 (?#text)	       An embedded comment
	 (?:pattern)	       This is for clustering, not capturing (simple)
	 (?imsx-imsx:pattern)  This is for clustering, not capturing (full)
	 (?=pattern)	       A zero-width positive lookahead assertion
	 (?!pattern)	       A zero-width negative lookahead assertion
	 (?<=pattern)	       A zero-width positive lookbehind	assertion
	 (?<!pattern)	       A zero-width negative lookbehind	assertion
	 (?>pattern)	       An "independent"	subexpression
	 (?(cond)yes-re)       Conditional expression (simple)
	 (?(cond)yes-re|no-re) Conditional expression (full)
	 (?imsx-imsx)	       One or more embedded pattern-match modifiers

       Parsing Specification

       The str_parse(const char	*s, const char *pop, ...) function is  a  very
       flexible	 but  complex  one.  The argument s is the string on which the
       parsing operation specified by argument pop is  applied.	  The  parsing
       semantics  are  highly influenced by Perl's `=~'	matching operator, be-
       cause one of the	main goals of str_parse(3) is to allow one to  rewrite
       typical Perl matching constructs	into C.

       Now  to	the gory details. In general, the pop argument of str_parse(3)
       has one of the following	two syntax variants:

       Matching: `m delim regex	delim flags*':
	   This	matches	s against the Perl-style regular expression regex  un-
	   der the control of zero or more flags which control the parsing se-
	   mantics.  The  stripped  down  pop  syntax `regex' is equivalent to
	   `m/regex/'.

	   For each grouping pair of parenthesis in regex, the text in s which
	   was grouped by the  parenthesis  is	extracted  into	 new  strings.
	   These per default are allocated as seperate strings and returned to
	   the caller through following	`char **' arguments. The caller	is re-
	   quired to free(3) them later.

       Substitution: `s	delim regex delim subst	delim flags*':
	   This	 matches s against the Perl-style regular expression regex un-
	   der the control of zero or more flags which control the parsing se-
	   mantics. As a result	of the operation, a new	 string	 formed	 which
	   consists  of	 s  but	 with the part which matched regex replaced by
	   subst. The result string is returned	to the caller through a	 `char
	   **' argument. The caller is required	to free(3) this	later.

	   For each grouping pair of parenthesis in regex, the text in s which
	   was	grouped	 by  the parenthesis is	extracted into new strings and
	   can be referenced for expansion via `$n' (n=1,..) in	subst.	 Addi-
	   tionally  any  str_format(3)	 style `%' constructs in subst are ex-
	   panded through additional caller supplied arguments.

       The following flags are supported:

       b   If the bundle flag `b' is specified,	the extracted strings are bun-
	   dled	together into a	single chunk of	memory and its address is  re-
	   turned to the caller	with a additional `char	**' argument which has
	   to  preceed	the  regular  string arguments.	The caller then	has to
	   free(3) only	this chunk of memory in	order to  free	all  extracted
	   strings at once.

       i   If  the case-insensitive flag `i' is	specified, regex is matched in
	   case-insensitive way.

       o   If the once flag `o'	is specified, this indicates to	the  OSSP  str
	   library that	the whole pop string is	constant and that its internal
	   pre-processing  (it is compiled into	a deterministic	finite automa-
	   ton (DFA) internally) has to	be done	only once (the	OSSP  str  li-
	   brary then caches the DFA which corresponds to the pop argument).

       x   If  the  extended  flag `x' is specified, the regex's legibility is
	   extended by permitting embedded whitespace and  comments  to	 allow
	   one	to write down complex regular expressions more cleary and even
	   in a	documented way.

       m   If the multiple lines flag  `m'  is	specified,  the	 string	 s  is
	   treated  as	multiple  lines. That is, this changes the regular ex-
	   pression meta characters `^'	and `$'	from matching at only the very
	   start or end	of the string s	to the start or	end of any  line  any-
	   where within	the string s.

       s   If  the  single line	flag `s' is specified, the string s is treated
	   as single line. That	is, this changes the regular  expression  meta
	   character  `.'  to  match any character whatsoever, even a newline,
	   which it normally would not match.

CONVERSION SPECIFICATION
       In the format string of str_format(3) each conversion specification  is
       introduced by the character %. After the	%, the following appear	in se-
       quence:

       o   An optional field, consisting of a decimal digit string followed by
	   a  $, specifying the	next argument to access.  If this field	is not
	   provided, the argument following the	last argument accessed will be
	   used.  Arguments are	numbered starting at 1.	 If  unaccessed	 argu-
	   ments  in the format	string are interspersed	with ones that are ac-
	   cessed the results will be indeterminate.

       o   Zero	or more	of the following flags:

	   A # character specifying that the value should be converted	to  an
	   ``alternate form''.	For c, d, i, n,	p, s, and u, conversions, this
	   option has no effect.  For o	conversions, the precision of the num-
	   ber	is increased to	force the first	character of the output	string
	   to a	zero (except if	a zero value is	printed	with an	explicit  pre-
	   cision  of  zero).	For x and X conversions, a non-zero result has
	   the string 0x (or 0X	for X conversions) prepended to	it.  For e, E,
	   f, g, and G,	conversions, the result	will always contain a  decimal
	   point,  even	 if no digits follow it	(normally, a decimal point ap-
	   pears in the	results	of those conversions only if a digit follows).
	   For g and G conversions, trailing zeros are not  removed  from  the
	   result as they would	otherwise be.

	   A  zero `0' character specifying zero padding.  For all conversions
	   except n, the converted value is padded  on	the  left  with	 zeros
	   rather than blanks.	If a precision is given	with a numeric conver-
	   sion	(d, i, o, u, i,	x, and X), the `0' flag	is ignored.

	   A negative field width flag `-' indicates the converted value is to
	   be  left adjusted on	the field boundary.  Except for	n conversions,
	   the converted value is padded on the	right with blanks, rather than
	   on the left with blanks or zeros.  A	`-' overrides a	 `0'  if  both
	   are given.

	   A  space,  specifying that a	blank should be	left before a positive
	   number produced by a	signed conversion (d, e, E, f, g, G, or	i).

	   A `+' character specifying that a sign always be  placed  before  a
	   number produced by a	signed conversion.  A `+' overrides a space if
	   both	are used.

       o   An  optional	decimal	digit string specifying	a minimum field	width.
	   If the converted value has fewer characters than the	 field	width,
	   it  will  be	padded with spaces on the left (or right, if the left-
	   adjustment flag has been given) to fill out the field width.

       o   An optional precision, in the form of a period `.' followed	by  an
	   optional  digit  string. If the digit string	is omitted, the	preci-
	   sion	is taken as zero. This gives the minimum number	of  digits  to
	   appear  for	d, i, o, u, x, and X conversions, the number of	digits
	   to appear after the decimal-point for e, E, and f conversions,  the
	   maximum  number  of	significant digits for g and G conversions, or
	   the maximum number of characters to be printed from a string	for  s
	   conversions.

       o   The	optional  character h, specifying that a following d, i, o, u,
	   x, or X conversion corresponds to a	`"short	 int"'	or  `"unsigned
	   short  int"'	argument, or that a following n	conversion corresponds
	   to a	pointer	to a `"short int" argument.

       o   The optional	character l (ell) specifying that a following d, i, o,
	   u, x, or X conversion applies to a pointer to  a  `"long  int"'  or
	   `"unsigned  long  int"'  argument, or that a	following n conversion
	   corresponds to a pointer to a `"long	int" argument.

       o   The optional	character q, specifying	that a following d, i,	o,  u,
	   x, or X conversion corresponds to a `"quad int"' or `"unsigned quad
	   int"'  argument,  or	that a following n conversion corresponds to a
	   pointer to a	`"quad int"' argument.

       o   The character L specifying that a following e, E, f,	g, or  G  con-
	   version corresponds to a `"long double"' argument.

       o   A character that specifies the type of conversion to	be applied.

       A  field	 width	or precision, or both, may be indicated	by an asterisk
       `*' or an asterisk followed by one or more decimal digits and a `$' in-
       stead of	a digit	string.	 In this case, an  `"int"'  argument  supplies
       the  field  width or precision.	A negative field width is treated as a
       left adjustment flag followed by	a positive  field  width;  a  negative
       precision is treated as though it were missing.	If a single format di-
       rective	mixes positional (`nn$') and non-positional arguments, the re-
       sults are undefined.

       The conversion specifiers and their meanings are:

       diouxX
	   The `"int"' (or  appropriate	 variant)  argument  is	 converted  to
	   signed decimal (d and i), unsigned octal (o), unsigned decimal (u),
	   or unsigned hexadecimal (x and X) notation.	The letters abcdef are
	   used	 for  x	conversions; the letters ABCDEF	are used for X conver-
	   sions.  The precision, if any, gives	the minimum number  of	digits
	   that	 must appear; if the converted value requires fewer digits, it
	   is padded on	the left with zeros.

       DOU The `"long int" argument is converted to signed  decimal,  unsigned
	   octal, or unsigned decimal, as if the format	had been ld, lo, or lu
	   respectively.  These	conversion characters are deprecated, and will
	   eventually disappear.

       eE  The	`"double"'  argument  is  rounded  and	converted in the style
	   `[-]d.ddde+-dd' where there is one digit before  the	 decimal-point
	   character  and the number of	digits after it	is equal to the	preci-
	   sion; if the	precision is missing, it is taken as 6;	if the	preci-
	   sion	 is zero, no decimal-point character appears.  An E conversion
	   uses	the letter E (rather than e) to	introduce the  exponent.   The
	   exponent always contains at least two digits; if the	value is zero,
	   the exponent	is 00.

       f   The	`"double"'  argument is	rounded	and converted to decimal nota-
	   tion	in the style `[-]ddd.ddd>' where the number  of	 digits	 after
	   the	decimal-point  character  is equal to the precision specifica-
	   tion.  If the precision is missing, it is taken as 6; if the	preci-
	   sion	is explicitly zero, no decimal-point character appears.	 If  a
	   decimal point appears, at least one digit appears before it.

       g   The	`"double"'  argument  is converted in style f or e (or E for G
	   conversions).  The precision	specifies the  number  of  significant
	   digits.   If	 the  precision	is missing, 6 digits are given;	if the
	   precision is	zero, it is treated as 1.  Style e is used if the  ex-
	   ponent from its conversion is less than -4 or greater than or equal
	   to  the  precision.	Trailing zeros are removed from	the fractional
	   part	of the result; a decimal point appears only if it is  followed
	   by at least one digit.

       c   The	`"int"'	 argument is converted to an `"unsigned	char", and the
	   resulting character is written.

       s   The `"char *"' argument is expected to be a pointer to an array  of
	   character  type  (pointer  to a string).  Characters	from the array
	   are written up to (but not including) a terminating	"NUL"  charac-
	   ter;	if a precision is specified, no	more than the number specified
	   are	written.   If  a precision is given, no	null character need be
	   present; if the precision is	not specified, or is greater than  the
	   size	of the array, the array	must contain a terminating "NUL" char-
	   acter.

       p   The	`"void *" pointer argument is printed in hexadecimal (as if by
	   `%#x' or `%#lx).

       n   The number of characters written so far is stored into the  integer
	   indicated by	the `"int *"' (or variant) pointer argument.  No argu-
	   ment	is converted.

       %   A `%' is written. No	argument is converted. The complete conversion
	   specification is `%%.

       In no case does a non-existent or small field width cause truncation of
       a  field;  if the result	of a conversion	is wider than the field	width,
       the field is expanded to	contain	the conversion result.

EXAMPLES
       In the following	a few snippets of selected use cases of	OSSP  str  are
       presented:

       Splice a	String into Another
	    char *v1 = "foo bar	quux";
	    char *v2 = "baz";
	    str_splice(v1, 3, 5, v2, 0):
	    /* now we have v1 =	"foobazquux" */
	    ....

       Tokenize	a String
	    char *var =	" foo \t " bar 'baz'" q'uu'x #comment";
	    char *tok, *p;
	    p =	var;
	    while ((tok	= str_token(p, ":", "\"'", "#",	0)) != NULL) {
		/* here	we enter three times:
		   1. tok = "foo"
		   2. tok = " bar 'baz'"
		   3. tok = "quux" */
		...
	    }

       Match a String
	    char *var =	"foo:bar";
	    if (str_parse(var, "^.+?:.+$/") > 0) {
		/* var matched */
		...
	    }

       Match a String and Go Ahead with	Details
	    char *var =	"foo:bar";
	    char *cp, *v1, *v2;
	    if (str_parse(var, "m/^(.+?):(.+)$/b", &cp,	&v1, &v2) > 0) {
		...
		/* now we have:
		   cp =	"foo\0bar\0" and v1 and	v2 pointing
		   into	it, i.e., v1 = "foo", v2 = "bar" */
		...
		free(cp);
	    }

       Substitute Text in a String
	    char *var =	"foo:bar";
	    char *subst	= "quux";
	    char *new;
	    str_parse(var, "s/^(.+?):(.+)$/$1-%s-$2/", &new, subst);
	    ...
	    /* now we have: var	= "foo:bar", new = "foo:quux:bar" */
	    ...
	    free(new);

       Format a	String
	    char *v0 = "abc..."; /* length not guessable */
	    char *v1 = "foo";
	    void *v2 = 0xDEAD;
	    int	v3 = 42;
	    char *cp;
	    int	n;

	    n =	str_format(NULL, 0, "%s|%5s-%x-%04d", v0, v1, v2, v3);
	    cp = malloc(n);
	    str_format(cp, n, "%s-%x-%04d", v1,	v2, v3);
	    /* now we have cp =	"abc...|  foo-DEAD-0042" */
	    ...
	    free(cp);

SEE ALSO
       string(3), printf(3), perlre(1).

HISTORY
       OSSP  str  was  written	in  November and December 1999 by Ralf S.  En-
       gelschall for the OSSP project. As  building  blocks  various  existing
       code  was used and recycled: for	the str_token(3) implementation	an an-
       chient strtok(3)	flavor from William Deich 1991 was cleaned up and  ad-
       justed.	As  the	 background  parsing engine for	str_parse(3) a heavily
       stripped	down version of	Philip Hazel's Perl Compatible Regular Expres-
       sion (PCRE) library (initially version 2.08 and now 3.9)	was used.  The
       str_format(3)  implementation was based on Panos	Tsirigotis' sprintf(3)
       code as adjusted	by the Apache Software Foundation (ASF)	1998. The for-
       matting engine was stripped down	and enhanced to	support	 internal  ex-
       tensions	which were required by str_format(3) and str_parse(3).

AUTHOR
	Ralf S.	Engelschall
	rse@engelschall.com
	www.engelschall.com

12-Oct-2005			  Str 0.9.12				str(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=str&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help