Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
libinn_uwildmat(3)	  InterNetNews Documentation	    libinn_uwildmat(3)

NAME
       uwildmat, uwildmat_simple, uwildmat_poison - Perform wildmat matching

SYNOPSIS
	   #include <inn/libinn.h>

	   bool	uwildmat(const char *text, const char *pattern);

	   bool	uwildmat_simple(const char *text, const	char *pattern);

	   enum	uwildmat uwildmat_poison(const char *text, const char *pattern);

DESCRIPTION
       uwildmat	compares text against the wildmat expression pattern,
       returning true if and only if the expression matches the	text.  "@" has
       no special meaning in pattern when passed to uwildmat.  Both text and
       pattern are assumed to be in the	UTF-8 character	encoding, although
       malformed UTF-8 sequences are treated in	a way that attempts to be
       mostly compatible with single-octet character sets like ISO 8859-1.
       (In other words,	if you try to match ISO	8859-1 text with these
       routines	everything should work as expected unless the ISO 8859-1 text
       contains	valid UTF-8 sequences, which thankfully	is somewhat rare.)

       uwildmat_simple is identical to uwildmat	except that neither "!"	 nor
       "," have	any special meaning and	pattern	is always treated as a single
       pattern.	 This function exists solely to	support	legacy interfaces like
       NNTP's XPAT command, and	should be avoided when implementing new
       features.

       uwildmat_poison works similarly to uwildmat, except that	"@" as the
       first character of one of the patterns in the expression	(see below)
       "poisons" the match if it matches.  uwildmat_poison returns
       UWILDMAT_MATCH if the expression	matches	the text, UWILDMAT_FAIL	if it
       doesn't,	and UWILDMAT_POISON if the expression doesn't match because a
       poisoned	pattern	matched	the text.  These enumeration constants are
       defined in the inn/libinn.h header.

WILDMAT	EXPRESSIONS
       A wildmat expression follows rules similar to those of shell filename
       wildcards but with some additions and changes.  A wildmat expression is
       composed	of one or more wildmat patterns	separated by commas.  Each
       character in the	wildmat	pattern	matches	a literal occurrence of	that
       same character in the text, with	the exception of the following
       metacharacters:

       ?       Matches	 any   single  character  (including  a	 single	 UTF-8
	       multibyte character, so "?" can match more than one byte).

       *       Matches any sequence of zero or more characters.

       \       Turns off any special meaning of	the following  character;  the
	       following  character  will  match itself	in the text.  "\" will
	       escape any character, including another backslash  or  a	 comma
	       that  otherwise	would separate a pattern from the next pattern
	       in an expression.  Note	that  "\"  is  not  special  inside  a
	       character range (no metacharacters are).

       [...]   A  character set, which matches any single character that falls
	       within that set.	 The  presence	of  a  character  between  the
	       brackets	 adds  that character to the set; for example, "[amv]"
	       specifies the set containing the	characters "a",	"m", and  "v".
	       A  range	of characters may be specified using "-"; for example,
	       "[0-5abc]"  is  equivalent  to  "[012345abc]".	The  order  of
	       characters is as	defined	in the UTF-8 character set, and	if the
	       start  character	 of  such  a  range  falls  after  the	ending
	       character  of  the  range  in  that  ranking  the  results   of
	       attempting a match with that pattern are	undefined.

	       In order	to include a literal "]" character in the set, it must
	       be the first character of the set (possibly following "^"); for
	       example,	 "[]a]"	 matches  either  "]"  or  "a".	  To include a
	       literal "-" character in	the set, it must be either  the	 first
	       or  the last character of the set.  Backslashes have no special
	       meaning inside a	character set, nor do any other	of the wildmat
	       metacharacters.

       [^...]  A negated character set.	 Follows the same rules	as a character
	       set above, but matches any character not	contained in the  set.
	       So,  for	 example, "[^]-]" matches any character	except "]" and
	       "-".

       In addition, "!"	(and possibly "@") have	special	meaning	as  the	 first
       character of a pattern; see below.

       When  matching  a  wildmat  expression  against	some text, each	comma-
       separated pattern is matched in order from left to right.  In order  to
       match,  the  pattern  must  match the whole text; in regular expression
       terminology, it's implicitly anchored at	both  the  beginning  and  the
       end.   For  example,  the  pattern  "a"	matches	 only the text "a"; it
       doesn't match "ab" or "ba" or even  "aa".   If  none  of	 the  patterns
       match,  the  whole  expression  doesn't	match.	Otherwise, whether the
       expression matches is determined	entirely  by  the  rightmost  matching
       pattern;	 the  expression matches the text if and only if the rightmost
       matching	pattern	is not negated.

       For example, consider the text "news.misc".  The	expression "*" matches
       this text, of course,  as  does	"comp.*,news.*"	 (because  the	second
       pattern matches).  "news.*,!news.misc" does not match this text because
       both  patterns  match, meaning that the rightmost takes precedence, and
       the rightmost matching pattern is negated.   "news.*,!news.misc,*.misc"
       does  match  this  text,	 since	the  rightmost matching	pattern	is not
       negated.

       Note that the expression	"!news.misc" can't match anything.  Either the
       pattern doesn't	match,	in  which  case	 no  patterns  match  and  the
       expression  doesn't  match,  or	the  pattern does match, in which case
       because it's negated the	expression doesn't match.  "*,!news.misc",  on
       the  other  hand,  is  a	 useful	 pattern  that matches anything	except
       "news.misc".

       "!" has significance only as the	first character	of a pattern; anywhere
       else in the pattern, it matches a literal "!"  in  the  text  like  any
       other non-metacharacter.

       If  the uwildmat_poison interface is used, then "@" behaves the same as
       "!" except that if an expression	fails to match because	the  rightmost
       matching	pattern	began with "@",	UWILDMAT_POISON	is returned instead of
       UWILDMAT_FAIL.

       If  the	uwildmat_simple	 interface is used, the	matching rules are the
       same as above except that none of "!", "@", or  ","  have  any  special
       meaning at all and only match those literal characters.

BUGS
       All of these functions internally convert the passed arguments to const
       unsigned	 char  pointers.   The	only reason why	they take regular char
       pointers	instead	of unsigned char is for	the  convenience  of  INN  and
       other  callers  that  may  not  be  using unsigned char everywhere they
       should.	In a future revision, the public interface should  be  changed
       to just take unsigned char pointers.

HISTORY
       Written by Rich $alz <rsalz@uunet.uu.net> in 1986, and posted to	Usenet
       several	times  since then, most	notably	in comp.sources.misc in	March,
       1991.

       Lars Mathiesen <thorinn@diku.dk>	enhanced  the  multi-asterisk  failure
       mode in early 1991.

       Rich and	Lars increased the efficiency of star patterns and reposted it
       to comp.sources.misc in April, 1991.

       Robert  Elz  <kre@munnari.oz.au>	 added	minus  sign  and close bracket
       handling	in June, 1991.

       Russ  Allbery  <eagle@eyrie.org>	 added	support	 for   comma-separated
       patterns	 and  the  "!"	and  "@"  metacharacters  to  the core wildmat
       routines	in July, 2000.	He also	added support  for  UTF-8  characters,
       changed	the  default  behavior	to  assume  that both the text and the
       pattern are in UTF-8, and largely rewrote this documentation to	expand
       and clarify the description of how a wildmat expression matches.

       Please  note  that the interfaces to these functions are	named uwildmat
       and the like rather than	wildmat	to distinguish them from  the  wildmat
       function	 provided  by Rich $alz's original implementation.  While this
       code is heavily based on	 Rich's	 original  code,  it  has  substantial
       differences,  including	the extension to support UTF-8 characters, and
       has noticeable functionality changes.  Any bugs present	in  it	aren't
       Rich's fault.

SEE ALSO
       grep(1),	fnmatch(3), regex(3), regexp(3).

INN 2.8.0			  2022-02-06		    libinn_uwildmat(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=libinn_uwildmat&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help