Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
UTF(3)			   Library Functions Manual			UTF(3)

NAME
       urecomp,	 ureexec, ureerror, urefree - UTF Regular Expression function-
       ality

SYNOPSIS
       #include	<ure.h>

       int urecomp(ure_t *up, char *exp, int cflags);

       int ureexec(ure_t *up, char *string, int	matchc,	urematch_t *matchv, int	eflags,	char *collseq);

       int ureerror(int	errcode, ure_t *up, char *buf, int size);

       int urefree(ure_t *up);

DESCRIPTION
       The URE routines	are utf(3)-aware regular expression routines.  urecomp
       is used to compile an expression	and ureexec is used to match the  com-
       piled  expression against a character string.  Matching can be done us-
       ing a collation sequence	other than English, which is the  default.  To
       do this,	use the	collseq	argument to the	ureexec	function to point to a
       UTF  string  which  is the key to the desired collation sequence.  This
       collation sequence must correspond to the utf  representation  of  that
       language	 in the	langcoll.utf file.  If this argument is	NULL, then the
       environment variable UTFCOLLSEQ will be used to determine the collation
       sequence.  If this too is NULL, then  the  default  collation  sequence
       (English)  is  used.  It	is also	possible, but not recommended, to call
       the urecollseq function directly.

       ureerror	is used	to format an error code	which can be returned by  ure-
       comp  or	ureexec.  urefree is used to free any space that was allocated
       by urecomp.

       Character ranges	are defined at execution time, not compile time.  Case
       insensitivity is	defined	at execution time, rather  than	 compile-time,
       which  obviates	the need to recompile expressions when case (in)sensi-
       tivity is the only difference.

       These routines are by no	means quick - the need	to  handle  characters
       which  may  be  more than 8 bits	wide, plus the overhead	of calculating
       ranges of characters at execution time make this	unavoidable.  However,
       functionality  was  the	goal  with  these routines, not	sheer blinding
       speed.

FLAGS
       The cflags flag to urecomp is there simply to provide a POSIX-interface
       to the URE functions.  It can take the URE_ICASE	value, meaning	ignore
       case  sensitivity  when matching	expressions every time this expression
       is used.	 This is not advised - it would	be better to ignore this flag,
       and then	use the	URE_ICASE flag to ureexec, giving  more	 control  over
       case-sensitivity.   Note	 that  extended	regular	expressions are	always
       used (there does	not seem to be any point in providing  extended	 func-
       tionality,  only	 to  provide a way of ignoring it).  In	addition, new-
       line matching is	always done, and case-sensitivity is best  decided  at
       ureexec time.

       The  eflags  flag  to ureexec can take the following values: URE_ICASE,
       URE_NOTBOL.  URE_ICASE means perform the	matching of the	expression  in
       a  case-insensitive manner, and uses the	current	language collation se-
       quence (see below). If none is specified, English is the	default.

       URE_NOTBOL is used when the string passed to ureexec should not match a
       '^' metacharacter.

RETURN VALUES
       A successful compilation	will result in URE_SUCCESS being  returned  by
       urecomp.	  urecomp  returns  URE_ERR_NULL_ARG if	it's passed a null ex-
       pression	to compile.  urecomp returns URE_ERR_TOO_BIG if	the given  ex-
       pression	 turns	out  to	be too big when	compiled (although this	should
       not happen).  If	urecomp	is unable to allocate enough  storage  on  the
       heap to store the compiled expression, URE_ERR_OUT_OF_SPACE will	be re-
       turned.	Other error codes are possible,	depending on the error encoun-
       tered, usually as part of a badly-formed	regular	expression.

       ureexec returns URE_SUCCESS if a	match was found, and URE_NOMATCH if no
       match was found.	Other error codes are possibly returned, for  self-ex-
       planatory reasons: URE_ERR_NULL_PARAM, URE_ERR_BAD_MAGIC.

       ureerror	 can be	used to	get a textual representation of	the error mes-
       sage.

EXAMPLE
       /* get the file into memory */
       static char *
       fgetfile(FILE *fp, int *size)
       {
	    struct stat	   s;
	    char      *cp;
	    int	      cc;

	    (void) fstat(fileno(fp), &s);
	    *size = s.st_size;
	    cp = (char *) malloc(*size + 1);
	    if (cp == (char *) NULL) {
		 (void)	fprintf(stderr,	"Memory	problems.0);
		 exit(1);
	    }
	    cc = fread(cp, sizeof(char), *size,	fp);
	    if (cc != *size) {
		 free(cp);
		 return	(char *) NULL;
	    }
	    cp[cc] = 0;
	    return cp;
       }

       /* do a utf regexp search for each file */
       int
       dofile(ure_t *sp, char *f, int eflags, int pname, int plineno, int pline, char *collseq)
       {
	    urematch_t	   matchv[10];
	    char *buf;
	    char *cp;
	    Rune r;
	    char ebuf[BUFSIZ];
	    char done;
	    FILE *fp;
	    int	 ucc;
	    int	 err;
	    int	 i;

	    if ((fp = fopen(f, "r")) ==	(FILE *) NULL) {
		 return	0;
	    }
	    if ((buf = fgetfile(fp, &ucc)) == (char *) NULL) {
		 return	0;
	    }
	    cp = buf;
	    for	(done =	0 ; !done ; ) {
		 switch	(err = ureexec(sp, cp, 10, matchv, eflags, collseq)) {
		 case URE_SUCCESS:
		      if (pname) {
			   printf("%s:", f);
		      }
		      if (plineno) {
			   printf("%d:", LineNum(buf, &cp[matchv[0].rm_so]));
		      }
		      if (!pline) {
			   (void) fclose(fp);
			   return 1;
		      }
		      PrintLine(cp, sp,	&cp[matchv[0].rm_so], &cp[matchv[0].rm_eo]);
		      cp = utfrune(&cp[matchv[0].rm_eo], '0);
		      if (cp ==	(char *) NULL) {
			   done	= 1;
		      }
		      i	= chartorune(&r, cp);
		      cp += i;
		      if (r == 0) {
			   done	= 1;
		      }
		      break;
		 case URE_NOMATCH:
		      done = 1;
		      break;
		 default:
		      ureerror(err, sp,	ebuf, sizeof(ebuf));
		      (void) fprintf(stderr, "Bad execution: %s0, ebuf);
		      done = 1;
		 }
	    }
	    (void) fclose(fp);
	    free(buf);
	    return 1;
       }

       extern int     optind;
       extern char    *optarg;

       int
       main(int	argc, char **argv)
       {
	    ure_t     u;
	    char errmsg[BUFSIZ];
	    char *collseq;
	    int	 plineno;
	    int	 pline;
	    int	 eflags;
	    int	 err;
	    int	 i;

	    eflags = 0;
	    plineno = 0;
	    pline = 1;
	    while ((i =	getopt(argc, argv, "a:iln")) !=	-1) {
		 switch(i) {
		 case 'a':
		      collseq =	optarg;
		      break;
		 case 'i':
		      eflags |=	URE_ICASE;
		      break;
		 case 'l':
		      pline = 0;
		      break;
		 case 'n':
		      plineno =	1;
		      break;
		 }
	    }
	    if ((err = urecomp(&u, argv[optind], 0)) !=	URE_SUCCESS) {
		 (void)	ureerror(err, &u, errmsg, sizeof(errmsg));
		 (void)	fprintf(stderr,	"can't compile ure `%s', %s0,
				argv[optind], errmsg);
		 exit(1);
	    }
	    for	(i = optind + 1	; i < argc ; i++) {
		 dofile(&u, argv[i], eflags, (optind < argc - 1), plineno, pline, collseq);
	    }
	    urefree(&u);
	    exit(0);
       }

BUGS
       What software would be complete without bugs?

AUTHOR
       Written	by  Alistair  Crooks   (agc@amdahl.com,	  or   agc@westley.de-
       mon.co.uk),  and	 based	on Henry Spencer's original regular expression
       code. I very much doubt that he would recognise his code	now,  or  that
       he would	want to.

									UTF(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=ure&sektion=3&manpath=FreeBSD+Ports+15.0>

home | help