Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
hspell(3)			     Ivrix			     hspell(3)

NAME
       hspell -	Hebrew spellchecker (C API)

SYNOPSIS
       #include	<hspell.h>

       int hspell_init(struct dict_radix **dictp, int flags);

       void hspell_uninit(struct dict_radix *dictp);

       int  hspell_check_word(struct  dict_radix  *dict, const char *word, int
       *preflen);

       void  hspell_trycorrect(struct  dict_radix  *dict,  const  char	*word,
       struct corlist *cl);

       int corlist_init(struct corlist *cl);

       int corlist_free(struct corlist *cl);

       int corlist_n(struct corlist *cl);

       char *corlist_str(struct	corlist	*cl, int i);

       unsigned	int hspell_is_canonic_gimatria(const char *word);

       typedef	int  hspell_word_split_callback_func(const  char  *word, const
       char *baseword, int preflen, int	prefspec);

       int  hspell_enum_splits(struct  dict_radix  *dict,  const  char	*word,
       hspell_word_split_callback_func *enumf);

       void hspell_set_dictionary_path(const char *path);

       const char *hspell_get_dictionary_path(void);

DESCRIPTION
       This  manual  describes	the  C	API of the Hspell Hebrew spellchecker.
       Please refer to hspell(1) for a description of the Hspell project,  its
       spelling	standard, and how it works.

       The  hspell_init()  function  must  be  called  first to	initialize the
       Hspell library. It sets up some global structures (see CAVEATS section)
       and then	reads the necessary dictionary files (whose places  are	 fixed
       when  the  library  is  built). The 'dictp' parameter is	a pointer to a
       struct dict_radix* object, which	is modified to point to	a newly	 allo-
       cated dictionary.  A typical hspell_init() call therefore looks like

	  struct dict_radix *dict;
	  hspell_init(&dict, flags);

       Note  that the (struct dict_radix*) type	is an opaque pointer - the li-
       brary user has no access	to the separate	fields in this structure.

       The 'flags' parameter can contain a bitwise  or'ing  of	several	 flags
       that  modify Hspell's default behavior; Turning on HSPELL_OPT_HE_SHEELA
       allows Hspell to	recognize the interrogative He prefix (he ha-she'ela).
       HSPELL_OPT_DEFAULT is a synonym for turning on no special  flag,	 i.e.,
       it evaluates to 0.

       hspell_init() returns 0 on success, or negative numbers on errors. Cur-
       rently, the only	error is -1, meaning the dictionary files could	not be
       read.

       The hspell_uninit() function undoes the effects of hspell_init(), free-
       ing any memory that was allocated during	initialization.

       The  hspell_check_word()	 function  checks  whether a certain word is a
       correct Hebrew word (possibly with prefix particles attached in a  syn-
       tacticly-correct	manner). 1 is returned if the word is correct, or 0 if
       it is incorrect.

       The  'word'  parameter should be	a single Hebrew	word, in the iso8859-8
       encoding, possibly containing the ASCII quote or	 double-quote  charac-
       ters  (signifying the geresh and	gershayim used in Hebrew for abbrevia-
       tions, acronyms,	and a few foreign sounds).  If	the  calling  programs
       works  with  other  encodings,  it  must	 convert the word to iso8859-8
       first. In particular cp1255 (the	MS-Windows Hebrew encoding) extensions
       to iso8859-8 like niqqud	characters, geresh or gershayim, are currently
       not recognized and must be removed  from	 the  word  prior  to  calling
       hspell_check_word().

       Into  the  'preflen'  parameter,	the function writes back the number of
       characters it recognized	as a prefix particle - the rest	of the	'word'
       is  a  stand-alone word.	 Because Hebrew	words typically	can be read in
       several different ways, this feature (of	getting	just one  prefix  from
       one  possible  reading) is usually not very useful, and it is likely to
       be removed in a future version.

       The hspell_enum_splits()	function provides a way	to  get	 all  possible
       splitting  of  the  given 'word'	into an	optional prefix	particle and a
       stand-alone word.  For each possible (and legal,	as some	 words	cannot
       accept  certain	prefixes)  split,  a user-defined callback function is
       called. This callback function is given the whole word, the  length  of
       the  prefix,  the stand-alone word, and a bitfield which	describes what
       types of	words this prefix can get.  Note that in some  cases,  a  word
       beginning with the letter waw gets this waw doubled before a prefix, so
       sometimes strlen(word)!=strlen(baseword)+preflen.

       The  hspell_trycorrect()	 tries	to find	a list of possible corrections
       for an incorrect	word.  Because in Hebrew the word density is  high  (a
       random  string  of letters, especially if short,	has a high probability
       of being	a correct word), this function	attempts  to  try  corrections
       based  on  the  assumption  of a	spelling error (replacement of letters
       that sound alike, missing or spurious immot qri'a), not	typo  (slipped
       finger on the keyboard, etc.) - see also	CAVEATS.

       hspell_trycorrect()  returns  the  correction  list into	a structure of
       type struct corlist.  This structure must be  first  allocated  with  a
       call to corlist_init() and subsequently freed with corlist_free().  The
       corlist_n()  macro  returns  the	 number	 of words held in an allocated
       corlist,	and corlist_str() returns the i'th word. Accordingly, here  is
       an example usage	of hspell_trycorrect():

	  struct corlist cl;
	  printf ("Found misspelled word %s. Possible corrections:\n", w);
	  corlist_init (&cl);
	  hspell_trycorrect (dict, w, &cl);
	  for (i=0; i<corlist_n(&cl); i++) {
	      printf ("%s\n", corlist_str(&cl, i));
	  }

       The hspell_is_canonic_gimatria()	function checks	whether	the given word
       is  a  canonic gimatria - i.e., the proper way to write in gimatria the
       number it represents. The caller	might want to accept canonic  gimatria
       as proper Hebrew	words, even if hspell_check_word() previously reported
       such  word to be	a non-existent word.  hspell_is_canonic_gimatria() re-
       turns the number	represented as gimatria	in  'word'  if	it  is	indeed
       proper gimatria (in canonic form), or 0 otherwise.

       hspell_init()  normally reads the dictionary files from a path compiled
       into the	library. This makes sense when the library's code and the dic-
       tionaries are distributed together, but in some scenarios  the  library
       user might want to use the Hspell dictionaries that are already present
       on  the	system	in  an arbitrary path. The function hspell_set_dictio-
       nary_path() can be used to set this path, and  should  be  used	before
       calling	hspell_init().	 The  given path is that of the	word list, and
       other  input  files  have  that	 path	with   an   appended   prefix.
       hspell_get_dictionary_path()  can  be used to find the current path. On
       many  installations,  this  defaults  to	  "/usr/local/share/hspell/he-
       brew.wgz".

LINKING
       On most systems,	the Hspell library is compiled to use the Zlib library
       for  reading  the compressed dictionaries. Therefore, a program linking
       with the	Hspell library must also be linked with	the Zlib library (usu-
       ally, by	adding "-lz" to	the compilation	line).

       Programs	that use autoconf to search for	the Hspell library, should re-
       member to tell AC_CHECK_LIB to also link	 with  the  -lz	 library  when
       checking	for -lhspell.

CAVEATS
       While  the  API described here has been stable for years, it may	change
       in the future. Users are	encouraged to compare the values of the	 inte-
       ger  macros  HSPELL_VERSION_MAJOR and HSPELL_VERSION_MINOR to those ex-
       pected by the writer of the program. A third macro,  HSPELL_VERSION_EX-
       TRA  contains  a	 string	 which	can  describe subrelease modifications
       (e.g., beta versions).

       The current Hspell C API	is very	low-level, in the sense	that it	leaves
       the user	to implement many features that	some users  take  for  granted
       that a spell-checker should provide. For	example	it doesn't provide any
       facilities for a	user-defined personal dictionary. It also has separate
       functions  for  checking	 valid Hebrew words and	valid gimatria,	and no
       function	to do both. It is assumed that the caller -  a	bigger	spell-
       checking	 library  or  word  processor (for example), will already have
       these facilities. If not, you may  wish	to  look  at  the  sources  of
       hspell(1) for an	example	implementation.

       Currently  there	 is no concept of separate Hspell "contexts" in	an ap-
       plication.  Some	of the context is now global for the  entire  applica-
       tion:  currently,  a single list	of legal prefix-particles is kept, and
       the dictionary read by hspell_init() is always read from	the global de-
       fault place. This may be	solved in a later version, e.g., by  switching
       to an API like:

	  context = hspell_new_context();
	  hspell_set_dictionary_path(context, "/some/path/hebrew.wgz");
	  hspell_init(context, flags);
	  ...
	  hspell_check_word(context, word, preflenp);

       Note that despite the global context mentioned above, after initializa-
       tion  all  functions  described here are	thread-safe, because they only
       read the	dictionary data, not write to it.

       hspell_trycorrect() is not as powerful as it could have been, with  ty-
       pos  or certain kinds of	spelling mistakes not giving useful correction
       suggestions. Along with more types of corrections,  hspell_trycorrect()
       needs  a	 better	 way to	order the likelihood of	the corrections, as an
       unordered list of 100 corrections would be just as useful  (or  rather,
       useless)	as none.

       In  some	 cases	of  errors  during hspell_init(), warning messages are
       printed to the standard errors. This is a bad thing for	a  library  to
       do.

       There are too many CAVEATS in this manual.

VERSION
       The version of hspell described by this manual page is 1.4.

COPYRIGHT
       Copyright (C) 2000-2017,	Nadav Har'El <nyh@math.technion.ac.il> and Dan
       Kenigsberg <danken@cs.technion.ac.il>.

       Hspell  is  free	software, released under the GNU Affero	General	Public
       License (AGPL) version 3.  Note that not	only the programs in the  dis-
       tribution,  but also the	dictionary files and the generated word	lists,
       are licensed under the AGPL.  There is no warranty of any kind.

       See the LICENSE file for	more information and the exact license terms.

       The   latest   version	of   this   software   can   be	   found    in
       http://hspell.ivrix.org.il/

SEE ALSO
       hspell(1)

Hspell 1.4			 24 June 2017			     hspell(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=hspell&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help