Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
CATDVI(1)		    General Commands Manual		     CATDVI(1)

NAME
       catdvi -	a DVI to plain text converter

SYNOPSIS
       catdvi  [-d debuglevel, --debug=debuglevel] [-e outenc, --output-encod-
       ing=outenc] [-p pagespec, --first-page=pagespec]	[-l pagespec,  --last-
       page=pagespec]	[-N,   --list-page-numbers]  [-s,  --sequential]  [-U,
       --show-unknown-glyphs] [-h,  --help]  [--version]  [--copyright]	 [dvi-
       file]

DESCRIPTION
       This manual page	documents catdvi version 0.14

       catdvi  reads the DVI (typesetter DeVice	Independent) file dvi-file and
       dumps a plain text approximation	of the document	it describes  to  std-
       out.   If the argument dvi-file is omitted or a dash (`-'), catdvi will
       read from stdin.	 Several output	encodings (different character sets of
       the plain text output) are supported, most notably UTF-8.

       The current version of catdvi is	a work in progress; it may not be  ro-
       bust enough for production use, but already works fine with linear eng-
       lish  text.   Many  mathematical	symbols	(e.g. the uppercase greek let-
       ters) and moderately complex formulae also come out right.

       The program needs to read the TFM (Tex Font Metric) files corresponding
       to the fonts used in the	DVI file.  These are searched (and, if	neces-
       sary and	possible, created on the fly) through the Kpathsea library.

       In  order to correctly translate	a DVI file to text, the	input encoding
       of the fonts used in it (i.e. a meaning-preserving  mapping  from  font
       code  points  to	 Unicode)  must	be known. There	are a lot of different
       font encodings in use. At the time of writing, catdvi  understands  the
       following input encodings:

       `TEX TEXT'
	      Knuth's original font encoding, also known as OT1.

       `TEX TEXT WITHOUT F-LIGATURES'
	      A	variant	of the above.

       `EXTENDED TEX FONT ENCODING - LATIN'
	      The Cork encoding, also known as T1.

       `TEX MATH ITALIC'
	      The encoding of Knuth's math italic fonts, also known as OML.

       `TEX MATH SYMBOLS'
	      The encoding of Knuth's math symbol fonts, also known as OMS.

       `TEX MATH EXTENSION' (most of it)
	      The  encoding  of	 Knuth's  math extension fonts (big operators,
	      brackets,	etc.), also known as OMX.

       `TEX TYPEWRITER TEXT'
	      The encoding of Knuth's typewriter type fonts.

       `LATEX SYMBOLS'
	      The encoding of the lasy fonts.

       Henrik Theilings	European currency symbol (`eurosym') font.

       `TEX TEXT COMPANION SYMBOLS 1---TS1' (almost everything)
	      The encoding of the text companion fonts.

       Martin Vogels symbol (`MarVoSym') font.
	      Both the 1998 and	the 2000 version are supported as far as  pos-
	      sible -- about half of the symbols are not representable in Uni-
	      code.

       `BLACKBOARD'
	      The encoding of the blackboard bold math (`bbm') fonts.

       All AMS fonts except the	Cyrillic ones.
	      This  includes  the  AMS math symbols group A and	group B, Euler
	      fraktur, Euler cursive, Euler script and Euler compatible	exten-
	      sion fonts.

       It is impossible	to do perfect  translation  from  unmarked-up  DVI  to
       plain  text,  since the former does only	describe the layout of a page,
       and a translator	such as	this should really know	where words and	 para-
       graphs end, and more importantly, which glyphs should be	aligned	verti-
       cally  and  which  shouldn't.  The current alignment algorithm tries to
       preserve	the relative horizontal	positions  of  word  beginnings;  this
       works  well  in	most  cases.   Word  breaks  are detected using	simple
       heuristics; paragraphs are not detected at all (and no  paragraph  fill
       is attempted).

       The  price  of alignment	is that	the output will	likely be more than 80
       columns wide, even though catdvi	 tries	very  hard  not	 to  use  more
       columns	than  strictly	necessary.   Output  is	 usually less than 120
       columns,	almost always less than	132 columns wide. It  may  be  a  good
       idea to switch your terminal to one of these modes if possible.

OPTIONS
       The  program  follows  the usual	GNU command line syntax, with long op-
       tions starting with two dashes.

       -d debuglevel, --debug=debuglevel
	      Set the debug output level to debuglevel (default	is 10).	 Large
	      values will result in lots of debug output, 0 in	none  at  all.
	      The maximal debug	output level currently used is 150.

       -e outenc, --output-encoding=outenc
	      Specify the encoding of the output character set.	 outenc	can be
	      one  of  the  numbers  or	names from the table below.  Names are
	      case insensitive.	 The  following	 output	 encodings  should  be
	      available:

	      0: UTF-8
	      1: US-ASCII
	      2: ISO-8859-1
	      3: ISO-8859-15

	      The  command  catdvi  --help (see	below) will give a more	up-to-
	      date list	of all compiled-in output encodings. The  default  en-
	      coding is	1.

       -p pagespec, --first-page=pagespec
	      Do  not  output pages before page	pagespec.  Pages can be	speci-
	      fied in three different ways; the	first two are exactly the same
	      as for dvips(1).

	      A	(possibly negative) number num specifies a  TeX	 page  number,
	      which  is	 stored	 as the	so-called count0 value in the DVI file
	      for every	page.  Plain TeX uses negative page numbers for	roman-
	      numbered frontmatter (title page,	preface,  TOC,	etc.)  so  the
	      count0 values compare as
		     -1	< -2 < -3 < ...	< 1 < 2	< 3 < ...
	      There  may be several pages with the same	count0 value in	a sin-
	      gle DVI file. This usually happens in documents with a per-chap-
	      ter page numbering scheme.

	      A	number prefixed	by an equals sign (`=num') specifies a	physi-
	      cal  page,  i.e. the num-th page appearing in the	DVI file. Num-
	      bering starts with 1.  Note that with the	long form of  the  op-
	      tion you actually	need two equals	signs, one as part of the long
	      option and one as	part of	the page specification.	Example:
		     catdvi --first-page==5 foo.dvi

	      The third	form of	a page specification, two numbers separated by
	      a	 colon (`num1:num2'), is useful	for documents with separately-
	      numbered parts, e.g. chapters.   It  refers  to  the  page  with
	      count0  value  equal  to num2 that catdvi	believes to be in part
	      num1.  Since those part numbers are not stored in	the DVI	 file,
	      the  program  has	 to guess them:	an internal chapter counter is
	      increased	by one every time the count0 value of the current page
	      is not greater (in above ordering) than  that  of	 the  previous
	      page.   The  counter  is	initialized to 1 if the	first page has
	      negative count0 value and	to 0 otherwise.	(A document with sepa-
	      rately numbered parts will  probably  have  separately  numbered
	      frontmatter  as  well,  and  then	 this  rule keeps the internal
	      counter equal to real world part numbers.)

       -l pagespec, --last-page=pagespec
	      Do not output pages after	page pagespec.	 Pages	are  specified
	      exactly as for the --first-page option above.

       -N, --list-page-numbers
	      Instead  of  the	contents  of pages, output their physical page
	      count, count0 value and chapter count (see the --first-page  op-
	      tion above for a definition of these).

       -s, --sequential
	      Do  not  attempt	to reproduce the page layout; output glyphs in
	      the order	they appear in the DVI file. This may be  useful  with
	      e.g. multi-column	page layouts.

       -U, --show-unknown-glyphs
	      Show the Unicode number of unknown glyphs	instead	of `?'.

       -h, --help
	      Show usage information and a list	of available output encodings,
	      then exit.

       --version
	      Show version information and exit.

       --copyright
	      Show copyright information and exit.

ENVIRONMENT
       The  usual  environment variables TFMFONTS, TEXFONTS, etc. for Kpathsea
       font search and creation	apply.	Refer to  the  Kpathsea	 documentation
       for details.

SEE ALSO
       xdvi(1),	dvips(1), tex(1), mktextfm(1), the Kpathsea texinfo documenta-
       tion, utf-8(7).

BUGS
       These things do not work	(yet):

             No rules are converted.

             Extensible  recipes (very	large brackets,	braces,	etc. built out
	      of several smaller pieces) are not properly handled.

             Complicated math formulae	are sometimes misaligned  (mostly  due
	      to lack of appropriate word break	heuristics).

             Some fonts and font encodings are	not recognised yet.

             Most  mathematical  symbols have no representation in the	avail-
	      able output character sets except	Unicode, and hence show	up  as
	      `?'  unless  UTF-8  output encoding is selected. A textual tran-
	      scription	would be desirable.

       Watch out for these:

             If there is a space where	it does	not belong or if there	is  no
	      space  where there should	be one,	report this as a bug (send the
	      DVI file to the catdvi maintainer, stating where in the file the
	      bug is seen).

AUTHORS
       catdvi was written by Antti-Juhani Kaijanaho <gaia@iki.fi>, based on  a
       skeletal	   version    by    J.H.M. Dassen    (Ray).	Bjoern	 Brill
       <brill@fs.math.uni-frankfurt.de>	did further improvements and currently
       maintains the program.

       The manual page was compiled by Bjoern Brill, using material written by
       the first two program authors.

				8 November 2002			     CATDVI(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=catdvi&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help