FreeBSD Manual Pages
ascii2uni(1) General Commands Manual ascii2uni(1) NAME ascii2uni - convert 7-bit ASCII representations to UTF-8 Unicode SYNOPSIS ascii2uni [options] (<input file name>) DESCRIPTION ascii2uni converts various 7-bit ASCII representations to UTF-8. It reads from the standard input and writes to the standard output. The representations understood are listed below under the command line op- tions. If no format is specified, standard hexadecimal format (e.g. 0x00e9) is assumed. COMMAND LINE OPTIONS -a <format> Convert from the specified format. Formats may be specified by means of the following arbitrary single character codes, by means of names such as "SGML_decimal", and by examples of the desired format. A Convert hexadecimal numbers with prefix U in angle-brackets (<U00E9>). B Convert \x-escaped hex (e.g. \x00E9) C Convert \x escaped hexadecimal numbers in braces (e.g. \x{00E9}). D Convert decimal HTML numeric character references (e.g. é) E Convert hexadecimal with prefix U (U00E9). F Convert hexadecimal with prefix u (u00E9). G Convert hexadecimal in single quotes with prefix X (e.g. X'00E9'). H Convert hexadecimal HTML numeric character references (e.g. é) I Convert hexadecimal UTF-8 with each byte's hex preceded by an =-sign (e.g. =C3=A9) . This is the Quoted Printable format de- fined by RFC 2045. J Convert hexadecimal UTF-8 with each byte's hex preceded by a %-sign (e.g. %C3%A9). This is the URIescape format defined by RFC 2396. K Convert octal UTF-8 with each byte escaped by a backslash (e.g. \303\251) L Convert \U followed by eight hex digits or \u followed by four hex digits. \UXXXXXXXX encoding a character within the BMP (U+0000-U+FFFF) is converted but a warning is issued since this violates the WWW specification. M Convert hexadecimal SGML numeric character references (e.g. \#xE9;) N Convert decimal SGML numeric character references (e.g. \#233;) O Convert octal escapes for the three low bytes in big-endian order(e.g. \000\000\351)) P Convert hexadecimal numbers with prefix U+ (e.g. U+00E9) Q Convert HTML character entities (e.g. é). R Convert raw hexadecimal numbers (e.g. 00E9). Requires the -p flag. S Convert hexadecimal escapes for the three low bytes in big-en- dian order (e.g. \x00\x00\xE9) T Convert decimal escapes for the three low bytes in big-endian order (e.g. \d000\d000\d233) U Convert \u-escaped hexadecimal numbers (e.g. \u00E9). V Convert \u-escaped decimal numbers (e.g. \u00233). X Convert standard hexadecimal numbers (e.g. 0x00E9). Y Convert all three types of HTML escape: hexadecimal and deci- mal character references and character entities. 0 Convert hexadecimal UTF-8 with each byte's hex enclosed within angle brackets (e.g. <C3><A9>). 1 Convert Common Lisp format hexadecimal numbers (e.g. #x00E9). 2 Convert Perl format decimal numbers with prefix v (e.g. v233). 3 Convert hexadecimal numbers with prefix $ (e.g. $00E9). 4 Convert Postscript format hexadecimal numbers with prefix 16# (e.g. 16#00E9). 5 Convert Common Lisp format hexadecimal numbers with prefix #16r (e.g. #16r00E9). 6 Convert ADA format hexadecimal numbers with prefix 16# and suffix # (e.g. 16#00E9#). 7 Convert Apache log format hexadecimal UTF-8 with each byte's hex preceded by a backslash-x (e.g. \xC3\xA9). 8 Convert Microsoft OOXML format hexadecimal numbers with prefix _x and suffix _ (e.g. _x00E9_). 9 Convert %\u-escaped hexadecimal numbers (e.g. %\u00E9). -h Help. Print the usage message and exit. -v Print program version information and exit. -m Accept deprecated HTML entities lacking final semicolon, e.g. "é" in place of "é". -p Pure. Assume that the input consists entirely of escapes except for arbitrary (but non-null) amounts of separating whitespace. -q Be quiet. Do not chat unnecessarily. -Z <format> Convert input using the supplied format. The format specified will be used as the format string in a call to sscanf(3) with a single argument consisting of a pointer to an unsigned long in- teger. For example, to obtain the same results as with the -U flag, the format would be: \u%04X. If the format is Quoted-Printable, although it is not strictly speaking conversion of an ASCII escape to Unicode, in accordance with RFC 2045, if an equal-sign occurs at the end of an input line, both the equal- sign and the immediately following newline are skipped. All options that accept hexadecimal input recognize both upper- and lower-case hexadecimal digits. EXIT STATUS The following values are returned on exit: 0 SUCCESS The input was successfully converted. 3 INFO The user requested information such as the version number or us- age synopsis and this has been provided. 5 BAD OPTION An incorrect option flag was given on the command line. 7 OUT OF MEMORY Additional memory was unsuccessfully requested. 8 BAD RECORD An ill-formed record was detected in the input. SEE ALSO uni2ascii(1) AUTHOR Bill Poser <billposer@alum.mit.edu> LICENSE GNU General Public License August, 2011 ascii2uni(1)
NAME | SYNOPSIS | DESCRIPTION | COMMAND LINE OPTIONS | EXIT STATUS | SEE ALSO | AUTHOR | LICENSE
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=ascii2uni&sektion=1&manpath=FreeBSD+Ports+15.0.quarterly>
