Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
MSORT(1)			 User Commands			      MSORT(1)

NAME
       msort - sort records in complex ways

SYNOPSIS
       msort <options> [<input file>]

DESCRIPTION
       msort  is  a  program for sorting text files in sophisticated ways.  It
       was developed initially for alphabetizing dictionaries of languages  in
       which  the  ordering  may  be quite different from English but has many
       other uses.

       msort allows you	to sort	blocks of text delimited in a number  of  ways
       rather  than just lines and to specify particular fields	of a record as
       sort keys using either their position, counted from either end,	or  by
       matching	regular	expressions to their tags.

       msort  is capable of sorting on multiple	keys, so that when two records
       tie on one key, the tie may be broken on	another. Any or	all  keys  may
       be  optional.   How  absent  optional  keys are ordered with respect to
       present keys may	be set separately for each key.

       msort allows you	to specify arbitrary sort orders and to	define	virtu-
       ally  unlimited numbers of multigraphs of effectively unlimited length.
       The sort	order and multigraphs are defined separately for each key.  If
       your system has locale support, you can also use	locale collation rules
       instead of specify your own sort	order.

       msort  provides twelve types of key comparison: lexicographic, numeric,
       numeric string, hybrid, by string length, by angle, by date, by	domain
       name, by	time, by ISO8601 date/time stamp, by month name, and random.

       What  month names are used is a bit complicated.	If the -s flag is used
       on the same key and its argument	is the name of a file, the month names
       are read	from the file, which should be in the same format  as  a  sort
       order definition	file. If the -s	flag is	used and its argument is a lo-
       cale  name,  the	month names recognized will be the month names and ab-
       breviations associated with the specified locale. If the	-s flag	is not
       used the	month names recognized will be the month names	and  abbrevia-
       tions  associated with the current locale. If your system does not have
       locale support and you do not use the -s	flag to	read the  month	 names
       from a file, the	month names recognized will be the English month names
       and abbreviations.

       msort  can  reverse  the	characters in a	key, allowing it to be used to
       generate	reverse	dictionaries.

       A choice	of sorting algorithms is provided.

       msort fully supports Unicode. The text to be sorted, and	all specifica-
       tions, should be	in UTF-8 Unicode. (If you have plain ASCII text,  this
       is  not	a problem as ASCII is a	subset of Unicode.) Full Unicode case-
       folding is available, in	Turkic and non-Turkic variants.	 Unicode  nor-
       malization is performed before sorting.

       For usage information, execute msort with no arguments.

       Full  information about msort is	currently to be	found in the reference
       manual, which is	distributed as a PDF (Portable Document	Format)	 file.
       If  a  copy  is not available locally, you can download it from msort's
       home page:
       http://billposer.org/Software/msort.html

OPTIONS
   Informational options
       -h,--help
	      Print usage message

       -v,--version
	      Print version message

       -D,--defaults
	      List defaults

       -F,--general-options
	      List general command line	options

       -G,--gnu-equivalences
	      List equivalents for GNU sort command line options.

       -H,--informational-options
	      List informational command line options

       -K,--key-specific-options
	      List key-specific	command	line options

       -L,--limits
	      List limits

       -N,--number-systems
	      List the supported number	systems.

   General options
       -b,--block
	      A	record is terminated by	two or more newlines

       -l,--line
	      A	record consists	of a single line

       -r,--record-separator <separator>
	      A	record is terminated by	separator character

       -O,--fixed-size-record <bytes>
	      A	record consists	of the specified number	of bytes.

       -d,--field-separators <character>+
	      Fields are delimited by the named	character(s)

       -w,--whole
	      Sort on the entire text of the record

       -a,--algorithm <algorithm>
	      Use the specified	sort algorithm.	The choices  are:  I(nsertion-
	      Sort), M(ergeSort), Q(uickSort), and S(hellSort).	 Note that In-
	      sertionSort and MergeSort	are stable, while QuickSort and	Shell-
	      Sort are unstable. The default is	QuickSort.

       -M,-initial-maximum-records <records>
	      Set initial maximum number of records

       -m,--line-end-carriage-return
	      End-of-line  in  the  input  data	 is  marked by Carriage	Return
	      (0x0D) as	on the Macintosh rather	than by	Line Feed (0x0A) as on
	      Unix systems.

       -I,--invert-globally
	      Invert sense of comparisons globally

       -B,--BMP
	      No characters fall outside the Basic Multingual Plane (that  is,
	      have values greater than 0xFFFF).

       -Z,--skip-first-record
	      Copy the first record in the input to the	output without sorting
	      it. This is useful for sorting files with	a header.

       -p,--reserve-private-use-area
	      Do  not  make internal use of the	Private	Use areas. By default,
	      multigraphs are assigned internally to codepoints	in the Supple-
	      mentary Private Use areas	if full	Unicode	is in use or to	 code-
	      points in	the Private Use	area if	input is restricted to the Ba-
	      sic  Multilingual	Plane by means of the -B option. If your input
	      makes use	of the Private Use areas, this option prevents	inter-
	      ference  with  your input. In this case, multigraphs will	be as-
	      signed to	the Low	and High Surrogate areas (0xD800-0xDFFF). Note
	      that this	limits the number of multigraphs to 2,048.

       -P,--random-seed	<seed>
	      Set the seed for the random number generator. If not  set	 here,
	      it  is  set  to a	value determined by the	time. The seed used is
	      reported in the log. This	option allows runs to be replicated.

       -Q,--check-only
	      Check whether the	input is already sorted. Do not	 generate  any
	      output.	Exit status is 0 if input is already sorted, 11	if not
	      sorted.

       -1,--in <input file name>

       -2,--out	<output	file name>
	      If the output file is the	same as	the input file,	the input file
	      will be overwritten. The input file will not be  overwritten  if
	      the run is unsuccessful.

       -j,--suppress-log
	      Suppress	output	to the log. If this flag is given before there
	      is any output to the log from a command line flag, nothing  will
	      be written to the	log and	the log	file will not be created. If a
	      command  line  flag  generates a log message before this flag is
	      processed, the log file will be created but no log messages will
	      be written to it once this flag is processed. To guarantee  that
	      no  attempt  will	 be  made  to  open a log file,	give this flag
	      first.

       -q,--quiet
	      Be quiet - do not	chat while working

       -u,--unicode-normalization <mode>
	      Select Unicode normalization mode. The choices of	 mode  are:  c
	      for  normalization  form	C  (NFC),  d  for normalization	form D
	      (NFD), C for normalization form KC (NFKC), D  for	 normalization
	      form KD (NFKD), and n for	no normalization. The default is NFC.

   Key specific	options
       -e,--character-range <m,n>
	      Sort on characters m through n. Positive indices start from one.
	      Negative	indices	 indicate  position with respect to the	end of
	      the record.  For example,	the range 3,-2 consists	of  the	 third
	      character	through	the next-to-last character.

       -n,--position <POS>(,<POS>)
	      Sort  on	the specified POS or contiguous	range of POSs, where a
	      POS is of	the form  <field  number>(.<character  number>).  Both
	      counts  begin  at	 one.  Field numbers but not character numbers
	      may be negative, in which	case they are counted from the	right.
	      Thus,  1.2  is  the second character of the first	field; -2.1 is
	      the first	character of the next to last field.

       -t,--tag	<tag regexp>
	      Sort on the field	with the specified tag

       -o,--optional <comparison>
	      Optional:	compare	as (<,=,>) to present key if absent

       -C,--fold-case
	      Fold case

       -z,--fold-case-turkic
	      Fold case	with additional	Turkic conversions.

       -c,--comparison-type <comparison	type>
	      a(ngle),l(exicographic), i(so8601	 date/time),  t(ime),  D(omain
	      name/email  address),  d(ate), m(onth name), n(umeric), N(umeric
	      string),s(ize), h(hybrid), r(andom)

       -y,--number-system <number system>
	      Specifies	the number system expected for this key. This  affects
	      only numeric and numeric string keys. There are two special val-
	      ues. If the number system	is "all", records may contain any num-
	      ber  system that msort can interpret. Different records may con-
	      tain different number systems.  If the number system  is	"any",
	      records may contain any writing system that msort	can interpret,
	      but  all records must make use of	the same number	system.	 msort
	      sets the number system on	the basis of the first record.

       -f,--date-format	<date format>
	      Permutation of ymd with separators, e.g. y-m-d for international
	      date format, m/d/y for American date format, or a	permutation of
	      yd with separators, e.g. y-d, for	day-of-year dates.  All	 three
	      components  may  be  numbers in any available number system. The
	      month field may also be a	month name, determined by the same de-
	      vices as independent month name fields.

       -W,--sort-order-file-separators <file name>
	      Read the list of characters to be	treated	as separators  in  the
	      sort order definition file.

       -S,--substitutions <file	name>
	      Read substitutions from named file

       -s,--sort-order <file name>|<locale name>|"locale"
	      If  the  argument	is a file name,	it is taken to be a sort order
	      file and the sort	order for the key is read from	the  file.  If
	      the  argument is a locale	name, the collation rules for that lo-
	      cale are used. If	the argument is	"locale", the collation	 rules
	      for the current locale are used.

       -T,--transformations <(d)(e)(s)>
	      Apply  the specified transformations.  d specifies that diacrit-
	      ics are to be stripped. Separately encoded combining  diacritics
	      are  removed.  Characters	 with diacritics represented by	single
	      codepoints are replaced with the corresponding  ASCII  character
	      without  the  diacritics,	if there is one.  e specifies that en-
	      closed characters, that is, characters within circles or	paren-
	      theses,  are  to	be replaced with the corresponding plain ASCII
	      character	if there is one.  s specifies that characters in  spe-
	      cial  styles  are	 to  be	 replaced with the corresponding plain
	      ASCII character if there is one. Stylistic equivalents  include:
	      small  capitals (e.g. U+1D04), script forms (e.g.	U+212C), black
	      letter forms (e.g.  U+212D),  Arabic  presentation  forms	 (e.g.
	      U+FE81),	Hebrew	presentation  forms  (e.g.  U+FB1D), fullwidth
	      forms (e.g. U+FF01), halfwidth  forms  (e.g.  U+FF7B),  and  the
	      mathematical alphanumeric	symbols	(e.g. U+1D400).

       -x,--exclusion-file <file name>
	      Read exclusions from named file

       -X,--exclude-characters <exclusions>
	      Exclude specified	characters

       -i,--invert-locally
	      Invert sense of comparisons

       -R,--reverse-key
	      Reverse characters of key

       -A,--first-character-only
	      Ignore all but the first character of the	field, after substitu-
	      tions, exclusions, etc.

       Note: long options may not be available on your system.

SEE ALSO
       sort(1),	uninum(3)

AUTHOR
       Bill Poser (billposer@alum.mit.edu)

LICENSE
       GNU General Public License (http://www.gnu.org/licenses/gpl.html), ver-
       sion 3.

msort				 January 2010			      MSORT(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=msort&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help