Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
TAR(5)			      File Formats Manual			TAR(5)

NAME
       tar -- format of	tape archive files

DESCRIPTION
       The  tar	 archive format	collects any number of files, directories, and
       other file system objects (symbolic links, device nodes,	etc.)  into  a
       single  stream of bytes.	 The format was	originally designed to be used
       with tape drives	that operate with fixed-size  blocks,  but  is	widely
       used as a general packaging mechanism.

   General Format
       A tar archive consists of a series of 512-byte records.	Each file sys-
       tem  object requires a header record which stores basic metadata	(path-
       name, owner, permissions, etc.) and zero	or more	records	containing any
       file data.  The end of the archive is indicated by two records consist-
       ing entirely of zero bytes.

       For compatibility with tape drives that use fixed block sizes, programs
       that read or write tar files always read	or write  a  fixed  number  of
       records	with each I/O operation.  These	"blocks" are always a multiple
       of the record size.  The	most common block size--and the	 maximum  sup-
       ported  by  historic  implementations--is  10240	 bytes	or 20 records.
       (Note: the terms	"block"	and "record" here are not  entirely  standard;
       this  document  follows	the  convention	established by John Gilmore in
       documenting pdtar.)

   Old-Style Archive Format
       The original tar	archive	format has been	extended many times to include
       additional information that various implementors	found necessary.  This
       section describes the variant implemented by the	tar  command  included
       in  Version  7 AT&T UNIX, which is one of the earliest widely-used ver-
       sions of	the tar	program.

       The header record for an	old-style tar archive consists of the  follow-
       ing:

	     struct header_old_tar {
		     char name[100];
		     char mode[8];
		     char uid[8];
		     char gid[8];
		     char size[12];
		     char mtime[12];
		     char checksum[8];
		     char linkflag[1];
		     char linkname[100];
		     char pad[255];
	     };
       All unused bytes	in the header record are filled	with nulls.

       name    Pathname, stored	as a null-terminated string.  Early tar	imple-
	       mentations  only	 stored	 regular files (including hardlinks to
	       those files).  One common early convention used a trailing  "/"
	       character to indicate a directory name, allowing	directory per-
	       missions	and owner information to be archived and restored.

       mode    File mode, stored as an octal number in ASCII.

       uid, gid
	       User id and group id of owner, as octal numbers in ASCII.

       size    Size  of	 file,	as  octal  number in ASCII.  For regular files
	       only, this indicates  the  amount  of  data  that  follows  the
	       header.	In particular, this field was ignored by early tar im-
	       plementations when extracting hardlinks.	 Modern	writers	should
	       always store a zero length for hardlink entries.

       mtime   Modification  time  of file, as an octal	number in ASCII.  This
	       indicates the number of seconds since the start of  the	epoch,
	       00:00:00	UTC January 1, 1970.  Note that	negative values	should
	       be avoided here,	as they	are handled inconsistently.

       checksum
	       Header  checksum,  stored as an octal number in ASCII.  To com-
	       pute the	checksum, set the checksum field to all	 spaces,  then
	       sum  all	 bytes	in the header using unsigned arithmetic.  This
	       field should be stored as six octal digits followed by  a  null
	       and a space character.  Note that many early implementations of
	       tar  used  signed  arithmetic for the checksum field, which can
	       cause interoperability problems when transferring archives  be-
	       tween systems.  Modern robust readers compute the checksum both
	       ways and	accept the header if either computation	matches.

       linkflag, linkname
	       In  order  to preserve hardlinks	and conserve tape, a file with
	       multiple	links is only written to the archive the first time it
	       is encountered.	The next time it is encountered, the  linkflag
	       is  set	to an ASCII `1'	and the	linkname field holds the first
	       name under which	this file appears.  (Note that	regular	 files
	       have a null value in the	linkflag field.)

       Early  tar  implementations varied in how they terminated these fields.
       The tar command in Version 7 AT&T UNIX used the	following  conventions
       (this  is  also documented in early BSD manpages): the pathname must be
       null-terminated;	the mode, uid, and gid fields must end in a space  and
       a  null byte; the size and mtime	fields must end	in a space; the	check-
       sum is terminated by a null and a space.	 Early implementations	filled
       the numeric fields with leading spaces.	This seems to have been	common
       practice	 until	the  IEEE Std 1003.1-1988 ("POSIX.1") standard was re-
       leased.	For best portability, modern implementations should  fill  the
       numeric fields with leading zeros.

   Pre-POSIX Archives
       An  early draft of IEEE Std 1003.1-1988 ("POSIX.1") served as the basis
       for John	Gilmore's pdtar	program	and many system	 implementations  from
       the  late  1980s	 and early 1990s.  These archives generally follow the
       POSIX ustar format described below with the following variations:
              The magic value is "ustar " (note the  following	 space).   The
	       version field contains a	space character	followed by a null.
              The  numeric  fields  are  generally filled with	leading	spaces
	       (not leading zeros as recommended in the	final standard).
              The prefix field	is often not used, limiting pathnames  to  the
	       100 characters of old-style archives.

   POSIX ustar Archives
       IEEE  Std 1003.1-1988 ("POSIX.1") defined a standard tar	file format to
       be read and written by compliant	implementations	of tar(1).  This  for-
       mat  is	often called the "ustar" format, after the magic value used in
       the header.  (The name is an acronym for	"Unix Standard TAR".)  It  ex-
       tends the historic format with new fields:

	     struct header_posix_ustar {
		     char name[100];
		     char mode[8];
		     char uid[8];
		     char gid[8];
		     char size[12];
		     char mtime[12];
		     char checksum[8];
		     char typeflag[1];
		     char linkname[100];
		     char magic[6];
		     char version[2];
		     char uname[32];
		     char gname[32];
		     char devmajor[8];
		     char devminor[8];
		     char prefix[155];
		     char pad[12];
	     };

       typeflag
	       Type  of	entry.	POSIX extended the earlier linkflag field with
	       several new type	values:
	       "0"     Regular file.  NULL should be treated as	a synonym, for
		       compatibility purposes.
	       "1"     Hard link.
	       "2"     Symbolic	link.
	       "3"     Character device	node.
	       "4"     Block device node.
	       "5"     Directory.
	       "6"     FIFO node.
	       "7"     Reserved.
	       Other   A POSIX-compliant implementation	must treat any	unrec-
		       ognized	typeflag value as a regular file.  In particu-
		       lar, writers should ensure  that	 all  entries  have  a
		       valid  filename so that they can	be restored by readers
		       that do not support the corresponding  extension.   Up-
		       percase letters "A" through "Z" are reserved for	custom
		       extensions.  Note that sockets and whiteout entries are
		       not archivable.
	       It is worth noting that the size	field, in particular, has dif-
	       ferent  meanings	 depending on the type.	 For regular files, of
	       course, it indicates the	amount of data following  the  header.
	       For  directories,  it may be used to indicate the total size of
	       all files in the	directory, for use by operating	 systems  that
	       pre-allocate  directory	space.	For all	other types, it	should
	       be set to zero by writers and ignored by	readers.

       magic   Contains	the magic value	"ustar"	followed by a NULL byte	to in-
	       dicate that this	is a POSIX standard archive.  Full  compliance
	       requires	the uname and gname fields be properly set.

       version
	       Version.	  This	should	be "00"	(two copies of the ASCII digit
	       zero) for POSIX standard	archives.

       uname, gname
	       User and	group names, as	null-terminated	ASCII strings.	 These
	       should  be  used	 in preference to the uid/gid values when they
	       are set and the corresponding names exist on the	system.

       devmajor, devminor
	       Major and minor numbers for character device  or	 block	device
	       entry.

       prefix  First  part of pathname.	 If the	pathname is too	long to	fit in
	       the 100 bytes provided by the standard format, it can be	 split
	       at  any	/ character with the first portion going here.	If the
	       prefix field is not empty, the reader will prepend  the	prefix
	       value and a / character to the regular name field to obtain the
	       full pathname.

       Note that all unused bytes must be set to NULL.

       Field  termination  is  specified slightly differently by POSIX than by
       previous	implementations.  The magic, uname, and	gname fields must have
       a trailing NULL.	 The pathname, linkname, and prefix fields must	have a
       trailing	NULL unless they fill the entire field.	 (In particular, it is
       possible	to store a 256-character pathname if it	happens	to have	a / as
       the 156th character.)  POSIX requires numeric fields to be  zero-padded
       in  the	front,	and  allows them to be terminated with either space or
       NULL characters.

       Currently, most tar implementations comply with the ustar format, occa-
       sionally	extending it by	adding new fields to the blank area at the end
       of the header record.

   Pax Interchange Format
       There are many attributes that cannot be	portably stored	in a POSIX us-
       tar  archive.   IEEE  Std  1003.1-2001  ("POSIX.1")  defined   a	  "pax
       interchange  format"  that  uses	two new	types of entries to hold text-
       formatted metadata that applies to following entries.  Note that	a  pax
       interchange  format  archive  is	a ustar	archive	in every respect.  The
       new data	is stored in ustar-compatible archive entries that use the "x"
       or "g" typeflag.	 In particular,	 older	implementations	 that  do  not
       fully  support  these extensions	will extract the metadata into regular
       files, where the	metadata can be	examined as necessary.

       An entry	in a pax interchange format archive consists  of  one  or  two
       standard	 ustar	entries, each with its own header and data.  The first
       optional	entry stores the extended attributes for the following	entry.
       This optional first entry has an	"x" typeflag and a size	field that in-
       dicates	the  total  size of the	extended attributes.  The extended at-
       tributes	themselves are stored as a series of text-format lines encoded
       in the portable UTF-8 encoding.	Each line consists of a	 decimal  num-
       ber,  a	space, a key string, an	equals sign, a value string, and a new
       line.  The decimal number indicates the length of the entire line,  in-
       cluding	the initial length field and the trailing newline.  An example
       of such a field is:
	     25	ctime=1084839148.1212\n
       Keys in all lowercase are standard keys.	 Vendors  can  add  their  own
       keys  by	prefixing them with an all uppercase vendor name and a period.
       Note that, unlike the historic header, numeric values are stored	 using
       decimal,	not octal.  A description of some common keys follows:

       atime, ctime, mtime
	       File  access,  inode  change,  and  modification	 times.	 These
	       fields can be negative or include a decimal point and  a	 frac-
	       tional value.

       uname, uid, gname, gid
	       User  name,  group  name,  and numeric UID and GID values.  The
	       user name and group name	stored here are	encoded	 in  UTF8  and
	       can  thus include non-ASCII characters.	The UID	and GID	fields
	       can be of arbitrary length.

       linkpath
	       The full	path of	the linked-to file.  Note that this is encoded
	       in UTF8 and can thus include non-ASCII characters.

       path    The full	pathname of the	entry.	Note that this is  encoded  in
	       UTF8 and	can thus include non-ASCII characters.

       realtime.*, security.*
	       These keys are reserved and may be used for future standardiza-
	       tion.

       size    The  size  of  the file.	 Note that there is no length limit on
	       this field, allowing conforming archives	to  store  files  much
	       larger than the historic	8GB limit.

       SCHILY.*
	       Vendor-specific	attributes  used by Joerg Schilling's star im-
	       plementation.

       SCHILY.acl.access, SCHILY.acl.default
	       Stores the access and default ACLs as textual strings in	a for-
	       mat that	is an extension	of the format  specified  by  POSIX.1e
	       draft  17.  In particular, each user or group access specifica-
	       tion can	include	a fourth colon-separated field	with  the  nu-
	       meric  UID  or GID.  This allows	ACLs to	be restored on systems
	       that may	not have complete user or group	information  available
	       (such  as when NIS/YP or	LDAP services are temporarily unavail-
	       able).

       SCHILY.devminor,	SCHILY.devmajor
	       The full	minor and major	numbers	for device nodes.

       SCHILY.dev, SCHILY.ino, SCHILY.nlinks
	       The device number, inode	number,	and link count for the	entry.
	       In particular, note that	a pax interchange format archive using
	       Joerg Schilling's SCHILY.* extensions can store all of the data
	       from struct stat.

       LIBARCHIVE.xattr.namespace.key
	       Libarchive stores POSIX.1e-style	extended attributes using keys
	       of  this	 form.	 The  key  value is URL-encoded: All non-ASCII
	       characters and the two special characters "=" and "%"  are  en-
	       coded as	"%" followed by	two uppercase hexadecimal digits.  The
	       value  of  this	key is the extended attribute value encoded in
	       base 64.	 XXX Detail the	base-64	format here XXX

       VENDOR.*
	       XXX document other vendor-specific extensions XXX

       Any values stored in an extended	attribute override  the	 corresponding
       values  in  the regular tar header.  Note that compliant	readers	should
       ignore the regular fields when they are overridden.  This is important,
       as existing archivers are known to store	non-compliant  values  in  the
       standard	 header	 fields	 in  this  situation.	There are no limits on
       length for any of these fields.	In particular, numeric fields  can  be
       arbitrarily  large.   All  text	fields are encoded in UTF8.  Compliant
       writers should store only portable 7-bit	ASCII characters in the	 stan-
       dard  ustar  header  and	 use extended attributes whenever a text value
       contains	non-ASCII characters.

       In addition to the x entry described above, the pax interchange	format
       also supports a g entry.	 The g entry is	identical in format, but spec-
       ifies  attributes that serve as defaults	for all	subsequent archive en-
       tries.  The g entry is not widely used.

       Besides the new x and g entries,	the pax	interchange format has	a  few
       other  minor  variations	from the earlier ustar format.	The most trou-
       bling one is that hardlinks are permitted to have data following	 them.
       This allows readers to restore any hardlink to a	file without having to
       rewind  the archive to find an earlier entry.  However, it creates com-
       plications for robust readers, as it is no longer clear whether or  not
       they should ignore the size field for hardlink entries.

   GNU Tar Archives
       The GNU tar program started with	a pre-POSIX format similar to that de-
       scribed earlier and has extended	it using several different mechanisms:
       It added	new fields to the empty	space in the header (some of which was
       later used by POSIX for conflicting purposes); it allowed the header to
       be  continued  over  multiple  records; and it defined new entries that
       modify following	entries	(similar in principle to the x entry described
       above, but each GNU special entry is single-purpose,  unlike  the  gen-
       eral-purpose  x	entry).	  As  a	result,	GNU tar	archives are not POSIX
       compatible, although more lenient POSIX-compliant readers can  success-
       fully extract most GNU tar archives.

	     struct header_gnu_tar {
		     char name[100];
		     char mode[8];
		     char uid[8];
		     char gid[8];
		     char size[12];
		     char mtime[12];
		     char checksum[8];
		     char typeflag[1];
		     char linkname[100];
		     char magic[6];
		     char version[2];
		     char uname[32];
		     char gname[32];
		     char devmajor[8];
		     char devminor[8];
		     char atime[12];
		     char ctime[12];
		     char offset[12];
		     char longnames[4];
		     char unused[1];
		     struct {
			     char offset[12];
			     char numbytes[12];
		     } sparse[4];
		     char isextended[1];
		     char realsize[12];
		     char pad[17];
	     };

       typeflag
	       GNU  tar	uses the following special entry types,	in addition to
	       those defined by	POSIX:

	       7       GNU tar treats type "7" records identically to type "0"
		       records,	except on one obscure RTOS where they are used
		       to indicate the pre-allocation of a contiguous file  on
		       disk.

	       D       This  indicates	a  directory entry.  Unlike the	POSIX-
		       standard	"5" typeflag, the header is followed  by  data
		       records	listing	 the names of files in this directory.
		       Each name is preceded by	an ASCII "Y" if	 the  file  is
		       stored in this archive or "N" if	the file is not	stored
		       in  this	archive.  Each name is terminated with a null,
		       and an extra null marks the end of the name list.   The
		       purpose	of  this entry is to support incremental back-
		       ups; a program restoring	from such an archive may  wish
		       to  delete  files on disk that did not exist in the di-
		       rectory when the	archive	was made.

		       Note that the "D" typeflag specifically violates	POSIX,
		       which requires that unrecognized	typeflags be  restored
		       as normal files.	 In this case, restoring the "D" entry
		       as  a  file could interfere with	subsequent creation of
		       the like-named directory.

	       K       The data	for this entry is a long linkname for the fol-
		       lowing regular entry.

	       L       The data	for this entry is a long pathname for the fol-
		       lowing regular entry.

	       M       This is a continuation of the last file on the previous
		       volume.	GNU multi-volume archives guarantee that  each
		       volume  begins  with  a	valid entry header.  To	ensure
		       this, a file may	be split, with part stored at the  end
		       of  one volume, and part	stored at the beginning	of the
		       next volume.  The "M" typeflag indicates	that this  en-
		       try  continues an existing file.	 Such entries can only
		       occur as	the first or second entry in an	 archive  (the
		       latter only if the first	entry is a volume label).  The
		       size  field  specifies  the  size  of  this entry.  The
		       offset field at	bytes  369-380	specifies  the	offset
		       where  this  file  fragment begins.  The	realsize field
		       specifies the total size	of the file (which must	 equal
		       size  plus  offset).   When  extracting,	GNU tar	checks
		       that the	header file name is the	one it	is  expecting,
		       that  the header	offset is in the correct sequence, and
		       that the	sum of offset and size is equal	 to  realsize.
		       FreeBSD's version of GNU	tar does not handle the	corner
		       case of an archive's being continued in the middle of a
		       long name or other extension header.

	       N       Type  "N"  records  are no longer generated by GNU tar.
		       They contained a	list of	files to be  renamed  or  sym-
		       linked  after  extraction;  this	was originally used to
		       support long names.  The	contents of this record	are  a
		       text  description  of the operations to be done,	in the
		       form "Rename %s to %s\n"	or "Symlink %s	to  %s\n";  in
		       either  case,  both  filenames  are escaped using K&R C
		       syntax.

	       S       This is a "sparse"  regular  file.   Sparse  files  are
		       stored as a series of fragments.	 The header contains a
		       list  of	 fragment  offset/length  pairs.  If more than
		       four such entries are required, the header is  extended
		       as  necessary  with "extra" header extensions (an older
		       format that is no longer	used), or "sparse" extensions.

	       V       The name	field should be	interpreted as	a  tape/volume
		       header name.  This entry	should generally be ignored on
		       extraction.

       magic   The magic field holds the five characters "ustar" followed by a
	       space.  Note that POSIX ustar archives have a trailing null.

       version
	       The  version  field holds a space character followed by a null.
	       Note that POSIX ustar archives use  two	copies	of  the	 ASCII
	       digit "0".

       atime, ctime
	       The time	the file was last accessed and the time	of last	change
	       of file information, stored in octal as with mtime.

       longnames
	       This field is apparently	no longer used.

       Sparse offset / numbytes
	       Each  such  structure  specifies	 a single fragment of a	sparse
	       file.  The two fields store values as octal numbers.  The frag-
	       ments are each padded  to  a  multiple  of  512	bytes  in  the
	       archive.	  On  extraction,  the	list of	fragments is collected
	       from the	header (including any extension	headers), and the data
	       is then read and	written	to the file at appropriate offsets.

       isextended
	       If this is set to non-zero, the header will be followed by  ad-
	       ditional	 "sparse  header"  records.  Each such record contains
	       information about as many as 21	additional  sparse  blocks  as
	       shown here:

		     struct gnu_sparse_header {
			     struct {
				     char offset[12];
				     char numbytes[12];
			     } sparse[21];
			     char    isextended[1];
			     char    padding[7];
		     };

       realsize
	       A  binary  representation  of  the file's complete size,	with a
	       much larger range than the POSIX	 file  size.   In  particular,
	       with  M	type files, the	current	entry is only a	portion	of the
	       file.  In that case, the	POSIX size  field  will	 indicate  the
	       size  of	this entry; the	realsize field will indicate the total
	       size of the file.

   Solaris Tar
       XXX More	Details	Needed XXX

       Solaris	tar  (beginning	 with  SunOS  XXX  5.7	??  XXX)  supports  an
       "extended" format that is fundamentally similar to pax interchange for-
       mat, with the following differences:
              Extended	attributes are stored in an entry whose	type is	X, not
	       x,  as  used by pax interchange format.	The detailed format of
	       this entry appears to be	the same as detailed above for	the  x
	       entry.
              An additional A entry is	used to	store an ACL for the following
	       regular	entry.	 The body of this entry	contains a seven-digit
	       octal number (whose value is 01000000 plus the  number  of  ACL
	       entries)	 followed  by a	zero byte, followed by the textual ACL
	       description.

   Other Extensions
       One common extension, utilized by GNU tar, star,	and  other  newer  tar
       implementations,	permits	binary numbers in the standard numeric fields.
       This  is	 flagged by setting the	high bit of the	first character.  This
       permits 95-bit values for the length and	time fields and	63-bit	values
       for  the	uid, gid, and device numbers.  GNU tar supports	this extension
       for the length, mtime, ctime, and atime fields.	Joerg Schilling's star
       program supports	this extension for all numeric fields.	Note that this
       extension is largely obsoleted by the extended  attribute  record  pro-
       vided by	the pax	interchange format.

       Another	early  GNU extension allowed base-64 values rather than	octal.
       This extension was short-lived and such archives	are almost never seen.
       However,	there is still code in GNU tar to support them;	this  code  is
       responsible  for	 a very	cryptic	warning	message	that is	sometimes seen
       when GNU	tar encounters a damaged archive.

SEE ALSO
       ar(1), pax(1), tar(1)

STANDARDS
       The tar utility is no longer a part of POSIX or the Single  Unix	 Stan-
       dard.   It  last	appeared in Version 2 of the Single UNIX Specification
       ("SUSv2").  It has been supplanted in subsequent	standards  by  pax(1).
       The  ustar format is currently part of the specification	for the	pax(1)
       utility.	 The  pax  interchange	file  format  is  new  with  IEEE  Std
       1003.1-2001 ("POSIX.1").

HISTORY
       A  tar  command appeared	in Seventh Edition Unix, which was released in
       January,	1979.  It replaced the tp program  from	 Fourth	 Edition  Unix
       which  in  turn replaced	the tap	program	from First Edition Unix.  John
       Gilmore's pdtar public-domain implementation (circa  1987)  was	highly
       influential  and	 formed	 the  basis of GNU tar.	 Joerg Shilling's star
       archiver	is another open-source (GPL)  archiver	(originally  developed
       circa 1985) which features complete support for pax interchange format.

FreeBSD	7.0			 May 20, 2004				TAR(5)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=tar&sektion=5&manpath=FreeBSD+7.0-RELEASE>

home | help