Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
dcm2xml(1)			  OFFIS	DCMTK			    dcm2xml(1)

NAME
       dcm2xml - Convert DICOM file and	data set to XML

SYNOPSIS
       dcm2xml [options] dcmfile-in [xmlfile-out]

DESCRIPTION
       The  dcm2xml utility converts the contents of a DICOM file (file	format
       or raw data set)	to XML (Extensible Markup  Language).  There  are  two
       output  formats.	 The  first  one  is  specific	to  DCMTK with its DTD
       (Document Type Definition)  described  in  the  file  dcm2xml.dtd.  The
       second  one  refers  to the 'Native DICOM Model'	which is specified for
       the DICOM Application Hosting service found in DICOM part 19.

       If dcm2xml reads	a raw data set (DICOM data without a file format meta-
       header) it will attempt to guess	the transfer syntax by	examining  the
       first  few  bytes  of  the file.	It is not always possible to correctly
       guess the transfer syntax and it	is better to convert a data set	 to  a
       file  format  whenever possible (using the dcmconv utility). It is also
       possible	to use the -f and -t[ieb] options to force dcm2xml to  read  a
       data set	with a particular transfer syntax.

PARAMETERS
       dcmfile-in   DICOM input	filename to be converted ("-" for stdin)

       xmlfile-out  XML	output filename	(default: stdout)

OPTIONS
   general options
	 -h    --help
		 print this help text and exit

	       --version
		 print version information and exit

	       --arguments
		 print expanded	command	line arguments

	 -q    --quiet
		 quiet mode, print no warnings and errors

	 -v    --verbose
		 verbose mode, print processing	details

	 -d    --debug
		 debug mode, print debug information

	 -ll   --log-level  [l]evel: string constant
		 (fatal, error,	warn, info, debug, trace)
		 use level l for the logger

	 -lc   --log-config  [f]ilename: string
		 use config file f for the logger

   input options
       input file format:

	 +f    --read-file
		 read file format or data set (default)

	 +fo   --read-file-only
		 read file format only

	 -f    --read-dataset
		 read data set without file meta information

       input transfer syntax:

	 -t=   --read-xfer-auto
		 use TS	recognition (default)

	 -td   --read-xfer-detect
		 ignore	TS specified in	the file meta header

	 -te   --read-xfer-little
		 read with explicit VR little endian TS

	 -tb   --read-xfer-big
		 read with explicit VR big endian TS

	 -ti   --read-xfer-implicit
		 read with implicit VR little endian TS

       long tag	values:

	 +M    --load-all
		 load very long	tag values (e.g. pixel data)

	 -M    --load-short
		 do not	load very long values (default)

	 +R    --max-read-length  [k]bytes: integer (4..4194302, default: 4)
		 set threshold for long	values to k kbytes

   processing options
       specific	character set:

	 +Cr   --charset-require
		 require declaration of	extended charset (default)

	 +Ca   --charset-assume	 [c]harset: string
		 assume	charset	c if no	extended charset declared

	 +Cc   --charset-check-all
		 check all data	elements with string values
		 (default: only	PN, LO,	LT, SH,	ST, UC and UT)

		 # this	option is only used for	the extended check whether
		 # the Specific	Character Set (0008,0005) attribute should be
		 # present, but	not for	the conversion of unaffected element
		 # values to UTF-8 (e.g. element values	with a VR of CS)

	 +U8   --convert-to-utf8
		 convert all element values that are affected
		 by Specific Character Set (0008,0005) to UTF-8

		 # requires support from an underlying character encoding
		 # library (see	output of --version on which one is available)

   output options
       general XML format:

	 -dtk  --dcmtk-format
		 output	in DCMTK-specific format (default)

	 -nat  --native-format
		 output	in Native DICOM	Model format (part 19)

	 +Xn   --use-xml-namespace
		 add XML namespace declaration to root element

       DCMTK-specific format (not with --native-format):

	 +Xd   --add-dtd-reference
		 add reference to document type	definition (DTD)

	 +Xe   --embed-dtd-content
		 embed document	type definition	into XML document

	 +Xf   --use-dtd-file  [f]ilename: string
		 use specified DTD file	(only with +Xe)
		 (default: /usr/local/share/dcmtk-<VERSION>/dcm2xml.dtd)

	 +Wn   --write-element-name
		 write name of the DICOM data elements (default)

	 -Wn   --no-element-name
		 do not	write name of the DICOM	data elements

	 +Wb   --write-binary-data
		 write binary data of OB and OW	elements
		 (default: off,	be careful with	--load-all)

       encoding	of binary data:

	 +Eh   --encode-hex
		 encode	binary data as hex numbers
		 (default for DCMTK-specific format)

	 +Eu   --encode-uuid
		 encode	binary data as a UUID reference
		 (default for Native DICOM Model)

	 +Eb   --encode-base64
		 encode	binary data as Base64 (RFC 2045, MIME)

DCMTK Format
       The  basic  structure  of  the DCMTK-specific XML output	created	from a
       DICOM file looks	like the following:

       <?xml version="1.0" encoding="ISO-8859-1"?>
       <!DOCTYPE file-format SYSTEM "dcm2xml.dtd">
       <file-format xmlns="http://dicom.offis.de/dcmtk">
	 <meta-header xfer="1.2.840.10008.1.2.1" name="Little Endian Explicit">
	   <element tag="0002,0000" vr="UL" vm="1" len="4"
		    name="MetaElementGroupLength">
	     166
	   </element>
	   ...
	   <element tag="0002,0013" vr="SH" vm="1" len="16"
		    name="ImplementationVersionName">
	     OFFIS_DCMTK_353
	   </element>
	 </meta-header>
	 <data-set xfer="1.2.840.10008.1.2" name="Little Endian	Implicit">
	   <element tag="0008,0005" vr="CS" vm="1" len="10"
		    name="SpecificCharacterSet">
	     ISO_IR 100
	   </element>
	   ...
	   <sequence tag="0028,3010" vr="SQ" card="2" name="VOILUTSequence">
	     <item card="3">
	       <element	tag="0028,3002"	vr="xs"	vm="3" len="6"
			name="LUTDescriptor">
		 256 8
	       </element>
	       ...
	     </item>
	     ...
	   </sequence>
	   ...
	   <element tag="7fe0,0010" vr="OW" vm="1" len="262144"
		    name="PixelData" loaded="no" binary="hidden">
	   </element>
	 </data-set>
       </file-format>

       The 'file-format' and 'meta-header' tags	 are  absent  for  DICOM  data
       sets.

   XML Encoding
       Attributes  with	 very  large  value  fields  (e.g. pixel data) are not
       loaded by default. They can be identified by the	 additional  attribute
       'loaded'	 with  a  value	 of 'no' (see example above). The command line
       option --load-all forces	to load	all value fields  including  the  very
       long ones.

       Furthermore, binary data	of OB and OW attributes	are not	written	to the
       XML  output  file  by  default. These elements can be identified	by the
       additional attribute 'binary' with a  value  of	'hidden'  (default  is
       'no').  The  command line option	--write-binary-data causes also	binary
       value fields to be printed (attribute value is 'yes' or 'base64'). But,
       be careful when using this option together with --load-all  because  of
       the  large  amounts  of pixel data that might be	printed	to the output.
       Please note that	in this	context	element	values with a VR of OD,	OF, OL
       and OV are not regarded as 'binary data'.

       Multiple	values (i.e. where the DICOM  value  multiplicity  is  greater
       than  1)	 are  separated	 by a backslash	'\' (except for	Base64 encoded
       data). The 'len'	attribute  indicates  the  number  of  bytes  for  the
       particular  value  field	as stored in the DICOM data set, i.e. it might
       deviate from  the  XML  encoded	value  length  e.g.  because  of  non-
       significant padding that	has been removed. If this attribute is missing
       in 'sequence' or	'item' start tags, the corresponding DICOM element has
       been stored with	undefined length.

Native DICOM Model Format
       The  description	 of  the Native	DICOM Model format can be found	in the
       DICOM standard, part 19 ('Application Hosting').

   Bulk	Data
       Binary data, i.e. DICOM element values with Value Representations  (VR)
       of OB or	OW, as well as OD, OF, OL, OV and UN values are	by default not
       written	to  the	 XML  output  because of their size. Instead, for each
       element,	a new Universally Unique Identifier (UUID) is being  generated
       and  written as an attribute of a <BulkData> XML	element. So far, there
       is no possibility to write an additional	file to	hold the  binary  data
       for  each  of  the  binary  data	 chunks.  This	is not required	by the
       standard, however, it might be useful for implementing  an  Application
       Hosting	interface;  thus  this	feature	 may  be  available  in	future
       versions	of dcm2xml.

       In addition, Supplement 163 (Store Over	the  Web  by  Representational
       State  Transfer	Services)  introduces a	new <InlineBinary> XML element
       that allows for encoding	binary data as Base64. Currently, the  command
       line  option  --encode-base64  enables  this encoding for the following
       VRs: OB,	OD, OF,	OL, OV,	OW and UN.

   Known Issues
       In addition to what is written in the above  section  on	 'Bulk	Data',
       there  are  further known issues	with the current implementation	of the
       Native DICOM Model format. For example, large element values with a  VR
       other  than OB, OD, OF, OL, OV, OW or UN	are currently never written as
       bulk data, although it  might  be  useful,  e.g.	 for  very  long  text
       elements	(especially UT)	or very	long numeric fields (of	various	VRs).

NOTES
   Character Encoding
       The  XML	 character encoding is determined automatically	from the DICOM
       attribute (0008,0005) 'Specific	Character  Set'	 using	the  following
       mapping:

       ASCII	     (ISO_IR 6)	   =>  "UTF-8"
       UTF-8	     "ISO_IR 192"  =>  "UTF-8"
       ISO Latin 1   "ISO_IR 100"  =>  "ISO-8859-1"
       ISO Latin 2   "ISO_IR 101"  =>  "ISO-8859-2"
       ISO Latin 3   "ISO_IR 109"  =>  "ISO-8859-3"
       ISO Latin 4   "ISO_IR 110"  =>  "ISO-8859-4"
       ISO Latin 5   "ISO_IR 148"  =>  "ISO-8859-9"
       ISO Latin 9   "ISO_IR 203"  =>  "ISO-8859-15"
       Cyrillic	     "ISO_IR 144"  =>  "ISO-8859-5"
       Arabic	     "ISO_IR 127"  =>  "ISO-8859-6"
       Greek	     "ISO_IR 126"  =>  "ISO-8859-7"
       Hebrew	     "ISO_IR 138"  =>  "ISO-8859-8"

       If  this	DICOM attribute	is missing in the input	file, although needed,
       option --charset-assume can be used to specify an appropriate character
       set manually (using one of the DICOM defined  terms).  For  reasons  of
       backward	 compatibility	with  previous	versions  of  this  tool,  the
       following terms are also	supported  and	mapped	automatically  to  the
       associated  DICOM  defined  terms:  latin-1, latin-2, latin-3, latin-4,
       latin-5,	latin-9, cyrillic, arabic, greek, hebrew.

       Multiple	 character  sets  using	 code  extension  techniques  are  not
       supported.  If  needed, option --convert-to-utf8	can be used to convert
       the DICOM file or data set to UTF-8 encoding prior to the conversion to
       XML format. This	is also	useful for DICOMDIR files where	each directory
       record can have a different character set.

       If no mapping is	defined	and option --convert-to-utf8 is	not used, non-
       ASCII characters	and those below	#32 are	stored as '&#nnn;' where 'nnn'
       refers to the numeric  character	 code.	This  might  lead  to  invalid
       character  entity  references  (such as '&#27;' for ESC)	and will cause
       most XML	parsers	to reject the document.

LOGGING
       The level of logging output of  the  various  command  line  tools  and
       underlying  libraries  can  be  specified by the	user. By default, only
       errors and warnings are written to the  standard	 error	stream.	 Using
       option  --verbose  also	informational messages like processing details
       are reported. Option --debug can	be used	to get	more  details  on  the
       internal	 activity,  e.g.  for debugging	purposes. Other	logging	levels
       can be selected using option --log-level. In --quiet  mode  only	 fatal
       errors  are reported. In	such very severe error events, the application
       will usually terminate. For  more  details  on  the  different  logging
       levels, see documentation of module 'oflog'.

       In  case	 the logging output should be written to file (optionally with
       logfile rotation), to syslog (Unix) or the event	log  (Windows)	option
       --log-config  can  be  used.  This  configuration  file also allows for
       directing only certain messages to a particular output stream  and  for
       filtering  certain  messages  based  on the module or application where
       they are	generated.  An	example	 configuration	file  is  provided  in
       <etcdir>/logger.cfg.

COMMAND	LINE
       All  command  line  tools  use  the  following notation for parameters:
       square brackets enclose optional	 values	 (0-1),	 three	trailing  dots
       indicate	 that multiple values are allowed (1-n), a combination of both
       means 0 to n values.

       Command line options are	distinguished from parameters by a leading '+'
       or '-' sign, respectively. Usually, order and position of command  line
       options	are  arbitrary	(i.e.  they  can appear	anywhere). However, if
       options are mutually exclusive the rightmost appearance is  used.  This
       behavior	 conforms  to  the  standard  evaluation  rules	of common Unix
       shells.

       In addition, one	or more	command	files can be specified	using  an  '@'
       sign  as	 a  prefix to the filename (e.g. @command.txt).	Such a command
       argument	is replaced by the content  of	the  corresponding  text  file
       (multiple  whitespaces  are  treated  as	a single separator unless they
       appear between two quotation marks) prior to  any  further  evaluation.
       Please  note  that  a command file cannot contain another command file.
       This simple but effective  approach  allows  one	 to  summarize	common
       combinations  of	 options/parameters  and  avoids longish and confusing
       command lines (an example is provided in	file <datadir>/dumppat.txt).

ENVIRONMENT
       The dcm2xml utility  will  attempt  to  load  DICOM  data  dictionaries
       specified  in the DCMDICTPATH environment variable. By default, i.e. if
       the  DCMDICTPATH	 environment   variable	  is   not   set,   the	  file
       <datadir>/dicom.dic  will be loaded unless the dictionary is built into
       the application (default	for Windows).

       The  default  behavior  should  be  preferred   and   the   DCMDICTPATH
       environment  variable  only used	when alternative data dictionaries are
       required. The DCMDICTPATH environment variable has the same  format  as
       the  Unix  shell	PATH variable in that a	colon (':') separates entries.
       On Windows systems, a semicolon (';') is	used as	a separator. The  data
       dictionary  code	 will  attempt	to  load  each	file  specified	in the
       DCMDICTPATH environment variable. It is an error	if no data  dictionary
       can be loaded.

       Depending  on  the  command line	options	specified, the dcm2xml utility
       will attempt to load character set mapping tables.  This	 happens  when
       DCMTK  was compiled with	the oficonv library (which is the default) and
       the mapping tables are not built	into the library (default  when	 DCMTK
       uses shared libraries).

       The  mapping  table  files  are	expected  in  DCMTK's  <datadir>.  The
       DCMICONVPATH environment	variable can be	used to	 specify  a  different
       location.  If  a	 different location is specified, those	mapping	tables
       also replace any	built-in tables.

FILES
       <datadir>/dcm2xml.dtd - Document	Type Definition	(DTD) file

SEE ALSO
       xml2dcm(1), dcmconv(1)

COPYRIGHT
       Copyright (C) 2002-2024 by OFFIS	e.V., Escherweg	 2,  26121  Oldenburg,
       Germany.

Version	3.6.9			Wed Dec	11 2024			    dcm2xml(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=dcm2xml&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help