Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
OSMIUM-EXTRACT(1)					     OSMIUM-EXTRACT(1)

NAME
       osmium-extract -	create geographical extracts from an OSM file

SYNOPSIS
       osmium extract --config CONFIG-FILE [OPTIONS] OSM-FILE
       osmium extract --bbox LEFT,BOTTOM,RIGHT,TOP [OPTIONS] OSM-FILE
       osmium extract --polygon	POLYGON-FILE [OPTIONS] OSM-FILE

DESCRIPTION
       Create  geographical  extracts  from an OSM data	file or	an OSM history
       file.  The region (geographical extent) can be given as a bounding  box
       or as a (multi)polygon.

       There are three ways of calling this command:

        Specify a config file with the	--config/-c option.  It	can define any
	 number	 of  regions you want to cut out.  See the CONFIG FILE section
	 for details.

        Specify a bounding box	to cut out with	the --bbox/-b option.

        Specify a (multi)polygon to cut out with the --polygon/-p option.

       The input file is assumed to be	ordered	 in  the  usual	 order:	 nodes
       first, then ways, then relations.

       If  the	--with-history/-H  option  is used, the	command	will work cor-
       rectly for history files.  This currently works for  the	 complete_ways
       strategy	only.  The simple or smart strategies do not work with history
       files.	A  history  extract  will contain every	version	of all objects
       with at least one version in the	region.	 Generating a history  extract
       is somewhat slower than a normal	data extract.

       Osmium will make	sure that all nodes on the vertices of the boundary of
       the region will be in the extract, but nodes that happen	to be directly
       on  the	boundary,  but between those vertices, might end up in the ex-
       tract or	not.  In almost	all cases this will be good enough, but	if you
       want to make really sure	you got	everything, use	a small	buffer	around
       your region.

       By default no bounds will be set	in the header of the output file.  Use
       the --set-bounds	option if you need this.

       Note  that osmium extract will never clip any OSM objects, ie.  it will
       not remove node references outside the region from ways or unused rela-
       tion members from relations.  This means	you might get objects that are
       not reference-complete.	It has the advantage that you can  use	osmium
       merge to	merge several extracts without problems.

OPTIONS
       -b, --bbox=LONG1,LAT1,LONG2,LAT2
	      Set  the	bounding box to	cut out.  Can not be used with --poly-
	      gon/-p,  --config/-c,  or	  --directory/-d.    The   coordinates
	      LONG1,LAT1  are  from  one  arbitrary  corner,  the  coordinates
	      LONG2,LAT2 are from the opposite corner.

       -c, --config=FILE
	      Set the name of the config file.	 Can  not  be  used  with  the
	      --bbox/-b	 or  --polygon/-p  option.  If this is set, the	--out-
	      put/-o and --output-format/-f options are	ignored, because  they
	      are set in the config file.

       --clean=ATTR
	      Clean  the attribute (version, timestamp,	changeset, uid,	user),
	      from the data before writing it out again.  The  attribute  will
	      be  set  to  0 (the user will be set to the empty	string).  This
	      option can be given multiple times.   Depending  on  the	output
	      format  these  attributes	 might	show up	as 0 or	not show up at
	      all.

       -d, --directory=DIRECTORY
	      Output directory.	 Output	file names in the config file are rel-
	      ative to this directory.	Overwrites the	setting	 of  the  same
	      name  in	the  config  file.   This  option  is ignored when the
	      --bbox/-b	or --polygon/-p	options	are used, set the  output  di-
	      rectory and name with the	--output/-o option in that case.

       -H, --with-history
	      Specify  that  the  input	 file  is  a history file.  The	output
	      file(s) will also	be history file(s).

       -p, --polygon=POLYGON_FILE
	      Set the polygon to cut out based on the contents	of  the	 file.
	      The  file	has to be a GeoJSON, poly, or OSM file as described in
	      the (MULTI)POLYGON FILE FORMATS section.	It  has	 to  have  the
	      right  suffix  to	 be  detected correctly.  Can not be used with
	      --bbox/-b, --config/-c, or --directory/-d.

       -s, --strategy=STRATEGY
	      Use the given strategy to	extract	the region.  For possible val-
	      ues and details see the STRATEGIES section.   Default  is	 "com-
	      plete_ways".

       -S, --option=OPTION=VALUE
	      Set  a named option for the strategy.  If	needed you can specify
	      this option multiple times to set	several	options.

       --set-bounds
	      Set the bounds field in the header.  The bounds are set  to  the
	      bbox or envelope of the polygon specified	for the	extract.  Note
	      that  strategies other than "simple" can put nodes outside those
	      bounds into the output file.

COMMON OPTIONS
       -h, --help
	      Show usage help.

       -v, --verbose
	      Set verbose mode.	 The program  will  output  information	 about
	      what it is doing to STDERR.

INPUT OPTIONS
       -F, --input-format=FORMAT
	      The  format  of the input	file(s).  Can be used to set the input
	      format if	it can't be autodetected from the file name(s).	  This
	      will  set	the format for all input files,	there is no way	to set
	      the format for some  input  files	 only.	 See  osmium-file-for-
	      mats(5) or the libosmium manual for details.

OUTPUT OPTIONS
       -f, --output-format=FORMAT
	      The  format  of  the output file.	 Can be	used to	set the	output
	      file format if it	can't be autodetected  from  the  output  file
	      name.   See  osmium-file-formats(5)  or the libosmium manual for
	      details.

       --fsync
	      Call fsync after writing	the  output  file  to  force  flushing
	      buffers to disk.

       --generator=NAME
	      The  name	and version of the program generating the output file.
	      It will be added to the header of	the output file.   Default  is
	      "osmium/"	and the	version	of osmium.

       -o, --output=FILE
	      Name of the output file.	Default	is `-' (STDOUT).

       -O, --overwrite
	      Allow  an	 existing output file to be overwritten.  Normally os-
	      mium will	refuse to write	over an	existing file.

       --output-header=OPTION=VALUE
	      Add output header	option.	 This command line option can be  used
	      multiple	times  for  different  OPTIONs.	  See  the osmium-out-
	      put-headers(5) man page for a list of available header  options.
	      For  some	commands you can use the special format	"OPTION!" (ie.
	      an exclamation mark after	the OPTION and no value	 set)  to  set
	      the value	to the same as in the input file.

CONFIG FILE
       The  config file	mainly specifies the file names	and the	regions	of the
       extracts	that should be created.

       The config file is in JSON format.  The top-level is  an	 object	 which
       contains	 at  least an "extracts" array.	 It can	also contain a "direc-
       tory" entry which names the directory where all the output  files  will
       be created:

	      {
		  "extracts": [...],
		  "directory": "/tmp/"
	      }

       The extracts array specifies the	extracts that should be	created.  Each
       item in the array is an object with at least a name "output" naming the
       output  file and	a region defined in a "bbox", "polygon"	or "multipoly-
       gon" name.  An optional "description" can be added, it will not be used
       by the program but can help with	documenting the	 file  contents.   You
       can  add	 an optional "output_format" if	the format can not be detected
       from the	"output" file name.  Run "osmium help file-formats" to	get  a
       description of allowed formats.

       The  optional  "output_header"  allows  you  to set additional OSM file
       header settings such as the "generator".	 If you	set  the  value	 of  a
       file  header setting to null, the output	header will be set to the same
       header from the input file.

	      "extracts": [
		  {
		      "output":	"hamburg.osm.pbf",
		      "output_format": "pbf",
		      "description": "optional description",
		      "bbox": ...
		  },
		  {
		      "output":	"berlin.osm.pbf",
		      "description": "optional description",
		      "polygon": ...
		  },
		  {
		      "output":	"munich.osm.pbf",
		      "output_header": {
			  "generator": "MyExtractor/1.0",
			  "osmosis_replication_timestamp": null
		      },
		      "description": "optional description",
		      "multipolygon": ...
		  }
	      ]

       There are several formats for specifying	the regions:

       bbox:

       A bounding box in one of	two formats.  The first	is a simple array with
       four real numbers, the first two	specifying the coordinates of an arbi-
       trary corner, the second	two specifying the coordinates of the opposite
       corner.

	      {
		  "output": "munich.osm.pbf",
		  "description": "Bounding box specified in array format",
		  "bbox": [11.35, 48.05, 11.73,	48.25]
	      }

       The second format uses an object	instead	of an array:

	      {
		  "output": "dresden.osm.pbf",
		  "description": "Bounding box specified in object format",
		  "bbox": {
		      "left": 13.57,
		      "right": 13.97,
		      "top": 51.18,
		      "bottom":	50.97
		  }
	      }

       polygon:

       A polygon, either specified inline in the config	file or	read  from  an
       external	 file.	See the	(MULTI)POLYGON FILE FORMATS section for	exter-
       nal files.  If specified	inline this is a nested	array, the outer array
       defining	the polygon, the next array the	rings and the innermost	arrays
       the coordinates.	 This format is	the same as in GeoJSON files.

       In this example there is	only one outer ring:

	      "polygon": [[
		  [9.613465, 53.58071],
		  [9.647599, 53.59655],
		  [9.649288, 53.61059],
		  [9.613465, 53.58071]
	      ]]

       In each ring, the last set of coordinates should	be  the	 same  as  the
       first set, closing the ring.

       multipolygon:

       A multipolygon, either specified	inline in the config file or read from
       an  external file.  See the (MULTI)POLYGON FILE FORMATS section for ex-
       ternal files.  If specified inline this is a nested  array,  the	 outer
       array  defining the multipolygon, the next array	the polygons, the next
       the rings and the innermost arrays the coordinates.  This format	is the
       same as in GeoJSON files.

       In this example there is	one outer and one inner	ring:

	      "multipolygon": [[[
		  [6.847, 50.987],
		  [6.910, 51.007],
		  [7.037, 50.953],
		  [6.967, 50.880],
		  [6.842, 50.925],
		  [6.847, 50.987]
	      ],[
		  [6.967, 50.954],
		  [6.969, 50.920],
		  [6.932, 50.928],
		  [6.934, 50.950],
		  [6.967, 50.954]
	      ]]]

       In each ring, the last set of coordinates should	be  the	 same  as  the
       first set, closing the ring.

       Osmium must check each and every	node in	the input data and find	out in
       which  bounding	boxes  or  (multi)polygons this	node is.  This is very
       cheap for bounding boxes, but more expensive for	(multi)polygons.   And
       it becomes more expensive the more vertices the (multi)polyon has.  Use
       bounding	boxes or simplified polygons where possible.

       Note that bounding boxes	or (multi)polygons are not allowed to span the
       -180/180	 degree	 line.	 If you	need this, cut out the regions on each
       side and	use osmium merge to join the resulting files.

(MULTI)POLYGON FILE FORMATS
       External	files describing a (multi)polygon are specified	in the	config
       file  using the "file_name" and "file_type" properties on the "polygon"
       or "multipolygon" object:

	      "polygon": {
		  "file_name": "berlin.geojson",
		  "file_type": "geojson"
	      }

       If file names don't start with a	slash (/), they	are interpreted	 rela-
       tive  to	the directory where the	config file is.	 If the	"file_type" is
       missing,	Osmium will try	to  autodetect	it  from  the  suffix  of  the
       "file_name".

       The following file types	are supported:

       geojson
	      GeoJSON  file  containing	exactly	one Feature of type Polygon or
	      MultiPolygon, or a FeatureCollection with	the first  Feature  of
	      type  Polygon  or	 MultiPolygon.	 Everything  except the	actual
	      geometry (of the first Feature) is ignored.

       poly   A	   poly	   file	   as	 described    in    https://wiki.open-
	      streetmap.org/wiki/Osmosis/Polygon_Filter_File_Format   .	  This
	      wiki page	also mentions several sources for such poly files.

       osm    An OSM file containing one or more multipolygon or boundary  re-
	      lation  together	with  all  the nodes and ways needed.  Any OSM
	      file format (XML,	PBF, ...)  supported by	 Osmium	 can  be  used
	      here, but	the correct suffix must	be used, so the	file format is
	      detected	correctly.   Files  for	this can easily	be obtained by
	      searching	for the	area on	OSM and	then downloading the full  re-
	      lation	  using	     a	    URL	    like     https://www.open-
	      streetmap.org/api/0.6/relation/RELATION-ID/full .	  Or  you  can
	      use osmium getid -r to get a specific relation from an OSM file.
	      Note that	both these approaches can get you very detailed	bound-
	      aries which can take quite a while to cut	out.  Consider simpli-
	      fying the	boundary before	use.

       If  there  are several (multi)polygons in a poly	file or	OSM file, they
       will be merged.	The (multi)polygons must not  overlap,	otherwise  the
       result is undefined.

STRATEGIES
       osmium  extract can use different strategies for	creating the extracts.
       Depending on the	strategy different objects will	 end  up  in  the  ex-
       tracts.	The strategies differ in how much memory they need and how of-
       ten  they  need to read the input file.	The choice of strategy depends
       on how you want to use the generated extracts and how much  memory  and
       time you	have.

       The default strategy is complete_ways.

       Strategy	simple
	      Runs  in	a single pass.	The extract will contain all nodes in-
	      side the region and all ways referencing those nodes as well  as
	      all  relations  referencing  any nodes or	ways already included.
	      Ways crossing the	region boundary	 will  not  be	reference-com-
	      plete.  Relations	will not be reference-complete.	 This strategy
	      is fast, because it reads	the input only once, but the result is
	      not  enough  for	most  use cases.  It is	the only strategy that
	      will work	when reading from a socket  or	pipe.	This  strategy
	      will not work for	history	files.

       Strategy	complete_ways
	      Runs  in	two passes.  The extract will contain all nodes	inside
	      the region and all ways referencing those	nodes as well  as  all
	      nodes  referenced	 by those ways.	 The extract will also contain
	      all relations referenced by nodes	inside the region or ways  al-
	      ready  included  and,  recursively, their	parent relations.  The
	      ways are reference-complete, but the relations are not.

       Strategy	smart
	      Runs in three passes.  The extract will contain all nodes	inside
	      the region and all ways referencing those	nodes as well  as  all
	      nodes  referenced	 by those ways.	 The extract will also contain
	      all relations referenced by nodes	inside the region or ways  al-
	      ready  included  and,  recursively, their	parent relations.  The
	      extract will also	contain	all nodes and ways (and	the nodes they
	      reference) referenced by	relations  tagged  "type=multipolygon"
	      directly referencing any nodes in	the region or ways referencing
	      nodes  in	 the region.  The ways are reference-complete, and all
	      multipolygon relations referencing nodes in the regions or  ways
	      that have	nodes in the region are	reference-complete.  Other re-
	      lations are not reference-complete.

       For  the	 complete_ways	strategy  you  can  set	 the  option "-S rela-
       tions=false" in which case no relations will be written to  the	output
       file.

       The smart strategy allows the following strategy	options:

       Use "-S types=TYPE,..." to change the types of relations	that should be
       reference-complete.   Instead of	just relations tagged "type=multipoly-
       gon", you can either get	all relations (use "-S types=any") or  give  a
       list  of	 types	to the -S option: "-S types=multipolygon,route".  Note
       that especially boundary	relations can be huge, so if you include them,
       be aware	your result might be huge.

       Use "-S complete-partial-relations=X" to	 force	completion  of	partly
       completed relations.  If	this is	set, all relations that	have more than
       X  percent of their members already in the extract will have their full
       set of members in the extract.  So this allows completing  almost  com-
       plete relations.	 It can	be useful for instance to make sure a boundary
       relation	is complete even if some of it is outside the polygon used for
       extraction.

       Use  "-S	 tags=PATTERN,..."  to only complete relations that have a tag
       matching	one of the PATTERNs.  So for example if	you use	"-S  tags=lan-
       duse,natural=wood,natural=water"	 everything  tagged  landuse=* or nat-
       ural=wood or natural=water is added to the result, but no  other	 rela-
       tions.

       You  can	 combine  the "-S types", "-S complete-partial-relations", and
       "-S tags" options.  The options will be interpreted as "(types OR  com-
       plete-partial-relations)	AND tags".

DIAGNOSTICS
       osmium extract exits with exit code

       0      if everything went alright,

       1      if there was an error processing the data, or

       2      if  there	 was a problem with the	command	line arguments,	config
	      file or polygon files.

MEMORY USAGE
       Memory usage of osmium extract depends on the number of extracts	and on
       the strategy used.  For the simple strategy it will  at	least  be  the
       number  of  extracts  times the highest node ID used divided by 8.  For
       the complete_ways twice that and	for the	smart strategy a bit more.

       If you want to split a large file into many extracts, do	this  in  sev-
       eral  steps.   First create several larger extracts and then split them
       again and again into smaller pieces.

LIMITS
       You can not have	more than 500 extracts.	 Although chances are that you
       will be running out of memory long before that.	See MEMORY USAGE.

EXAMPLES
       See the example config files in the  extract-example-config  directory.
       To try it:

	      osmium extract -v	-c extract-example-config/extracts.json	\
		  germany-latest.osm.pbf

       Extract the city	of Karlsruhe using a boundary polygon:

	      osmium extract -p	karlsruhe-boundary.osm.bz2 germany-latest.osm.pbf \
		  -o karlsruhe.osm.pbf

       Extract the city	of Munich using	a bounding box:

	      osmium extract -b	11.35,48.05,11.73,48.25	germany-latest.osm.pbf \
		  -o munich.osm.pbf

SEE ALSO
        osmium(1),   osmium-file-formats(5),	osmium-output-headers(5),  os-
	 mium-getid(1),	osmium-merge(1)

        Osmium	website	<https://osmcode.org/osmium-tool/>

COPYRIGHT
       Copyright (C) 2013-2023 Jochen Topf <jochen@topf.org>.

       License	GPLv3+:	 GNU  GPL  version  3  or  later  <https://gnu.org/li-
       censes/gpl.html>.   This	 is  free software: you	are free to change and
       redistribute it.	 There is NO WARRANTY, to the extent permitted by law.

CONTACT
       If you have any questions or  want  to  report  a  bug,	please	go  to
       https://osmcode.org/contact.html

AUTHORS
       Jochen Topf <jochen@topf.org>.

				    1.16.0		     OSMIUM-EXTRACT(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=osmium-extract&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help