Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
vcflib(1)			vcflib (index)			     vcflib(1)

NAME
       vcflib index

DESCRIPTION
       vcflib  contains	 tools and libraries for dealing with the Variant Call
       Format (VCF) which is a flat-file, tab-delimited	textual	format intend-
       ed to describe reference-indexed	variations between individuals.

       VCF provides a common interchange format	for the	description of	varia-
       tion  in	individuals and	populations of samples,	and has	become the de-
       facto standard reporting	format for a wide array	of genomic variant de-
       tectors.

       vcflib provides methods to manipulate and interpret sequence  variation
       as it can be described by VCF.  It is both:

        an  API  for parsing and operating on records of genomic variation as
	 it can	be described by	the VCF	format,

        and a collection of command-line utilities for	executing complex  ma-
	 nipulations on	VCF files.

       The API itself provides a quick and extremely permissive	method to read
       and  write  VCF files.  Extensions and applications of the library pro-
       vided in	the included utilities (*.cpp) comprise	the vast bulk  of  the
       library's utility for most users.

   filter
       filter command				  description
       --------------------------------------------------------------------------
       vcffilter				  VCF  filter  the specified vcf
						  file using the set of	filters
       vcfuniq					  List unique genotypes.   Simi-
						  lar  to GNU uniq, but	aimed at
						  VCF records.	vcfuniq	 removes
						  records  which  have	the same
						  position, ref, and alt as  the
						  previous  record  on	a sorted
						  VCF file.  Note that	it  does
						  not  adjust/combine  genotypes
						  in  the  output,  but	  simply
						  takes	 the  first record.  See
						  also vcfcreatemulti  for  com-
						  bining records.
       vcfuniqalleles				  List	unique	alleles	For each
						  record, remove  any  duplicate
						  alternate   alleles  that  may
						  have	resulted  from	 merging
						  separate VCF files.

   metrics
       metrics command				  description
       --------------------------------------------------------------------------
       vcfcheck					  Validate integrity and identi-
						  ty  of  the  VCF  by verifying
						  that	the  VCF  record's   REF
						  matches   a	given  reference
						  file.
       vcfdistance				  Adds a  tag  to  each	 variant
						  record   which  indicates  the
						  distance to the nearest  vari-
						  ant.	 (defaults  to	BasesTo-
						  ClosestVariant  if  no  custom
						  tag name is given.
       vcfentropy				  Annotate  VCF	records	with the
						  Shannon  entropy  of	flanking
						  sequence.  Anotates the output
						  VCF	file   with,   for  each
						  record, EntropyLeft,	Entropy-
						  Right,   EntropyCenter,  which
						  are the entropies of	the  se-
						  quence  of  the  given  window
						  size to the left,  right,  and
						  center  of  the  record.  Also
						  adds EntropyRef and EntropyAlt
						  for each alt.
       vcfhetcount				  Calculate  the  heterozygosity
						  rate:	 count the number of al-
						  ternate alleles  in  heterozy-
						  gous	genotypes in all records
						  in the vcf file
       vcfhethomratio				  Generates  the  het/hom  ratio
						  for  each  individual	 in  the
						  file

   phenotype
       phenotype command			  description
       --------------------------------------------------------------------------
       permuteGPAT++				  permuteGPAT++	is a method  for
						  adding empirical p-values to a
						  GPAT++ score.

   genotype
       genotype	command				  description
       --------------------------------------------------------------------------
       abba-baba				  abba-baba  calculates	the tree
						  pattern for  four  indviduals.
						  This tool assumes reference is
						  ancestral  and ignores non ab-
						  ba-baba sites.  The output  is
						  a  boolian value: 1 =	true , 0
						  = false  for	abba  and  baba.
						  the  tree  argument  should be
						  specified from the most  basal
						  taxa to the most derived.
       hapLrt					  HapLRT  is  a	likelihood ratio
						  test	for  haplotype	lengths.
						  The  lengths	are modeled with
						  an  exponential  distribution.
						  The sign denotes if the target
						  has  longer  haplotypes (1) or
						  the background (-1).
       normalize-iHS				  normalizes   iHS   or	  XP-EHH
						  scores.

   transformation
       transformation command			  description
       --------------------------------------------------------------------------
       dumpContigsFromHeader			  Dump contigs from header
       smoother					  smoothes  is a method	for win-
						  dow  smoothing  many	of   the
						  GPAT++ formats.
       vcf2dag					  Modify VCF to	be able	to build
						  a directed acyclic graph (DAG)
       vcf2fasta				  Generates  sample_seq:N.fa for
						  each	sample,	 reference   se-
						  quence, and chromosomal copy N
						  in [0,1...  ploidy].	Each se-
						  quence  in  the  fasta file is
						  named	using the  same	 pattern
						  used for the file name, allow-
						  ing them to be combined.
       vcf2tsv					  Converts VCF to per-allelle or
						  per-genotype	   tab-delimited
						  format, using	null  string  to
						  replace  empty  values  in the
						  table.   Specifying  -g   will
						  output  one  line  per  sample
						  with	 genotype   information.
						  When	there  is  more	than one
						  alt allele there will	be  mul-
						  tiple	 rows,	one for	each al-
						  lele and, the	info will  match
						  the `A' index
       vcfaddinfo				  Adds info fields from	the sec-
						  ond file which are not present
						  in the first vcf file.
       vcfafpath				  Display genotype paths
       vcfallelicprimitives			  WARNING:  this tool is consid-
						  ered legacy and  is  only  re-
						  tained  for  older  workflows.
						  It will emit a warning!   Even
						  though  it can use the WFA you
						  should use vcfwave instead.
       vcfannotate				  Intersect the	records	 in  the
						  VCF file with	targets	provided
						  in  a	BED file.  Intersections
						  are done on the reference  se-
						  quences  in  the VCF file.  If
						  no VCF filename  is  specified
						  on  the command line (last ar-
						  gument)  the	VCF  read   from
						  stdin.
       vcfannotategenotypes			  Examine   genotype  correspon-
						  dence.  Annotate genotypes  in
						  the  first file with genotypes
						  in the second	adding the geno-
						  type as another flag	to  each
						  sample   filed  in  the  first
						  file.	 annotation-tag	 is  the
						  name	of the sample flag which
						  is added to store the	 annota-
						  tion.	  also adds a `has_vari-
						  ant' flag for	sites where  the
						  second file has a variant.
       vcfbreakmulti				  If multiple alleles are speci-
						  fied in a single record, break
						  the	record	 into	multiple
						  lines, preserving  allele-spe-
						  cific	INFO fields.
       vcfcat					  Concatenates VCF files
       vcfclassify				  Creates  a  new VCF where each
						  variant is  tagged  by  allele
						  class: snp, ts/tv, indel, mnp
       vcfcleancomplex				  Removes reference-matching se-
						  quence  from	complex	 alleles
						  and adjusts records to reflect
						  positional change.
       vcfcombine				  Combine VCF files  positional-
						  ly,	combining  samples  when
						  sites	and alleles are	 identi-
						  cal.	 Any number of VCF files
						  may  be  combined.   The  INFO
						  field	 and  other  columns are
						  taken	from one  of  the  files
						  which	   are	 combined   when
						  records  in	multiple   files
						  match.    Alleles   must  have
						  identical ordering to	be  com-
						  bined	 into  one  record.   If
						  they do not, multiple	 records
						  will be emitted.
       vcfcommonsamples				  Generates  each  record in the
						  first	file,  removing	 samples
						  not present in the second
       vcfcreatemulti				  Go through sorted VCF	and when
						  overlapping alleles are repre-
						  sented     across	multiple
						  records,  merge  them	 into  a
						  single  multi-ALT record.  See
						  the documentation for	more in-
						  formation.
       vcfecho					  Echo VCF to stdout (simple de-
						  mo)
       vcfevenregions				  Generates a list  of	regions,
						  e.g. chr20:10..30   using  the
						  variant  density   information
						  provided  in	the  VCF file to
						  ensure that the  regions  have
						  even	 numbers   of  variants.
						  This can be use to reduce  the
						  variance  in	runtime	when di-
						  viding  variant  detection  or
						  genotyping  by genomic coordi-
						  nates.
       vcffixup					  Generates a VCF  stream  where
						  AC  and NS have been generated
						  for each record  using  sample
						  genotypes
       vcfflatten				  Removes multi-allelic	sites by
						  picking the most common alter-
						  nate.	  Requires  allele  fre-
						  quency specification `AF'  and
						  use  of `G' and `A' to specify
						  the fields which vary	 accord-
						  ing to the Allele or Genotype.
						  VCF  file  may be specified on
						  the command line or  piped  as
						  stdin.
       vcfgeno2alleles				  modifies  the	 genotypes field
						  to provide the literal alleles
						  rather than indexes
       vcfgeno2haplo				  Convert genotype-based  phased
						  alleles   within  -window-size
						  into haplotype alleles.   Will
						  break	 haplotype  construction
						  when	encountering  non-phased
						  genotypes on input.
       vcfgenosamplenames			  Get samplenames
       vcfglbound				  Adjust GLs so	that the maximum
						  GL  is  0  by	dividing all GLs
						  for each sample by the max.
       vcfglxgt					  Set genotypes	using the  maxi-
						  mum  genotype	 likelihood  for
						  each sample.
       vcfindex					  Adds an index	 number	 to  the
						  INFO field (id=position)
       vcfinfo2qual				  Sets	QUAL from info field tag
						  keyed	by [key].  The VCF  file
						  may  be  omitted and read from
						  stdin.   The	average	 of  the
						  field	 is  used if it	contains
						  multiple values.
       vcfinfosummarize				  Take annotations given in  the
						  per-sample  fields and add the
						  mean,	median,	min, or	 max  to
						  the site-level INFO.
       vcfintersect				  VCF 1.0.12 set analysis
       vcfkeepgeno				  Reduce  file	size by	removing
						  FORMAT fields	 not  listed  on
						  the  command	line from sample
						  specifications in the	output
       vcfkeepinfo				  To decrease file  size  remove
						  INFO	fields not listed on the
						  command line
       vcfkeepsamples				  outputs each record in the vcf
						  file,	 removing  samples   not
						  listed on the	command	line
       vcfld					  Compute LD
       vcfleftalign				  Left-align  indels and complex
						  variants in the input	using  a
						  pairwise   ref/alt   alignment
						  followed by a	heuristic, iter-
						  ative	left realignment process
						  that shifts indel  representa-
						  tions	 to their absolute left-
						  most (5') extent.
       vcflength				  Add length info field
       vcfnullgenofields			  Makes	 the  FORMAT  for   each
						  variant  line	 the  same (uses
						  all  the  FORMAT  fields   de-
						  scribed in the header).  Fills
						  out per-sample fields	to match
						  FORMAT.   Expands GT values of
						  `.'  with  number  of	 alleles
						  based	on ploidy (eg: `./.' for
						  dipolid).
       vcfnumalt				  outputs a VCF	stream where NU-
						  MALT	has  been  generated for
						  each record using sample geno-
						  types
       vcfoverlay				  Overlay records in  the  input
						  vcf files with order as prece-
						  dence.
       vcfprimers				  For  each  VCF record, extract
						  the  flanking	 sequences,  and
						  write	 them to stdout	as FASTA
						  records  suitable  for  align-
						  ment.
       vcfqual2info				  Puts	QUAL  into an info field
						  tag keyed by [key].
       vcfremap					  For each alternate allele, at-
						  tempt	to realign  against  the
						  reference   with  lowered  gap
						  open penalty.	 If  realignment
						  is  possible,	adjust the cigar
						  and reference/alternate  alle-
						  les.	 Observe  how  different
						  alignment parameters,	 includ-
						  ing context and entropy-depen-
						  dent	ones,  influence variant
						  classification and interpreta-
						  tion.
       vcfremoveaberrantgenotypes		  strips samples which	are  ho-
						  mozygous but have observations
						  implying  heterozygosity.  Re-
						  move samples for which the re-
						  ported genotype (GT)	and  ob-
						  servation counts disagree (AO,
						  RO).
       vcfremovesamples				  outputs each record in the vcf
						  file,	 removing samples listed
						  on the command line
       vcfsample2info				  Take annotations given in  the
						  per-sample  fields and add the
						  mean,	median,	min, or	 max  to
						  the site-level INFO.
       vcfsamplediff				  Establish   putative	 somatic
						  variants using  reported  dif-
						  ferences  between germline and
						  somatic  samples.   Tags  each
						  record where the listed sample
						  genotypes  differ  with .  The
						  first	sample is assumed to  be
						  germline,  the second	somatic.
						  Each	record	is  tagged  with
						  ={germline,somatic,loh}     to
						  specify the  type  of	 variant
						  given	 the genotype difference
						  between the two samples.
       vcfsamplenames				  List sample names
       vcfstreamsort				  Sorts	the input (either  stdin
						  or  file)  using  a  streaming
						  sort	algorithm.    Guarantees
						  that	the  positional	order is
						  correct provided  out-of-order
						  variants  are	no more	than 100
						  positions  in	 the  VCF   file
						  apart.
       vcfwave					  Realign  reference  and alter-
						  nate alleles with WFA, parsing
						  out  the  `primitive'	 alleles
						  into	 multiple  VCF	records.
						  New records have IDs that ref-
						  erence the source  record  ID.
						  Genotypes/samples  are handled
						  correctly.  Deletions	generate
						  haploid/missing  genotypes  at
						  overlapping sites.

   statistics
       statistics command			  description
       --------------------------------------------------------------------------
       bFst					  bFst is a Bayesian approach to
						  Fst.	  Importantly  bFst  ac-
						  counts for genotype uncertain-
						  ty in	the model using	genotype
						  likelihoods.	For a  more  de-
						  tailed   description	see:  `A
						  Bayesian approach to inferring
						  population structure from dom-
						  inant	markers' by Holsinger et
						  al. Molecular	Ecology	Vol  11,
						  issue	 7 2002.  The likelihood
						  function has been modified  to
						  use  genotype	likelihoods pro-
						  vided	 by   variant	callers.
						  There	are five free parameters
						  estimated  in	 the model: each
						  subpopulation's  allele   fre-
						  quency  and  Fis (fixation in-
						  dex,	within	each  subpopula-
						  tion),  a  free  parameter for
						  the total population's  allele
						  frequency, and Fst.
       genotypeSummary				  Generates  a table of	genotype
						  counts.   Summarizes	genotype
						  counts for bi-allelic	SNVs and
						  indel
       iHS					  iHS  calculates the integrated
						  haplotype score which	measures
						  the relative decay of	extended
						  haplotype  homozygosity  (EHH)
						  for the reference and	alterna-
						  tive	alleles	 at a site (see:
						  voight et al.	2006,  Spiech  &
						  Hernandez 2014).
       meltEHH
       pFst					  pFst	is  a  probabilistic ap-
						  proach for  detecting	 differ-
						  ences	 in  allele  frequencies
						  between two populations.
       pVst					  pVst calculates vst, a measure
						  of CNV stratification.
       permuteSmooth				  permuteSmooth	is a method  for
						  adding    empirical	p-values
						  smoothed wcFst scores.
       plotHaps					  plotHaps provides the	 format-
						  ted  output  that  can be used
						  with `bin/plotHaplotypes.R'.
       popStats					  General   population	 genetic
						  statistics for each SNP
       segmentFst				  segmentFst   creates	 genomic
						  segments (bed	 file)	for  re-
						  gions	with high wcFst
       segmentIhs				  Creates  genomic segments (bed
						  file)	for  regions  with  high
						  wcFst
       sequenceDiversity			  The  sequenceDiversity program
						  calculates two popular metrics
						  of haplotype diversity: pi and
						  extended  haplotype  homozygo-
						  isty	(eHH).	Pi is calculated
						  using	the Nei	and Li 1979 for-
						  mulation.   eHH  a  convenient
						  way  to  think about haplotype
						  diversity.  When eHH =  0  all
						  haplotypes  in  the window are
						  unique and when eHH  =  1  all
						  haplotypes  in  the window are
						  identical.
       vcfaltcount				  count	the number of  alternate
						  alleles  in all records in the
						  vcf file
       vcfcountalleles				  Count	alleles
       vcfgenosummarize				  Adds	summary	 statistics   to
						  each record summarizing quali-
						  ties	reported in called geno-
						  types.   Uses:  RO  (reference
						  observation count), QR (quali-
						  ty sum reference observations)
						  AO	(alternate   observation
						  count), QA (quality sum alter-
						  nate observations)
       vcfgenotypecompare			  adds statistics  to  the  INFO
						  field	of the vcf file	describ-
						  ing  the amount of discrepancy
						  between the genotypes	(GT)  in
						  the vcf file and the genotypes
						  reported  in	the  .	use this
						  after	vcfannotategenotypes  to
						  get  correspondence statistics
						  for two vcfs.
       vcfgenotypes				  Report the genotypes for  each
						  sample,  for	each  variant in
						  the VCF.  Convert the	 numeri-
						  cal represenation of genotypes
						  provided  by the GT field to a
						  human-readable  genotype  for-
						  mat.
       vcfparsealts				  Alternate    allele	 parsing
						  method.   This   method   uses
						  pairwise  alignment of REF and
						  ALTs	to  determine  component
						  allelic  primitives  for  each
						  alternate allele.
       vcfrandom				  Generate a random VCF	file
       vcfrandomsample				  Randomly sample sites	from  an
						  input	 VCF  file, which may be
						  provided as stdin.  Scale  the
						  sampling  probability	 by  the
						  field	specified in KEY.   This
						  may be used to provide uniform
						  sampling  across  allele  fre-
						  quencies, for	instance.
       vcfroc					  Generates a  pseudo-ROC  curve
						  using	 sensitivity  and speci-
						  ficity estimated against a pu-
						  tative truth set.   Threshold-
						  ing  is provided by successive
						  QUAL cutoffs.
       vcfsitesummarize				  Summarize by site
       vcfstats					  Prints statistics about  vari-
						  ants in the input VCF	file.
       wcFst					  wcFst	 is  Weir  & Cockerham's
						  Fst for two populations.  Neg-
						  ative	values are  VALID,  they
						  are sites which can be treated
						  as  zero Fst.	 For more infor-
						  mation see Evolution,	Vol.  38
						  N.  6	Nov 1984.   Specifically
						  wcFst	uses equations 1,2,3,4.

SOURCE CODE
       See the source code repository at https://github.com/vcflib/vcflib

CREDIT
       Citations  are  the bread and butter of Science.	 If you	are using this
       software	in your	research and want to support our future	 work,	please
       cite the	following publication:

       Please cite:

       A  spectrum  of free software tools for processing the VCF variant call
       format: vcflib, bio-vcf,	 cyvcf2,  hts-nim  and	slivar	(https://jour-
       nals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009123).
       Garrison	E, Kronenberg ZN, Dawson ET, Pedersen BS, Prins	P (2022), PLoS
       Comput	 Biol	 18(5):	   e1009123.	 https://doi.org/10.1371/jour-
       nal.pcbi.1009123

LICENSE
       Copyright 2011-2025 (C) Erik Garrison and vcflib	 contributors.	 Copy-
       right 2020-2025 (C) Pjotr Prins MIT licensed.

AUTHORS
       Erik Garrison and vcflib	contributors.

vcflib								     vcflib(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=vcflib&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help