FreeBSD Manual Pages

home | help
FASDA-FOLD-CHANGE(1)	    General Commands Manual	  FASDA-FOLD-CHANGE(1)

NAME
       fasda  fold-change  -  Compute fold-change and P-values from normalized
       counts

SYNOPSIS
       fasda fold-change [--output file.txt] [--near-exact]\
	   normalized-counts1.tsv  normalized-counts2.tsv \
	   [normalized-counts3.tsv ...]

OPTIONS
       --output	file.txt
	      Report fold-change and P-values to file.txt instead of  the  de-
	      fault stdout.

       --near-exact
	      Force  calculation  of  near-exact P-values, rather than the de-
	      fault Mann-Whitney (Wilcoxon) for	 8  or	more  replicates.   Be
	      aware that this will take	a long time.

DESCRIPTION
       fasda  fold-change  computes  fold-change  and P-values for two or more
       conditions.  The	input is two or	more tab-separated value  (TSV)	 files
       containing  normalized counts for all conditions.  These	files are gen-
       erally produced by fasda	normalize which	is turn	takes its  input  from
       kallisto	or fasda abundance output.

       The Mann-Whitney	U-test (A.K.A. Wilcoxon	rank sum test) is used to com-
       pute  P-values  for  a minimum of 8 replicates per condition.  Exact P-
       values are computed for 2 to 4 replicates and near-exact	P-values for 5
       to 7 replicates (the enormous space of possible count  pairs  is	 down-
       sampled to keep run time	within reason).

       For each	pair of	conditions, fasda fold-change reports the mean normal-
       ized  counts  for  each condition, the standard deviation / mean	counts
       for each	condition, the	percent	 agreement  across  replicates	as  to
       whether	the  fold-change  is  up  or down, the fold-change using total
       counts for each condition, and the  P-value  for	 this  set  of	counts
       across the two conditions.

       Feature		       MNC1    MNC2  SD/C1  SD/C2  FC 1-2 log2(FC) P-val
       YPL071C_mRNA	       29.6    31.1    0.3    0.2    1.05    0.07  0.72611
       YLL050C_mRNA	      399.5   543.8    0.2    0.2    1.36    0.44  0.02857
       YMR172W_mRNA	       60.6    83.3    0.2    0.2    1.38    0.46  0.02167
       YOR185C_mRNA	       57.1    56.1    0.3    0.2    0.98   -0.03  0.92562
       YLL032C_mRNA	       33.9    21.6    0.3    0.3    0.64   -0.65  0.14138
       YBR225W_mRNA	       61.4    75.4    0.2    0.2    1.23    0.30  0.11281

       P-values	 will  generally  be  lower when fold-changes are higher, when
       mean normalized counts are higher and when standard deviation is	lower.
       We report standard deviation divided by mean normalized counts to  pro-
       vide  an	 immediate  sense of how variable the counts are across	repli-
       cates for each feature.	E.g. the actual	standard deviation for	condi-
       tion  1	in  YPL071C_mRNA  (using rounded output) would be 34.4 * 0.4 =
       13.76.

Interpreting Results
       P-values	from any differential analysis tool should never be taken  too
       seriously. There	are countless uncontrollable biological	variables that
       can  affect  the	 RNA  abundance	 in  a	cell.  There are also numerous
       sources of experimental error in	sample prep and	 sequencing  that  can
       lead  to	 inaccuracy  in	read counts.  Technical	replicates (replicates
       from the	same biological	sample)	and spike-in controls can reveal  some
       of these	technical issues, but do not address biological	variations.

       Another problem is that many biology experiments	use only 3 replicates.
       We  simply  cannot  draw	high confidence	from any statistics based on 3
       samples.

       P-value calculations typically make  the	 same  assumptions  about  all
       genes.	In reality, a 2-fold change in expression could	be hugely sig-
       nificant	for one	gene under certain conditions and completely  meaning-
       less  for  a  different gene or different conditions.  Statistical rou-
       tines have no knowledge of the biology that determines this.

       There is	huge variability on the	computational side as well.   Well-es-
       tablished  differential	analysis  tools	commonly report	very different
       sets   of   genes   as	differentially	 expressed.    Li,    et    al
       (https://doi.org/10.1186/s13059-022-02648-4)  reported  that  23.71% to
       75% of the DEGs identified by DESeq2 were missed	by edgeR.  In one data
       set tested, DESeq2 and edgeR had	only an	8% overlap in  the  DEGs  they
       identified.

       Hence,  simply  assuming	 that  P-values	 <  0.05 represent significant
       changes while others do not would be foolish.  Rather than try too hard
       to produce adjusted P-values that you can take  on  faith,  we  provide
       simple,	honest	statistics  and	leave it to you	to consider them care-
       fully.

       From our	preliminary experiments, we have found that  P-values  between
       0.05  and  0.20 are often questionable and may warrant a	closer look at
       the raw data.  A	quick look at the  individual  read  counts  for  each
       replicate  in  these cases can be enlightening.	You will usually see a
       high variance in	counts across individuals, sometimes  with  up-regula-
       tion in some individuals	and down-regulation in others.

       For  example,  consider the following gene, for which Sleuth reported a
       P-value of 0.0132, while	the exact P-value computed by FASDA is	0.116.
       The  kallisto estimated counts show that	one replicate was up-regulated
       almost 4-fold, another almost 2-fold, and the third was slightly	 down-
       regulated.  A P-value of	0.01 would not likely make anyone suspect this
       situation.   0.116,  on	the  other hand, tells us that there is	a good
       chance this is significant, but maybe we	should take a minute or	two to
       look at the read	counts and consider the	biology	behind them.  This  is
       a  tiny investment that will help us better decide whether a costly ex-
       perimental verification is warranted.

		     FASDA		       Sleuth
	       Feature	  MNC1	  MNC2	FC   1-2    SC1	   SC2	SFC    SPV
       ENSMUST00000017610  7620.5 16006.7 2.1 0.116  191.4  433.5  2.3 0.0132

       Kallisto	estimated counts:

		 R1	 R2	 R3
       MNC1    5382.16	8567.4 6986.43
       MNC2    21519.9 16196.7 6307.52

       Conversely, Sleuth produced a P-value of	0.12 for the following,	 which
       looks like a slam-dunk given the	counts.

		     FASDA		       Sleuth
	       Feature	  MNC1	  MNC2	FC   1-2    SC1	   SC2	SFC    SPV
       ENSMUST00000036928  1165.0  4064.3 3.5 0.024   70.5  300.4  4.3 0.1203

		 R1	 R2	 R3
       MNC1    1045.41 942.707 1220.02
       MNC2    2835.7  4167.81 3718.39

       The  bottom  line is, while the 0.05 rule is a good one mathematically,
       we cannot count on experimental and  computational  results  reflecting
       the biology with	that much accuracy.  Give the results of any differen-
       tial  analysis  a  generous  margin of error, and examine the data more
       closely for anything within that	margin.

FILES
       abundance.tsv: Input with normalized counts for all replicates of 1 condition
       file.txt: Output	containing fold-change and P-values

SEE ALSO
       fasda-abundance(1), fasda-normalize(1)

BUGS
       Please report bugs to the author	and send patches in unified diff  for-
       mat.  (man diff for more	information)

AUTHOR
       J. Bacon

							  FASDA-FOLD-CHANGE(1)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=fasda-fold-change&sektion=1&manpath=FreeBSD+Ports+15.0>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages