Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
HPL_pdpancrT(3)		     HPL Library Functions	       HPL_pdpancrT(3)

       HPL_pdpancrT - Crout panel factorization.

       #include	"hpl.h"

       void HPL_pdpancrT( HPL_T_panel *	PANEL, const int M, const int N, const
       int ICOFF, double * WORK	);

       HPL_pdpancrT factorizes	a panel	of columns that	is a  sub-array	 of  a
       larger  one-dimensional	panel  A using the Crout variant of the	 usual
       one-dimensional algorithm.  The lower triangular	N0-by-N0  upper	 block
       of the panel is stored in transpose form.

       Bi-directional	exchange   is  used  to	 perform  the  swap::broadcast
       operations  at once  for	one column in the panel.  This	results	 in  a
       lower  number  of slightly larger  messages than	usual.	On P processes
       and assuming bi-directional links,  the running time of	this  function
       can be approximated by (when N is equal to N0):

	  N0 * log_2( P	) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
	  N0^2 * ( M - N0/3 ) *	gam2-3

       where  M	 is the	local number of	rows of	 the panel, lat	and bdwth  are
       the latency and bandwidth of the	network	for  double   precision	  real
       words,  and   gam2-3  is	an  estimate of	the  Level 2 and Level 3  BLAS
       rate of execution. The  recursive  algorithm  allows indeed  to	almost
       achieve	 Level	3 BLAS	performance  in	the panel factorization.  On a
       large  number of	modern machines,  this	operation is  however  latency
       bound,	meaning	  that its cost	can  be	estimated  by only the latency
       portion N0 * log_2(P) * lat.  Mono-directional links will  double  this
       communication cost.

       Note  that   one	 iteration of the the main loop	is unrolled. The local
       computation of the absolute value max of	the next column	 is  performed
       just  after  its	update by the current column. This allows to bring the
       current column only  once through  cache	at each	 step.	 The   current
       implementation	does  not perform  any blocking	 for  this sequence of
       BLAS operations,	however	the design allows for plugging in  an  optimal
       (machine-specific)  specialized	 BLAS-like kernel.  This idea has been
       suggested to us by Fred Gustavson, IBM T.J. Watson Research Center.

       PANEL   (local input/output)    HPL_T_panel *
	       On entry,  PANEL	 points	to the data structure  containing  the
	       panel information.

       M       (local input)	       const int
	       On entry,  M specifies the local	number of rows of sub(A).

       N       (local input)	       const int
	       On entry,  N specifies the local	number of columns of sub(A).

       ICOFF   (global input)	       const int
	       On  entry,  ICOFF specifies the row and column offset of	sub(A)
	       in A.

       WORK    (local workspace)       double *
	       On entry, WORK  is a workarray of size at least 2*(4+2*N0).

       HPL_dlocmax (3),	HPL_dlocswpN (3),  HPL_dlocswpT	(3),  HPL_pdmxswp (3),
       HPL_pdpancrN (3), HPL_pdpanllN (3), HPL_pdpanllT	(3), HPL_pdpanrlN (3),
       HPL_pdpanrlT (3).

HPL 2.3			       December	2, 2018		       HPL_pdpancrT(3)


Want to link to this manual page? Use this URL:

home | help