Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
HPL_pdrpancrT(3)	     HPL Library Functions	      HPL_pdrpancrT(3)

       HPL_pdrpancrT - Crout recursive panel factorization.

       #include	"hpl.h"

       void  HPL_pdrpancrT(  HPL_T_panel  *  PANEL,  const int M, const	int N,
       const int ICOFF,	double * WORK );

       HPL_pdrpancrT recursively  factorizes  a	panel  of columns  using   the
       recursive   Crout   variant  of	the  usual one-dimensional  algorithm.
       The lower triangular N0-by-N0  upper block of the panel	is  stored  in
       transpose form.

       Bi-directional	exchange   is  used  to	 perform  the  swap::broadcast
       operations  at once  for	one column in the panel.  This	results	 in  a
       lower  number  of slightly larger  messages than	usual.	On P processes
       and assuming bi-directional links,  the running time of	this  function
       can be approximated by (when N is equal to N0):

	  N0 * log_2( P	) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
	  N0^2 * ( M - N0/3 ) *	gam2-3

       where  M	 is the	local number of	rows of	 the panel, lat	and bdwth  are
       the latency and bandwidth of the	network	for  double   precision	  real
       words,  and   gam2-3  is	 an estimate of	the  Level 2 and Level 3  BLAS
       rate of execution. The  recursive  algorithm  allows indeed  to	almost
       achieve	 Level	3 BLAS	performance  in	the panel factorization.  On a
       large  number of	modern machines,  this	operation is  however  latency
       bound,	meaning	  that its cost	can  be	estimated  by only the latency
       portion N0 * log_2(P) * lat.  Mono-directional links will  double  this
       communication cost.

       PANEL   (local input/output)    HPL_T_panel *
	       On  entry,   PANEL  points to the data structure	containing the
	       panel information.

       M       (local input)	       const int
	       On entry,  M specifies the local	number of rows of sub(A).

       N       (local input)	       const int
	       On entry,  N specifies the local	number of columns of sub(A).

       ICOFF   (global input)	       const int
	       On entry, ICOFF specifies the row and column offset  of	sub(A)
	       in A.

       WORK    (local workspace)       double *
	       On entry, WORK  is a workarray of size at least 2*(4+2*N0).

       HPL_dlocmax (3),	 HPL_dlocswpN (3),  HPL_dlocswpT (3), HPL_pdmxswp (3),
       HPL_pdpancrN (3), HPL_pdpancrT (3), HPL_pdpanllN	(3), HPL_pdpanllT (3),
       HPL_pdpanrlN (3),   HPL_pdpanrlT	(3),   HPL_pdrpancrN (3),  HPL_pdrpan-
       llN (3),	  HPL_pdrpanllT	(3),   HPL_pdrpanrlN (3),   HPL_pdrpanrlT (3),
       HPL_pdfact (3).

HPL 2.3			       December	2, 2018		      HPL_pdrpancrT(3)


Want to link to this manual page? Use this URL:

home | help