Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
HPL_pdpancrT(3)		     HPL Library Functions	       HPL_pdpancrT(3)

NAME
       HPL_pdpancrT - Crout panel factorization.

SYNOPSIS
       #include	"hpl.h"

       void HPL_pdpancrT( HPL_T_panel *	PANEL, const int M, const int N, const
       int ICOFF, double * WORK	);

DESCRIPTION
       HPL_pdpancrT  factorizes	  a  panel of columns that is a	sub-array of a
       larger one-dimensional panel  A using the Crout variant of  the	 usual
       one-dimensional	algorithm.   The lower triangular N0-by-N0 upper block
       of the panel is stored in transpose form.

       Bi-directional  exchange	 is  used  to  perform	 the   swap::broadcast
       operations   at	once  for one column in	the panel.  This  results in a
       lower number of slightly	larger	messages than usual.  On  P  processes
       and  assuming  bi-directional links,  the running time of this function
       can be approximated by (when N is equal to N0):

	  N0 * log_2( P	) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
	  N0^2 * ( M - N0/3 ) *	gam2-3

       where M is the local number of rows of  the panel, lat and  bdwth   are
       the  latency  and bandwidth of the network for  double  precision  real
       words, and  gam2-3  is an  estimate of the  Level 2 and Level  3	  BLAS
       rate  of	 execution. The	 recursive  algorithm  allows indeed to	almost
       achieve	Level 3	BLAS  performance  in the panel	factorization.	 On  a
       large   number  of modern machines,  this  operation is however latency
       bound,  meaning	that its cost can  be estimated	 by only  the  latency
       portion	N0  * log_2(P) * lat.  Mono-directional	links will double this
       communication cost.

       Note that  one  iteration of the	the main loop is unrolled.  The	 local
       computation  of	the absolute value max of the next column is performed
       just after its update by	the current column. This allows	to  bring  the
       current	column	only  once through  cache at each  step.  The  current
       implementation  does not	perform	 any blocking  for  this  sequence  of
       BLAS  operations,  however the design allows for	plugging in an optimal
       (machine-specific) specialized  BLAS-like kernel.  This idea  has  been
       suggested to us by Fred Gustavson, IBM T.J. Watson Research Center.

ARGUMENTS
       PANEL   (local input/output)    HPL_T_panel *
	       On  entry,   PANEL  points to the data structure	containing the
	       panel information.

       M       (local input)	       const int
	       On entry,  M specifies the local	number of rows of sub(A).

       N       (local input)	       const int
	       On entry,  N specifies the local	number of columns of sub(A).

       ICOFF   (global input)	       const int
	       On entry, ICOFF specifies the row and column offset  of	sub(A)
	       in A.

       WORK    (local workspace)       double *
	       On entry, WORK  is a workarray of size at least 2*(4+2*N0).

SEE ALSO
       HPL_dlocmax (3),	 HPL_dlocswpN (3),  HPL_dlocswpT (3), HPL_pdmxswp (3),
       HPL_pdpancrN (3), HPL_pdpanllN (3), HPL_pdpanllT	(3), HPL_pdpanrlN (3),
       HPL_pdpanrlT (3).

HPL 2.3			       December	2, 2018		       HPL_pdpancrT(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=HPL_pdpancrT&sektion=3&manpath=FreeBSD+Ports+15.0>

home | help