FreeBSD Manual Pages

home | help
GMX-NONBONDED-BENCHMARK(1)	    GROMACS	    GMX-NONBONDED-BENCHMARK(1)

NAME
       gmx-nonbonded-benchmark	-  Benchmarking	 tool  for the non-bonded pair
       kernels.

SYNOPSIS
	  gmx nonbonded-benchmark [-o [<.csv>]]	[-size <int>] [-nt <int>]
		       [-simd <enum>] [-coulomb	<enum>]	[-[no]table]
		       [-combrule <enum>] [-[no]halflj]	[-[no]energy]
		       [-[no]all] [-cutoff <real>] [-iter <int>]
		       [-warmup	<int>] [-[no]cycles] [-[no]time]

DESCRIPTION
       gmx nonbonded-benchmark runs benchmarks for one or more so-called Nbnxm
       non-bonded pair kernels.	The non-bonded pair kernels are	the most  com-
       pute  intensive	part  of  MD simulations and usually comprise 60 to 90
       percent of the runtime.	For this reason	they are highly	optimized  and
       several different setups	are available to compute the same physical in-
       teractions.   In	 addition,  there are different	physical treatments of
       Coulomb interactions and	optimizations for atoms	without	 Lennard-Jones
       interactions.   There   are   also  different  physical	treatments  of
       Lennard-Jones interactions, but only a plain cut-off  is	 supported  in
       this  tool,  as that is by far the most common treatment.  And finally,
       while force output is always necessary, energy output is	only  required
       at  certain  steps.  In total there are 12 relevant combinations	of op-
       tions. The combinations double to 24 when two different SIMD setups are
       supported. These	combinations can be run	with a single invocation using
       the -all	option.	 The behavior of each kernel is	 affected  by  caching
       behavior,  which	 is  determined	by the hardware	used together with the
       system size and the cut-off radius. The larger the number of atoms  per
       thread,	the  more  L1  cache  is needed	to avoid L1 cache misses.  The
       cut-off radius mainly affects the data reuse: a larger cut-off  results
       in more data reuse and makes the	kernel less sensitive to cache misses.

       OpenMP  parallelization	is  used  to utilize multiple hardware threads
       within a	compute	node. In these benchmarks there	is no interaction  be-
       tween threads, apart from starting and closing a	single OpenMP parallel
       region  per  iteration.	Additionally, threads interact through sharing
       and evicting data from shared caches.  The number of threads to use  is
       set with	the -nt	option.	 Thread	affinity is important, especially with
       SMT and shared caches. Affinities can be	set through the	OpenMP library
       using the GOMP_CPU_AFFINITY environment variable.

       The benchmark tool times	one or more kernels by running them repeatedly
       for  a  number of iterations set	by the -iter option. An	initial	kernel
       call is done to	avoid  additional  initial  cache  misses.  Times  are
       recording  in cycles read from efficient, high accuracy counters	in the
       CPU. Note that these often do not correspond to	actual	clock  cycles.
       For  each  kernel,  the tool reports the	total number of	cycles,	cycles
       per iteration, and (total and useful) pair interactions per cycle.  Be-
       cause a cluster pair list is used instead of an atom pair list,	inter-
       actions	are  also  computed  for  some	atom pairs that	are beyond the
       cut-off distance. These pairs are not  useful  (except  for  additional
       buffering, but that is not of interest here), only a side effect	of the
       cluster-pair setup. The SIMD 2xMM kernel	has a higher useful pair ratio
       then  the  4xM  kernel due to a smaller cluster size, but a lower total
       pair throughput.	 It is best to run  this,  or  for  that  matter  any,
       benchmark  with	locked	CPU clocks, as thermal throttling can signifi-
       cantly affect performance. If that is not an option, the	-warmup	option
       can be used to run initial, untimed iterations to warm up  the  proces-
       sor.

       The most	relevant regime	is between 0.1 to 1 millisecond	per iteration.
       Thus it is useful to run	with system sizes that cover both ends of this
       regime.

       The  -simd  and -table options select different implementations to com-
       pute the	same physics. The choice of these options  should  ideally  be
       optimized  for  the target hardware.  Historically, we only found tabu-
       lated Ewald correction to be useful on 2-wide SIMD or 4-wide SIMD with-
       out FMA support.	As all modern architectures are	wider and support FMA,
       we do not use tables by default.	The only exceptions are	kernels	 with-
       out  SIMD,  which only support tables.  Options -coulomb, -combrule and
       -halflj depend on the force field and composition of the	simulated sys-
       tem.  The optimization of computing Lennard-Jones interactions for only
       half of the atoms in a cluster is useful	for water, which does not  use
       Lennard-Jones  on  hydrogen  atoms in most water	models.	 In the	MD en-
       gine, any clusters where	at most	half of	the atoms have LJ interactions
       will automatically use this kernel.  And	finally,  the  -energy	option
       selects	the computation	of energies, which are usually only needed in-
       frequently.

OPTIONS
       Options to specify output files:

       -o [<.csv>] (nonbonded-benchmark.csv) (Optional)
	      Also output results in csv format

       Other options:

       -size <int> (1)
	      The system size is 3000 atoms times this value

       -nt <int> (1)
	      The number of OpenMP threads to use

       -simd <enum> (auto)
	      SIMD type, auto runs all supported SIMD setups or	no  SIMD  when
	      SIMD is not supported: auto, no, 4xm, 2xmm

       -coulomb	<enum> (ewald)
	      The  functional  form for	the Coulomb interactions: ewald, reac-
	      tion-field

       -[no]table (no)
	      Use lookup table for Ewald correction instead of analytical

       -combrule <enum>	(geometric)
	      The LJ combination rule: geometric, lb, none

       -[no]halflj (no)
	      Use optimization for LJ on half of the atoms

       -[no]energy (no)
	      Compute energies in addition to forces

       -[no]all	(no)
	      Run all 12 combinations of options for coulomb, halflj, combrule

       -cutoff <real> (1)
	      Pair-list	and interaction	cut-off	distance

       -iter <int> (100)
	      The number of iterations for each	kernel

       -warmup <int> (0)
	      The number of iterations for initial warmup

       -[no]cycles (no)
	      Report cycles/pair instead of pairs/cycle

       -[no]time (no)
	      Report micro-seconds instead of cycles

SEE ALSO
       gmx(1)

       More    information    about    GROMACS	  is	available    at	    <-
       http://www.gromacs.org/>.

COPYRIGHT
       2025, GROMACS development team

2025.0				 Feb 10, 2025	    GMX-NONBONDED-BENCHMARK(1)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=gmx-nonbonded-benchmark&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages