Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
acct_gather.conf(5)	   Slurm Configuration File	   acct_gather.conf(5)

NAME
       acct_gather.conf	- Slurm	configuration file for the acct_gather plugins

DESCRIPTION
       acct_gather.conf	is a UTF8 formatted file which defines parameters used
       by  Slurm's  acct_gather	 related plugins.  The file will always	be lo-
       cated in	the same directory as the slurm.conf.

       Parameter names are case	insensitive but	parameter values are case sen-
       sitive.	Any text following a "#" in the	configuration file is  treated
       as  a  comment  through the end of that line.  The size of each line in
       the file	is limited to 1024 characters.

       Changes to the configuration file take effect upon restart of the Slurm
       daemons.

       The following acct_gather.conf parameters are defined  to  control  the
       general behavior	of various plugins in Slurm.

       The  acct_gather.conf  file  is different than other Slurm .conf	files.
       Each plugin defines which options are  available.  Each	plugin	to  be
       loaded must be specified	in the slurm.conf under	the following configu-
       ration entries:

        AcctGatherEnergyType (plugin type=acct_gather_energy)
        AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
        AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
        AcctGatherProfileType (plugin type=acct_gather_profile)

       If  the	respective plugin for an option	is not loaded then that	option
       will be unknown to Slurm, causing the daemon to	fatal  on  initializa-
       tion.   If  you	decide to change plugin	types in slurm.conf, also make
       sure to change the related options in acct_gather.conf.

acct_gather_energy/gpu
       Required	entry in slurm.conf:
	      AcctGatherEnergyType=acct_gather_energy/gpu
       This plugin doesn't read	any options from acct_gather.conf.
       Dataset provided	by the plugin is: Energy.

acct_gather_energy/IPMI
       Required	entry in slurm.conf:
	      AcctGatherEnergyType=acct_gather_energy/ipmi

       Options used for	acct_gather_energy/ipmi	are as follows:

	      EnergyIPMIFrequency=<number>
			This parameter is the number of	 seconds  between  BMC
			access samples.

	      EnergyIPMICalcAdjustment=<yes|no>
			If  set	to "yes", the consumption between the last BMC
			access sample and a step consumption update is approx-
			imated to get more accurate task consumption.  The ad-
			justment is made at the	step start and each  time  the
			consumption  is	 updated,  including the step end. The
			approximations are not accumulated, only the first and
			last adjustments are used to calculated	 the  consump-
			tion. The default is "no".

	      EnergyIPMIPowerSensors=<key=values>
			Optionally  specify  the  ids  of the sensors to used.
			Multiple <key=values> can be set with ";"  separators.
			The  key  "Node"  is mandatory and is used to know the
			consumed energy	for nodes  (scontrol  show  node)  and
			jobs  (sacct).	 Other keys are	optional and are named
			by administrator.  These keys  are  useful  only  when
			profile	 is  activated	for  energy to store power (in
			watt) of each key.  <values> are integers except  when
			using  DCMI. Multiple values can be set	with "," sepa-
			rators.	The sum	of the listed sensors is used for each
			key.   EnergyIPMIPowerSensors  is  optional,   default
			value  is  "Node=<value>" where	"<value>" is the id of
			the first power	sensor returned	by ipmi-sensors.
			i.e.
			EnergyIPMIPowerSen-
			sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
			EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
			EnergyIPMIPowerSensors=Node=1280

			Data Center Manageability Interface -  acct_gather_en-
			ergy/ipmi  supports  gathering power data through DCMI
			IPMI extension commands.  When	configured,  the  ipmi
			plugin	will  query  the  DCMI using the "System Power
			mode" or the "Enhanced System Power  Statistics	 mode"
			flags depending	on the configuration.

			To configure one or the	other, the special sensor val-
			ues DCMI or DCMI_ENHANCED can be used, for example:
			EnergyIPMIPowerSensors=Node=DCMI
			EnergyIPMIPowerSensors=Node=DCMI_ENHANCED

	      The following acct_gather.conf parameters	are defined to control
	      the IPMI config default values for libipmiconsole.

	      EnergyIPMIUsername=USERNAME
			Specify	BMC Username.

	      EnergyIPMIPassword=PASSWORD
			Specify	BMC Password.
       Datasets	provided by the	plugin have name: <IPMI_SENSOR_LABEL>Power.

   NOTES:
       This plugin requires the	freeipmi development files to be installed and
       linkable	 at  configure time. The plugin	will not build otherwise. When
       building	the RPM, rpmbuild ... --with freeipmi can be specified to  ex-
       plicitly	check for these	dependencies.

acct_gather_energy/rapl
       Required	entry in slurm.conf:
	      AcctGatherEnergyType=acct_gather_energy/rapl
       This plugin doesn't read	any options from acct_gather.conf.
       Dataset provided	by the plugin is: Power.

acct_gather_energy/XCC
       Required	entry in slurm.conf:
	      AcctGatherEnergyType=acct_gather_energy/xcc

       Options used for	acct_gather_energy/xcc include only in-band communica-
       tions with XClarity Controller, thus a reduced set of configurations is
       supported:

	      EnergyIPMIFrequency=<number>
			This  parameter	 is  the number	of seconds between XCC
			access samples.	 Default is 30 seconds.

	      EnergyIPMITimeout=<number>
			Timeout, in seconds, for  initializing	the  IPMI  XCC
			context	for a new gathering thread. Default is 10 sec-
			onds.
       Datasets	provided by the	plugin are: Energy, CurrPower.

acct_gather_filesystem/lustre
       Required	entry in slurm.conf:
	      AcctGatherFilesystemType=acct_gather_filesystem/lustre
       This plugin doesn't read	any options from acct_gather.conf.
       Datasets	provided by the	plugin are: Reads, ReadMB, Writes, WriteMB.

acct_gather_profile/HDF5
       Required	entry in slurm.conf:
	      AcctGatherProfileType=acct_gather_profile/hdf5

       Options used for	acct_gather_profile/hdf5 are as	follows:

	      ProfileHDF5Dir=<path>
		     This  parameter  is  the  path  to	the shared folder into
		     which the acct_gather_profile plugin will write  detailed
		     data (usually as an HDF5 file).  The directory is assumed
		     to	 be  on	a file system shared by	the controller and all
		     compute nodes. This is a required parameter.

	      ProfileHDF5Default
		     A comma-delimited list of data types to be	collected  for
		     each job submission.  Allowed values are:

		     All     All data types are	collected. (Cannot be combined
			     with other	values.)

		     None    No	data types are collected. This is the default.
			     (Cannot be	combined with other values.)

		     Energy  Energy data is collected.

		     Filesystem
			     File system (Lustre) data is collected.

		     Network Network (InfiniBand) data is collected.

		     Task    Task (I/O,	Memory,	...) data is collected.

acct_gather_profile/InfluxDB
       Required	entry in slurm.conf:
	      AcctGatherProfileType=acct_gather_profile/influxdb

       The  InfluxDB  plugin  provides the same	information as the HDF5	plugin
       but will	instead	send information to the	configured InfluxDB server.

       The InfluxDB plugin is designed against 1.x protocol of	InfluxDB.  Any
       site  running a v2.x InfluxDB server will need to configure a v1.x com-
       patibility endpoint along with the correct user and password authoriza-
       tion. Token authentication is not currently supported.

   Options:
       ProfileInfluxDBDatabase
	      InfluxDB v1.x database name where	profiling information is to be
	      written.	InfluxDB v2.x bucket name where	profiling  information
	      is to be written.

       ProfileInfluxDBDefault
	      A	 comma-delimited  list	of data	types to be collected for each
	      job submission.  Allowed values are:

	      All	All data types are collected. Cannot be	combined  with
			other values.

	      None	No  data  types	 are  collected.  This is the default.
			Cannot be combined with	other values.

	      Energy	Energy data is collected.

	      Filesystem
			File system (Lustre) data is collected.

	      Network	Network	(InfiniBand) data is collected.

	      Task	Task (I/O, Memory, ...)	data is	collected.

       ProfileInfluxDBHost=<hostname>:<port>
	      The hostname of the machine where	the InfluxDB instance is  exe-
	      cuted  and  the  port used by the	HTTP API. The port used	by the
	      HTTP API is the one  configured  through	the  bind-address  in-
	      fluxdb.conf option in the	[http] section.	 Example:
	      ProfileInfluxDBHost=myinfluxhost:8086

       ProfileInfluxDBPass
	      Password	for  username  configured  in ProfileInfluxDBUser. Re-
	      quired in	v2.x and optional in v1.x InfluxDB.

       ProfileInfluxDBRTPolicy
	      The InfluxDB v1.x	retention policy name for the database config-
	      ured in ProfileInfluxDBDatabase option. The InfluxDB v2.x	reten-
	      tion policy bucket name for the database configured in  Profile-
	      InfluxDBDatabase option.

       ProfileInfluxDBUser
	      InfluxDB	username  that	should	be  used to gain access	to the
	      database configured in ProfileInfluxDBDatabase. Required in v2.x
	      and optional in v1.x InfluxDB.  This is only needed if  InfluxDB
	      v1.x  is	configured  with  authentication enabled in the	[http]
	      config section and a user	has been granted at least WRITE	access
	      to the database. See also	ProfileInfluxDBPass.

       ProfileInfluxDBTimeout=<seconds>
	      The maximum time in seconds that an HTTP query to	 the  InfluxDB
	      server  can  take.  After	this timeout the data is discarded. Be
	      aware that a long	timeout	can drain your nodes if	 the  InfluxDB
	      server  is  unresponsive and, when terminating the job, the last
	      dataset takes more than UnkillableStepTimeout to be sent.	Inter-
	      nally, that option sets CURLOPT_TIMEOUT library option.  Default
	      is 10 seconds.

   NOTES:
       This  plugin requires the libcurl development files to be installed and
       linkable	at configure time. The plugin will not build otherwise.

       Information on how to install and configure InfluxDB and	 manage	 data-
       bases,  retention  policies  and	such is	available on the official web-
       page.

       Collected information is	written	from every compute node	 where	a  job
       runs  to	the InfluxDB instance listening	on the ProfileInfluxDBHost. In
       order to	avoid overloading the InfluxDB instance	with incoming  connec-
       tion  requests, the plugin uses an internal buffer which	is filled with
       samples.	Once the buffer	is full, a HTTP	API write request is performed
       and the buffer is emptied to hold subsequent samples. A	final  request
       is also performed when a	task ends even if the buffer isn't full.

       Failed  HTTP API	write requests are silently discarded. This means that
       collected profile information in	the plugin buffer is lost if it	 can't
       be written to the InfluxDB database for any reason.

       Plugin messages are logged along	with the slurmstepd logs to SlurmdLog-
       File.  In  order	 to troubleshoot any issues, it	is recommended to tem-
       porarily	increase the slurmd debug level	to debug3 and add  Profile  to
       the  debug  flags.  This	 can be	accomplished by	setting	the slurm.conf
       SlurmdDebug and DebugFlags respectively or dynamically through scontrol
       setdebug	and setdebugflags.

       Grafana can be used to create charts based on  the  data	 held  by  In-
       fluxDB.	This kind of tool permits one to create	dashboards, tables and
       other graphics using the	stored time series.

acct_gather_interconnect/OFED
       Required	entry in slurm.conf:
	      AcctGatherInterconnectType=acct_gather_interconnect/ofed

       Options used for	acct_gather_interconnect/ofed are as follows:

	      InfinibandOFEDPort=<number>
			This parameter represents the port number of the local
			Infiniband  card  that we are willing to monitor.  The
			default	port is	1.
       Datasets	provided by the	plugin:	PacketsIn, PacketsOut, InMB, OutMB

acct_gather_interconnect/sysfs
       Required	entry in slurm.conf:
	      AcctGatherInterconnectType=acct_gather_interconnect/sysfs

       Options used for	acct_gather_interconnect/sysfs are as follows:

	      SysfsInterfaces=<interfaces>
			Comma-separated	list of	 interface  names  to  collect
			statistics from. Usage from all	listed interfaces will
			be  summed  together, and is not broken	down individu-
			ally.
       Datasets	provided by the	plugin:	PacketsIn, PacketsOut, InMB, OutMB

EXAMPLE
       ###
       # Slurm acct_gather configuration file
       ###
       # Parameters for	acct_gather_energy/impi	plugin
       EnergyIPMIFrequency=10
       EnergyIPMICalcAdjustment=yes
       #
       # Parameters for	acct_gather_profile/hdf5 plugin
       ProfileHDF5Dir=/app/slurm/profile_data
       # Parameters for	acct_gather_interconnect/ofed plugin
       InfinibandOFEDPort=1

COPYING
       Copyright (C) 2012-2013 Bull.  Copyright	 (C)  2012-2022	 SchedMD  LLC.
       Produced	at Bull	(cf, DISCLAIMER).

       This  file  is  part  of	Slurm, a resource management program.  For de-
       tails, see <https://slurm.schedmd.com/>.

       Slurm is	free software; you can redistribute it and/or modify it	 under
       the  terms  of  the GNU General Public License as published by the Free
       Software	Foundation; either version 2 of	the License, or	(at  your  op-
       tion) any later version.

       Slurm  is  distributed  in the hope that	it will	be useful, but WITHOUT
       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
       FITNESS	FOR  A	PARTICULAR PURPOSE. See	the GNU	General	Public License
       for more	details.

SEE ALSO
       slurm.conf(5)

December 2023		   Slurm Configuration File	   acct_gather.conf(5)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=acct_gather.conf&sektion=5&manpath=FreeBSD+Ports+14.3.quarterly>

home | help