Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
acct_gather.conf(5)	   Slurm Configuration File	   acct_gather.conf(5)

NAME
       acct_gather.conf	- Slurm	configuration file for the acct_gather plugins

DESCRIPTION
       acct_gather.conf	is a UTF8 formatted file which defines parameters used
       by  Slurm's  acct_gather	 related plugins.  The file will always	be lo-
       cated in	the same directory as the slurm.conf.

       Parameter names are case	insensitive but	parameter values are case sen-
       sitive.	Any text following a "#" in the	configuration file is  treated
       as  a  comment  through the end of that line.  The size of each line in
       the file	is limited to 1024 characters.

       Changes to the configuration file take effect upon restart of the Slurm
       daemons.

       The following acct_gather.conf parameters are defined  to  control  the
       general behavior	of various plugins in Slurm.

       The  acct_gather.conf  file  is different than other Slurm .conf	files.
       Each plugin defines which options are  available.  Each	plugin	to  be
       loaded must be specified	in the slurm.conf under	the following configu-
       ration entries:

        AcctGatherEnergyType (plugin type=acct_gather_energy)
        AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
        AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
        AcctGatherProfileType (plugin type=acct_gather_profile)

       If  the	respective plugin for an option	is not loaded then that	option
       will be unknown to Slurm, causing the daemon to	fatal  on  initializa-
       tion.   If  you	decide to change plugin	types in slurm.conf, also make
       sure to change the related options in acct_gather.conf.

acct_gather_energy/gpu
       Required	entry in slurm.conf:
	      AcctGatherEnergyType=acct_gather_energy/gpu
       This plugin doesn't read	any options from acct_gather.conf.
       Dataset provided	by the plugin is: Energy.

acct_gather_energy/IPMI
       Required	entry in slurm.conf:
	      AcctGatherEnergyType=acct_gather_energy/ipmi

       Options used for	acct_gather_energy/ipmi	are as follows:

	      EnergyIPMIFrequency=<number>
			This parameter is the number of	 seconds  between  BMC
			access	samples.  Ideally it should be higher or equal
			to JobAcctGatherFrequency, otherwise the JobAcctGather
			plugin will get	repeated values	in successive polls.

	      EnergyIPMICalcAdjustment=<yes|no>
			If set to "yes", the consumption between the last  BMC
			access sample and a step consumption update is approx-
			imated to get more accurate task consumption.  The ad-
			justment  is  made at the step start and each time the
			consumption is updated,	including the  step  end.  The
			approximations are not accumulated, only the first and
			last  adjustments  are used to calculated the consump-
			tion. The default is "no".

	      EnergyIPMIPowerSensors=<key=values>
			Optionally specify the ids of  the  sensors  to	 used.
			Multiple  <key=values> can be set with ";" separators.
			The key	"Node" is mandatory and	is used	 to  know  the
			consumed  energy  for  nodes  (scontrol	show node) and
			jobs (sacct).  Other keys are optional and  are	 named
			by  administrator.   These  keys  are useful only when
			profile	is activated for energy	 to  store  power  (in
			watt)  of each key.  <values> are integers except when
			using DCMI. Multiple values can	be set with ","	 sepa-
			rators.	The sum	of the listed sensors is used for each
			key.	EnergyIPMIPowerSensors	is  optional,  default
			value is "Node=<value>"	where "<value>"	is the	id  of
			the first power	sensor returned	by ipmi-sensors.
			i.e.
			EnergyIPMIPowerSen-
			sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
			EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
			EnergyIPMIPowerSensors=Node=1280

			Data  Center Manageability Interface - acct_gather_en-
			ergy/ipmi supports gathering power data	 through  DCMI
			IPMI  extension	 commands.  When  configured, the ipmi
			plugin will query the DCMI  using  the	"System	 Power
			mode"  or  the "Enhanced System	Power Statistics mode"
			flags depending	on the configuration.

			To configure one or the	other, the special sensor val-
			ues DCMI or DCMI_ENHANCED can be used, for example:
			EnergyIPMIPowerSensors=Node=DCMI
			EnergyIPMIPowerSensors=Node=DCMI_ENHANCED

	      The following acct_gather.conf parameters	are defined to control
	      the IPMI config default values for libipmiconsole.

	      EnergyIPMIUsername=USERNAME
			Specify	BMC Username.

	      EnergyIPMIPassword=PASSWORD
			Specify	BMC Password.
       Datasets	provided by the	plugin have name: <IPMI_SENSOR_LABEL>Power.

   NOTES:
       This plugin requires the	freeipmi development files to be installed and
       linkable	at configure time. The plugin will not build  otherwise.  When
       building	 the RPM, rpmbuild ... --with freeipmi can be specified	to ex-
       plicitly	check for these	dependencies.

acct_gather_energy/rapl
       Required	entry in slurm.conf:
	      AcctGatherEnergyType=acct_gather_energy/rapl
       This plugin doesn't read	any options from acct_gather.conf.
       Dataset provided	by the plugin is: Power.

acct_gather_energy/XCC
       Required	entry in slurm.conf:
	      AcctGatherEnergyType=acct_gather_energy/xcc

       Options used for	acct_gather_energy/xcc include only in-band communica-
       tions with XClarity Controller, thus a reduced set of configurations is
       supported:

	      EnergyIPMIFrequency=<number>
			This parameter is the number of	 seconds  between  XCC
			access samples.	 Default is 30 seconds.

	      EnergyIPMITimeout=<number>
			Timeout,  in  seconds,	for  initializing the IPMI XCC
			context	for a new gathering thread. Default is 10 sec-
			onds.
       Datasets	provided by the	plugin are: Energy, CurrPower.

acct_gather_filesystem/lustre
       Required	entry in slurm.conf:
	      AcctGatherFilesystemType=acct_gather_filesystem/lustre
       This plugin doesn't read	any options from acct_gather.conf.
       Datasets	provided by the	plugin are: Reads, ReadMB, Writes, WriteMB.

acct_gather_profile/HDF5
       Required	entry in slurm.conf:
	      AcctGatherProfileType=acct_gather_profile/hdf5

       Options used for	acct_gather_profile/hdf5 are as	follows:

	      ProfileHDF5Dir=<path>
		     This parameter is the path	 to  the  shared  folder  into
		     which  the	acct_gather_profile plugin will	write detailed
		     data (usually as an HDF5 file).  The directory is assumed
		     to	be on a	file system shared by the controller  and  all
		     compute nodes. This is a required parameter.

	      ProfileHDF5Default
		     A	comma-delimited	list of	data types to be collected for
		     each job submission.  Allowed values are:

		     All     All data types are	collected. (Cannot be combined
			     with other	values.)

		     None    No	data types are collected. This is the default.
			     (Cannot be	combined with other values.)

		     Energy  Energy data is collected.

		     Filesystem
			     File system (Lustre) data is collected.

		     Network Network (InfiniBand) data is collected.

		     Task    Task (I/O,	Memory,	...) data is collected.

acct_gather_profile/InfluxDB
       Required	entry in slurm.conf:
	      AcctGatherProfileType=acct_gather_profile/influxdb

       The InfluxDB plugin provides the	same information as  the  HDF5	plugin
       but will	instead	send information to the	configured InfluxDB server.

       The  InfluxDB  plugin is	designed against 1.x protocol of InfluxDB. Any
       site running a v2.x InfluxDB server will	need to	configure a v1.x  com-
       patibility endpoint along with the correct user and password authoriza-
       tion. Token authentication is not currently supported.

   Options:
       ProfileInfluxDBDatabase
	      InfluxDB v1.x database name where	profiling information is to be
	      written.	 InfluxDB v2.x bucket name where profiling information
	      is to be written.

       ProfileInfluxDBDefault
	      A	comma-delimited	list of	data types to be  collected  for  each
	      job submission.  Allowed values are:

	      All	All  data types	are collected. Cannot be combined with
			other values.

	      None	No data	types are  collected.  This  is	 the  default.
			Cannot be combined with	other values.

	      Energy	Energy data is collected.

	      Filesystem
			File system (Lustre) data is collected.

	      Network	Network	(InfiniBand) data is collected.

	      Task	Task (I/O, Memory, ...)	data is	collected.

       ProfileInfluxDBExtraTags=<tags>
	      A	comma-delimited	list of	additional tags	to attach to every In-
	      fluxDB	      measurement.	   Requires	    ProfileIn-
	      fluxDBFlags=grouped_fields.  Valid tags are:

	      cluster	The Slurm cluster name.

	      sluid	The SLUID of the job.

	      uid	The user ID of the user	who ran	the job.

	      user	The user name of the user who ran the job.

       ProfileInfluxDBFlags=<flags>
	      A	comma-delimited	list of	flags that tune	 the  InfluxDB	plugin
	      behavior.	 Valid flags are:

	      grouped_fields
			Emit  all  fields from a single	sample as one InfluxDB
			measurement line, with the dataset name	(e.g., "task",
			"network", "filesystem", "energy") as the measurement,
			rather than one	measurement per	 field.	 This  reduces
			InfluxDB  series  cardinality  and  removes  ambiguity
			caused by identical field names	coming from  different
			plugins.

       ProfileInfluxDBFrequency=<seconds>
	      How often	in seconds data	should be sent to the InfluxDB server.
	      Note  that  in practice, profile data will be sent no more often
	      than every JobAcctGatherFrequency	seconds. A value of 0 disables
	      buffering	in the plugin so that data is  sent  to	 the  InfluxDB
	      server  as  soon	it is collected. In all	cases, data is sent to
	      InfluxDB whenever	the plugin's internal buffer is	 full  and  at
	      the end of each job step.	Default	is 30 seconds.

       ProfileInfluxDBHost=<hostname>:<port>
	      The  hostname of the machine where the InfluxDB instance is exe-
	      cuted and	the port used by the HTTP API. The port	 used  by  the
	      HTTP  API	 is  the  one  configured through the bind-address in-
	      fluxdb.conf option in the	[http] section.	 Example:
	      ProfileInfluxDBHost=myinfluxhost:8086

       ProfileInfluxDBPass
	      Password for username  configured	 in  ProfileInfluxDBUser.  Re-
	      quired in	v2.x and optional in v1.x InfluxDB.

       ProfileInfluxDBRTPolicy
	      The InfluxDB v1.x	retention policy name for the database config-
	      ured  in	ProfileInfluxDBDatabase	 option.  If  unset,  InfluxDB
	      writes to	the default retention policy.

	      For InfluxDB v2.x, this value is passed on the  v1-compatibility
	      /write  endpoint and, together with ProfileInfluxDBDatabase, se-
	      lects a DBRP mapping that	resolves to a v2 bucket.  Recent  v2.x
	      releases auto-create a default DBRP mapping per bucket, so leav-
	      ing this option unset is usually sufficient.

       ProfileInfluxDBUser
	      InfluxDB	username  that	should	be  used to gain access	to the
	      database configured in ProfileInfluxDBDatabase. Required in v2.x
	      and optional in v1.x InfluxDB.  This is only needed if  InfluxDB
	      v1.x  is	configured  with  authentication enabled in the	[http]
	      config section and a user	has been granted at least WRITE	access
	      to the database. See also	ProfileInfluxDBPass.

       ProfileInfluxDBTimeout=<seconds>
	      The maximum time in seconds that an HTTP query to	 the  InfluxDB
	      server  can  take.  After	this timeout the data is discarded. Be
	      aware that a long	timeout	can drain your nodes if	 the  InfluxDB
	      server  is  unresponsive and, when terminating the job, the last
	      dataset takes more than UnkillableStepTimeout to be sent.	Inter-
	      nally, that option sets CURLOPT_TIMEOUT library option.  Default
	      is 10 seconds.

   NOTES:
       This  plugin requires the libcurl development files to be installed and
       linkable	at configure time. The plugin will not build otherwise.

       Information on how to install and configure InfluxDB and	 manage	 data-
       bases,  retention  policies  and	such is	available on the official web-
       page.

       Collected information is	written	from every compute node	 where	a  job
       runs  to	the InfluxDB instance listening	on the ProfileInfluxDBHost. In
       order to	avoid overloading the InfluxDB instance	with incoming  connec-
       tion  requests, the plugin uses an internal buffer which	is filled with
       samples.	Once the buffer	is full, a HTTP	API write request is performed
       and the buffer is emptied to hold subsequent samples. A	final  request
       is also performed when a	task ends even if the buffer isn't full.

       Failed  HTTP API	write requests are silently discarded. This means that
       collected profile information in	the plugin buffer is lost if it	 can't
       be written to the InfluxDB database for any reason.

       Plugin messages are logged along	with the slurmstepd logs to SlurmdLog-
       File.  In  order	 to troubleshoot any issues, it	is recommended to tem-
       porarily	increase the slurmd debug level	to debug3 and add  Profile  to
       the  debug  flags.  This	 can be	accomplished by	setting	the slurm.conf
       SlurmdDebug and DebugFlags respectively or dynamically through scontrol
       setdebug	and setdebugflags.

       Grafana can be used to create charts based on  the  data	 held  by  In-
       fluxDB.	This kind of tool permits one to create	dashboards, tables and
       other graphics using the	stored time series.

acct_gather_interconnect/OFED
       Required	entry in slurm.conf:
	      AcctGatherInterconnectType=acct_gather_interconnect/ofed

       Options used for	acct_gather_interconnect/ofed are as follows:

	      InfinibandOFEDPort=<number>
			This parameter represents the port number of the local
			Infiniband  card  that we are willing to monitor.  The
			default	port is	1.
       Datasets	provided by the	plugin:	PacketsIn, PacketsOut, InMB, OutMB

acct_gather_interconnect/sysfs
       Required	entry in slurm.conf:
	      AcctGatherInterconnectType=acct_gather_interconnect/sysfs

       Options used for	acct_gather_interconnect/sysfs are as follows:

	      SysfsInterfaces=<interfaces>
			Comma-separated	list of	 interface  names  to  collect
			statistics from. Usage from all	listed interfaces will
			be  summed  together, and is not broken	down individu-
			ally.
       Datasets	provided by the	plugin:	PacketsIn, PacketsOut, InMB, OutMB

EXAMPLE
       ###
       # Slurm acct_gather configuration file
       ###
       # Parameters for	acct_gather_energy/impi	plugin
       EnergyIPMIFrequency=10
       EnergyIPMICalcAdjustment=yes
       #
       # Parameters for	acct_gather_profile/hdf5 plugin
       ProfileHDF5Dir=/app/slurm/profile_data
       # Parameters for	acct_gather_interconnect/ofed plugin
       InfinibandOFEDPort=1

COPYING
       Copyright (C) 2012-2013 Bull.  Copyright	 (C)  2012-2022	 SchedMD  LLC.
       Produced	at Bull	(cf, DISCLAIMER).

       This  file  is  part  of	Slurm, a resource management program.  For de-
       tails, see <https://slurm.schedmd.com/>.

       Slurm is	free software; you can redistribute it and/or modify it	 under
       the  terms  of  the GNU General Public License as published by the Free
       Software	Foundation; either version 2 of	the License, or	(at  your  op-
       tion) any later version.

       Slurm  is  distributed  in the hope that	it will	be useful, but WITHOUT
       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
       FITNESS	FOR  A	PARTICULAR PURPOSE. See	the GNU	General	Public License
       for more	details.

SEE ALSO
       slurm.conf(5)

Slurm 26.05		   Slurm Configuration File	   acct_gather.conf(5)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=acct_gather.conf&sektion=5&manpath=FreeBSD+Ports+15.1.quarterly>

home | help