Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
gres.conf(5)		   Slurm Configuration File		  gres.conf(5)

NAME
       gres.conf  -  Slurm configuration file for Generic RESource (GRES) man-
       agement.

DESCRIPTION
       gres.conf is an ASCII file which	describes the configuration of Generic
       RESource(s) (GRES) on each compute node.	 If the	 GRES  information  in
       the  slurm.conf	file  does  not	fully describe those resources,	then a
       gres.conf file should be	included  on  each  compute  node.  For	 cloud
       nodes,  a  gres.conf  file that includes	all the	cloud nodes must be on
       all cloud nodes and the controller. The file will always	be located  in
       the same	directory as slurm.conf.

       If  the	GRES  information in the slurm.conf file fully describes those
       resources (i.e. no "Cores", "File" or "Links" specification is required
       for that	GRES type or that information is automatically detected), that
       information may be omitted from the gres.conf file and only the config-
       uration information in the slurm.conf file will be used.	 The gres.conf
       file may	be omitted completely if the configuration information in  the
       slurm.conf file fully describes all GRES.

       If  using  the  gres.conf  file	to describe the	resources available to
       nodes, the first	parameter on the line should be	NodeName. If configur-
       ing Generic Resources without specifying	nodes, the first parameter  on
       the line	should be Name.

       Parameter  names	are case insensitive.  Any text	following a "#"	in the
       configuration file is treated as	a comment  through  the	 end  of  that
       line.   Changes	to  the	configuration file take	effect upon restart of
       Slurm daemons, daemon receipt of	the SIGHUP signal, or execution	of the
       command "scontrol reconfigure" unless otherwise noted.

       NOTE: Slurm support for gres/[mps|shard]	requires the use  of  the  se-
       lect/cons_tres  plugin.	For  more information on how to	configure MPS,
       see https://slurm.schedmd.com/gres.html#MPS_Management.	For  more  in-
       formation      on      how      to      configure     Sharding,	   see
       https://slurm.schedmd.com/gres.html#Sharding.

       For   more   information	  on   GRES   scheduling   in	general,   see
       https://slurm.schedmd.com/gres.html.

       The overall configuration parameters available include:

       AutoDetect
	      The  hardware  detection mechanisms to enable for	automatic GRES
	      configuration.  Currently, the options are:

	      nrt    Automatically detect AWS Trainium/Inferentia devices.

	      nvml   Automatically detect NVIDIA  GPUs.	 Requires  the	NVIDIA
		     Management	Library	(NVML).

	      off    Do	 not  automatically  detect any	GPUs. Used to override
		     other options.

	      oneapi Automatically  detect  Intel  GPUs.  Requires  the	 Intel
		     Graphics Compute Runtime for oneAPI Level Zero and	OpenCL
		     Driver (oneapi).

	      rsmi   Automatically  detect  AMD	GPUs. Requires the ROCm	System
		     Management	Interface (ROCm	SMI) Library.

	      AutoDetect can be	on a line by itself, in	 which	case  it  will
	      globally	apply  to  all lines in	gres.conf by default. In addi-
	      tion, AutoDetect can be combined with NodeName to	only apply  to
	      certain  nodes.  Node-specific AutoDetects will trump the	global
	      AutoDetect. A node-specific AutoDetect only needs	to  be	speci-
	      fied  once  per  node.  If specified multiple times for the same
	      nodes, they must all be the same value. To unset AutoDetect  for
	      a	 node  when a global AutoDetect	is set,	simply set it to "off"
	      in a  node-specific  GRES	 line.	 E.g.:	NodeName=tux3  AutoDe-
	      tect=off	Name=gpu  File=/dev/nvidia[0-3].  AutoDetect cannot be
	      used with	cloud nodes.

	      AutoDetect will automatically detect files,  cores,  links,  and
	      any other	hardware. If a parameter such as File, Cores, or Links
	      are specified when AutoDetect is used, then the specified	values
	      are used to sanity check the auto	detected values. If there is a
	      mismatch,	 then  the node's state	is set to invalid and the node
	      is drained.

       Count  Number of	resources of this name/type available  on  this	 node.
	      The  default value is set	to the number of File values specified
	      (if any),	otherwise the default value is one. A suffix  of  "K",
	      "M", "G",	"T" or "P" may be used to multiply the number by 1024,
	      1048576,	  1073741824,	etc.   respectively.	For   example:
	      "Count=10G".

       Cores  Optionally specify the core index	numbers	for the	specific cores
	      which can	use this resource.  For	example, it  may  be  strongly
	      preferable  to  use  specific  cores  with specific GRES devices
	      (e.g. on a NUMA architecture).  While Slurm can track and	assign
	      resources	at the CPU or thread level, its	scheduling  algorithms
	      used  to co-allocate GRES	devices	with CPUs operates at a	socket
	      level (or	NUMA level with	numa_node_as_socket) for  job  alloca-
	      tions.   Therefore  it  is not possible to preferentially	assign
	      GRES with	different specific CPUs	on the same  socket  (or  NUMA
	      with  numa_node_as_socket)  and  this option should generally be
	      used to identify all cores on some socket. Though, job step  al-
	      locations	 that  request	a  portion ofnthe job's	resources with
	      --exact and task binding through --gpu-d will both look at cores
	      directly for which more specific core identification may be use-
	      ful.

	      Multiple cores may be specified using a comma-delimited list  or
	      a	 range	may be specified using a "-" separator (e.g. "0,1,2,3"
	      or "0-3").  If  a	 job  specifies	 --gres-flags=enforce-binding,
	      then  only  the  identified  cores  can  be  allocated with each
	      generic resource.	This will tend to improve performance of jobs,
	      but delay	the allocation of resources to them.  If specified and
	      a	job is not submitted with the --gres-flags=enforce-binding op-
	      tion the identified cores	will be	preferred for scheduling  with
	      each generic resource.

	      If  --gres-flags=disable-binding is specified, then any core can
	      be used with the resources, which	also increases	the  speed  of
	      Slurm's  scheduling  algorithm  but  can degrade the application
	      performance.  The	--gres-flags=disable-binding  option  is  cur-
	      rently  required to use more CPUs	than are bound to a GRES (e.g.
	      if a GPU is bound	to the CPUs on one socket,  but	 resources  on
	      more  than one socket are	required to run	the job).  If any core
	      can be effectively used with the resources, then do not  specify
	      the  cores  option  for  improved	 speed in the Slurm scheduling
	      logic.  A	restart	of the slurmctld is needed for changes to  the
	      Cores option to take effect.

	      NOTE: Since Slurm	must be	able to	perform	resource management on
	      heterogeneous  clusters having various processing	unit numbering
	      schemes, a logical core index must be specified instead  of  the
	      physical	core  index.  That logical core	index might not	corre-
	      spond to your physical core index	number.	 Core 0	 will  be  the
	      first  core on the first socket, while core 1 will be the	second
	      core on the first	socket.	 This  numbering  coincides  with  the
	      logical  core  number (Core L#) seen in "lstopo -l" command out-
	      put.

       File   Fully qualified pathname of the device files associated  with  a
	      resource.	 The name can include a	numeric	range suffix to	be in-
	      terpreted	by Slurm (e.g. File=/dev/nvidia[0-3]).

	      This  field  is generally	required if enforcement	of generic re-
	      source allocations is to be supported (i.e. prevents users  from
	      making  use  of  resources  allocated to a different user).  En-
	      forcement	of the	file  allocation  relies  upon	Linux  Control
	      Groups  (cgroups)	 and  Slurm's  task/cgroup  plugin, which will
	      place the	allocated files	into the job's cgroup and prevent  use
	      of  other	 files.	 Please	see Slurm's Cgroups Guide for more in-
	      formation: https://slurm.schedmd.com/cgroups.html.

	      If File is specified then	Count must be either set to the	number
	      of file names specified or not set (the  default	value  is  the
	      number of	files specified).  The exception to this is MPS/Shard-
	      ing.  For	 either	of these GRES, each GPU	would be identified by
	      device file using	the File parameter and Count would specify the
	      number of	entries	that would correspond to that  GPU.  For  MPS,
	      typically	 100  or  some multiple	of 100.	For Sharding typically
	      the maximum number of jobs that could simultaneously share  that
	      GPU.

	      If  using	a card with Multi-Instance GPU functionality, use Mul-
	      tipleFiles instead. File and MultipleFiles are  mutually	exclu-
	      sive.

	      NOTE: File is required for all gpu typed GRES.

	      NOTE:  If	 you specify the File parameter	for a resource on some
	      node, the	option must be specified on all	nodes and  Slurm  will
	      track  the  assignment  of  each specific	resource on each node.
	      Otherwise	Slurm will only	track a	count of  allocated  resources
	      rather than the state of each individual device file.

	      NOTE:  Drain  a  node  before changing the count of records with
	      File parameters (e.g. if you want	to add or remove GPUs  from  a
	      node's  configuration).  Failure to do so	will result in any job
	      using those GRES being aborted.

	      NOTE: When specifying File, Count	is limited in size  (currently
	      1024) for	each node.

       Flags  Optional flags that can be specified to change configured	behav-
	      ior of the GRES.

	      Allowed values at	present	are:

	      CountOnly		  Do  not attempt to load a plugin of the GRES
				  type as this GRES will only be used to track
				  counts of GRES used. This avoids  attempting
				  to load non-existent plugin which can	affect
				  filesystems with high	latency	metadata oper-
				  ations for non-existent files.

	      explicit		  If the flag is set, GRES is not allocated to
				  the  job  as	part  of whole node allocation
				  (--exclusive or OverSubscribe=EXCLUSIVE  set
				  on  partition)  unless it was	explicitly re-
				  quested by the job.

	      one_sharing	  To be	used on	a  shared  gres.  If  using  a
				  shared  gres	(mps) on top of	a sharing gres
				  (gpu)	only allow one of the sharing gres  to
				  be used by the shared	gres.  This is the de-
				  fault	for MPS.

				  NOTE:	 If a gres has this flag configured it
				  is global, so	all other nodes	with that gres
				  will have this flag implied.	This  flag  is
				  not  compatible  with	all_sharing for	a spe-
				  cific	gres.

	      all_sharing	  To be	used on	a shared gres. This is the op-
				  posite of one_sharing	and can	be used	to al-
				  low all sharing gres (gpu) on	a node	to  be
				  used for shared gres (mps).

				  NOTE:	 If a gres has this flag configured it
				  is global, so	all other nodes	with that gres
				  will have this flag implied.	This  flag  is
				  not  compatible  with	one_sharing for	a spe-
				  cific	gres.

	      nvidia_gpu_env	  Set  environment  variable  CUDA_VISIBLE_DE-
				  VICES	for all	GPUs on	the specified node(s).

	      amd_gpu_env	  Set  environment  variable  ROCR_VISIBLE_DE-
				  VICES	for all	GPUs on	the specified node(s).

	      intel_gpu_env	  Set  environment  variable  ZE_AFFINITY_MASK
				  for all GPUs on the specified	node(s).

	      opencl_env	  Set  environment variable GPU_DEVICE_ORDINAL
				  for all GPUs on the specified	node(s).

	      no_gpu_env	  Set no GPU-specific  environment  variables.
				  This	is mutually exclusive to all other en-
				  vironment-related flags.

	      If   no	environment-related   flags   are   specified,	  then
	      nvidia_gpu_env,  amd_gpu_env, intel_gpu_env, and opencl_env will
	      be implicitly set	by default.  If	AutoDetect is used  and	 envi-
	      ronment-related  flags  are  not specified, then AutoDetect=nvml
	      will set nvidia_gpu_env, AutoDetect=rsmi will  set  amd_gpu_env,
	      and AutoDetect=oneapi will set intel_gpu_env.  Conversely, spec-
	      ified environment-related	flags will always override AutoDetect.

	      Environment-related flags	set on one GRES	line will be inherited
	      by  the  GRES  line  directly below it if	no environment-related
	      flags are	specified on that line and if it is of the same	 node,
	      name,  and  type.	Environment-related flags must be the same for
	      GRES of the same node, name, and type.

	      Note that	there is a known issue with the	AMD ROCm runtime where
	      ROCR_VISIBLE_DEVICES is processed	 first,	 and  then  CUDA_VISI-
	      BLE_DEVICES  is  processed.  To avoid the	issues caused by this,
	      set Flags=amd_gpu_env for	AMD GPUs so only  ROCR_VISIBLE_DEVICES
	      is set.

       Links  A	comma-delimited	list of	numbers	identifying the	number of con-
	      nections between this device and other devices to	allow cosched-
	      uling  of	 better	connected devices.  This is an ordered list in
	      which the	number of connections this specific device has to  de-
	      vice number 0 would be in	the first position, the	number of con-
	      nections	it has to device number	1 in the second	position, etc.
	      A	-1 indicates the device	itself and a 0	indicates  no  connec-
	      tion.   If  specified,  then this	line can only contain a	single
	      GRES device (i.e.	can only contain a single file via File).

	      This is an optional value	and is	usually	 automatically	deter-
	      mined  if	AutoDetect is enabled.	A typical use case would be to
	      identify GPUs having NVLink connectivity.	 Note that  for	 GPUs,
	      the  minor number	assigned by the	OS and used in the device file
	      (i.e. the	X in /dev/nvidiaX) is not necessarily the same as  the
	      device number/index. The device number is	created	by sorting the
	      GPUs  by	PCI  bus  ID and then numbering	them starting from the
	      smallest		     bus	       ID.		   See
	      https://slurm.schedmd.com/gres.html#GPU_Management

       MultipleFiles
	      Fully  qualified	pathname of the	device files associated	with a
	      resource.	 Graphics cards	using Multi-Instance GPU  (MIG)	 tech-
	      nology will present multiple device files	that should be managed
	      as a single generic resource. The	file names can be a comma sep-
	      arated  list or it can include a numeric range suffix (e.g. Mul-
	      tipleFiles=/dev/nvidia[0-3]).

	      Drain a node before changing the count of	records	with the  Mul-
	      tipleFiles  parameter, such as when adding or removing GPUs from
	      a	node's configuration.  Failure to do so	will result in any job
	      using those GRES being aborted.

	      When not using GPUs with MIG functionality,  use	File  instead.
	      MultipleFiles and	File are mutually exclusive.

       Name   Name of the generic resource. Any	desired	name may be used.  The
	      name  must  match	 a  value  in  GresTypes  in slurm.conf.  Each
	      generic resource has an optional plugin which  can  provide  re-
	      source-specific functionality.  Generic resources	that currently
	      include an optional plugin are:

	      gpu    Graphics Processing Unit

	      mps    CUDA Multi-Process	Service	(MPS)

	      nic    Network Interface Card

	      shard  Shards of a gpu

       NodeName
	      An  optional  NodeName  specification  can be used to permit one
	      gres.conf	file to	be used	for all	compute	nodes in a cluster  by
	      specifying  the  node(s)	that  each  line should	apply to.  The
	      NodeName specification can use a Slurm hostlist specification as
	      shown in the example below.

       Type   An optional arbitrary string identifying the type	of generic re-
	      source.  For example, this might be used to identify a  specific
	      model  of	 GPU,  which  users can	then specify in	a job request.
	      For changes to the Type option to	take effect  with  a  scontrol
	      reconfig	all  affected slurmd daemons must be responding	to the
	      slurmctld.  Otherwise a restart of the slurmctld and slurmd dae-
	      mons is required.

	      NOTE: If using autodetect	functionality and defining the Type in
	      your gres.conf file, the Type specified should  match  or	 be  a
	      substring	 of the	value that is detected,	using an underscore in
	      lieu of any spaces.

EXAMPLES
       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Define	GPU devices with MPS support, with AutoDetect sanity checking
       ##################################################################
       AutoDetect=nvml
       Name=gpu	Type=gtx560 File=/dev/nvidia0 COREs=0,1
       Name=gpu	Type=tesla  File=/dev/nvidia1 COREs=2,3
       Name=mps	Count=100 File=/dev/nvidia0 COREs=0,1
       Name=mps	Count=100  File=/dev/nvidia1 COREs=2,3

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Overwrite system defaults and explicitly configure three GPUs
       ##################################################################
       Name=gpu	Type=tesla File=/dev/nvidia[0-1] COREs=0,1
       # Name=gpu Type=tesla  File=/dev/nvidia[2-3] COREs=2,3
       # NOTE: nvidia2 device is out of	service
       Name=gpu	Type=tesla  File=/dev/nvidia3 COREs=2,3

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Use a single gres.conf	file for all compute nodes - positive method
       ##################################################################
       ## Explicitly specify devices on	nodes tux0-tux15
       # NodeName=tux[0-15]  Name=gpu File=/dev/nvidia[0-3]
       # NOTE: tux3 nvidia1 device is out of service
       NodeName=tux[0-2]  Name=gpu File=/dev/nvidia[0-3]
       NodeName=tux3  Name=gpu File=/dev/nvidia[0,2-3]
       NodeName=tux[4-15]  Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Use NVML to gather GPU	configuration information
       # for all nodes except one
       ##################################################################
       AutoDetect=nvml
       NodeName=tux3 AutoDetect=off Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Specify some nodes with NVML, some with RSMI, and some	with no	AutoDetect
       ##################################################################
       NodeName=tux[0-7] AutoDetect=nvml
       NodeName=tux[8-11] AutoDetect=rsmi
       NodeName=tux[12-15] Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Define	'bandwidth' GRES to use	as a way to limit the
       # resource use on these nodes for workflow purposes
       ##################################################################
       NodeName=tux[0-7] Name=bandwidth	Type=lustre Count=4G Flags=CountOnly

COPYING
       Copyright (C) 2010 The Regents of the University	of  California.	  Pro-
       duced at	Lawrence Livermore National Laboratory (cf, DISCLAIMER).
       Copyright (C) 2010-2022 SchedMD LLC.

       This  file  is  part  of	Slurm, a resource management program.  For de-
       tails, see <https://slurm.schedmd.com/>.

       Slurm is	free software; you can redistribute it and/or modify it	 under
       the  terms  of  the GNU General Public License as published by the Free
       Software	Foundation; either version 2 of	the License, or	(at  your  op-
       tion) any later version.

       Slurm  is  distributed  in the hope that	it will	be useful, but WITHOUT
       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
       FITNESS	FOR  A	PARTICULAR PURPOSE. See	the GNU	General	Public License
       for more	details.

SEE ALSO
       slurm.conf(5)

August 2023		   Slurm Configuration File		  gres.conf(5)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=gres.conf&sektion=5&manpath=FreeBSD+Ports+14.3.quarterly>

home | help