Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
sdiag(1)			Slurm Commands			      sdiag(1)

NAME
       sdiag - Scheduling diagnostic tool for Slurm

SYNOPSIS
       sdiag

DESCRIPTION
       sdiag  shows information	related	to slurmctld execution about: threads,
       agents, jobs, and scheduling algorithms.	The goal  is  to  obtain  data
       from  slurmctld	behavior helping to adjust configuration parameters or
       queues policies.	The main reason	behind is to know Slurm	behavior under
       systems with a high throughput.

       It has two execution modes. The default mode --all shows	several	 coun-
       ters and	statistics explained later, and	there is another execution op-
       tion --reset for	resetting those	values.

       Values are reset	at midnight UTC	time by	default.

       The  first  block  of information is related to global slurmctld	execu-
       tion:

       Server thread count
	      The number of current active slurmctld threads.  A  high	number
	      would  mean  a high load processing events like job submissions,
	      jobs dispatching,	jobs completing, etc. If this is  often	 close
	      to MAX_SERVER_THREADS it could point to a	potential bottleneck.

       Agent queue size
	      Slurm  design  has  scalability  in mind and sending messages to
	      thousands	of nodes is not	a trivial task.	 The  agent  mechanism
	      helps  to	control	communication between slurmctld	and the	slurmd
	      daemons for a best effort. This value denotes the	count  of  en-
	      queued outgoing RPC requests in an internal retry	list.

       Agent count
	      Number  of agent threads.	Each of	these agent threads can	create
	      in turn a	group of up to 2 + AGENT_THREAD_COUNT  active  threads
	      at a time.

       Agent thread count
	      Total count of active threads created by all the agent threads.

       DBD Agent queue size
	      Slurm  queues  up	 the  messages	intended  for the SlurmDBD and
	      processes	them in	a separate thread. If the SlurmDBD,  or	 data-
	      base, is down then this number will increase.

	      The  max	queue size is configured in the	slurm.conf with	MaxDB-
	      DMsgs. If	this number begins to grow more	than half of  the  max
	      queue size, the slurmdbd and the database	should be investigated
	      immediately.

       Jobs submitted
	      Number of	jobs submitted since last reset

       Jobs started
	      Number  of  jobs	started	 since last reset. This	includes back-
	      filled jobs.

       Jobs completed
	      Number of	jobs completed since last reset.

       Jobs canceled
	      Number of	jobs canceled since last reset.

       Jobs failed
	      Number of	jobs failed due	to slurmd  or  other  internal	issues
	      since last reset.

       Job states ts:
	      Lists  the timestamp of when the following job state counts were
	      gathered.

       Jobs pending:
	      Number of	jobs pending at	the  given  time  of  the  time	 stamp
	      above.

       Jobs running:
	      Number  of  jobs	running	 at  the  given	time of	the time stamp
	      above.

       Jobs running ts:
	      Time stamp of when the running job count was taken.

       The next	block of information is	related	to main	 scheduling  algorithm
       based  on  jobs	priorities.  A	scheduling  cycle  implies  to get the
       job_write_lock lock, then trying	to get	resources  for	jobs  pending,
       starting	from the most priority one and going in	descending order. Once
       a  job can not get the resources	the loop keeps going but just for jobs
       requesting other	partitions. Jobs with dependencies or affected	by ac-
       counts limits are not processed.

       Last cycle
	      Time in microseconds for last scheduling cycle.

       Max cycle
	      Maximum time in microseconds for any scheduling cycle since last
	      reset.

       Total cycles
	      Total run	time in	microseconds for all scheduling	 cycles	 since
	      last reset.  Scheduling is performed periodically	and (depending
	      upon  configuration)  when  a  job is submitted or a job is com-
	      pleted.

       Mean cycle
	      Mean time	in microseconds	for all	scheduling cycles  since  last
	      reset.

       Mean depth cycle
	      Mean  of	cycle depth. Depth means number	of jobs	processed in a
	      scheduling cycle.

       Cycles per minute
	      Counter of scheduling executions per minute.

       Last queue length
	      Length of	jobs pending queue.

       The next	block of information is	related	to backfilling scheduling  al-
       gorithm.	 A backfilling scheduling cycle	implies	to get locks for jobs,
       nodes  and  partitions  objects	then  trying to	get resources for jobs
       pending.	Jobs are processed based on priorities.	If a job can  not  get
       resources  the  algorithm calculates when it could get them obtaining a
       future start time for the job.  Then next job is	processed and the  al-
       gorithm	tries to get resources for that	job but	avoiding to affect the
       previous	ones, and again	it calculates the future  start	 time  if  not
       current	resources available. The backfilling algorithm takes more time
       for each	new job	to process since more priority jobs  can  not  be  af-
       fected.	The algorithm itself takes measures for	avoiding a long	execu-
       tion cycle and for taking all the locks for too long.

       Total backfilled	jobs (since last slurm start)
	      Number of	jobs started thanks to backfilling  since  last	 slurm
	      start.

       Total backfilled	jobs (since last stats cycle start)
	      Number  of  jobs	started	 thanks	to backfilling since last time
	      stats where reset.  By default these values are  reset  at  mid-
	      night UTC	time.

       Total backfilled	heterogeneous job components
	      Number  of  heterogeneous	job components started thanks to back-
	      filling since last Slurm start.

       Total cycles
	      Number of	backfill scheduling cycles since last reset

       Last cycle when
	      Time when	last backfill scheduling cycle happened	in the	format
	      "weekday Month MonthDay hour:minute.seconds year"

       Last cycle
	      Time  in	microseconds  of  last	backfill scheduling cycle.  It
	      counts only execution time, removing sleep time inside a	sched-
	      uling  cycle when	it executes for	an extended period time.  Note
	      that locks are released during the sleep time so that other work
	      can proceed.

       Max cycle
	      Time in microseconds of maximum backfill scheduling cycle	execu-
	      tion since last reset.  It counts	only execution time,  removing
	      sleep time inside	a scheduling cycle when	it executes for	an ex-
	      tended  period  time.   Note  that locks are released during the
	      sleep time so that other work can	proceed.

       Mean cycle
	      Mean time	in microseconds	of backfilling scheduling cycles since
	      last reset.

       Last depth cycle
	      Number of	processed jobs during last backfilling scheduling  cy-
	      cle. It counts every job even if that job	can not	be started due
	      to dependencies or limits.

       Last depth cycle	(try sched)
	      Number  of processed jobs	during last backfilling	scheduling cy-
	      cle. It counts only jobs with a chance to	start using  available
	      resources.  These	 jobs  consume	more scheduling	time than jobs
	      which are	found can not be started due to	dependencies  or  lim-
	      its.

       Depth Mean
	      Mean  count  of jobs processed during all	backfilling scheduling
	      cycles since last	reset.	Jobs which are found to	be  ineligible
	      to  run  when examined by	the backfill scheduler are not counted
	      (e.g. jobs submitted to multiple partitions and already started,
	      jobs which have reached a	QOS or account limit such  as  maximum
	      running jobs for an account, etc).

       Depth Mean (try sched)
	      The  subset  of Depth Mean that the backfill scheduler attempted
	      to schedule.

       Last queue length
	      Number of	jobs pending to	be processed by	backfilling algorithm.
	      A	job is counted once for	each partition it is queued to use.  A
	      pending job array	will normally be counted as one	job (tasks  of
	      a	job array which	have already been started/requeued or individ-
	      ually  modified will already have	individual job records and are
	      each counted as a	separate job).

       Queue length Mean
	      Mean count of jobs pending to be processed by backfilling	 algo-
	      rithm.   A  job is counted once for each partition it requested.
	      A	pending	job array will normally	be counted as one  job	(tasks
	      of a job array which have	already	been started/requeued or indi-
	      vidually	modified  will already have individual job records and
	      are each counted as a separate job).

       Last table size
	      Count of different time slots tested by the  backfill  scheduler
	      in its last iteration.

       Mean table size
	      Mean count of different time slots tested	by the backfill	sched-
	      uler.  Larger counts increase the	time required for the backfill
	      operation.   The table size is influenced	by many	scheduling pa-
	      rameters,	 including:  bf_min_age_reserve,  bf_min_prio_reserve,
	      bf_resolution, and bf_window.

       Latency for 1000	calls to gettimeofday()
	      Latency of 1000 calls to the gettimeofday() syscall in microsec-
	      onds, as measured	at controller startup.

       The next	blocks of information report the most frequently issued	remote
       procedure  calls	(RPCs),	calls made for the Slurmctld daemon to perform
       some action.  The fourth	block reports the RPCs issued by message type.
       You will	need to	look up	those RPC codes	in the Slurm  source  code  by
       looking	them up	in the file src/common/slurm_protocol_defs.h.  The re-
       port includes the number	of times each RPC is invoked, the  total  time
       consumed	 by  all  of those RPCs	plus the average time consumed by each
       RPC in microseconds.  The fifth block reports the RPCs issued  by  user
       ID,  the	total number of	RPCs they have issued, the total time consumed
       by all of those RPCs plus the average time consumed by each RPC in  mi-
       croseconds.   RPCs  statistics are collected for	the life of the	slurm-
       ctld process unless explicitly --reset.

       The sixth block of information, labeled Pending RPC  Statistics,	 shows
       information  about  pending outgoing RPCs on the	slurmctld agent	queue.
       The first section of this block shows types of RPCs on  the  queue  and
       the count of each. The second section shows up to the first 25 individ-
       ual  RPCs pending on the	agent queue, including the type	and the	desti-
       nation host list.  This information is cached and only refreshed	on  30
       second intervals.

OPTIONS
       -a, --all
	      Get  and	report information. This is the	default	mode of	opera-
	      tion.

       -M, --cluster=<string>
	      The cluster to issue commands to.	Only one cluster name  may  be
	      specified.  Note that the	SlurmDBD must be up for	this option to
	      work properly.

       -h, --help
	      Print description	of options and exit.

       --json, --json=list, --json=<data_parser>
	      Dump information as JSON using the default data_parser plugin or
	      explicit data_parser with	parameters. Sorting and	formatting ar-
	      guments will be ignored.

       -r, --reset
	      Reset  scheduler and RPC counters	to 0. Only supported for Slurm
	      operators	and administrators.

       -i, --sort-by-id
	      Sort Remote Procedure Call (RPC) data by	message	 type  ID  and
	      user ID.

       -t, --sort-by-time
	      Sort Remote Procedure Call (RPC) data by total run time.

       -T, --sort-by-time2
	      Sort Remote Procedure Call (RPC) data by average run time.

       --usage
	      Print list of options and	exit.

       -V, --version
	      Print current version number and exit.

       --yaml, --yaml=list, --yaml=<data_parser>
	      Dump information as YAML using the default data_parser plugin or
	      explicit data_parser with	parameters. Sorting and	formatting ar-
	      guments will be ignored.

PERFORMANCE
       Executing  sdiag	 sends a remote	procedure call to slurmctld. If	enough
       calls from sdiag	or other Slurm client commands that send remote	proce-
       dure calls to the slurmctld daemon come in at once, it can result in  a
       degradation  of performance of the slurmctld daemon, possibly resulting
       in a denial of service.

       Do not run sdiag	or other Slurm client commands that send remote	proce-
       dure calls to slurmctld from loops in shell scripts or other  programs.
       Ensure  that programs limit calls to sdiag to the minimum necessary for
       the information you are trying to gather.

ENVIRONMENT VARIABLES
       Some sdiag options may be set via environment variables.	These environ-
       ment variables, along with their	corresponding options, are listed  be-
       low.  (Note: Command line options will always override these settings.)

       SLURM_CLUSTERS	   Same	as --cluster

       SLURM_CONF	   The location	of the Slurm configuration file.

COPYING
       Copyright (C) 2010-2011 Barcelona Supercomputing	Center.
       Copyright (C) 2010-2022 SchedMD LLC.

       Slurm  is free software;	you can	redistribute it	and/or modify it under
       the terms of the	GNU General Public License as published	 by  the  Free
       Software	 Foundation;  either version 2 of the License, or (at your op-
       tion) any later version.

       Slurm is	distributed in the hope	that it	will be	 useful,  but  WITHOUT
       ANY  WARRANTY;  without even the	implied	warranty of MERCHANTABILITY or
       FITNESS FOR A PARTICULAR	PURPOSE. See the GNU  General  Public  License
       for more	details.

SEE ALSO
       sinfo(1), squeue(1), scontrol(1), slurm.conf(5),

May 2023			Slurm Commands			      sdiag(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=sdiag&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help