Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
nvidia-smi(1)			    NVIDIA			 nvidia-smi(1)

NAME
       nvidia-smi - NVIDIA System Management Interface program

SYNOPSIS
       nvidia-smi [OPTION1 [ARG1]] [OPTION2 [ARG2]] ...

DESCRIPTION
       nvidia-smi (also	NVSMI) provides	monitoring and management capabilities
       for each	of NVIDIA's Tesla, Quadro, GRID	and GeForce devices from Fermi
       and higher architecture families. GeForce Titan series devices are sup-
       ported  for  most  functions with very limited information provided for
       the remainder of	the Geforce brand.  NVSMI is  a	 cross	platform  tool
       that  supports  all  standard NVIDIA driver-supported Linux distros, as
       well as 64bit versions of Windows starting with Windows Server 2008 R2.
       Metrics can be consumed directly	by users via stdout,  or  provided  by
       file via	CSV and	XML formats for	scripting purposes.

       Note  that much of the functionality of NVSMI is	provided by the	under-
       lying NVML C-based library.  See	the NVIDIA developer website link  be-
       low  for	 more  information about NVML.	NVML-based python bindings are
       also available.

       The output of NVSMI is not guaranteed to	be backwards compatible.  How-
       ever,  both  NVML and the Python	bindings are backwards compatible, and
       should be the first choice when writing any tools that  must  be	 main-
       tained across NVIDIA driver releases.

       NVML SDK: http://developer.nvidia.com/nvidia-management-library-nvml/

       Python bindings:	http://pypi.python.org/pypi/nvidia-ml-py/

OPTIONS
   GENERAL OPTIONS
   -h, --help
       Print usage information and exit.

   --version
       Print version information and exit.

   LIST	OPTIONS
   -L, --list-gpus
       List each of the	NVIDIA GPUs in the system, along with their UUIDs.

   -B, --list-excluded-gpus
       List  each  of the excluded NVIDIA GPUs in the system, along with their
       UUIDs.

   SUMMARY OPTIONS
   Show	a summary of GPUs connected to the system.
   [any	one of]
   -i, --id=ID
       Target a	specific GPU.

   -f FILE, --filename=FILE
       Log to the specified file, rather than to stdout.

   -l SEC, --loop=SEC
       Probe until Ctrl+C at specified second interval.

   QUERY OPTIONS
   -q, --query
       Display GPU or Unit info.  Displayed info includes all data  listed  in
       the  (GPU  ATTRIBUTES)  or (UNIT	ATTRIBUTES) sections of	this document.
       Some devices and/or environments	don't support  all  possible  informa-
       tion.   Any unsupported data is indicated by a "N/A" in the output.  By
       default information for all available GPUs or Units is displayed.   Use
       the -i option to	restrict the output to a single	GPU or Unit.

   [plus optionally]
   -u, --unit
       Display Unit data instead of GPU	data.  Unit data is only available for
       NVIDIA S-class Tesla enclosures.

   -i, --id=ID
       Display	data for a single specified GPU	or Unit.  The specified	id may
       be the GPU/Unit's 0-based index in the natural enumeration returned  by
       the driver, the GPU's board serial number, the GPU's UUID, or the GPU's
       PCI  bus	 ID (as	domain:bus:device.function in hex).  It	is recommended
       that users desiring consistency use either UUID or PCI  bus  ID,	 since
       device  enumeration ordering is not guaranteed to be consistent between
       reboots and board serial	number might be	shared between	multiple  GPUs
       on the same board.

   -f FILE, --filename=FILE
       Redirect	 query	output	to  the	specified file in place	of the default
       stdout.	The specified file will	be overwritten.

   -x, --xml-format
       Produce XML output in place of the default human-readable format.  Both
       GPU and Unit query outputs conform to corresponding  DTDs.   These  are
       available via the --dtd flag.

   --dtd
       Use with	-x.  Embed the DTD in the XML output.

   --debug=FILE
       Produces	 an  encrypted debug log for use in submission of bugs back to
       NVIDIA.

   -d TYPE, --display=TYPE
       Display only selected information: MEMORY, UTILIZATION,	ECC,  TEMPERA-
       TURE,  POWER,  CLOCK,  COMPUTE,	PIDS,  PERFORMANCE,  SUPPORTED_CLOCKS,
       PAGE_RETIREMENT,	ACCOUNTING, ENCODER_STATS,  SUPPORTED_GPU_TARGET_TEMP,
       VOLTAGE,	 FBC_STATS,  ROW_REMAPPER, RESET_STATUS, GSP_FIRMWARE_VERSION,
       POWER_SMOOTHING,	POWER_PROFILES.	 Flags can be combined with comma e.g.
       "MEMORY,ECC".  Sampling data with max, min and avg is also returned for
       POWER,  UTILIZATION  and	 CLOCK	display	 types.	  Doesn't  work	  with
       -u/--unit or -x/--xml-format flags.

   -l SEC, --loop=SEC
       Continuously  report  query data	at the specified interval, rather than
       the default of  just  once.   The  application  will  sleep  in-between
       queries.	  Note	that on	Linux ECC error	or Xid error events will print
       out during the sleep period if the -x flag was not specified.  Pressing
       Ctrl+C at any time will abort the loop, which will otherwise run	indef-
       initely.	 If no argument	is specified for the -l	form a default	inter-
       val of 5	seconds	is used.

   -lms	ms, --loop-ms=ms
       Same as -l,--loop but in	milliseconds.

   SELECTIVE QUERY OPTIONS
       Allows the caller to pass an explicit list of properties	to query.

   [one	of]
   --query-gpu=
       Information  about  GPU.	  Pass	comma separated	list of	properties you
       want to	query.	 e.g.  --query-gpu=pci.bus_id,persistence_mode.	  Call
       --help-query-gpu	for more info.

   --query-supported-clocks=
       List  of	supported clocks.  Call	--help-query-supported-clocks for more
       info.

   --query-compute-apps=
       List of currently active	 compute  processes.   Call  --help-query-com-
       pute-apps for more info.

   --query-accounted-apps=
       List  of	accounted compute processes.  Call --help-query-accounted-apps
       for more	info.  This query is not supported on vGPU host.

   --query-retired-pages=
       List  of	 GPU  device  memory  pages  that  have	 been  retired.	  Call
       --help-query-retired-pages for more info.

   --query-remapped-rows=
       Information  about  remapped rows.  Call	--help-query-remapped-rows for
       more info.

   [mandatory]
   --format=
       Comma separated list of format options:

             csv - comma separated values (MANDATORY)

             noheader - skip first line with column headers

             nounits -	don't print units for numerical	values

   [plus any of]
   -i, --id=ID
       Display data for	a single specified GPU.	 The specified id may  be  the
       GPU's  0-based index in the natural enumeration returned	by the driver,
       the GPU's board serial number, the GPU's	UUID, or the GPU's PCI bus  ID
       (as  domain:bus:device.function	in hex).  It is	recommended that users
       desiring	consistency use	either UUID or PCI bus ID, since  device  enu-
       meration	 ordering  is  not guaranteed to be consistent between reboots
       and board serial	number might be	shared between multiple	 GPUs  on  the
       same board.

   -f FILE, --filename=FILE
       Redirect	 query	output	to  the	specified file in place	of the default
       stdout.	The specified file will	be overwritten.

   -l SEC, --loop=SEC
       Continuously report query data at the specified interval,  rather  than
       the  default  of	 just  once.   The  application	 will sleep in-between
       queries.	 Note that on Linux ECC	error or Xid error events  will	 print
       out during the sleep period if the -x flag was not specified.  Pressing
       Ctrl+C at any time will abort the loop, which will otherwise run	indef-
       initely.	  If no	argument is specified for the -l form a	default	inter-
       val of 5	seconds	is used.

   -lms	ms, --loop-ms=ms
       Same as -l,--loop but in	milliseconds.

   DEVICE MODIFICATION OPTIONS
   [any	one of]
   -pm,	--persistence-mode=MODE
       Set the persistence mode	for the	target GPUs.  See the (GPU ATTRIBUTES)
       section for a description of persistence	mode.	Requires  root.	  Will
       impact all GPUs unless a	single GPU is specified	using the -i argument.
       The  effect  of this operation is immediate.  However, it does not per-
       sist across reboots.  After each	reboot persistence mode	 will  default
       to "Disabled".  Available on Linux only.

   -e, --ecc-config=CONFIG
       Set the ECC mode	for the	target GPUs.  See the (GPU ATTRIBUTES) section
       for  a  description  of ECC mode.  Requires root.  Will impact all GPUs
       unless a	single GPU is specified	using the -i argument.	 This  setting
       takes effect after the next reboot and is persistent.

   -p, --reset-ecc-errors=TYPE
       Reset the ECC error counters for	the target GPUs.  See the (GPU ATTRIB-
       UTES)  section for a description	of ECC error counter types.  Available
       arguments are 0|VOLATILE	or 1|AGGREGATE.	 Requires root.	  Will	impact
       all  GPUs  unless a single GPU is specified using the -i	argument.  The
       effect of this operation	is immediate.  Clearing	 aggregate  counts  is
       not supported on	Ampere+

   -c, --compute-mode=MODE
       Set  the	 compute  mode	for the	target GPUs.  See the (GPU ATTRIBUTES)
       section for a description of compute mode.  Requires root.  Will	impact
       all GPUs	unless a single	GPU is specified using the -i  argument.   The
       effect  of  this	 operation is immediate.  However, it does not persist
       across reboots.	After each reboot compute  mode	 will  reset  to  "DE-
       FAULT".

   -dm TYPE, --driver-model=TYPE
   -fdm	TYPE, --force-driver-model=TYPE
       Enable or disable TCC driver model.  For	Windows	only.  Requires	admin-
       istrator	 privileges.  -dm will fail if a display is attached, but -fdm
       will force the driver model to change.  Will impact all GPUs  unless  a
       single  GPU  is	specified using	the -i argument.  A reboot is required
       for the change to take place.  See Driver Model for more	information on
       Windows driver models.

	--gom=MODE
       Set GPU Operation Mode:	0/ALL_ON,  1/COMPUTE,  2/LOW_DP	 Supported  on
       GK110  M-class  and X-class Tesla products from the Kepler family.  Not
       supported on Quadro and Tesla C-class products.	LOW_DP and ALL_ON  are
       the  only  modes	supported on GeForce Titan devices.  Requires adminis-
       trator privileges.  See GPU Operation Mode for more  information	 about
       GOM.   GOM  changes  take  effect after reboot.	The reboot requirement
       might be	removed	in the future.	Compute	only GOMs don't	 support  WDDM
       (Windows	Display	Driver Model)

   -r, --gpu-reset
       Trigger	a  reset of one	or more	GPUs.  Can be used to clear GPU	HW and
       SW state	in situations that would otherwise require a  machine  reboot.
       Typically  useful  if a double bit ECC error has	occurred.  Optional -i
       switch can be used to target one	or  more  specific  devices.   Without
       this  option,  all  GPUs	are reset.  Requires root.  There can't	be any
       applications using these	devices	(e.g. CUDA application,	 graphics  ap-
       plication  like X server, monitoring application	like other instance of
       nvidia-smi).  There also	can't be any compute applications  running  on
       any other GPU in	the system if individual GPU reset is not feasible.

       Starting	 with the NVIDIA Ampere	architecture, GPUs with	NVLink connec-
       tions can be individually reset.	 On Ampere  NVSwitch  systems,	Fabric
       Manager	is  required to	facilitate reset. On Hopper and	later NVSwitch
       systems,	the dependency on Fabric Manager to facilitate	reset  is  re-
       moved.

       If Fabric Manager is not	running, or if any of the GPUs being reset are
       based  on an architecture preceding the NVIDIA Ampere architecture, any
       GPUs with NVLink	connections to a GPU being reset must also be reset in
       the same	command.  This can be done either by omitting the  -i  switch,
       or  using the -i	switch to specify the GPUs to be reset.	 If the	-i op-
       tion does not specify a complete	set of NVLink GPUs to reset, this com-
       mand will issue an error	identifying the	additional GPUs	that  must  be
       included	in the reset command.

       GPU reset is not	guaranteed to work in all cases. It is not recommended
       for production environments at this time.  In some situations there may
       be  HW  components  on the board	that fail to revert back to an initial
       state following the reset request.  This	is more	likely to be  seen  on
       Fermi-generation	products vs. Kepler, and more likely to	be seen	if the
       reset is	being performed	on a hung GPU.

       Following  a reset, it is recommended that the health of	each reset GPU
       be verified before further use.	If any GPU is not healthy  a  complete
       reset should be instigated by power cycling the node.

       GPU reset operation will	not be supported on MIG	enabled	vGPU guests.

       Visit  http://developer.nvidia.com/gpu-deployment-kit  to  download the
       GDK.

   -vm,	--virt-mode=MODE
       Switch GPU Virtualization Mode. Sets GPU	virtualization mode to	3/VGPU
       or  4/VSGA.   Virtualization  mode  of a	GPU can	only be	set when it is
       running on a hypervisor.

   -lgc, --lock-gpu-clocks=MIN_GPU_CLOCK,MAX_GPU_CLOCK
       Specifies <minGpuClock,maxGpuClock> clocks as a pair  (e.g.  1500,1500)
       that  defines closest desired locked GPU	clock speed in MHz.  Input can
       also use	be a singular desired clock value (e.g.	<GpuClockValue>).  Op-
       tionally, --mode	can be supplied	to specify the	clock  locking	modes.
       Supported on Volta+.  Requires root

       --mode=0	(Default)
		      This mode	is the default clock locking mode and provides
		      the  highest  possible frequency accuracies supported by
		      the hardware.

       --mode=1	      The clock	locking	algorithm leverages  close  loop  con-
		      trollers	to  achieve frequency accuracies with improved
		      perf per watt for	certain	class of applications. Due  to
		      convergence  latency of close loop controllers, the fre-
		      quency accuracies	may be	slightly  lower	 than  default
		      mode 0.

   -lmc, --lock-memory-clocks=MIN_MEMORY_CLOCK,MAX_MEMORY_CLOCK
       Specifies  <minMemClock,maxMemClock>  clocks as a pair (e.g. 5100,5100)
       that defines the	range of desired locked	Memory clock speed in MHz. In-
       put can also be a singular desired clock	value (e.g. <MemClockValue>).

   -rgc, --reset-gpu-clocks
       Resets the GPU clocks to	the default value.  Supported on Volta+.   Re-
       quires root.

   -rmc, --reset-memory-clocks
       Resets  the  memory  clocks to the default value.  Supported on Volta+.
       Requires	root.

   -ac,	--applications-clocks=MEM_CLOCK,GRAPHICS_CLOCK
       Specifies maximum <memory,graphics> clocks as a	pair  (e.g.  2000,800)
       that  defines  GPU's  speed  while running applications on a GPU.  Sup-
       ported  on  Maxwell-based  GeForce  and	from  the  Kepler+  family  in
       Tesla/Quadro/Titan devices.  Requires root.

   -rac, --reset-applications-clocks
       Resets  the  applications  clocks  to  the default value.  Supported on
       Maxwell-based GeForce and from the Kepler+ family in Tesla/Quadro/Titan
       devices.	 Requires root.

   -lmcd, --lock-memory-clocks-deferred
       Specifies the memory clock that	defines	 the  closest  desired	Memory
       Clock  in  MHz.	The memory clock takes effect the next time the	GPU is
       initialized. This can be	guaranteed by unloading	and reloading the ker-
       nel module.  Requires root.

   -rmcd, --reset-memory-clocks-deferred
       Resets the memory clock to default value. Driver	unload and  reload  is
       required	for this to take effect. This can be done by unloading and re-
       loading the kernel module.  Requires root.

   -pl,	--power-limit=POWER_LIMIT
       Specifies  maximum  power limit in watts.  Accepts integer and floating
       point numbers.  it takes	an optional argument --scope.	Only  on  sup-
       ported  devices from Kepler family.  Requires administrator privileges.
       Value needs to be between Min  and  Max	Power  Limit  as  reported  by
       nvidia-smi.

   -sc,	--scope=0/GPU, 1/TOTAL_MODULE
       Specifies  the  scope  of  the  power limit. Following are the options:
       0/GPU: This only	changes	 power	limits	for  the  GPU  1/Module:  This
       changes	the  power for the module containing multiple components. E.g.
       GPU and CPU.

   -cc,	--cuda-clocks=MODE
       Overrides or restores default CUDA clocks Available arguments are 0|RE-
       STORE_DEFAULT or	1|OVERRIDE.

   -am,	--accounting-mode=MODE
       Enables or disables GPU Accounting.  With GPU Accounting	one  can  keep
       track  of  usage	 of resources throughout lifespan of a single process.
       Only on supported devices from Kepler family.   Requires	 administrator
       privileges.  Available arguments	are 0|DISABLED or 1|ENABLED.

   -caa, --clear-accounted-apps
       Clears  all processes accounted so far.	Only on	supported devices from
       Kepler family.  Requires	administrator privileges.

	--auto-boost-default=MODE
       Set the default auto boost policy to 0/DISABLED or 1/ENABLED, enforcing
       the change only after the last boost client has exited.	Only  on  cer-
       tain  Tesla  devices  from the Kepler+ family and Maxwell-based GeForce
       devices.	 Requires root.

	--auto-boost-permission=MODE
       Allow non-admin/root control over auto boost mode.  Available arguments
       are 0|UNRESTRICTED, 1|RESTRICTED.  Only on certain Tesla	 devices  from
       the Kepler+ family and Maxwell-based GeForce devices.  Requires root.

   -mig, --multi-instance-gpu=MODE
       Enables or disables Multi Instance GPU mode.  Only supported on devices
       based on	the NVIDIA Ampere architecture.	 Requires root.	 Available ar-
       guments are 0|DISABLED or 1|ENABLED.

   -gtt, --gpu-target-temp=MODE
       Set  GPU	 Target	Temperature for	a GPU in degree	celsius.  Requires ad-
       ministrator privileges.	Target temperature  should  be	within	limits
       supported  by GPU.  These limits	can be retrieved by using query	option
       with SUPPORTED_GPU_TARGET_TEMP.

   [plus optionally]
   -i, --id=ID
       Modify a	single specified GPU.  The specified id	may be the  GPU/Unit's
       0-based	index  in  the natural enumeration returned by the driver, the
       GPU's board serial number, the GPU's UUID, or the GPU's PCI bus ID  (as
       domain:bus:device.function  in  hex).  It is recommended	that users de-
       siring consistency use either UUID or PCI bus ID, since device enumera-
       tion ordering is	not guaranteed to be consistent	 between  reboots  and
       board  serial  number might be shared between multiple GPUs on the same
       board.

   -eom, --error-on-warning
       Return a	non-zero error for warnings.

   UNIT	MODIFICATION OPTIONS
   -t, --toggle-led=STATE
       Set the LED indicator state on the front	and back of the	 unit  to  the
       specified  color.   See the (UNIT ATTRIBUTES) section for a description
       of the LED states.  Allowed colors are 0|GREEN and  1|AMBER.   Requires
       root.

   [plus optionally]
   -i, --id=ID
       Modify a	single specified Unit.	The specified id is the	Unit's 0-based
       index in	the natural enumeration	returned by the	driver.

   SHOW	DTD OPTIONS
   --dtd
       Display Device or Unit DTD.

   [plus optionally]
   -f FILE, --filename=FILE
       Redirect	 query	output	to  the	specified file in place	of the default
       stdout.	The specified file will	be overwritten.

   -u, --unit
       Display Unit DTD	instead	of device DTD.

   topo
       Display topology	information about the system.	Use  "nvidia-smi  topo
       -h"  for	more information.  Linux only.	Shows all GPUs NVML is able to
       detect but CPU and NUMA node affinity information will  only  be	 shown
       for  GPUs with Kepler or	newer architectures.  Note: GPU	enumeration is
       the same	as NVML.

   drain
       Display and modify the GPU drain	states.	 A drain state is one in which
       the GPU is no longer accepting new clients, and is used while preparing
       to power	down the GPU. Use "nvidia-smi drain -h"	for more  information.
       Linux only.

   nvlink
       Display nvlink information.  Use	"nvidia-smi nvlink -h" for more	infor-
       mation.

   clocks
       Query and control clocking behavior. Use	"nvidia-smi clocks --help" for
       more information.

   vgpu
       Display	information on GRID virtual GPUs. Use "nvidia-smi vgpu -h" for
       more information.

   mig
       Provides	controls for MIG management. "nvidia-smi mig -h" for more  in-
       formation.

   boost-slider
       Provides	 controls  for	boost  sliders	management. "nvidia-smi	boost-
       slider -h" for more information.

   power-hint
       Provides	queries	for power hint.	"nvidia-smi power-hint	-h"  for  more
       information.

   conf-compute
       Provides	 control  and  queries	for  confidential compute. "nvidia-smi
       conf-compute -h"	for more information.

   power-smoothing
       Provides	controls and  information  for	power  smoothing.  "nvidia-smi
       power-smoothing -h" for more information.

   power-profiles
       Profiles	  controls   and  information  for  workload  power  profiles.
       "nvidia-smi power-profiles -h" for more information.

   encodersessions
       Display Encoder Sessions	information. "nvidia-smi  encodersessions  -h"
       for more	information.

RETURN VALUE
       Return code reflects whether the	operation succeeded or failed and what
       was the reason of failure.

             Return code 0 - Success

             Return code 2 - A	supplied argument or flag is invalid

             Return code 3 - The requested operation is not available on tar-
	      get device

             Return code 4 - The current user does not	have permission	to ac-
	      cess this	device or perform this operation

             Return code 6 - A	query to find an object	was unsuccessful

             Return  code  8 -	A device's external power cables are not prop-
	      erly attached

             Return code 9 - NVIDIA driver is not loaded

             Return code 10 - NVIDIA Kernel detected an interrupt issue  with
	      a	GPU

             Return code 12 - NVML Shared Library couldn't be found or	loaded

             Return  code  13	- Local	version	of NVML	doesn't	implement this
	      function

             Return code 14 - infoROM is corrupted

             Return code 15 - The GPU has fallen off the bus or has otherwise
	      become inaccessible

             Return code 255 -	Other error or internal	driver error occurred

GPU ATTRIBUTES
       The following list describes all	possible data returned by the  -q  de-
       vice  query  option.   Unless otherwise noted all numerical results are
       base 10 and unitless.

   Timestamp
       The current system timestamp at the time	nvidia-smi was invoked.	  For-
       mat is "Day-of-week Month Day HH:MM:SS Year".

   Driver Version
       The  version  of	 the  installed	NVIDIA display driver.	This is	an al-
       phanumeric string.

   CUDA	Version
       The version of the CUDA toolkit installed on the	system.	  This	is  an
       alphanumeric string.

   Attached GPUs
       The number of NVIDIA GPUs in the	system.

   Product Name
       The  official product name of the GPU.  This is an alphanumeric string.
       For all products.

   Product Brand
       The official brand of the GPU.  This is an  alphanumeric	 string.   For
       all products.

   Product Architecture
       The  official  architecture  name  of the GPU.  This is an alphanumeric
       string.	For all	products.

   Display Mode
       A flag that indicates whether a physical	display	(e.g. monitor) is cur-
       rently connected	to any of the GPU's connectors.	  "Enabled"  indicates
       an attached display.  "Disabled"	indicates otherwise.

   Display Active
       A  flag	that  indicates	 whether a display is initialized on the GPU's
       (e.g. memory is allocated on the	device for display).  Display  can  be
       active  even  when  no monitor is physically attached.  "Enabled" indi-
       cates an	active display.	 "Disabled" indicates otherwise.

   Persistence Mode
       A flag that indicates whether persistence mode is enabled for the  GPU.
       Value  is either	"Enabled" or "Disabled".  When persistence mode	is en-
       abled the NVIDIA	driver remains loaded even  when  no  active  clients,
       such  as	 X11 or	nvidia-smi, exist.  This minimizes the driver load la-
       tency associated	with running dependent apps, such  as  CUDA  programs.
       For all CUDA-capable products.  Linux only.

   Addressing Mode
       A  field	that indicates which addressing	mode is	currently active.  The
       value is	"ATS" or "HMM" or "None".  When	the mode is "ATS", system  al-
       located	memory	like  malloc  is  addressable from the GPU via Address
       Translation Services.  This means there is effectively a	single set  of
       page  tables used by both the CPU and the GPU.  When the	mode is	"HMM",
       system allocated	memory like malloc is addressable  from	 the  GPU  via
       software-based  mirroring  of  the CPU's	page tables, on	the GPU.  When
       the mode	is "None", neither ATS nor HMM is active.  Linux only.

   MIG Mode
       MIG Mode	configuration status

       Current	      MIG mode currently in use	- NA/Enabled/Disabled

       Pending	      Pending configuration of MIG Mode	- Enabled/Disabled

   Accounting Mode
       A flag that indicates whether accounting	mode is	enabled	for  the  GPU.
       Value  is  either  "Enabled" or "Disabled".  When accounting is enabled
       statistics are calculated for each compute process running on the  GPU.
       Statistics  can	be queried during the lifetime or after	termination of
       the process. The	execution time of process is reported as 0  while  the
       process	is in running state and	updated	to actual execution time after
       the process has terminated.  See	--help-query-accounted-apps  for  more
       info.

   Accounting Mode Buffer Size
       Returns	the  size  of the circular buffer that holds list of processes
       that can	be queried for accounting stats.  This is the  maximum	number
       of  processes that accounting information will be stored	for before in-
       formation about oldest processes	will get  overwritten  by  information
       about new processes.

   Driver Model
       On  Windows,  the TCC and WDDM driver models are	supported.  The	driver
       model can be changed with the (-dm) or (-fdm) flags.   The  TCC	driver
       model  is optimized for compute applications.  I.E. kernel launch times
       will be quicker with TCC.  The WDDM driver model	is designed for	graph-
       ics applications	and  is	 not  recommended  for	compute	 applications.
       Linux does not support multiple driver models, and will always have the
       value of	"N/A".

       Current	      The  driver  model  currently  in	 use.  Always "N/A" on
		      Linux.

       Pending	      The driver model that will be used on the	 next  reboot.
		      Always "N/A" on Linux.

   Serial Number
       This number matches the serial number physically	printed	on each	board.
       It is a globally	unique immutable alphanumeric value.

   GPU UUID
       This  value is the globally unique immutable alphanumeric identifier of
       the GPU.	 It does not correspond	to any physical	label on the board.

   Minor Number
       The minor number	for the	device is such that  the  Nvidia  device  node
       file for	each GPU will have the form /dev/nvidia[minor number].	Avail-
       able only on Linux platform.

   VBIOS Version
       The BIOS	of the GPU board.

   MultiGPU Board
       Whether or not this GPU is part of a multiGPU board.

   Board ID
       The  unique  board ID assigned by the driver.  If two or	more GPUs have
       the same	board ID and the above "MultiGPU" field	is true	then the  GPUs
       are on the same board.

   Board Part Number
       The unique part number of the GPU's board

   GPU Part Number
       The unique part number of the GPU

   FRU Part Number
       Unique FRU part number of the GPU

   Platform Info
       Platform	 Information  are  compute tray	platform specific information.
       They are	GPU's positional index and platform identifying	information.

       Chassis Serial Number
		      Serial Number of the chassis containing this GPU.

       Slot Number    The slot number in the chassis containing	this GPU  (in-
		      cludes switches).

       Tray Index     The  tray	 index within the compute slots	in the chassis
		      containing this GPU (does	not include switches).

       Host ID	      Index of the node	within the slot	containing this	GPU.

       Peer Type      Platform indicated NVLink-peer type (e.g.	switch present
		      or not).

       Module Id      ID of this GPU within the	node.

       GPU Fabric GUID
		      Fabric ID	for this GPU.

   Inforom Version
       Version numbers for each	object in the  GPU  board's  inforom  storage.
       The  inforom  is	 a  small, persistent store of configuration and state
       data for	the GPU.  All inforom version fields are numerical.  It	can be
       useful to know these version numbers because some GPU features are only
       available with inforoms of a certain version or higher.

       If any of the fields below return Unknown Error additional Inforom ver-
       ification check is performed and	appropriate warning  message  is  dis-
       played.

       Image Version  Global version of	the infoROM image.  Image version just
		      like  VBIOS version uniquely describes the exact version
		      of the infoROM flashed on	the board in contrast  to  in-
		      foROM  object version which is only an indicator of sup-
		      ported features.

       OEM Object     Version for the OEM configuration	data.

       ECC Object     Version for the ECC recording data.

       Power Management	Object
		      Version for the power management data.

   Inforom BBX Object Flush
       Information about flushing of the blackbox data to the inforom storage.

       Latest Timestamp
		      The timestamp of the latest flush	of the BBX Object dur-
		      ing the current run.

       Latest Duration
		      The duration of the latest flush of the BBX Object  dur-
		      ing the current run.

   GPU Operation Mode
       GOM  allows  one	 to  reduce power usage	and optimize GPU throughput by
       disabling GPU features.

       Each GOM	is designed to meet specific user needs.

       In "All On" mode	everything is enabled and running at full speed.

       The "Compute" mode is designed for running only compute tasks. Graphics
       operations are not allowed.

       The "Low	Double Precision" mode is designed for running graphics	appli-
       cations that don't require high bandwidth double	precision.

       GOM can be changed with the (--gom) flag.

       Supported on GK110 M-class and X-class Tesla products from  the	Kepler
       family.	 Not supported on Quadro and Tesla C-class products.  Low Dou-
       ble Precision and All On	modes are the only modes  available  for  sup-
       ported GeForce Titan products.

       Current	      The GOM currently	in use.

       Pending	      The GOM that will	be used	on the next reboot.

   GPU C2C Mode
       The C2C mode of the GPU.

   GPU Reset Status
       Reset status of the GPU.	This functionality is deprecated.

       Reset Required Requested	functionality has been deprecated

       Drain and Reset Recommended
		      Requested	functionality has been deprecated

   GPU Recovery	Action
       Action  to  take	to clear fault that previously happened. It is not in-
       tended for determining which fault triggered recovery action.
       Possible	values:	None, Reset, Reboot, Drain P2P,	Drain and Reset

       None

       No recovery action needed

       Reset

       Example scenario	- Uncontained HBM/SRAM UCE
       The GPU has encountered a fault that requires a reset to	recover.
       Terminate all GPU processes, reset the GPU using	'nvidia-smi  -r',  and
       the GPU can be used again by starting new GPU processes.

       Reboot

       Example scenario	- UVM fatal error
       The GPU has encountered a fault may have	left the OS in an inconsistent
       state.
       Reboot  the  operating  system  to  restore the OS back to a consistent
       state.
       Node reboot required.
       Application cannot restart without node reboot
       OS warm reboot is sufficient (no	need for AC/DC cycle)

       Drain P2P

       Example scenario	- N/A
       The GPU has encountered a fault that requires all peer-to-peer  traffic
       to be quiesced.
       Terminate  all GPU processes that conduct peer-to-peer traffic and dis-
       able UVM	persistence mode.
       Disable job scheduling (no new jobs), stop all applications when	conve-
       nient, if persistence mode is enabled, disable it
       Once    all     peer-to-peer	traffic	    are	    drained,	 query
       NVML_FI_DEV_GET_GPU_RECOVERY_ACTION again, which	will return one	of the
       other actions.
       If still	DRAIN_P2P, then	GPU reset.

       Drain and Reset

       Example scenario	- Contained HBM	UCE
       Reset Recommended.
       The GPU has encountered a fault that results the	GPU to temporarily op-
       erate  at  a  reduced capacity, such as part of its frame buffer	memory
       being offlined, or some of its MIG partitions down.
       No new work should be scheduled on the  GPU,  but  existing  work  that
       didnt  get  affected  are safe to continue until	they finish or reach a
       good checkpoint.
       Safe to restart application (memory capacity will be reduced due	to dy-
       namic page offlining), but need to eventually reset (to get row remap).
       Asserted	only for UCE row remaps.
       After all existing work have drained, reset the GPU to regain its  full
       capacity.

   GSP Firmware	Version
       Firmware	version	of GSP This is an alphanumeric string.

   PCI
       Basic  PCI  info	 for  the device.  Some	of this	information may	change
       whenever	cards are added/removed/moved in a system.  For	all products.

       Bus	      PCI bus number, in hex

       Device	      PCI device number, in hex

       Domain	      PCI domain number, in hex

       Base Classcode PCI Base classcode, in hex

       Sub Classcode  PCI Sub classcode, in hex

       Device Id      PCI vendor device	id, in hex

       Bus Id	      PCI bus id as "domain:bus:device.function", in hex

       Sub System Id  PCI Sub System id, in hex

   GPU Link Info
       The PCIe	link generation	and bus	width

       Max	      The maximum link generation and width possible with this
		      GPU and system configuration.  For example, if  the  GPU
		      supports	a  higher PCIe generation than the system sup-
		      ports then this reports the system PCIe generation.

       Current	      The current link generation and width.  These may	be re-
		      duced when the GPU is not	in use.

   Bridge Chip
       Information related to Bridge Chip  on  the  device.  The  bridge  chip
       firmware	 is  only  present on certain boards and may display "N/A" for
       some newer multiGPUs boards.

       Type	      The type of bridge chip. Reported	as N/A if doesn't  ex-
		      ist.

       Firmware	Version
		      The firmware version of the bridge chip. Reported	as N/A
		      if doesn't exist.

   Replays Since Reset
       The number of PCIe replays since	reset.

   Replay Number Rollovers
       The number of PCIe replay number	rollovers since	reset. A replay	number
       rollover	 occurs	 after 4 consecutive replays and results in retraining
       the link.

   Tx Throughput
       The GPU-centric transmission throughput across the  PCIe	 bus  in  MB/s
       over the	past 20ms.  Only supported on Maxwell architectures and	newer.

   Rx Throughput
       The GPU-centric receive throughput across the PCIe bus in MB/s over the
       past 20ms.  Only	supported on Maxwell architectures and newer.

   Atomic Caps
       The PCIe	atomic capabilities of outbound/inbound	operations of the GPU.

   Fan Speed
       The  fan	speed value is the percent of the product's maximum noise tol-
       erance fan speed	that the device's fan is currently intended to run at.
       This value may exceed 100% in certain cases.  Note: The reported	 speed
       is the intended fan speed.  If the fan is physically blocked and	unable
       to  spin,  this output will not match the actual	fan speed.  Many parts
       do not report fan speeds	because	they rely on cooling via fans  in  the
       surrounding enclosure.  For all discrete	products with dedicated	fans.

   Performance State
       The current performance state for the GPU.  States range	from P0	(maxi-
       mum performance)	to P12 (minimum	performance).

   Clocks Event	Reasons
       Retrieves  information about factors that are reducing the frequency of
       clocks.

       If all event reasons are	returned as "Not Active" it means that	clocks
       are running as high as possible.

       Idle	      Nothing  is  running on the GPU and the clocks are drop-
		      ping to Idle state.  This	limiter	may be	removed	 in  a
		      later release.

       Application Clocks Setting
		      GPU  clocks  are limited by applications clocks setting.
		      E.g.  can	 be  changed   using   nvidia-smi   --applica-
		      tions-clocks=

       SW Power	Cap   SW  Power	Scaling	algorithm is reducing the clocks below
		      requested	clocks because the GPU is consuming  too  much
		      power.   E.g.  SW	 power	cap  limit can be changed with
		      nvidia-smi --power-limit=

       HW Slowdown    HW Slowdown (reducing the	core clocks by a factor	 of  2
		      or  more)	 is engaged.  HW Thermal Slowdown and HW Power
		      Brake will be displayed on Pascal+.

		      This is an indicator of:
		      *	Temperature being too high (HW Thermal Slowdown)
		      *	External Power Brake Assertion is triggered  (e.g.  by
		      the system power supply) (HW Power Brake Slowdown)
		      *	 Power draw is too high	and Fast Trigger protection is
		      reducing the clocks

       SW Thermal Slowdown
		      SW Thermal capping algorithm is  reducing	 clocks	 below
		      requested	 clocks	because	GPU temperature	is higher than
		      Max Operating Temp

   Sparse Operation Mode
       A flag that indicates whether sparse operation mode is enabled for  the
       GPU.  Value is either "Enabled" or "Disabled". Reported as "N/A"	if not
       supported.

   FB Memory Usage
       On-board	frame buffer memory information.  Reported total memory	can be
       affected	 by ECC	state.	If ECC does affect the total available memory,
       memory is decreased by several percent, due  to	the  requisite	parity
       bits.   The driver may also reserve a small amount of memory for	inter-
       nal use,	even without active work on the	GPU.  On  systems  where  GPUs
       are  NUMA  nodes,  the  accuracy	 of  FB	memory utilization provided by
       nvidia-smi depends on the memory	accounting of  the  operating  system.
       This is because FB memory is managed by the operating system instead of
       the  NVIDIA  GPU	driver.	 Typically, pages allocated from FB memory are
       not released even after the process terminates to enhance  performance.
       In  scenarios  where  the operating system is under memory pressure, it
       may resort to utilizing FB memory.  Such	actions	can result in discrep-
       ancies in the accuracy of memory	reporting.  For	all products.

       Total	      Total size of FB memory.

       Reserved	      Reserved size of FB memory.

       Used	      Used size	of FB memory.

       Free	      Available	size of	FB memory.

   BAR1	Memory Usage
       BAR1 is used to map the FB (device memory) so that it can  be  directly
       accessed	 by  the CPU or	by 3rd party devices (peer-to-peer on the PCIe
       bus).

       Total	      Total size of BAR1 memory.

       Used	      Used size	of BAR1	memory.

       Free	      Available	size of	BAR1 memory.

   Compute Mode
       The compute mode	flag indicates whether individual or multiple  compute
       applications may	run on the GPU.

       "Default" means multiple	contexts are allowed per device.

       "Exclusive  Process"  means only	one context is allowed per device, us-
       able from multiple threads at a time.

       "Prohibited" means no contexts  are  allowed  per  device  (no  compute
       apps).

       "EXCLUSIVE_PROCESS"  was	 added	in CUDA	4.0.  Prior CUDA releases sup-
       ported  only  one  exclusive  mode,  which  is  equivalent  to  "EXCLU-
       SIVE_THREAD" in CUDA 4.0	and beyond.

       For all CUDA-capable products.

   Utilization
       Utilization  rates  report  how	busy each GPU is over time, and	can be
       used to determine how much an application is using the GPUs in the sys-
       tem.  Note: On MIG-enabled GPUs,	querying the utilization  of  encoder,
       decoder,	jpeg, ofa, gpu,	and memory is not currently supported.

       Note: During driver initialization when ECC is enabled one can see high
       GPU  and	 Memory	 Utilization  readings.	  This is caused by ECC	Memory
       Scrubbing mechanism that	is performed during driver initialization.

       GPU	      Percent of time over the past sample period during which
		      one or more kernels was executing	on the GPU.  The  sam-
		      ple  period  may	be between 1 second and	1/6 second de-
		      pending on the product.

       Memory	      Percent of time over the past sample period during which
		      global (device) memory was being read or	written.   The
		      sample period may	be between 1 second and	1/6 second de-
		      pending on the product.

       Encoder	      Percent of time over the past sample period during which
		      the  GPU's  video	 encoder was being used.  The sampling
		      rate is variable and can be obtained  directly  via  the
		      nvmlDeviceGetEncoderUtilization()	API

       Decoder	      Percent of time over the past sample period during which
		      the  GPU's  video	 decoder was being used.  The sampling
		      rate is variable and can be obtained  directly  via  the
		      nvmlDeviceGetDecoderUtilization()	API

       JPEG	      Percent of time over the past sample period during which
		      the  GPU's  JPEG	decoder	 was being used.  The sampling
		      rate is variable and can be obtained  directly  via  the
		      nvmlDeviceGetJpgUtilization() API

       OFA	      Percent of time over the past sample period during which
		      the GPU's	OFA (Optical Flow Accelerator) was being used.
		      The  sampling  rate  is variable and can be obtained di-
		      rectly via the nvmlDeviceGetOfaUtilization() API

   Encoder Stats
       Encoder Stats report the	count of active	encoder	sessions,  along  with
       the  average  Frames Per	Second (FPS) and average latency (in microsec-
       onds) for all these active sessions on this device.

       Active Sessions
		      The total	number of active encoder sessions on this  de-
		      vice.

       Average FPS    The  average  Frame  Per Sencond (FSP) of	all active en-
		      coder sessions on	this device.

       Average Latency
		      The average latency in microseconds of  all  active  en-
		      coder sessions on	this device.

   DRAM	Encryption Mode
       A  flag that indicates whether DRAM Encryption support is enabled.  May
       be either "Enabled" or "Disabled".  Changes to DRAM Encryption mode re-
       quire a reboot.	Requires Inforom DRAM Encryption object.

       Current	      The DRAM Encryption mode that the	GPU is currently oper-
		      ating under.

       Pending	      The DRAM Encryption mode that the	GPU will operate under
		      after the	next reboot.

   ECC Mode
       A flag that indicates whether ECC support is enabled.   May  be	either
       "Enabled"  or  "Disabled".   Changes to ECC mode	require	a reboot.  Re-
       quires Inforom ECC object version 1.0 or	higher.

       Current	      The ECC mode that	the GPU	is currently operating under.

       Pending	      The ECC mode that	the GPU	will operate under  after  the
		      next reboot.

   ECC Errors
       NVIDIA  GPUs  can provide error counts for various types	of ECC errors.
       Some ECC	errors are either single or double bit,	where single  bit  er-
       rors  are  corrected  and double	bit errors are uncorrectable.  Texture
       memory errors may be correctable	via resend or uncorrectable if the re-
       send fails.  These errors are available across two timescales (volatile
       and aggregate).	Single bit ECC errors are automatically	 corrected  by
       the HW and do not result	in data	corruption.  Double bit	errors are de-
       tected  but not corrected.  Please see the ECC documents	on the web for
       information on compute application behavior when	double bit errors  oc-
       cur.  Volatile error counters track the number of errors	detected since
       the  last driver	load.  Aggregate error counts persist indefinitely and
       thus act	as a lifetime counter.

       A note about volatile counts: On	Windows	this is	 once  per  boot.   On
       Linux  this  can	be more	frequent.  On Linux the	driver unloads when no
       active clients exist.  Hence, if	persistence mode is enabled  or	 there
       is  always a driver client active (e.g. X11), then Linux	also sees per-
       boot behavior.  If not, volatile	counts are reset each time  a  compute
       app is run.

       Tesla and Quadro	products pre-volta can display total ECC error counts,
       as  well	 as  a breakdown of errors based on location on	the chip.  The
       locations are described below.  Location-based data for aggregate error
       counts requires Inforom ECC object version 2.0.	All other  ECC	counts
       require ECC object version 1.0.

       Device Memory  Errors detected in global	device memory.

       Register	File  Errors detected in register file memory.

       L1 Cache	      Errors detected in the L1	cache.

       L2 Cache	      Errors detected in the L2	cache.

       Texture Memory Parity errors detected in	texture	memory.

       Total	      Total  errors detected across entire chip. Sum of	Device
		      Memory, Register File, L1	Cache, L2  Cache  and  Texture
		      Memory.

       On Turing the output is such:

       SRAM Correctable
		      Number  of  correctable  errors  detected	 in any	of the
		      SRAMs

       SRAM Uncorrectable
		      Number of	uncorrectable errors detected in  any  of  the
		      SRAMs

       DRAM Correctable
		      Number of	correctable errors detected in the DRAM

       DRAM Uncorrectable
		      Number of	uncorrectable errors detected in the DRAM

       On  Ampere+  The	 categorization	of SRAM	errors has been	expanded upon.
       SRAM errors are now categorized as either parity	or SEC-DED (single er-
       ror correctable/double error detectable)	depending on  which  unit  hit
       the  error.  A  histogram has been added	that categorizes what unit hit
       the SRAM	error. Additionally a flag has been added  that	 indicates  if
       the threshold for the specific SRAM has been exceeded.

       SRAM Uncorrectable Parity
		      Number  of  uncorrectable	 errors	detected in SRAMs that
		      are parity protected

       SRAM Uncorrectable SEC-DED
		      Number of	uncorrectable errors detected  in  SRAMs  that
		      are SEC-DED protected

       Aggregate Uncorrectable SRAM Sources

       SRAM L2	      Errors that occurred in the L2 cache

       SRAM SM	      Errors that occurred in the SM

       SRAM Microcontroller
		      Errors  that  occurred  in  a  microcontroller  (PMU/GSP
		      etc...)

       SRAM PCIE      Errors that occrred in any PCIE related unit

       SRAM Other     Errors occuring in anything else not covered above

   Page	Retirement
       NVIDIA GPUs can retire pages of GPU device memory when they become  un-
       reliable.   This	 can  happen when multiple single bit ECC errors occur
       for the same page, or on	a double bit ECC error.	 When a	 page  is  re-
       tired,  the NVIDIA driver will hide it such that	no driver, or applica-
       tion memory allocations can access it.

       Double Bit ECC The number of GPU	device memory pages that have been re-
       tired due to a double bit ECC error.

       Single Bit ECC The number of GPU	device memory pages that have been re-
       tired due to multiple single bit	ECC errors.

       Pending Checks if any GPU device	memory pages are pending blacklist  on
       the  next  reboot.   Pages that are retired but not yet blacklisted can
       still be	allocated, and may cause further reliability issues.

   Row Remapper
       NVIDIA GPUs can remap rows of GPU device	memory when they become	 unre-
       liable.	 This can happen when a	single uncorrectable ECC error or mul-
       tiple correctable ECC errors occur on the same  row.   When  a  row  is
       remapped,  the  NVIDIA  driver  will remap the faulty row to a reserved
       row.  All future	accesses to the	row will access	the reserved  row  in-
       stead of	the faulty row.	 This feature is available on Ampere+

       Correctable  Error  The	number	of rows	that have been remapped	due to
       correctable ECC errors.

       Uncorrectable Error The number of rows that have	been remapped  due  to
       uncorrectable ECC errors.

       Pending	Indicates  whether  or not a row is pending remapping. The GPU
       must be reset for the remapping to go into effect.

       Remapping Failure Occurred Indicates whether or not a row remapping has
       failed in the past.

       Bank Remap Availability Histogram Each memory bank has a	 fixed	number
       of  reserved  rows  that	 can be	used for row remapping.	 The histogram
       will classify the remap availability of each bank into  Maximum,	 High,
       Partial,	 Low  and  None.  Maximum availability means that all reserved
       rows are	available for remapping	while None means that no reserved rows
       are available.  Correctable row	remappings  don't  count  towards  the
       availability  histogram	since  row  remappings	due to correctable row
       remappings can be evicted by an uncorrectable row remapping.

   Temperature
       Readings	from temperature sensors on the	board.	All  readings  are  in
       degrees C.  Not all products support all	reading	types.	In particular,
       products	in module form factors that rely on case fans or passive cool-
       ing  do	not  usually  provide temperature readings.  See below for re-
       strictions.

       T.Limit:	The T.Limit sensor measures the	current	margin in degree  Cel-
       sius  to	 the  maximum  operating temperature. As such it is not	an ab-
       solute temperature reading rather a relative measurement.

       Not all products	support	T.Limit	sensor readings.

       When supported, nvidia-smi reports the current T.Limit temperature as a
       signed value that counts	down. A	T.Limit	temperature of 0  C  or	 lower
       indicates  that	the GPU	may optimize its clock based on	thermal	condi-
       tions. Further, when the	T.Limit	sensor is supported, available temper-
       ature thresholds	are also reported relative to T.Limit (see below)  in-
       stead of	absolute measurements.

       GPU	      Core  GPU	 temperature.	For  all  discrete and S-class
		      products.

       T.Limit Temp   Current margin in	degrees	Celsius	from the  maximum  GPU
		      operating	temperature.

       Shutdown	Temp  The temperature at which a GPU will shutdown.

       Shutdown	T.Limit	Temp
		      The  T.Limit temperature below which a GPU may shutdown.
		      Since shutdown can only triggered	 by  the  maximum  GPU
		      temperature it is	possible for the current T.Limit to be
		      more negative than this threshold.

       Slowdown	Temp  The  temperature at which	a GPU HW will begin optimizing
		      clocks due to thermal conditions,	in order to cool.

       Slowdown	T.Limit	Temp
		      The T.Limit temperature below which a GPU	HW  may	 opti-
		      mize its clocks for thermal conditions. Since this clock
		      adjustment can only triggered by the maximum GPU temper-
		      ature  it	is possible for	the current T.Limit to be more
		      negative than this threshold.

       Max Operating Temp
		      The temperature at which GPU SW will optimize its	 clock
		      for thermal conditions.

       Max Operating T.Limit Temp
		      The T.Limit temperature below which GPU SW will optimize
		      its clock	for thermal conditions.

   Power Readings
       Power  readings	help  to  shed light on	the current power usage	of the
       GPU, and	the factors that affect	that usage.  When power	management  is
       enabled the GPU limits power draw under load to fit within a predefined
       power  envelope by manipulating the current performance state.  See be-
       low for limits of availability.	Please note that  power	 readings  are
       not applicable for Pascal and higher GPUs with BA sensor	boards.

       Power State    Power  State  is deprecated and has been renamed to Per-
		      formance State in	2.285.	To maintain XML	compatibility,
		      in XML  format  Performance  State  is  listed  in  both
		      places.

       Power Management
		      A	 flag  that  indicates whether power management	is en-
		      abled.  Either "Supported" or "N/A".   Requires  Inforom
		      PWR object version 3.0 or	higher or Kepler device.

       Instantaneous Power Draw
		      The  last	 measured  power draw for the entire board, in
		      watts.  Only available if	power management is supported.

       Average Power Draw
		      The average power	draw for the entire board for the last
		      second, in watts.	 Only available	if power management is
		      supported.

       Power Limit    The software power limit,	in  watts.   Set  by  software
		      such  as nvidia-smi.  Only available if power management
		      is supported.  Requires Inforom PWR object  version  3.0
		      or  higher  or  Kepler  device.  On Kepler devices Power
		      Limit can	be adjusted using -pl,--power-limit= switches.

       Enforced	Power Limit
		      The  power  management  algorithm's  power  ceiling,  in
		      watts.   Total  board  power  draw is manipulated	by the
		      power management algorithm such that it stays under this
		      value.  This limit is the	minimum	of various limits such
		      as the software limit listed above.  Only	 available  if
		      power  management	 is  supported.	 Requires a Kepler de-
		      vice.  Please note that for boards without INA  sensors,
		      it is the	GPU power draw that is being manipulated.

       Default Power Limit
		      The  default power management algorithm's	power ceiling,
		      in watts.	 Power Limit will be set back to Default Power
		      Limit after driver unload.  Only	on  supported  devices
		      from Kepler family.

       Min Power Limit
		      The  minimum  value in watts that	power limit can	be set
		      to.  Only	on supported devices from Kepler family.

       Max Power Limit
		      The maximum value	in watts that power limit can  be  set
		      to.  Only	on supported devices from Kepler family.

   Power Smoothing
       Power  Smoothing	 related  definitions  and currently set values.  This
       feature allows users to tune power parameters to	minimize power fluctu-
       ations in large datacenter environments.

       Enabled	      Value is "Yes" if	the feature is enabled and "No"	if the
		      feature is not enabled.

       Privilege Level
		      The current privilege for	the user. Value	is 0, 1	or  2.
		      Note  that  the higher the privilege level, the more in-
		      formation	the user will have access to.

       Immediate Ramp Down
		      Values are "Enabled" or "Disabled".  Indicates  if  ramp
		      down  hysteresis value will be honored (when enabled) or
		      ignored (when disabled).

       Current TMP    The last read value of the Total Module Power, in	watts.

       Current TMP FLoor
		      The last read value of the Total Module Power floor,  in
		      watts.   This value is calculated	by doing TMP Ceiling *
		      (% TMP FLoor value)

       Max % TMP Floor
		      The highest percentage value for which the  Percent  TMP
		      Floor can	be set.

       Min % TMP Floor
		      The  lowest  percentage  value for which the Percent TMP
		      Floor can	be set.

       HW Lifetime % Remaining
		      As this feature is used, the circuitry which drives  the
		      feature  wears down.  This value gives the percentage of
		      the remaining lifetime of	this hardware.

       Number of Preset	Profiles
		      This value is the	total number of	Preset	Profiles  sup-
		      ported.

   Current Profile
       Values for the currently	acvive power smoothing preset profile.

       % TMP Floor    The  percentage of the TMP Ceiling, which	is used	to set
		      the TMP floor, for the currently active preset  profile.
		      For  example,  if	max TMP	is 1000	W, and the % TMP floor
		      is 50%, then the min TMP value  will  be	500  W.	  This
		      value  is	 in  the  range	 [Min  %  TMP Floor, Max % TMP
		      Floor].

       Ramp Up Rate   The ramp up rate,	measured in mW/s,  for	the  currently
		      active preset profile.

       Ramp Down Rate The  ramp	down rate, measured in mW/s, for the currently
		      active preset profile.

       Ramp Down Hysteresis
		      The ramp down hysteresis value, in ms, for the currently
		      active preset profile.

       Active Preset Profile Number
		      The number of the	active preset profile.

   Admin Overrides
       Admin overrides allow users with	sufficient permissions to preempt  the
       values of the currently active preset profile.  If an admin override is
       set  for	one of the fields, then	this value will	be used	instead	of any
       other configured	value.

       % TMP Floor    The admin	override value for % TMP Floor.	 This value is
		      in the range [Min	% TMP Floor, Max % TMP Floor].

       Ramp Up Rate   The admin	override value for ramp	up rate,  measured  in
		      mW/s.

       Ramp Down Rate The admin	override value for ramp	down rate, measured in
		      mW/s.

       Ramp Down Hysteresis
		      The admin	override value for ramp	down hysteresis	value,
		      in ms.

   Workload Power Profiles
       Pre-tuned  GPU profiles help to provide immediate, optimized configura-
       tions for Datacenter use	cases.	 This  sections	 includes  information
       about the currently requested on	enfornced power	profiles.

       Requested Profiles
		      The list of user requested profiles.

       Enforced	Profiles
		      Since  many of the profiles have conflicting goals, some
		      configurations of	requested profiles  are	 incompatible.
		      This  is	the  list  of the requested profiles which are
		      currently	enforced.

   Clocks
       Current frequency at which parts	of the GPU are running.	 All  readings
       are in MHz.

       Graphics	      Current frequency	of graphics (shader) clock.

       SM	      Current	frequency  of  SM  (Streaming  Multiprocessor)
		      clock.

       Memory	      Current frequency	of memory clock.

       Video	      Current frequency	of video (encoder + decoder) clocks.

   Applications	Clocks
       User specified frequency	at which applications will be running at.  Can
       be changed with [-ac | --applications-clocks] switches.

       Graphics	      User specified frequency of graphics (shader) clock.

       Memory	      User specified frequency of memory clock.

   Default Applications	Clocks
       Default frequency at which applications will be running	at.   Applica-
       tion clocks can be changed with [-ac | --applications-clocks] switches.
       Application clocks can be set to	default	using [-rac | --reset-applica-
       tions-clocks] switches.

       Graphics	      Default  frequency  of  applications  graphics  (shader)
		      clock.

       Memory	      Default frequency	of applications	memory clock.

   Max Clocks
       Maximum frequency at which parts	of the GPU are	design	to  run.   All
       readings	are in MHz.

       On  GPUs	 from  Fermi family current P0 clocks (reported	in Clocks sec-
       tion) can differ	from max clocks	by few MHz.

       Graphics	      Maximum frequency	of graphics (shader) clock.

       SM	      Maximum  frequency  of  SM  (Streaming   Multiprocessor)
		      clock.

       Memory	      Maximum frequency	of memory clock.

       Video	      Maximum frequency	of video (encoder + decoder) clock.

   Clock Policy
       User-specified  settings	 for  automated	 clocking changes such as auto
       boost.

       Auto Boost     Indicates	whether	auto boost mode	is  currently  enabled
		      for  this	GPU (On) or disabled for this GPU (Off). Shows
		      (N/A) if boost is	not supported. Auto boost  allows  dy-
		      namic  GPU clocking based	on power, thermal and utiliza-
		      tion. When auto boost is disabled	the GPU	 will  attempt
		      to  maintain clocks at precisely the Current Application
		      Clocks settings (whenever	a  CUDA	 context  is  active).
		      With  auto  boost	 enabled the GPU will still attempt to
		      maintain this floor, but will opportunistically boost to
		      higher clocks when power,	thermal	and utilization	 head-
		      room  allow.  This  setting persists for the life	of the
		      CUDA context for which it	was requested.	Apps  can  re-
		      quest  a	particular  mode  either via an	NVML call (see
		      NVML SDK)	or by setting the  CUDA	 environment  variable
		      CUDA_AUTO_BOOST.

       Auto Boost Default
		      Indicates	 the  default setting for auto boost mode, ei-
		      ther enabled (On)	or  disabled  (Off).  Shows  (N/A)  if
		      boost  is	 not  supported.  Apps will run	in the default
		      mode if they have	not explicitly requested a  particular
		      mode.  Note: Auto	Boost settings can only	be modified if
		      "Persistence Mode" is enabled, which is NOT by default.

   Supported clocks
       List of possible	memory and graphics clocks combinations	that  the  GPU
       can  operate  on	 (not  taking  into  account HW	brake reduced clocks).
       These are the only clock	combinations that can be passed	to  --applica-
       tions-clocks  flag.   Supported	Clocks are listed only when -q -d SUP-
       PORTED_CLOCKS switches are provided or in XML format.

   Voltage
       Current voltage reported	by the GPU. All	units are in mV.

       Graphics	      Current voltage of the graphics unit. This field is dep-
		      recated and always displays "N/A". Voltage will  be  re-
		      moved in a future	release.

   Fabric
       GPU Fabric information

       State

       Indicates  the  state of	the GPU's handshake with the nvidia-fabricman-
       ager (a.k.a. GPU	fabric probe)
       Possible	values:	Completed, In Progress,	Not Started, Not supported

       Status

       Status of the GPU fabric	probe response from the	nvidia-fabricmanager.
       Possible	values:	NVML_SUCCESS or	one of the failure codes.

       Clique ID

       A clique	is a set of GPUs that  can  communicate	 to  each  other  over
       NVLink.
       The GPUs	belonging to the same clique share the same clique ID.
       Clique ID will only be valid for	NVLink multi-node systems.

       Cluster UUID

       UUID of an NVLink multi-node cluster to which this GPU belongs.
       Cluster UUID will be zero for NVLink single-node	systems.

       Health

       Bandwidth - is the GPU NVLink bandwidth degraded	or not <True/False>
       Route  Recovery	in  progress  -	 is  NVLink route recovery in progress
       <True/False>
       Route  Unhealthy	 -  is	NVLink	route  recovery	 failed	  or   aborted
       <True/False>
       Access Timeout Recovery - is NVLink access timeout recovery in progress
       <True/False>

   Processes
       List  of	 processes  having  Compute or Graphics	Context	on the device.
       Compute processes are reported on all the fully supported products. Re-
       porting for Graphics processes is limited  to  the  supported  products
       starting	with Kepler architecture.

       Each Entry is of	format "<GPU Index> <GPU Instance Index> <Compute In-
       stance Index> <PID> <Type> <Process Name> <GPU Memory Usage>"

       GPU Index      Represents NVML Index of the device.

       GPU Instance Index
		      Represents  GPU Instance Index of	the MIG	device (if en-
		      abled).

       Compute Instance	Index
		      Represents Compute Instance Index	of the MIG device  (if
		      enabled).

       PID	      Represents  Process  ID corresponding to the active Com-
		      pute or Graphics context.

       Type	      Displayed	as "C" for Compute Process, "G"	 for  Graphics
		      Process,	"M"  for MPS ("Multi-Process Service") Compute
		      Process, and "C+G" or "M+C" for the process having  both
		      Compute  and  Graphics  or  MPS Compute and Compute con-
		      texts.

       Process Name   Represents process name  for  the	 Compute  or  Graphics
		      process.

       GPU Memory Usage
		      Amount of	memory used on the device by the context.  Not
		      available	 on  Windows when running in WDDM mode because
		      Windows KMD manages all the memory not NVIDIA driver.

   Device Monitoring
       The "nvidia-smi dmon" command-line is used to monitor one or more  GPUs
       (up  to	16 devices) plugged into the system. This tool allows the user
       to see one line of monitoring data per monitoring cycle.	The output  is
       in concise format and easy to interpret in interactive mode. The	output
       data  per  line	is  limited  by	 the terminal size. It is supported on
       Tesla, GRID, Quadro and limited GeForce products	for  Kepler  or	 newer
       GPUs  under  bare  metal	64 bits	Linux. By default, the monitoring data
       includes	Power Usage, Temperature, SM clocks, Memory  clocks  and  Uti-
       lization	 values	for SM,	Memory,	Encoder, Decoder, JPEG and OFA.	It can
       also be configured to report other metrics such as frame	buffer	memory
       usage,  bar1  memory usage, power/thermal violations and	aggregate sin-
       gle/double bit ecc errors. If any of the	metric is not supported	on the
       device or any other error in fetching the metric	is reported as "-"  in
       the  output  data. The user can also configure monitoring frequency and
       the number of monitoring	iterations for each run. There is also an  op-
       tion  to	 include date and time at each line. All the supported options
       are exclusive and can be	used together in any order.  Note: On  MIG-en-
       abled  GPUs,  querying  the utilization of encoder, decoder, jpeg, ofa,
       gpu, and	memory is not currently	supported.

       Usage:

       1) Default with no arguments

       nvidia-smi dmon

       Monitors	default	metrics	for up to 16 supported devices under natural
       enumeration (starting with GPU index 0) at a frequency of 1 sec.	Runs
       until terminated	with ^C.

       2) Select one or	more devices

       nvidia-smi dmon -i <device1,device2, .. , deviceN>

       Reports default metrics for the devices selected	by comma separated de-
       vice list. The tool picks up to 16 supported devices from the list un-
       der natural enumeration (starting with GPU index	0).

       3) Select metrics to be displayed

       nvidia-smi dmon -s <metric_group>

       <metric_group> can be one or more from the following:

	   p - Power Usage (in Watts) and GPU/Memory Temperature (in C)	if
       supported

	   u - Utilization (SM,	Memory,	Encoder, Decoder, JPEG and OFA Uti-
       lization	in %)

	   c - Proc and	Mem Clocks (in MHz)

	   v - Power Violations	(in %) and Thermal Violations (as a boolean
       flag)

	   m - Frame Buffer, Bar1 and Confidential Compute protected memory
       usage (in MB)

	   e - ECC (Number of aggregated single	bit, double bit	ecc errors)
       and PCIe	Replay errors

	   t - PCIe Rx and Tx Throughput in MB/s (Maxwell and above)

       4) Configure monitoring iterations

       nvidia-smi dmon -c <number of samples>

       Displays	data for specified number of samples and exit.

       5) Configure monitoring frequency

       nvidia-smi dmon -d <time	in secs>

       Collects	and displays data at every specified monitoring	interval until
       terminated with ^C.

       6) Display date

       nvidia-smi dmon -o D

       Prepends	monitoring data	with date in YYYYMMDD format.

       7) Display time

       nvidia-smi dmon -o T

       Prepends	monitoring data	with time in HH:MM:SS format.

       8) Select GPM metrics to	be displayed

       nvidia-smi dmon --gpm-metrics <gpmMetric1,gpmMetric2,...,gpmMetricN>

       <gpmMetricX> Refer to the documentation for nvmlGpmMetricId_t in	the
       NVML header file

       9) Select which level of	GPM metrics to be displayed

       nvidia-smi dmon --gpm-options <gpmMode>

       <gpmMode> can be	one of the following:

	   d  -	Display	Device Level GPM metrics

	   m  -	Display	MIG Level GPM metrics

	   dm -	Display	Device and MIG Level GPM metrics

	   md -	Display	Device and MIG Level GPM metrics, same as 'dm'

       10) Modify output format

       nvidia-smi dmon --format	<formatSpecifier>

       <formatSpecifier> can be	any comma separated combination	of the follow-
       ing:

	   csv - Format	dmon output as CSV

	   nounit - Remove unit	line from dmon output

	   noheader - Remove header line from dmon output

       11) Help	Information

       nvidia-smi dmon -h

       Displays	help information for using the command line.

   Daemon (EXPERIMENTAL)
       The "nvidia-smi daemon" starts a	background process to monitor  one  or
       more  GPUs  plugged  in	to  the	system.	It monitors the	requested GPUs
       every monitoring	cycle and logs the file	in compressed  format  at  the
       user  provided  path  or	the default location at	/var/log/nvstats/. The
       log file	is created with	system's date appended to it and of the	format
       nvstats-YYYYMMDD. The flush operation to	the log	file is	done every al-
       ternate	monitoring  cycle.  Daemon  also  logs	 it's	own   PID   at
       /var/run/nvsmi.pid. By default, the monitoring data to persist includes
       Power Usage, Temperature, SM clocks, Memory clocks and Utilization val-
       ues  for	 SM,  Memory, Encoder, Decoder,	JPEG and OFA. The daemon tools
       can also	be configured to record	other metrics  such  as	 frame	buffer
       memory usage, bar1 memory usage,	power/thermal violations and aggregate
       single/double  bit ecc errors.The default monitoring cycle is set to 10
       secs and	can be configured via command-line. It is supported on	Tesla,
       GRID,  Quadro  and GeForce products for Kepler or newer GPUs under bare
       metal 64	bits Linux. The	daemon requires	root privileges	 to  run,  and
       only  supports running a	single instance	on the system. All of the sup-
       ported options are exclusive and	can be used  together  in  any	order.
       Note:  On  MIG-enabled  GPUs,  querying the utilization of encoder, de-
       coder, jpeg, ofa, gpu, and memory is not	currently supported.  Usage:

       1) Default with no arguments

       nvidia-smi daemon

       Runs in the background to monitor default metrics for up	to 16 sup-
       ported devices under natural enumeration	(starting with GPU index 0) at
       a frequency of 10 sec. The date stamped log file	is created at
       /var/log/nvstats/.

       2) Select one or	more devices

       nvidia-smi daemon -i <device1,device2, .. , deviceN>

       Runs in the background to monitor default metrics for the devices se-
       lected by comma separated device	list. The tool picks up	to 16 sup-
       ported devices from the list under natural enumeration (starting	with
       GPU index 0).

       3) Select metrics to be monitored

       nvidia-smi daemon -s <metric_group>

       <metric_group> can be one or more from the following:

	   p - Power Usage (in Watts) and GPU/Memory Temperature (in C)	if
       supported

	   u - Utilization (SM,	Memory,	Encoder, Decoder, JPEG and OFA Uti-
       lization	in %)

	   c - Proc and	Mem Clocks (in MHz)

	   v - Power Violations	(in %) and Thermal Violations (as a boolean
       flag)

	   m - Frame Buffer, Bar1 and Confidential Compute protected memory
       usage (in MB)

	    e -	ECC (Number of aggregated single bit, double bit ecc errors)
       and PCIe	Replay errors

	   t - PCIe Rx and Tx Throughput in MB/s (Maxwell and above)

       4) Configure monitoring frequency

       nvidia-smi daemon -d <time in secs>

       Collects	data at	every specified	monitoring interval until terminated.

       5) Configure log	directory

       nvidia-smi daemon -p <path of directory>

       The log files are created at the	specified directory.

       6) Configure log	file name

       nvidia-smi daemon -j <string to append log file name>

       The command-line	is used	to append the log file name with the user pro-
       vided string.

       7) Terminate the	daemon

       nvidia-smi daemon -t

       This command-line uses the stored PID (at /var/run/nvsmi.pid) to	termi-
       nate the	daemon.	It makes the best effort to stop the daemon and	offers
       no guarantees for it's termination. In case the daemon is not termi-
       nated, then the user can	manually terminate by sending kill signal to
       the daemon. Performing a	GPU reset operation (via nvidia-smi) requires
       all GPU processes to be exited, including the daemon. Users who have
       the daemon open will see	an error to the	effect that the	GPU is busy.

       8) Help Information

       nvidia-smi daemon -h

       Displays	help information for using the command line.

   Replay Mode (EXPERIMENTAL)
       The "nvidia-smi replay" command-line is used to extract/replay  all  or
       parts  of  log file generated by	the daemon. By default,	the tool tries
       to pull the metrics such	as Power Usage,	Temperature, SM	clocks,	Memory
       clocks and Utilization values for SM, Memory,  Encoder,	Decoder,  JPEG
       and  OFA.  The  replay  tool can	also fetch other metrics such as frame
       buffer memory usage, bar1 memory	usage,	power/thermal  violations  and
       aggregate  single/double	bit ecc	errors.	There is an option to select a
       set of metrics to replay, If any	of the requested metric	is  not	 main-
       tained or logged	as not-supported then it's shown as "-"	in the output.
       The  format of data produced by this mode is such that the user is run-
       ning the	device monitoring utility interactively. The command line  re-
       quires  mandatory option	"-f" to	specify	complete path of the log file-
       name, all the other supported options are exclusive and can be used to-
       gether in any order.  Note: On MIG-enabled GPUs,	querying the  utiliza-
       tion  of	 encoder, decoder, jpeg, ofa, gpu, and memory is not currently
       supported.  Usage:

       1) Specify log file to be replayed

       nvidia-smi replay -f <log file name>

       Fetches monitoring data from the	compressed log file and	allows the
       user to see one line of monitoring data (default	metrics	with time-
       stamp) for each monitoring iteration stored in the log file. A new line
       of monitoring data is replayed every other second irrespective of the
       actual monitoring frequency maintained at the time of collection. It is
       displayed till the end of file or until terminated by ^C.

       2) Filter metrics to be replayed

       nvidia-smi replay -f <path to log file> -s <metric_group>

       <metric_group> can be one or more from the following:

	   p - Power Usage (in Watts) and GPU/Memory Temperature (in C)	if
       supported

	   u - Utilization (SM,	Memory,	Encoder, Decoder, JPEG and OFA Uti-
       lization	in %)

	   c - Proc and	Mem Clocks (in MHz)

	   v - Power Violations	(in %) and Thermal Violations (as a boolean
       flag)

	   m - Frame Buffer, Bar1 and Confidential Compute protected memory
       usage (in MB)

	    e -	ECC (Number of aggregated single bit, double bit ecc errors)
       and PCIe	Replay errors

	   t - PCIe Rx and Tx Throughput in MB/s (Maxwell and above)

       3) Limit	replay to one or more devices

       nvidia-smi replay -f <log file> -i <device1,device2, .. , deviceN>

       Limits reporting	of the metrics to the set of devices selected by comma
       separated device	list. The tool skips any of the	devices	not maintained
       in the log file.

       4) Restrict the time frame between which	data is	reported

       nvidia-smi replay -f <log file> -b <start time in HH:MM:SS format> -e
       <end time in HH:MM:SS format>

       This option allows the data to be limited between the specified time
       range. Specifying time as 0 with	-b or -e option	implies	start or end
       file respectively.

       5) Redirect replay information to a log file

       nvidia-smi replay -f <log file> -r <output file name>

       This option takes log file as an	input and extracts the information re-
       lated to	default	metrics	in the specified output	file.

       6) Help Information

       nvidia-smi replay -h

       Displays	help information for using the command line.

   Process Monitoring
       The "nvidia-smi pmon" command-line  is  used  to	 monitor  compute  and
       graphics	 processes  running  on	 one  or  more GPUs (up	to 16 devices)
       plugged into the	system.	This tool allows the user to see  the  statis-
       tics  for  all the running processes on each device at every monitoring
       cycle. The output is in concise format and easy to interpret in	inter-
       active  mode. The output	data per line is limited by the	terminal size.
       It is supported on Tesla, GRID, Quadro and limited GeForce products for
       Kepler or newer GPUs under bare metal 64	bits Linux.  By	 default,  the
       monitoring data for each	process	includes the pid, command name and av-
       erage  utilization values for SM, Memory, Encoder and Decoder since the
       last monitoring cycle. It can also be configured	to report frame	buffer
       memory usage for	each process. If there is no process running  for  the
       device, then all	the metrics are	reported as "-"	for the	device.	If any
       of  the	metric	is  not	 supported on the device or any	other error in
       fetching	the metric is also reported as "-" in  the  output  data.  The
       user can	also configure monitoring frequency and	the number of monitor-
       ing  iterations	for  each run. There is	also an	option to include date
       and time	at each	line. All the supported	options	are exclusive and  can
       be used together	in any order.  Note: On	MIG-enabled GPUs, querying the
       utilization of encoder, decoder,	jpeg, ofa, gpu,	and memory is not cur-
       rently supported.

       Usage:

       1) Default with no arguments

       nvidia-smi pmon

       Monitors	all the	processes running on each device for up	to 16 sup-
       ported devices under natural enumeration	(starting with GPU index 0) at
       a frequency of 1	sec. Runs until	terminated with	^C.

       2) Select one or	more devices

       nvidia-smi pmon -i <device1,device2, .. , deviceN>

       Reports statistics for all the processes	running	on the devices se-
       lected by comma separated device	list. The tool picks up	to 16 sup-
       ported devices from the list under natural enumeration (starting	with
       GPU index 0).

       3) Select metrics to be displayed

       nvidia-smi pmon -s <metric_group>

       <metric_group> can be one or more from the following:

	   u - Utilization (SM,	Memory,	Encoder, Decoder, JPEG,	and OFA	Uti-
       lization	for the	process	in %). Reports average utilization since last
       monitoring cycle.

	   m - Frame Buffer and	Confidential Compute protected memory usage
       (in MB).	Reports	instantaneous value for	memory usage.

       4) Configure monitoring iterations

       nvidia-smi pmon -c <number of samples>

       Displays	data for specified number of samples and exit.

       5) Configure monitoring frequency

       nvidia-smi pmon -d <time	in secs>

       Collects	and displays data at every specified monitoring	interval until
       terminated with ^C. The monitoring frequency must be between 1 to 10
       secs.

       6) Display date

       nvidia-smi pmon -o D

       Prepends	monitoring data	with date in YYYYMMDD format.

       7) Display time

       nvidia-smi pmon -o T

       Prepends	monitoring data	with time in HH:MM:SS format.

       8) Help Information

       nvidia-smi pmon -h

       Displays	help information for using the command line.

   Topology
       List  topology information about	the system's GPUs, how they connect to
       each other, their CPU and memory	affinities as well as  qualified  NICs
       capable of RDMA.

       Note: On	some systems, a	NIC is used as a PCI bridge for	the NVLINK
       switches	and is not useful from a networking or RDMA point of view. The
       nvidia-smi topo command will filter the NIC's ports/PCIe	sub-functions
       out of the topology matrix by examining the NIC's sysfs entries.	On
       some kernel versions, nvidia-smi	requires root privileges to read these
       sysfs entries.

       Usage:

       1) Topology connections and affinities matrix between the GPUs and NICs
       in the system

       nvidia-smi topo -m

       Displays	a matrix of connections	between	all GPUs and NICs in the sys-
       tem along with CPU/memory affinities for	the GPUs with the following
       legend:

       Legend:

			X    = Self
			SYS   =	 Connection traversing PCIe as well as the SMP
		      interconnect between NUMA	nodes (e.g., QPI/UPI)
			NODE = Connection traversing PCIe as well as  the  in-
		      terconnect between PCIe Host Bridges within a NUMA node
			PHB   =	 Connection  traversing	PCIe as	well as	a PCIe
		      Host Bridge (typically the CPU)
			PXB  = Connection traversing  multiple	PCIe  switches
		      (without traversing the PCIe Host	Bridge)
			PIX  = Connection traversing a single PCIe switch
			NV#  = Connection traversing a bonded set of # NVLinks

       Note: This command may also display bonded NICs which may not be	RDMA
       capable.

       2) PCI-only topology connections	and affinities matrix between GPUs and
       NICs in the system

       nvidia-smi topo -mp

       Displays	a matrix of PCI-only connections between all GPUs and NICs in
       the system along	with CPU/memory	affinities for the GPUs	with the same
       legend as the "nvidia-smi topo -m" command. This	command	excludes
       NVLINK connections and shows PCI	connections between GPUs.

       3) Display GPUs with affinity to	a given	CPU

       nvidia-smi topo -c <CPU number>

       Shows all the GPUs with an affinity to the specified CPU	number.

       4) Display the nearest GPUs for a given traversal path

       nvidia-smi topo -n <traversal path> -i <deviceID>

       Shows all the GPUs connected with the given GPU using the specified
       traversal path. The traversal path values are:
					 0  =  A  single PCIe switch on	a dual
		      GPU board
					 1 = A single PCIe switch
					 2 = Multiple PCIe switches
					 3 = A PCIe host bridge
					 4 = An	on-CPU interconnect  link  be-
		      tween PCIe host bridges
					 5  = An SMP interconnect link between
		      NUMA nodes

       5) Direct PCIe path traversal for a pair	of GPUs

       nvidia-smi topo -p -i <deviceID1>,<deviceID2>

       Shows the most direct PCIe path traversal for a given pair of GPUs.

       6) P2P Status Matrix

       nvidia-smi topo -p2p <capability>

       Shows the P2P status between all	GPUs, given a capability. Capability
       values are:
					 r - p2p read capability
					 w - p2p write capability
					 n - p2p nvlink	capability
					 a - p2p atomics capability
					 p - p2p pcie capability

       7) NUMA ID of the nearest CPU for a given GPU

       nvidia-smi topo -C -i <deviceID>

       Shows the NUMA ID of the	nearest	CPU for	a GPU represented by the de-
       vice ID.

       8) NUMA ID of the nearest memory	node for a given GPU

       nvidia-smi topo -M -i <deviceID>

       Shows the NUMA ID of the	nearest	memory for a GPU represented by	the
       device ID.

       9) NUMA ID of the GPU

       nvidia-smi topo -gnid -i	<deviceID>

       Shows the NUMA ID of the	GPU represented	by the device ID, if applica-
       ble. Displays N/A otherwise.

       10) Topology connections	between	the GPUs and NVME devices in the sys-
       tem

       nvidia-smi topo -nvme

       Displays	a matrix of PCI	connections between all	GPUs and NVME devices
       in the system with the following	legend:

		      Legend:

			X    = Self
			SYS  = Connection traversing PCIe as well as  the  SMP
		      interconnect between NUMA	nodes (e.g., QPI/UPI)
			NODE  =	 Connection traversing PCIe as well as the in-
		      terconnect between PCIe Host Bridges within a NUMA node
			PHB  = Connection traversing PCIe as well  as  a  PCIe
		      Host Bridge (typically the CPU)
			PXB   =	 Connection  traversing	 multiple PCIe bridges
		      (without traversing the PCIe Host	Bridge)
			PIX  = Connection traversing at	 most  a  single  PCIe
		      bridge

   Nvlink
       The  "nvidia-smi	 nvlink"  command-line	is  used  to  manage the GPU's
       Nvlinks.	It provides options to set and query Nvlink information.

       Usage:

       1) Display help menu

       nvidia-smi nvlink -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi nvlink -i <GPU IDs>

       nvidia-smi nvlink --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option	ap-
       plies to	all of the supported GPUs.

       3) Select a specific NvLink

       nvidia-smi nvlink -l <GPU Nvlink	Id>

       nvidia-smi nvlink --list	<GPU Nvlink Id>

       Selects a specific Nvlink of the	GPU for	the given command, if valid.
       If not used, the	given command-line option allies to all	of the GPU's
       Nvlinks.

       4) Query	Nvlink Status

       nvidia-smi nvlink -s

       nvidia-smi nvlink --status

       Get the status of the GPU's Nvlinks.

       If Active, the Bandwidth	of the links will be displayed.

       If the link is present but Not Active, it will show the link as Inac-
       tive.

       If the link is in Sleep state, it will show as Sleep.

       5) Query	Nvlink capabilities

       nvidia-smi nvlink -c

       nvidia-smi nvlink --capabilities

       Get the GPU's Nvlink capabilities.

       6) Query	the Nvlink's remote node PCI bus

       nvidia-smi nvlink -p

       nvidia-smi nvlink -pcibusid

       Get the Nvlink's	remote node PCI	bus ID.

       7) Query	the Nvlink's remote link info

       nvidia-smi nvlink -R

       nvidia-smi nvlink -remotelinkinfo

       Get the remote device PCI bus ID	and NvLink ID for a link.

       8) Set Nvlink Counter Control is	DEPRECATED

       9) Get Nvlink Counter Control is	DEPRECATED

       10) Get Nvlink Counters is DEPRECATED, -gt/--getthroughput should be
       used instead

       11) Reset Nvlink	counters is DEPRECATED

       12) Query Nvlink	Error Counters

       nvidia-smi nvlink -e

       nvidia-smi nvlink --errorcounters

       Get the Nvlink error counters.

       For NVLink 4

       Replay Errors - count the number	of replay 'events' that	occurred

       Recovery	Errors - count the number of link recovery events

       CRC Errors - count the number of	CRC errors in received packets

       For NVLink 5

       Tx packets - Total Tx packets on	the link

       Tx bytes	- Total	Tx bytes on the	link

       Rx packets - Total Rx packets on	the link

       Rx bytes	- Total	Rx bytes on the	link

       Malformed packet	Errors - Number	of packets Rx on a link	where packets
       are malformed

       Buffer overrun Errors - Number of packets that were discarded on	Rx due
       to buffer overrun

       Rx Errors - Total number	of packets with	errors Rx on a link

       Rx remote Errors	- Total	number of packets Rx - stomp/EBP marker

       Rx General Errors - Total number	of packets Rx with header mismatch

       Local link integrity Errors - Total number of times that	the count of
       local errors exceeded a threshold

       Tx discards - Total number of tx	error packets that were	discarded

       Link recovery successful	events - Number	of times link went from	Up to
       recovery, succeeded and link came back up

       Link recovery failed events - Number of times link went from Up to re-
       covery, failed and link was declared down

       Total link recovery events - Number of times link went from Up to re-
       covery, irrespective of the result

       Effective Errors	- Sum of the number of errors in each Nvlink packet

       Effective BER - BER for symbol errors

       Symbol Errors - Number of errors	in rx symbols

       Symbol BER - BER	for symbol errors

       FEC Errors - [0-15] - count of symbol errors that are corrected

       13) Query Nvlink	CRC error counters

       nvidia-smi nvlink -ec

       nvidia-smi nvlink --crcerrorcounters

       Get the Nvlink per-lane CRC error counters.

       Get the Nvlink per-lane CRC/ECC error counters.

       CRC - NVLink 4 and before - Total Rx CRC	errors on an NVLink Lane

       ECC - NVLink 4 -	Total Rx ECC errors on an NVLink Lane

       Deprecated NVLink 5 onwards

       14) Reset Nvlink	Error Counters

       nvidia-smi nvlink -re

       nvidia-smi nvlink --reseterrorcounters

       Reset all Nvlink	error counters to zero.

       NvLink 5	NOT SUPPORTED

       15) Query Nvlink	throughput counters

       nvidia-smi nvlink -gt <Data Type>

       nvidia-smi nvlink --getthroughput <Data Type>

       <Data Type> can be one of the following:

	d - Tx and Rx data payload in KiB.

	r - Tx and Rx raw payload and protocol overhead	in KiB.

       16) Set Nvlink Low Power	thresholds

       nvidia-smi nvlink -sLowPwrThres <Threshold>

       nvidia-smi nvlink --setLowPowerThreshold	<Threshold>

       Set the Nvlink Low Power	Threshold, in units of 100us, before the links
       go into Low Power Mode.

       17) Get Nvlink Low Power	Info

       nvidia-smi nvlink -gLowPwrInfo

       nvidia-smi nvlink --getLowPowerInfo

       Query the Nvlink's Low Power Info.

       18) Set Nvlink Bandwidth	mode

       nvidia-smi nvlink -sBwMode <Bandwidth Mode>

       nvidia-smi nvlink --setBandwidthMode <Bandwidth Mode>

       Set the Nvlink Bandwidth	mode for all GPUs. This	is DEPRECATED for
       Blackwell+.

       The options are:

	FULL - All links are at	max Bandwidth.

	OFF - Bandwidth	is not used. P2P is via	PCIe bus.

	MIN - Bandwidth	is at minimum speed.

	HALF - Bandwidth is at around half of FULL speed.

	3QUARTER - Bandwidth is	at around 75% of FULL speed.

       19) Get Nvlink Bandwidth	mode

       nvidia-smi nvlink -gBwMode

       nvidia-smi nvlink --getBandwidthMode

       Get the Nvlink Bandwidth	mode for all GPUs. THis	is DEPRECATED for
       Blackwell+.

       20) Query for Nvlink Bridge

       nvidia-smi nvlink -cBridge

       nvidia-smi nvlink --checkBridge

       Query for Nvlink	Bridge presence.

       21) Set the GPU's Nvlink	Width

       nvidia-smi nvlink -sLWidth <Link	Width>

       nvidia-smi nvlink --setLinkWidth	<Link Width>

       Set the GPU's Nvlink width, which will be keep those number of links
       Active, and the rest to sleep.

       <Link Width> can	be one of the following:

	values - List possible Link Widths to be set.

	The numerical value from the above option.

       22) Get the GPU's Nvlink	Width

       nvidia-smi nvlink -gLWidth

       nvidia-smi nvlink --getLinkWidth

       Query the GPU's Nvlink Width.

   C2C
       The "nvidia-smi c2c" command-line is  used  to  manage  the  GPU's  C2C
       Links. It provides options to query C2C Link information.

       Usage:

       1) Display help menu

       nvidia-smi c2c -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi c2c -i <GPU IDs>

       nvidia-smi c2c --id <GPU	IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option	ap-
       plies to	all of the supported GPUs.

       3) Select a specific C2C	Link

       nvidia-smi c2c -l <GPU C2C Id>

       nvidia-smi c2c --list <GPU C2C Id>

       Selects a specific C2C Link of the GPU for the given command, if	valid.
       If not used, the	given command-line option allies to all	of the GPU's
       C2C Links.

       4) Query	C2C Link Status

       nvidia-smi c2c -s

       nvidia-smi c2c --status

       Get the status of the GPU's C2C Links. If active, the Bandwidth of the
       links will be displayed.

   vGPU	Management
       The  "nvidia-smi	 vgpu" command reports on GRID vGPUs executing on sup-
       ported GPUs and hypervisors (refer to driver  release  notes  for  sup-
       ported  platforms).  Summary reporting provides basic information about
       vGPUs currently executing on the	system.	Additional options provide de-
       tailed reporting	of vGPU	properties, per-vGPU reporting of SM,  Memory,
       Encoder,	 Decoder,  Jpeg, and OFA utilization, and per-GPU reporting of
       supported and creatable vGPUs. Periodic reports	can  be	 automatically
       generated  by  specifying a configurable	loop frequency to any command.
       Note: On	MIG-enabled GPUs, querying the	utilization  of	 encoder,  de-
       coder, jpeg, ofa, gpu, and memory is not	currently supported.

       Usage:

       1) Help Information

       nvidia-smi vgpu -h

       Displays	help information for using the command line.

       2) Default with no arguments

       nvidia-smi vgpu

       Reports summary of all the vGPUs	currently active on each device.

       3) Display detailed info	on currently active vGPUs

       nvidia-smi vgpu -q

       Collects	and displays information on currently active vGPUs on each de-
       vice, including driver version, utilization, and	other information.

       4) Select one or	more devices

       nvidia-smi vgpu -i <device1,device2, .. , deviceN>

       Reports summary for all the vGPUs currently active on the devices se-
       lected by comma-separated device	list.

       5) Display supported vGPUs

       nvidia-smi vgpu -s

       Displays	vGPU types supported on	each device. Use the -v	/ --verbose
       option to show detailed info on each vGPU type.

       6) Display creatable vGPUs

       nvidia-smi vgpu -c

       Displays	vGPU types creatable on	each device. This varies dynamically,
       depending on the	vGPUs already active on	the device. Use	the -v /
       --verbose option	to show	detailed info on each vGPU type.

       7) Report utilization for currently active vGPUs.

       nvidia-smi vgpu -u

       Reports average utilization (SM,	Memory,	Encoder, Decoder, Jpeg,	and
       OFA) for	each active vGPU since last monitoring cycle. The default cy-
       cle time	is 1 second, and the command runs until	terminated with	^C. If
       a device	has no active vGPUs, its metrics are reported as "-".

       8) Configure loop frequency

       nvidia-smi vgpu [-s -c -q -u] -l	<time in secs>

       Collects	and displays data at a specified loop interval until termi-
       nated with ^C. The loop frequency must be between 1 and 10 secs.	When
       no time is specified, the loop frequency	defaults to 5 secs.

       9) Display GPU engine usage

       nvidia-smi vgpu -p

       Display GPU engine usage	of currently active processes running in the
       vGPU VMs.

       10) Display migration capabitlities.

       nvidia-smi vgpu -m

       Display pGPU's migration/suspend/resume capability.

       11) Display the vGPU Software scheduler state.

       nvidia-smi vgpu -ss

       Display the information about vGPU Software scheduler state.

       12) Display the vGPU Software scheduler capabilities.

       nvidia-smi vgpu -sc

       Display the list	of supported vGPU scheduler policies returned along
       with the	other capabilities values, if the engine is Graphics type. For
       other engine types, it is BEST EFFORT policy and	other capabilities
       will be zero. If	ARR is supported and enabled, scheduling frequency and
       averaging factor	are applicable else timeSlice is applicable.

       13) Display the vGPU Software scheduler logs.

       nvidia-smi vgpu -sl

       Display the vGPU	Software scheduler runlist logs.

       nvidia-smi --query-vgpu-scheduler-logs=[input parameters]

       Display the vGPU	Software scheduler runlist logs	in CSV format.

       14) Set the vGPU	Software scheduler state.

       nvidia-smi vgpu --set-vgpu-scheduler-state [options]

       Set the vGPU Software scheduler policy and states.

       15) Display Nvidia Encoder session info.

       nvidia-smi vgpu -es

       Display the information about encoder sessions for currently running
       vGPUs.

       16) Display accounting statistics.

       nvidia-smi vgpu --query-accounted-apps=[input parameters]

       Display accounting stats	for compute/graphics processes.

       To find list of properties which	can be queried,	run - 'nvidia-smi
       --help-query-accounted-apps'.

       17) Display Nvidia Frame	Buffer Capture session info.

       nvidia-smi vgpu -fs

       Display the information about FBC sessions for currently	running	vGPUs.

       Note : Horizontal resolution, vertical resolution, average FPS and av-
       erage latency data for a	FBC session may	be zero	if there are no	new
       frames captured since the session started.

       18) Set vGPU heterogeneous mode.

       nvidia-smi vgpu -shm

       Set vGPU	heterogeneous mode of the device for timesliced	vGPUs with
       different framebuffer sizes.

   MIG Management
       The  privileged "nvidia-smi mig"	command-line is	used to	manage MIG-en-
       abled GPUs. It provides options to create, list	and  destroy  GPU  in-
       stances and compute instances.

       Usage:

       1) Display help menu

       nvidia-smi mig -h

       Displays	help menu for using the	command-line.

       2) Select one or	more GPUs

       nvidia-smi mig -i <GPU IDs>

       nvidia-smi mig --id <GPU	IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option	ap-
       plies to	all of the supported GPUs.

       3) Select one or	more GPU instances

       nvidia-smi mig -gi <GPU instance	IDs>

       nvidia-smi mig --gpu-instance-id	<GPU instance IDs>

       Selects one or more GPU instances using the given comma-separated GPU
       instance	IDs. If	not used, the given command-line option	applies	to all
       of the GPU instances.

       4) Select one or	more compute instances

       nvidia-smi mig -ci <compute instance IDs>

       nvidia-smi mig --compute-instance-id <compute instance IDs>

       Selects one or more compute instances using the given comma-separated
       compute instance	IDs. If	not used, the given command-line option	ap-
       plies to	all of the compute instances.

       5) List GPU instance profiles

       nvidia-smi mig -lgip -i <GPU IDs>

       nvidia-smi mig --list-gpu-instance-profiles --id	<GPU IDs>

       Lists GPU instance profiles, their availability and IDs.	Profiles de-
       scribe the supported types of GPU instances, including all of the GPU
       resources they exclusively control.

       6) List GPU instance possible placements

       nvidia-smi mig -lgipp -i	<GPU IDs>

       nvidia-smi mig --list-gpu-instance-possible-placements --id <GPU	IDs>

       Lists GPU instance possible placements. Possible	placements describe
       the locations of	the supported types of GPU instances within the	GPU.

       7) Create GPU instance

       nvidia-smi mig -cgi <GPU	instance specifiers> -i	<GPU IDs>

       nvidia-smi mig --create-gpu-instance <GPU instance specifiers> --id
       <GPU IDs>

       Creates GPU instances for the given GPU instance	specifiers. A GPU in-
       stance specifier	comprises a GPU	instance profile name or ID and	an op-
       tional placement	specifier consisting of	a colon	and a placement	start
       index. The command fails	if the GPU resources required to allocate the
       requested GPU instances are not available, or if	the placement index is
       not valid for the given profile.

       8) Create a GPU instance	along with the default compute instance

       nvidia-smi mig -cgi <GPU	instance profile IDs or	names> -i <GPU IDs> -C

       nvidia-smi mig --create-gpu-instance <GPU instance profile IDs or
       names> --id <GPU	IDs> --default-compute-instance

       9) List GPU instances

       nvidia-smi mig -lgi -i <GPU IDs>

       nvidia-smi mig --list-gpu-instances --id	<GPU IDs>

       Lists GPU instances and their IDs.

       10) Destroy GPU instance

       nvidia-smi mig -dgi -gi <GPU instance IDs> -i <GPU IDs>

       nvidia-smi mig --destroy-gpu-instances --gpu-instance-id	<GPU instance
       IDs> --id <GPU IDs>

       Destroys	GPU instances. The command fails if the	requested GPU instance
       is in use by an application.

       11) List	compute	instance profiles

       nvidia-smi mig -lcip -gi	<GPU instance IDs> -i <GPU IDs>

       nvidia-smi mig --list-compute-instance-profiles --gpu-instance-id <GPU
       instance	IDs> --id <GPU IDs>

       Lists compute instance profiles,	their availability and IDs. Profiles
       describe	the supported types of compute instances, including all	of the
       GPU resources they share	or exclusively control.

       12) List	compute	instance possible placements

       nvidia-smi mig -lcipp -gi <GPU instance IDs> -i <GPU IDs>

       nvidia-smi mig --list-compute-instance-possible-placements --gpu-in-
       stance-id <GPU instance IDs> --id <GPU IDs>

       Lists compute instance possible placements. Possible placements de-
       scribe the locations of the supported types of compute instances	within
       the GPU instance.

       13) Create compute instance

       nvidia-smi mig -cci <compute instance profile IDs or names> -gi <GPU
       instance	IDs> -i	<GPU IDs>

       nvidia-smi mig --create-compute-instance	<compute instance profile IDs
       or names> --gpu-instance-id <GPU	instance IDs> --id <GPU	IDs>

       Creates compute instances for the given compute instance	spcifiers. A
       compute instance	specifier comprises a compute instance profile name or
       ID and an optional placement specifier consisting of a colon and	a
       placement start index. The command fails	if the GPU resources required
       to allocate the requested compute instances are not available, or if
       the placement index is not valid	for the	given profile.

       14) List	compute	instances

       nvidia-smi mig -lci -gi <GPU instance IDs> -i <GPU IDs>

       nvidia-smi mig --list-compute-instances --gpu-instance-id <GPU instance
       IDs> --id <GPU IDs>

       Lists compute instances and their IDs.

       15) Destroy compute instance

       nvidia-smi mig -dci -ci <compute	instance IDs> -gi <GPU instance	IDs>
       -i <GPU IDs>

       nvidia-smi mig --destroy-compute-instance --compute-instance-id <com-
       pute instance IDs> --gpu-instance-id <GPU instance IDs> --id <GPU IDs>

       Destroys	compute	instances. The command fails if	the requested compute
       instance	is in use by an	application.

   Boost Slider
       The privileged "nvidia-smi boost-slider"	command-line is	used to	manage
       boost  slider  on  GPUs.	 It provides options to	list and control boost
       sliders.

       Usage:

       1) Display help menu

       nvidia-smi boost-slider -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi boost-slider -i <GPU IDs>

       nvidia-smi boost-slider --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option	ap-
       plies to	all of the supported GPUs.

       3) List boost sliders

       nvidia-smi boost-slider -l

       nvidia-smi boost-slider --list

       List all	boost sliders for the selected devices.

       4) Set video boost slider

       nvidia-smi boost-slider --vboost	<value>

       Set the video boost slider for the selected devices.

   Power Hint
       The privileged "nvidia-smi power-hint" command-line is  used  to	 query
       power hint on GPUs.

       Usage:

       1) Display help menu

       nvidia-smi boost-slider -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi boost-slider -i <GPU IDs>

       nvidia-smi boost-slider --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option	ap-
       plies to	all of the supported GPUs.

       3) List power hint info

       nvidia-smi boost-slider -l

       nvidia-smi boost-slider --list-info

       List all	boost sliders for the selected devices.

       4) Query	power hint

       nvidia-smi boost-slider -gc <value> -t <value> -p <profile ID>

       nvidia-smi boost-slider --graphics-clock	<value>	--temperature <value>
       --profile <profile ID>

       Query power hint	with graphics clock, temperature and profile id.

       5) Query	power hint

       nvidia-smi boost-slider -gc <value> -mc <value> -t <value> -p <profile
       ID>

       nvidia-smi boost-slider --graphics-clock	<value>	--memory-clock <value>
       --temperature <value> --profile <profile	ID>

       Query power hint	with graphics clock, memory clock, temperature and
       profile id.

   Confidential	Compute
       The  "nvidia-smi	conf-compute" command-line is used to manage confiden-
       tial compute. It	provides options to set	and  query  confidential  com-
       pute.

       Usage:

       1) Display help menu

       nvidia-smi conf-compute -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi conf-compute -i <GPU IDs>

       nvidia-smi conf-compute --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option	ap-
       plies to	all of the supported GPUs.

       3) Query	confidential compute CPU capability

       nvidia-smi conf-compute -gc

       nvidia-smi conf-compute --get-cpu-caps

       Get confidential	compute	CPU capability.

       4) Query	confidential compute GPUs capability

       nvidia-smi conf-compute -gg

       nvidia-smi conf-compute --get-gpus-caps

       Get confidential	compute	GPUs capability.

       5) Query	confidential compute devtools mode

       nvidia-smi conf-compute -d

       nvidia-smi conf-compute --get-devtools-mode

       Get confidential	compute	DevTools mode.

       6) Query	confidential compute environment

       nvidia-smi conf-compute -e

       nvidia-smi conf-compute --get-environment

       Get confidential	compute	environment.

       7) Query	confidential compute feature status

       nvidia-smi conf-compute -f

       nvidia-smi conf-compute --get-cc-feature

       Get confidential	compute	CC feature status.

       8) Query	confidential compute GPU protected/unprotected memory sizes

       nvidia-smi conf-compute -gm

       nvidia-smi conf-compute --get-mem-size-info

       Get confidential	compute	GPU protected/unprotected memory sizes.

       9) Set confidential compute GPU unprotected memory size

       nvidia-smi conf-compute -sm <value>

       nvidia-smi conf-compute --set-unprotected-mem-size <value>

       Set confidential	compute	GPU unprotected	memory size in KiB. Requires
       root.

       10) Set confidential compute GPUs ready state

       nvidia-smi conf-compute -srs <value>

       nvidia-smi conf-compute --set-gpus-ready-state <value>

       Set confidential	compute	GPUs ready state. The value must be 1 to set
       the ready state and 0 to	unset it. Requires root.

       11) Query confidential compute GPUs ready state

       nvidia-smi conf-compute -grs

       nvidia-smi conf-compute --get-gpus-ready-state

       Get confidential	compute	GPUs ready state.

       12) Set Confidential Compute Key	Rotation Max Attacker Advantage

       nvidia-smi conf-compute -skr <value>

       nvidia-smi conf-compute --set-key-rotation-max-attacker-advantage

       Set Confidential	Compute	Key Rotation Max Attacker Advantage

       13) Display Confidential	Compute	Key Rotation Threshold Info

       nvidia-smi conf-compute -gkr

       nvidia-smi conf-compute --get-key-rotation-threshold-info

       Display Confidential Compute Key	Rotation Threshold Info

       14) Display Confidential	Compute	Multi-GPU Mode

       nvidia-smi conf-compute -mgm

       nvidia-smi conf-compute --get-multigpu-mode

       Display Confidential Compute Multi-GPU Mode

       15) Display Confidential	Compute	Detailed Info

       nvidia-smi conf-compute -q

       nvidia-smi conf-compute --query-conf-compute

       Display Confidential Compute Detailed Info

   GPU Performance Monitoring(GPM) Stream State
       The  "nvidia-smi	 gpm"  command-line  is	used to	manage GPU performance
       monitoring unit.	It provides options to query and set the stream	state.

       Usage:

       1) Display help menu

       nvidia-smi gpm -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi gpm -i <GPU IDs>

       nvidia-smi gpm --id <GPU	IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option	ap-
       plies to	all of the supported GPUs.

       3) Query	GPU performance	monitoring stream state

       nvidia-smi gpm -g

       nvidia-smi gpm --get-stream-state

       Get gpm stream state for	the selected devices.

       4) Set GPU performance monitoring stream	state

       nvidia-smi gpm -s <value>

       nvidia-smi gpm --set-stream-state <value>

       Set gpm stream state for	the selected devices.

   GPU PCI section
       The "nvidia-smi pci" command-line is used to manage GPU	PCI  counters.
       It provides options to query and	clear PCI counters.

       Usage:

       1) Display help menu

       nvidia-smi pci -h

       Displays	help menu for using the	command-line.

       2) Query	PCI error counters

       nvidia-smi pci -i <GPU index> -gErrCnt

       Query PCI error counters	of a GPU

       3) Clear	PCI error counters

       nvidia-smi pci -i <GPU index> -cErrCnt

       Clear PCI error counters	of a GPU

       4) Query	PCI counters

       nvidia-smi pci -i <GPU index> -gCnt

       Query PCI RX and	TX counters of a GPU

   Power Smoothing
       The  "nvidia-smi	 power-smoothing" command-line is used to manage Power
       Smoothing related data on the GPU. It provides  options	to  set	 Power
       Smoothing related data and query	the preset profile definitions.

       Usage:

       1) Display help menu

       nvidia-smi power-smoothing -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi power-smoothing -i <GPU IDs>

       nvidia-smi power-smoothing --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option	ap-
       plies to	all of the supported GPUs.

       2) List one Preset Profile ID

       nvidia-smi power-smoothing -p <Profile ID>

       nvidia-smi power-smoothing --profile <Profile ID>

       Selects a Preset	Profile	ID for which to	update a value.	This is	re-
       quired when updating a Preset Profile parameter and prohibited in all
       other cases.

       2) Set Active Preset Profile ID

       nvidia-smi power-smoothing -spp <Profile	ID>

       nvidia-smi power-smoothing --set-preset-profile <Profile	ID>

       Activate	the deisred Preset Profile ID.

       2) Update percentage Total Module Power (TMP) floor

       nvidia-smi power-smoothing -ptf <Percentage> -p <Profile	ID>

       nvidia-smi power-smoothing --percent-tmp-floor <Percentage> --profile
       <Profile	ID>

       Sets the	percentage TMP floor to	inputted value for a given Preset Pro-
       file ID.	The desired percentage should be from 0	- 100, given in	the
       form of "AB.CD",	with a maximum of two decimal places of	precision. For
       example,	to set value to	34.56%,	user will input	34.56. Input can also
       contain zero or one decimal places of precision.	This option requires a
       profile ID as an	argument.

       2) Update Ramp-Up Rate

       nvidia-smi power-smoothing -rur <value> -p <Profile ID>

       nvidia-smi power-smoothing --ramp-up-rate <value> --profile <Profile
       ID>

       Sets the	Ramp-Up	Rate to	the desired value for a	given Preset Profile
       ID. The rate given must be in the units of mW/s.	This option requires a
       profile ID as an	argument.

       2) Update Ramp-Down Rate

       nvidia-smi power-smoothing -rdr <value> -p <Profile ID>

       nvidia-smi power-smoothing --ramp-down-rate <value> --profile <Profile
       ID>

       Sets the	Ramp-Down Rate to the desired value for	a given	Preset Profile
       ID. The rate given must be in the units of mW/s.	This option requires a
       profile ID as an	argument.

       2) Update Ramp-Down Hysteresis

       nvidia-smi power-smoothing -rdh <value> -p <Profile ID>

       nvidia-smi power-smoothing --ramp-down-hysteresis <value> --profile
       <Profile	ID>

       Sets the	Ramp-Down Hysteresis to	the desired value for a	given Preset
       Profile ID. The rate given must be in the units of ms. This option re-
       quires a	profile	ID as an argument.

       2) Displays the Preset Profile definitions for all Profile IDs

       nvidia-smi power-smoothing -ppd

       nvidia-smi power-smoothing --print-profile-definitions

       Displays	all values for each Preset Profile IDs.

       2) Set Feature State

       nvidia-smi power-smoothing -s <state>

       nvidia-smi power-smoothing --state <state>

       Sets the	state of the feature to	either 0/DISABLED or 1/ENABLED

   Power Profiles
       The "nvidia-smi power-profiles" command-line is used to manage Workload
       Power  Profiles	related	data on	the GPU. It provides options to	update
       Power Profiles data and query the supported Power Profiles.

       Usage:

       1) Display help menu

       nvidia-smi power-profiles -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi power-profiles -i <GPU IDs>

       nvidia-smi power-profiles --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option	ap-
       plies to	all of the supported GPUs.

       2) List Power Profiles

       nvidia-smi power-profiles -l

       nvidia-smi power-profiles --list

       List all	Workload Power Profiles	supported by the device.

       2) List Detailed	Power Profiles info

       nvidia-smi power-profiles -ld

       nvidia-smi power-profiles --list-detailed

       List all	Workload Power Profiles	supported by the device	along with
       their metadata. This includes the Profile ID, the Priority (where a
       lower number indicates a	higher priority), and Profiles that conflict
       with the	given profile. If two or more conflicting profiles are re-
       quested,	not all	my be enforced.

       2) Get Requested	Profiles

       nvidia-smi power-profiles -gr

       nvidia-smi power-profiles --get-requested

       Get a list of all currently requested Power Profiles. Note that if any
       of the profiles conflict, then not all may be enforced.

       2) Set Requested	Profiles

       nvidia-smi power-profiles -sr <Profile ID>

       nvidia-smi power-profiles --set-requested <Profile ID(s)>

       Adds the	input profile(s) to the	list of	requested Power	Profiles. The
       input is	a comma	separated list of profile IDs with no spaces.

       2) Clear	Requested Profiles

       nvidia-smi power-profiles -cr <Profile ID>

       nvidia-smi power-profiles --clear-requested <Profile ID(s)>

       Removes the input profile(s) to the list	of requested Power Profiles.
       The input is a comma separated list of profile IDs with no spaces.

       2) Get Enforced Profiles

       nvidia-smi power-profiles -ge

       nvidia-smi power-profiles --get-enforced

       Get a list of all currently enforced Power Profiles. Note that this
       list may	differ from the	requested Profiles list	if multiple conflict-
       ing profiles are	selected.

UNIT ATTRIBUTES
       The following list describes all	possible data returned by  the	-q  -u
       unit  query  option.   Unless otherwise noted all numerical results are
       base 10 and unitless.

   Timestamp
       The current system timestamp at the time	nvidia-smi was invoked.	  For-
       mat is "Day-of-week Month Day HH:MM:SS Year".

   Driver Version
       The  version  of	 the  installed	NVIDIA display driver.	Format is "Ma-
       jor-Number.Minor-Number".

   HIC Info
       Information about any Host Interface Cards (HIC)	that are installed  in
       the system.

       Firmware	Version
		      The version of the firmware running on the HIC.

   Attached Units
       The number of attached Units in the system.

   Product Name
       The  official product name of the unit.	This is	an alphanumeric	value.
       For all S-class products.

   Product Id
       The product identifier for the unit.  This is an	alphanumeric value  of
       the form	"part1-part2-part3".  For all S-class products.

   Product Serial
       The  immutable globally unique identifier for the unit.	This is	an al-
       phanumeric value.  For all S-class products.

   Firmware Version
       The version of the firmware running on the unit.	 Format	is "Major-Num-
       ber.Minor-Number".  For all S-class products.

   LED State
       The LED indicator is used to flag systems with potential	problems.   An
       LED color of AMBER indicates an issue.  For all S-class products.

       Color	      The  color of the	LED indicator.	Either "GREEN" or "AM-
		      BER".

       Cause	      The reason for the current LED color.  The cause may  be
		      listed as	any combination	of "Unknown", "Set to AMBER by
		      host  system",  "Thermal	sensor failure", "Fan failure"
		      and "Temperature exceeds critical	limit".

   Temperature
       Temperature readings for	important components of	the Unit.   All	 read-
       ings  are in degrees C.	Not all	readings may be	available.  For	all S-
       class products.

       Intake	      Air temperature at the unit intake.

       Exhaust	      Air temperature at the unit exhaust point.

       Board	      Air temperature across the unit board.

   PSU
       Readings	for the	unit power supply.  For	all S-class products.

       State	      Operating	state of the PSU.  The power supply state  can
		      be  any  of  the	following: "Normal", "Abnormal", "High
		      voltage",	"Fan failure", "Heatsink  temperature",	 "Cur-
		      rent   limit",   "Voltage	 below	UV  alarm  threshold",
		      "Low-voltage", "I2C remote  off  command",  "MOD_DISABLE
		      input" or	"Short pin transition".

       Voltage	      PSU voltage setting, in volts.

       Current	      PSU current draw,	in amps.

   Fan Info
       Fan  readings  for  the	unit.	A reading is provided for each fan, of
       which there can be many.	 For all S-class products.

       State	      The state	of the fan, either "NORMAL" or "FAILED".

       Speed	      For a healthy fan, the fan's speed in RPM.

   Attached GPUs
       A list of PCI bus ids that correspond to	each of	the GPUs  attached  to
       the  unit.   The	bus ids	have the form "domain:bus:device.function", in
       hex.  For all S-class products.

NOTES
       On Linux, NVIDIA	device files may be modified by	nvidia-smi if  run  as
       root.  Please see the relevant section of the driver README file.

       The  -a	and -g arguments are now deprecated in favor of	-q and -i, re-
       spectively.  However, the old arguments still work for this release.

EXAMPLES
   nvidia-smi -q
       Query attributes	for all	GPUs once, and display in plain	text  to  std-
       out.

   nvidia-smi --format=csv,noheader --query-gpu=uuid,persistence_mode
       Query UUID and persistence mode of all GPUs in the system.

   nvidia-smi -q -d ECC,POWER -i 0 -l 10 -f out.log
       Query  ECC  errors and power consumption	for GPU	0 at a frequency of 10
       seconds,	indefinitely, and record to the	file out.log.

   "nvidia-smi			 -c		       1		    -i
       GPU-b2f5f1b745e3d23d-65a3a26d-097db358-7303e0b6-149642ff3d219f8587cde3a8"

       Set   the  compute  mode	 to  "PROHIBITED"  for	GPU  with  UUID	 "GPU-
       b2f5f1b745e3d23d-65a3a26d-097db358-7303e0b6-149642ff3d219f8587cde3a8".

   nvidia-smi -q -u -x --dtd
       Query attributes	for all	Units once, and	display	in XML format with em-
       bedded DTD to stdout.

   nvidia-smi --dtd -u -f nvsmi_unit.dtd
       Write the Unit DTD to nvsmi_unit.dtd.

   nvidia-smi -q -d SUPPORTED_CLOCKS
       Display supported clocks	of all GPUs.

   nvidia-smi -i 0 --applications-clocks 2500,745
       Set applications	clocks to 2500 MHz memory, and 745 MHz graphics.

   nvidia-smi mig -cgi 19
       Create a	MIG GPU	instance on profile ID 19.

   nvidia-smi mig -cgi 19:2
       Create a	MIG GPU	instance on profile ID 19 at placement start index 2.

   nvidia-smi boost-slider -l
       List all	boost sliders for all GPUs.

   nvidia-smi boost-slider --vboost 1
       Set vboost to value 1 for all GPUs.

   nvidia-smi power-hint -l
       List clock range, temperature range and	supported  profiles  of	 power
       hint.

   nvidia-smi boost-slider -gc 1350 -t 60 -p 0
       Query power hint	with graphics clock at 1350MHz,	temperature at 60C and
       profile ID at 0.

   nvidia-smi boost-slider -gc 1350 -mc	1215 -t	n5 -p 1
       Query  power  hint  with	 graphics  clock  at  1350MHz, memory clock at
       1215MHz,	temperature at -5C and profile ID at 1.

CHANGE LOG
	 === Known Issues ===

	 * On systems where GPUs are NUMA nodes, the  accuracy	of  FB	memory
       utilization  provided by	nvidia-smi depends on the memory accounting of
       the operating system.

	   This	is because FB memory is	managed	by the	operating  system  in-
       stead of	the NVIDIA GPU driver.

	   Typically, pages allocated from FB memory are not released even af-
       ter the process terminates to enhance performance. In scenarios where

	   the	operating  system  is  under memory pressure, it may resort to
       utilizing FB memory. Such actions can result in	discrepancies  in  the
       accuracy	of memory reporting.

	 *  On	Linux  GPU  Reset can't	be triggered when there	is pending GOM
       change.

	 * On Linux GPU	Reset may not successfully change pending ECC mode.  A
       full reboot may be required to enable the mode change.

	 *  On	Linux  platforms that configure	NVIDIA GPUs as NUMA nodes, en-
       abling persistence mode or resetting GPUs may print  "Warning:  persis-
       tence  mode  is	disabled on device" if nvidia-persistenced is not run-
       ning, or	if nvidia-persistenced cannot access files in the NVIDIA  dri-
       ver's  procfs  directory	 for the device	(/proc/driver/nvidia/gpus/<PCI
       Config Address>/). During GPU reset and driver reload,  this  directory
       will  be	 deleted  and  recreated,  and	outstanding  references	to the
       deleted directory, such as mounts or shells, can	prevent	processes from
       accessing files in the new directory.

	 * Updated "nvidia-smi -q" to print both  "Instantaneous  Power	 Draw"
       and  "Average  Power  Draw"  in all cases where "Power Draw" used to be
       printed.

	 *  ===	Changes	between	nvidia-smi v570	Update and v565	===

	 * Added new  cmdline  option  "-idth"	and  "-Width"  to  "nvidia-smi
       nvlink"

	 *  Added  new	ability	to display Nvlink sleep	state with "nvidia-smi
       nvlink -for Blackwell and onward	generations"

	 * Added new query GPU options for average/instant module power	 draw:
       "nvidia-smi --query-gpu=module.power.draw.{average,instant}"

	 *  Added  new query GPU options for default/max/min module power lim-
       its:		"nvidia-smi		 --query-gpu=module.power.{de-
       fault_limit,max_limit,min_limit}"

	 *  Added  new	query GPU options for module power limits: "nvidia-smi
       --query-gpu=module.power.limit"

	 * Added new query GPU	options	 for  enforced	module	power  limits:
       "nvidia-smi --query-gpu=module.enforced.power.limit"

	 * Added new query GPU aliases for GPU Power options

	 *  Added  a new command to get	confidential compute info: "nvidia-smi
       conf-compute -q"

	 * Added new Power Profiles section in nvidia-smi -q and corresponding
       -d display flag POWER_PROFILES

	 * Added new Power  Profiles  option  "nvidia-smi  power-profiles"  to
       get/set power profiles related information.

	 * Added the platform information query	to "nvidia-smi -q"

	 *  Added  the	platform  information query to "nvidia-smi --query-gpu
       platform"

	 * Added new Power Smoothing option  "nvidia-smi  power-smoothing"  to
       set power smoothing related values.

	 *  Added new Power Smoothing section in nvidia-smi -q and correspond-
       ing -d display flag POWER_SMOOTHING

	 *  Deprecated	graphics  voltage  value  from	Voltage	  section   of
       nvidia-smi -q. Voltage now always displays as "N/A" and will be removed
       in a future release.

	 *  Added  new	topo  option  nvidia-smi topo -nvme to display GPUs vs
       NVMes connecting	path.

	 * Changed help	string for the command "nvidia-smi topo	-p2p -p"  from
       "prop" to "pcie"	to better describe the p2p capability.

	 * Added new command "nvidia-smi pci -gCnt" to query PCIe RX/TX	Bytes.

	 *  Added  EGM	capability  display  under new Capabilities section in
       nvidia-smi -q command.

	 * Add multiGpuMode dipsplay via nvidia-smi via	"nvidia-smi  conf-com-
       pute --get-multigpu-mode" or "nvidia-smi	conf-compute -mgm"

	 * GPU Reset Status in nvidia-smi -q has been deprecated. GPU Recovery
       action provides all the necessary actions

	 * nvidia-smi -q will now display Dram encryption state

	 *  nvidia-smi	-den/--dram-encryption	0/1 to disable/enable dram en-
       cryption

	 * Added new status to nvidia fabric health. nvidia-smi	-q  will  dis-
       play  3 new fields in Fabric Health - Route Recovery in progress, Route
       Unhealthy and  Access Timeout Recovery

	 * In nvidia-smi -q Platform Info - RACK GUID is changed  to  Platform
       Info - RACK Serial Number

	 *  In	nvidia-smi  --query-gpu	 new option for	gpu_recovery_action is
       added

	 * - Added new counters	for Nvlink5 in nvidia-smi nvlink -e

	 *    -	Effective Errors to get	sum of the number of  errors  in  each
       Nvlink packet

	 *    -	Effective BER to get Effective BER for effective errors

	 *     -  FEC  Errors -	0 to 15	to get count of	symbol errors that are
       corrected

	 * Added a new output field called "GPU	Fabric GUID" to	 the  "nvidia-
       smi -q" output

	 *  Added a new	property called	"platform.gpu_fabric_guid" to "nvidia-
       smi --query-gpu"

	 *  ===	Changes	between	nvidia-smi v565	Update and v560	===

	 * Added the reporting of vGPU homogeneous mode	to "nvidia-smi -q".

	 * Added the reporting of homogeneous vGPU placements  to  "nvidia-smi
       vgpu -s -v", complementing the existing reporting of heterogeneous vGPU
       placements.

	 *  ===	Changes	between	nvidia-smi v560	Update and v555	===

	 * Added "Atomic Caps Inbound" in the PCI section of "nvidia-smi -q".

	 *  Updated  ECC and row remapper output for options "--query-gpu" and
       "--query-remapped-rows".

	 * Added support for events including ECC single-bit error storm, DRAM
       retirement, DRAM	retirement failure, contained/nonfatal poison and  un-
       contained/fatal poison.

	 *  Added  support  in "nvidia-smi nvlink -e" to display NVLink5 error
       counters

	 *  ===	Changes	between	nvidia-smi v550	Update and v545	===

	 * Added a new cmdline option to print out version information:	--ver-
       sion

	 *  Added  ability  to	print  out  only  the  GSP  firmware   version
       with"nvidia-smi	 -q   -d".   Example  commandline:  nvidia-smi	-q  -d
       GSP_FIRMWARE_VERSION

	 *  Added  support  to	query  pci.baseClass  and  pci.subClass.   See
       nvidia-smi --help-query-gpu for details.

	 * Added PCI base and sub classcodes to	"nvidia-smi -q"	output.

	 * Added new cmdline option "--format" to "nvidia-smi dmon" to support
       "csv", "nounit" and "noheader" format specifiers

	 *  Added a new	cmdline	option "--gpm-options" to "nvidia-smi dmon" to
       support GPM metrics report in MIG mode

	 * Added the NVJPG and NVOFA utilization report	to "nvidia-smi pmon"

	 * Added the NVJPG and NVOFA utilization report	to "nvidia-smi	-q  -d
       utilization"

	 *  Added  the	NVJPG and NVOFA	utilization report to "nvidia-smi vgpu
       -q" to report NVJPG/NVOFA utilization on	active vgpus

	 * Added the NVJPG and NVOFA utilization report	 to  "nvidia-smi  vgpu
       -u" to periodically report NVJPG/NVOFA utilization on active vgpus

	 *  Added  the	NVJPG and NVOFA	utilization report to "nvidia-smi vgpu
       -p" to periodically report NVJPG/NVOFA utilization on running  processs
       of active vgpus

	 *  Added a new	cmdline	option "-shm" to "nvidia-smi vgpu" to set vGPU
       heterogeneous mode

	 * Added the reporting of vGPU heterogeneous mode in "nvidia-smi -q"

	 * Added ability to call "nvidia-smi mig -lgip"	 and  "nvidia-smi  mig
       -lgipp" to work without requiring MIG being enabled

	 *  Added support to query confidential	compute	key rotation threshold
       info.

	 * Added support to set	confidential compute key rotation max attacker
       advantage.

	 * Added a new cmdline option "--sparse-operation-mode"	to "nvidia-smi
       clocks" to set the sparse operation mode

	 * Added the reporting of sparse operation mode	to "nvidia-smi	-q  -d
       PERFORMANCE"

	 *  ===	Changes	between	nvidia-smi v535	Update and v545	===

	 *  Added  support  to	query the timestamp and	duration of the	latest
       flush of	the BBX	object to the inforom storage.

	 * Added support for reporting out GPU Memory power usage.

	 *  ===	Changes	between	nvidia-smi v535	Update and v530	===

	 * Updated the SRAM error status reported in the ECC query "nvidia-smi
       -q -d ECC"

	 * Added support to query and report the GPU  JPEG  and	 OFA  (Optical
       Flow Accelerator) utilizations.

	 * Removed deprecated "stats" command.

	 * Added support to set	the vGPU software scheduler state.

	 * Renamed counter collection unit to gpu performance monitoring.

	 * Added new C2C Mode reporting	to device query.

	 * Added back clock_throttle_reasons to	--query-gpu to not break back-
       wards compatibility

	 *  Added  support to get confidential compute CPU capability and GPUs
       capability.

	 * Added support to set	confidential compute  unprotected  memory  and
       GPU ready state.

	 * Added support to get	confidential compute memory info and GPU ready
       state.

	 *  Added support to display confidential compute devtools mode, envi-
       ronment and feature status.

	 *  ===	Changes	between	nvidia-smi v525	Update and v530	===

	 * Added support to query power.draw.average  and  power.draw.instant.
       See nvidia-smi --help-query-gpu for details.

	 * Added support to get	the vGPU software scheduler state.

	 * Added support to get	the vGPU software scheduler logs.

	 * Added support to get	the vGPU software scheduler capabilities.

	 * Renamed Clock Throttle Reasons to Clock Event Reasons.

	 *  ===	Changes	between	nvidia-smi v520	Update and v525	===

	 *  Added  support  to	query  and  set	counter	collection unit	stream
       state.

	 *  ===	Changes	between	nvidia-smi v470	Update and v510	===

	 * Add new "Reserved" memory reporting to the FB memory	output

	 *  ===	Changes	between	nvidia-smi v465	Update and v470	===

	 * Added support to query power	hint

	 *  ===	Changes	between	nvidia-smi v460	Update and v465	===

	 * Removed support for -acp,--application-clock-permissions option

	 *  ===	Changes	between	nvidia-smi v450	Update and v460	===

	 * Add option to specify placement when	creating a MIG GPU instance.

	 * Added support to query and control boost slider

	 *  ===	Changes	between	nvidia-smi v445	Update and v450	===

	 * Added --lock-memory-clock and --reset-memory-clock command to  lock
       to  closest  min/max  Memory clock provided and ability to reset	Memory
       clock

	 * Allow fan speeds greater than 100% to be reported

	 * Added topo support to display NUMA node affinity for	GPU devices

	 * Added support to create MIG instances using profile names

	 * Added support to create the default compute instance	while creating
       a GPU instance

	 * Added support to query and disable MIG mode on Windows

	 * Removed support of GPU reset(-r) command on MIG enabled vGPU	guests

	 *  ===	Changes	between	nvidia-smi v418	Update and v445	===

	 * Added support for Multi Instance GPU	(MIG)

	 * Added support to individually reset NVLink-capable  GPUs  based  on
       the NVIDIA Ampere architecture

	 *  ===	Changes	between	nvidia-smi v361	Update and v418	===

	 *  Support for	Volta and Turing architectures,	bug fixes, performance
       improvements, and new features

	 *  ===	Changes	between	nvidia-smi v352	Update and v361	===

	 * Added nvlink	support	to expose the publicly available  NVLINK  NVML
       APIs

	 * Added clocks	sub-command with synchronized boost support

	 * Updated nvidia-smi stats to report GPU temperature metric

	 * Updated nvidia-smi dmon to support PCIe throughput

	 * Updated nvidia-smi daemon/replay to support PCIe throughput

	 *  Updated  nvidia-smi	dmon, daemon and replay	to support PCIe	Replay
       Errors

	 * Added GPU part numbers in nvidia-smi	-q

	 * Removed support for exclusive thread	compute	mode

	 * Added Video (encoder/decode)	clocks to the Clocks  and  Max	Clocks
       display of nvidia-smi -q

	 * Added memory	temperature output to nvidia-smi dmon

	 *  Added  --lock-gpu-clock  and  --reset-gpu-clock command to lock to
       closest min/max GPU clock provided and reset clock

	 * Added --cuda-clocks to override or restore default CUDA clocks

	 === Changes between nvidia-smi	v346 Update and	v352 ===

	 * Added topo support to display affinities per	GPU

	 * Added topo support to display neighboring GPUs for a	given level

	 * Added topo support to show pathway between two given	GPUs

	 * Added "nvidia-smi pmon"  command-line  for  process	monitoring  in
       scrolling format

	 * Added "--debug" option to produce an	encrypted debug	log for	use in
       submission of bugs back to NVIDIA

	 * Fixed reporting of Used/Free	memory under Windows WDDM mode

	 *  The	accounting stats is updated to include both running and	termi-
       nated processes.	The execution time of running process is reported as 0
       and updated to actual value when	the process is terminated.

	 === Changes between nvidia-smi	v340 Update and	v346 ===

	 * Added reporting of PCIe replay counters

	 * Added support for reporting Graphics	processes via nvidia-smi

	 * Added reporting of PCIe utilization

	 * Added dmon command-line for device monitoring in scrolling format

	 * Added daemon	command-line to	run in background and monitor  devices
       as a daemon process. Generates dated log	files at /var/log/nvstats/

	 *  Added  replay command-line to replay/extract the stat files	gener-
       ated by the daemon tool

	 === Changes between nvidia-smi	v331 Update and	v340 ===

	 * Added reporting of temperature threshold information.

	 * Added reporting of brand information	(e.g. Tesla, Quadro, etc.)

	 * Added support for K40d and K80.

	 * Added reporting of max, min and avg for  samples  (power,  utiliza-
       tion,  clock changes). Example commandline: nvidia-smi -q -d power,uti-
       lization, clock

	 * Added nvidia-smi stats interface  to	 collect  statistics  such  as
       power, utilization, clock changes, xid events and perf capping counters
       with  a	notion	of  time attached to each sample. Example commandline:
       nvidia-smi stats

	 * Added support for collectively reporting metrics on more  than  one
       GPU. Used with comma separated with "-i"	option.	Example: nvidia-smi -i
       0,1,2

	 *  Added  support for displaying the GPU encoder and decoder utiliza-
       tions

	 * Added nvidia-smi topo interface to display the GPUDirect communica-
       tion matrix (EXPERIMENTAL)

	 * Added support for displayed the GPU board ID	and whether or not  it
       is a multiGPU board

	 * Removed user-defined	throttle reason	from XML output

	 === Changes between nvidia-smi	v5.319 Update and v331 ===

	 * Added reporting of minor number.

	 * Added reporting BAR1	memory size.

	 * Added reporting of bridge chip firmware.

	 ===  Changes  between	nvidia-smi v4.319 Production and v4.319	Update
       ===

	 * Added new --applications-clocks-permission switch to	change permis-
       sion requirements for setting and resetting applications	clocks.

	 === Changes between nvidia-smi	v4.304 and v4.319 Production ===

	 * Added reporting of Display Active state and	updated	 documentation
       to clarify how it differs from Display Mode and Display Active state

	 *  For	 consistency on	multi-GPU boards nvidia-smi -L always displays
       UUID instead of serial number

	 * Added machine readable selective reporting. See SELECTIVE QUERY OP-
       TIONS section of	nvidia-smi -h

	 *   Added   queries   for   page   retirement	  information.	   See
       --help-query-retired-pages and -d PAGE_RETIREMENT

	 *  Renamed  Clock Throttle Reason User	Defined	Clocks to Applications
       Clocks Setting

	 * On error, return codes have distinct	non zero values	for each error
       class. See RETURN VALUE section

	 * nvidia-smi -i can now query information from	healthy	GPU when there
       is a problem with other GPU in the system

	 * All messages	that point to a	problem	with a GPU print pci bus id of
       a GPU at	fault

	 * New flag --loop-ms for querying information at  higher  rates  than
       once a second (can have negative	impact on system performance)

	 *  Added  queries  for	 accounting  procsses.	 See  --help-query-ac-
       counted-apps and	-d ACCOUNTING

	 * Added the enforced power limit to the query output

	 === Changes between nvidia-smi	v4.304 RC and v4.304 Production	===

	 * Added reporting of GPU Operation Mode (GOM)

	 * Added new --gom switch to set GPU Operation Mode

	 === Changes between nvidia-smi	v3.295 and v4.304 RC ===

	 * Reformatted non-verbose output due to user feedback.	 Removed pend-
       ing information from table.

	 * Print out helpful message if	initialization	fails  due  to	kernel
       module not receiving interrupts

	 *  Better  error  handling when NVML shared library is	not present in
       the system

	 * Added new --applications-clocks switch

	 * Added new filter to --display switch. Run with -d  SUPPORTED_CLOCKS
       to list possible	clocks on a GPU

	 * When	reporting free memory, calculate it from the rounded total and
       used memory so that values add up

	 *  Added  reporting of	power management limit constraints and default
       limit

	 * Added new --power-limit switch

	 * Added reporting of texture memory ECC errors

	 * Added reporting of Clock Throttle Reasons

	 === Changes between nvidia-smi	v2.285 and v3.295 ===

	 * Clearer error reporting for running commands	(like changing compute
       mode)

	 * When	running	commands on multiple  GPUs  at	once  N/A  errors  are
       treated as warnings.

	 * nvidia-smi -i now also supports UUID

	 *  UUID  format changed to match UUID standard	and will report	a dif-
       ferent value.

	 === Changes between nvidia-smi	v2.0 and v2.285	===

	 * Report VBIOS	version.

	 * Added -d/--display flag to filter parts of data

	 * Added reporting of PCI Sub System ID

	 * Updated docs	to indicate we support M2075 and C2075

	 * Report HIC HWBC firmware version with -u switch

	 * Report max(P0) clocks next to current clocks

	 * Added --dtd flag to print the device	or unit	DTD

	 * Added message when NVIDIA driver is not running

	 * Added reporting of PCIe link	generation (max	and current), and link
       width (max and current).

	 * Getting pending driver model	works on non-admin

	 * Added support for running nvidia-smi	on Windows Guest accounts

	 * Running nvidia-smi without -q command will output non verbose  ver-
       sion of -q instead of help

	 *  Fixed  parsing  of	-l/--loop=  argument (default value, 0,	to big
       value)

	 * Changed format of pciBusId (to XXXX:XX:XX.X - this change was visi-
       ble in 280)

	 * Parsing of busId for	-i command is less restrictive.	You  can  pass
       0:2:0.0 or 0000:02:00 and other variations

	 * Changed versioning scheme to	also include "driver version"

	 * XML format always conforms to DTD, even when	error conditions occur

	 *  Added  support for single and double bit ECC events	and XID	errors
       (enabled	by default with	-l flag	disabled for -x	flag)

	 * Added device	reset -r --gpu-reset flags

	 * Added listing of compute running processes

	 * Renamed power state to performance state. Deprecated	support	exists
       in XML output only.

	 * Updated DTD version number to 2.0 to	match the updated XML output

SEE ALSO
       On     Linux,	 the	 driver	    README     is     installed	    as
       /usr/share/doc/NVIDIA_GLX-1.0/README.txt

AUTHOR
       NVIDIA Corporation

COPYRIGHT
       Copyright 2011-2025 NVIDIA Corporation.

nvidia-smi 570.124		   2025/2/25			 nvidia-smi(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=nvidia-smi&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help