Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
nvidia-smi(1)			     NVSMI			 nvidia-smi(1)

NAME
       nvidia-smi - NVIDIA System Management Interface program

SYNOPSIS
       nvidia-smi [OPTION1 [ARG1]] [OPTION2 [ARG2]] ...

DESCRIPTION
       nvidia-smi (also	NVSMI) provides	monitoring and management capabilities
       for each	of NVIDIA's Tesla, Quadro, GRID	and GeForce devices from Fermi
       and higher architecture families. GeForce Titan series devices are
       supported for most functions with very limited information provided for
       the remainder of	the Geforce brand. NVSMI is a cross platform tool that
       supports	all standard NVIDIA driver-supported Linux distros, as well as
       64bit versions of Windows starting with Windows Server 2008 R2. Metrics
       can be consumed directly	by users via stdout, or	provided by file via
       CSV and XML formats for scripting purposes.

       Note that much of the functionality of NVSMI is provided	by the
       underlying NVML C-based library.	See the	NVIDIA developer website link
       below for more information about	NVML. NVML-based python	bindings are
       also available.

       The output of NVSMI is not guaranteed to	be backwards compatible.
       However,	both NVML and the Python bindings are backwards	compatible,
       and should be the first choice when writing any tools that must be
       maintained across NVIDIA	driver releases.

       NVML SDK: https://docs.nvidia.com/deploy/nvml-api/index.html

       Python bindings:	http://pypi.python.org/pypi/nvidia-ml-py/

OPTIONS
   GENERAL OPTIONS
   -h, --help
       Print usage information and exit.

   --version
       Print version information and exit.

   LIST	OPTIONS
   -L, --list-gpus
       List each of the	NVIDIA GPUs in the system, along with their UUIDs.

   -B, --list-excluded-gpus
       List each of the	excluded NVIDIA	GPUs in	the system, along with their
       UUIDs.

   SUMMARY OPTIONS
   Show	a summary of GPUs connected to the system.
   -col, --columns
       Show a summary of GPUs connected	to the system in a multi-column
       format.

   [any	one of]
   -i, --id=ID
       Target a	specific GPU.

   -f FILE, --filename=FILE
       Log to the specified file, rather than to stdout.

   -l SEC, --loop=SEC
       Probe until Ctrl+C at specified second interval.

   QUERY OPTIONS
   -q, --query
       Display GPU or Unit info. Displayed info	includes all data listed in
       the (GPU	ATTRIBUTES) or (UNIT ATTRIBUTES) sections of this document.
       Some devices and/or environments	don't support all possible
       information. Any	unsupported data is indicated by a "N/A" in the
       output. By default information for all available	GPUs or	Units is
       displayed. Use the -i option to restrict	the output to a	single GPU or
       Unit.

   [plus optionally]
   -u, --unit
       Display Unit data instead of GPU	data. Unit data	is only	available for
       NVIDIA S-class Tesla enclosures.

   -i, --id=ID
       Display data for	a single specified GPU or Unit.	The specified id may
       be the GPU/Unit's 0-based index in the natural enumeration returned by
       the driver, the GPU's board serial number, the GPU's UUID, or the GPU's
       PCI bus ID (as domain:bus:device.function in hex). It is	recommended
       that users desiring consistency use either UUID or PCI bus ID, since
       device enumeration ordering is not guaranteed to	be consistent between
       reboots and board serial	number might be	shared between multiple	GPUs
       on the same board.

   -f FILE, --filename=FILE
       Redirect	query output to	the specified file in place of the default
       stdout. The specified file will be overwritten.

   -x, --xml-format
       Produce XML output in place of the default human-readable format. Both
       GPU and Unit query outputs conform to corresponding DTDs. These are
       available via the --dtd flag.

   --dtd
       Use with	-x. Embed the DTD in the XML output.

   --debug=FILE
       Produces	an encrypted debug log for use in submission of	bugs back to
       NVIDIA.

   -d TYPE, --display=TYPE
       Display only selected information: MEMORY, UTILIZATION, ECC,
       TEMPERATURE, POWER, CLOCK, COMPUTE, PIDS, PERFORMANCE,
       SUPPORTED_CLOCKS, PAGE_RETIREMENT, ACCOUNTING, ENCODER_STATS,
       SUPPORTED_GPU_TARGET_TEMP, VOLTAGE, FBC_STATS, ROW_REMAPPER,
       GSP_FIRMWARE_VERSION, POWER_SMOOTHING, POWER_PROFILES. Flags can	be
       combined	with comma e.g.	"MEMORY,ECC". Sampling data with max, min and
       avg is also returned for	POWER, UTILIZATION and CLOCK display types.
       Doesn't work with -u/--unit or -x/--xml-format flags.

   -l SEC, --loop=SEC
       Continuously report query data at the specified interval, rather	than
       the default of just once. The application will sleep in-between
       queries.	Note that on Linux ECC error or	Xid error events will print
       out during the sleep period if the -x flag was not specified. Pressing
       Ctrl+C at any time will abort the loop, which will otherwise run
       indefinitely. If	no argument is specified for the -l form a default
       interval	of 5 seconds is	used.

   -lms	ms, --loop-ms=ms
       Same as -l,--loop but in	milliseconds.

   SELECTIVE QUERY OPTIONS
       Allows the caller to pass an explicit list of properties	to query.

   [one	of]
   --query-gpu=
       Information about GPU. Pass comma separated list	of properties you want
       to query. e.g. --query-gpu=pci.bus_id,persistence_mode. Call --help-
       query-gpu for more info.

   --query-supported-clocks=
       List of supported clocks. Call --help-query-supported-clocks for	more
       info.

   --query-compute-apps=
       List of currently active	compute	processes. Call	--help-query-compute-
       apps for	more info.

   --query-accounted-apps=
       List of accounted compute processes. Call --help-query-accounted-apps
       for more	info. This query is not	supported on vGPU host.

   --query-retired-pages=
       List of GPU device memory pages that have been retired. Call --help-
       query-retired-pages for more info.

   --query-remapped-rows=
       Information about remapped rows.	Call --help-query-remapped-rows	for
       more info.

   [mandatory]
   --format=
       Comma separated list of format options:

        csv - comma separated values (MANDATORY)

        noheader - skip first line with column	headers

        nounits - don't print units for numerical values

   [plus any of]
   -i, --id=ID
       Display	data  for  a single specified GPU. The specified id may	be the
       GPU's 0-based index in the natural enumeration returned by the  driver,
       the  GPU's board	serial number, the GPU's UUID, or the GPU's PCI	bus ID
       (as domain:bus:device.function in hex). It is  recommended  that	 users
       desiring	 consistency  use  either  UUID	 or  PCI  bus ID, since	device
       enumeration ordering is not guaranteed to be consistent between reboots
       and board serial	number might be	shared between multiple	 GPUs  on  the
       same board.

   -f FILE, --filename=FILE
       Redirect	 query	output	to  the	specified file in place	of the default
       stdout. The specified file will be overwritten.

   -l SEC, --loop=SEC
       Continuously report query data at the specified interval,  rather  than
       the  default  of	 just  once.  The  application	will  sleep in-between
       queries.	Note that on Linux ECC error or	Xid error  events  will	 print
       out  during the sleep period if the -x flag was not specified. Pressing
       Ctrl+C at any time will	abort  the  loop,  which  will	otherwise  run
       indefinitely.  If  no  argument	is specified for the -l	form a default
       interval	of 5 seconds is	used.

   -lms	ms, --loop-ms=ms
       Same as -l,--loop but in	milliseconds.

   DEVICE MODIFICATION OPTIONS
   [any	one of]
   -pm,	--persistence-mode=MODE
       Set the persistence mode	for the	target GPUs. See the (GPU  ATTRIBUTES)
       section	for  a	description  of	 persistence mode. Requires root. Will
       impact all GPUs unless a	single GPU is specified	using the -i argument.
       The effect of this operation is immediate. However, it does not persist
       across reboots. After each reboot  persistence  mode  will  default  to
       "Disabled". Available on	Linux only.

   -e, --ecc-config=CONFIG
       Set  the	ECC mode for the target	GPUs. See the (GPU ATTRIBUTES) section
       for a description of ECC	mode. Requires	root.  Will  impact  all  GPUs
       unless  a  single  GPU is specified using the -i	argument. This setting
       takes effect after the next reboot and is persistent.

   -p, --reset-ecc-errors=TYPE
       Reset the ECC  error  counters  for  the	 target	 GPUs.	See  the  (GPU
       ATTRIBUTES)  section  for  a  description  of  ECC error	counter	types.
       Available arguments are 0\|VOLATILE  or	1\|AGGREGATE.  Requires	 root.
       Will  impact  all  GPUs	unless	a single GPU is	specified using	the -i
       argument. The effect of this operation is immediate. Clearing aggregate
       counts is not supported on Ampere+

   -c, --compute-mode=MODE
       Set the compute mode for	the target  GPUs.  See	the  (GPU  ATTRIBUTES)
       section	for  a description of compute mode. Requires root. Will	impact
       all GPUs	unless a single	GPU is specified using the  -i	argument.  The
       effect  of  this	 operation  is immediate. However, it does not persist
       across reboots. After each reboot compute mode will reset to "DEFAULT".

   -dm TYPE, --driver-model=TYPE
   -fdm	TYPE, --force-driver-model=TYPE
       Enable  or  disable  TCC	 driver	 model.	 For  Windows  only.  Requires
       administrator  privileges.  -dm will fail if a display is attached, but
       -fdm will force the driver model	to change. Will	impact all GPUs	unless
       a single	GPU is specified using the -i argument.	A reboot  is  required
       for  the	change to take place. See Driver Model for more	information on
       Windows driver models. An error message indicates that  retrieving  the
       field failed.

   --gom=MODE
       Set  GPU	 Operation  Mode:  0/ALL_ON,  1/COMPUTE, 2/LOW_DP Supported on
       GK110 M-class and X-class Tesla products	from the  Kepler  family.  Not
       supported  on  Quadro and Tesla C-class products. LOW_DP	and ALL_ON are
       the  only  modes	 supported  on	 GeForce   Titan   devices.   Requires
       administrator  privileges.  See GPU Operation Mode for more information
       about GOM. GOM changes take effect after	reboot.	The reboot requirement
       might be	removed	in the future. Compute only GOMs  don't	 support  WDDM
       (Windows	Display	Driver Model)

   -r, --gpu-reset
       Trigger a reset of one or more GPUs. Can	be used	to clear GPU HW	and SW
       state  in  situations  that  would  otherwise require a machine reboot.
       Typically useful	if a double bit	ECC error has  occurred.  Optional  -i
       switch can be used to target one	or more	specific devices. Without this
       option,	all  GPUs  are	reset.	Requires  root.	 There	can't  be  any
       applications using  these  devices  (e.g.  CUDA	application,  graphics
       application  like  X server, monitoring application like	other instance
       of nvidia-smi). There also can't	be any compute applications running on
       any other GPU in	the system if individual GPU reset is not feasible.

       Starting	 with  the  NVIDIA  Ampere  architecture,  GPUs	 with	NVLink
       connections  can	 be  individually  reset.  On Ampere NVSwitch systems,
       Fabric Manager is required to facilitate	reset.	On  Hopper  and	 later
       NVSwitch	 systems, the dependency on Fabric Manager to facilitate reset
       is removed.

       If Fabric Manager is not	running, or if any of the GPUs being reset are
       based on	an architecture	preceding the NVIDIA Ampere architecture,  any
       GPUs with NVLink	connections to a GPU being reset must also be reset in
       the same	command. This can be done either by omitting the -i switch, or
       using  the  -i switch to	specify	the GPUs to be reset. If the -i	option
       does not	specify	a complete set of NVLink GPUs to reset,	 this  command
       will  issue  an	error  identifying  the	 additional  GPUs that must be
       included	in the reset command.

       Specific	details	are outlined in	the tables below:

       NVSwitch	systems:

	GPU Family | Fabric Manager running	  | Fabric Manager not running
       ------------|------------------------------|------------------------------
	Pre-Ampere | All PEER connected	GPUs must | All	PEER connected GPUs must
		   | be	reset in same command.	  | be reset in	same command
	Ampere+	   | Each GPU can be reset	  | All	PEER connected GPUs must
		   | individually		  | be reset in	same command

       Direct connected	NVLink systems:	(FM is not supported, as  no  NVSwitch
       HW is present)

	GPU Family | Capabilities
       ------------|-------------------------------------------------------
	Pre-Ampere | All PEER connected	GPUs must be reset in same command
	Ampere+	   | Each GPU can be reset individually

       GPU reset is not	guaranteed to work in all cases. It is not recommended
       for  production environments at this time. In some situations there may
       be HW components	on the board that fail to revert back  to  an  initial
       state  following	 the  reset request. This is more likely to be seen on
       Fermi-generation	products vs. Kepler, and more likely to	be seen	if the
       reset is	being performed	on a hung GPU.

       Following a reset, it is	recommended that the health of each reset  GPU
       be  verified  before  further use. If any GPU is	not healthy a complete
       reset should be instigated by power cycling the node.

       Reset triggered without extra arguments,	will  be  a  default  Function
       Level  Reset  (FLR).  To	 issue	a  Bus	Reset, use -r bus. For certain
       platforms only Function Level Reset is possible.

       GPU reset operation will	not be supported on MIG	enabled	vGPU guests.

       Visit http://developer.nvidia.com/gpu-deployment-kit  to	 download  the
       GDK.

   -vm,	--virt-mode=MODE
       Switch  GPU Virtualization Mode.	Sets GPU virtualization	mode to	3/VGPU
       or 4/VSGA. Virtualization mode of a GPU can only	 be  set  when	it  is
       running on a hypervisor.

   -lgc, --lock-gpu-clocks=MIN_GPU_CLOCK,MAX_GPU_CLOCK
       Specifies  <minGpuClock,maxGpuClock>  clocks as a pair (e.g. 1500,1500)
       that defines closest desired locked GPU clock speed in MHz.  Input  can
       also  use  be  a	 singular  desired clock value (e.g. <GpuClockValue>).
       Optionally, --mode can be supplied to specify the clock locking	modes.
       Supported on Volta+. Requires root.

       --mode=0	(Default)
		      This mode	is the default clock locking mode and provides
		      the  highest  possible frequency accuracies supported by
		      the hardware.

       --mode=1	      The  clock  locking  algorithm  leverages	  close	  loop
		      controllers   to	 achieve   frequency  accuracies  with
		      improved	perf   per   watt   for	  certain   class   of
		      applications.  Due  to convergence latency of close loop
		      controllers, the frequency accuracies  may  be  slightly
		      lower than default mode 0.

   -lmc, --lock-memory-clocks=MIN_MEMORY_CLOCK,MAX_MEMORY_CLOCK
       Specifies  <minMemClock,maxMemClock>  clocks as a pair (e.g. 5100,5100)
       that defines the	range of desired locked	Memory	clock  speed  in  MHz.
       Input   can   also   be	 a   singular	desired	  clock	  value	 (e.g.
       <MemClockValue>). Requires root.	Note: this option  does	 not  work  on
       GPUs  based  on	NVIDIA	Hopper architectures; to lock memory clocks on
       those systems use --lock-memory-clocks-deferred instead.

   -rgc, --reset-gpu-clocks
       Resets the GPU clocks  to  the  default	value.	Supported  on  Volta+.
       Requires	root.

   -rmc, --reset-memory-clocks
       Resets  the  memory  clocks  to the default value. Supported on Volta+.
       Requires	root.

   -ac,	--applications-clocks=MEM_CLOCK,GRAPHICS_CLOCK
       This option is deprecated and will be  removed  in  in  a  future  CUDA
       release.	Please use -lmc	for locking memory clocks and -lgc for locking
       graphics	 clocks.  Specifies maximum <memory,graphics> clocks as	a pair
       (e.g. 2000,800) that defines GPU's speed	while running applications  on
       a  GPU.	Supported on Maxwell-based GeForce and from the	Kepler+	family
       in Tesla/Quadro/Titan devices. Requires root.

   -rac, --reset-applications-clocks
       This option is deprecated and will be  removed  in  in  a  CUDA	future
       release.	Resets the applications	clocks to the default value. Supported
       on   Maxwell-based   GeForce   and   from   the	 Kepler+   family   in
       Tesla/Quadro/Titan devices. Requires root.

   -lmcd, --lock-memory-clocks-deferred
       Specifies the memory clock that	defines	 the  closest  desired	Memory
       Clock  in  MHz.	The memory clock takes effect the next time the	GPU is
       initialized. This can be	guaranteed  by	unloading  and	reloading  the
       kernel module. Requires root.

   -rmcd, --reset-memory-clocks-deferred
       Resets  the  memory clock to default value. Driver unload and reload is
       required	for this to take effect. This can be  done  by	unloading  and
       reloading the kernel module. Requires root.

   -pl,	--power-limit=POWER_LIMIT
       Specifies  maximum  power  limit	in watts. Accepts integer and floating
       point numbers. it takes an optional argument --scope. Only on supported
       devices from Kepler family. Value needs to be between Min and Max Power
       Limit as	reported by nvidia-smi.	Requires root.

   -sc,	--scope=0/GPU, 1/TOTAL_MODULE
       Specifies the scope of the power	 limit.	 Following  are	 the  options:
       0/GPU:  This  only  changes  power  limits  for the GPU.	1/Module: This
       changes the power limits	for the	module containing multiple components.
       E.g. GPU	and CPU.

   -cc,	--cuda-clocks=MODE
       Overrides or restores default CUDA  clocks.  Available  arguments  are:
       0\|RESTORE_DEFAULT or 1\|OVERRIDE. Requires root.

   -am,	--accounting-mode=MODE
       Enables	or  disables  GPU Accounting. With GPU Accounting one can keep
       track of	usage of resources throughout lifespan of  a  single  process.
       Only  on	 supported  devices from Kepler	family.	Requires administrator
       privileges. Available arguments are 0\|DISABLED or 1\|ENABLED.

   -caa, --clear-accounted-apps
       Clears all processes accounted so far. Only on supported	 devices  from
       Kepler family. Requires administrator privileges.

   --auto-boost-default=MODE
       This option is deprecated and will be removed in	a future CUDA release.
       Set the default auto boost policy to 0/DISABLED or 1/ENABLED, enforcing
       the change only after the last boost client has exited. Only on certain
       Tesla  devices  from  the  Kepler+  family  and	Maxwell-based  GeForce
       devices.	Requires root.

   --auto-boost-permission=MODE
       This option is deprecated and will be removed in	a future CUDA release.
       Allow non-admin/root control over auto boost mode. Available  arguments
       are  0\|UNRESTRICTED, 1\|RESTRICTED. Only on certain Tesla devices from
       the Kepler+ family and Maxwell-based GeForce devices. Requires root.

   -mig, --multi-instance-gpu=MODE
       Enables or disables Multi Instance GPU mode. Only supported on  devices
       based  on  the  NVIDIA  Ampere  architecture.  Requires root. Available
       arguments are 0\|DISABLED or 1\|ENABLED.

   -gtt, --gpu-target-temp=MODE
       Set GPU Target  Temperature  for	 a  GPU	 in  degrees  celsius.	Target
       temperature  should be within limits supported by GPU. These limits can
       be retrieved by	using  query  option  with  SUPPORTED_GPU_TARGET_TEMP.
       Requires	Root.

   --set-hostname=hostname
       Set  the	hostname associated with device. Should	be a maximum length of
       64 characters (including	 the  terminating  NULL	 character).  Requires
       root.

   --get-hostname
       Retrieves the hostname associated with the device.

   [plus optionally]
   -i, --id=ID
       Modify  a  single specified GPU.	The specified id may be	the GPU/Unit's
       0-based index in	the natural enumeration	returned by  the  driver,  the
       GPU's  board serial number, the GPU's UUID, or the GPU's	PCI bus	ID (as
       domain:bus:device.function  in  hex).  It  is  recommended  that	 users
       desiring	 consistency  use  either  UUID	 or  PCI  bus ID, since	device
       enumeration ordering is not guaranteed to be consistent between reboots
       and board serial	number might be	shared between multiple	 GPUs  on  the
       same board.

   -eom, --error-on-warning
       Return a	non-zero error for warnings.

   UNIT	MODIFICATION OPTIONS
   -t, --toggle-led=STATE
       Set  the	 LED  indicator	state on the front and back of the unit	to the
       specified color.	See the	(UNIT ATTRIBUTES) section for a	description of
       the LED states. Allowed colors  are  0\|GREEN  and  1\|AMBER.  Requires
       root.

   [plus optionally]
   -i, --id=ID
       Modify  a single	specified Unit.	The specified id is the	Unit's 0-based
       index in	the natural enumeration	returned by the	driver.

   SHOW	DTD OPTIONS
   --dtd
       Display Device or Unit DTD.

   [plus optionally]
   -f FILE, --filename=FILE
       Redirect	query output to	the specified file in  place  of  the  default
       stdout. The specified file will be overwritten.

   -u, --unit
       Display Unit DTD	instead	of device DTD.

   topo
       Display topology	information about the system. Use "nvidia-smi topo -h"
       for more	information. Linux only. Shows all GPUs	NVML is	able to	detect
       but  CPU	and NUMA node affinity information will	only be	shown for GPUs
       with Kepler or newer architectures. Note: GPU enumeration is  the  same
       as NVML.

   drain
       Display	and modify the GPU drain states. A drain state is one in which
       the GPU is no longer accepting new clients, and is used while preparing
       to power	down the GPU. Use "nvidia-smi drain -h"	for more  information.
       Linux only.

   nvlink
       Display	nvlink	information.  Use  "nvidia-smi	nvlink	-h"  for  more
       information.

   clocks
       Query and control clocking behavior. Use	"nvidia-smi clocks --help" for
       more information.

   vgpu
       Display information on GRID virtual GPUs. Use "nvidia-smi vgpu -h"  for
       more information.

   mig
       Provides	 controls  for	MIG  management.  "nvidia-smi mig -h" for more
       information.

   boost-slider
       Provides	controls for  boost  sliders  management.  "nvidia-smi	boost-
       slider -h" for more information.

   power-hint
       Provides	 queries  for  power hint. "nvidia-smi power-hint -h" for more
       information.

   conf-compute
       Provides	control	and  queries  for  confidential	 compute.  "nvidia-smi
       conf-compute -h"	for more information.

   power-smoothing
       Provides	 controls  and	information  for  power	smoothing. "nvidia-smi
       power-smoothing -h" for more information.

   power-profiles
       Profiles	controls and information for workload power profiles. "nvidia-
       smi power-profiles -h" for more information.

   encodersessions
       Display Encoder Sessions	information. "nvidia-smi  encodersessions  -h"
       for more	information.

RETURN VALUE
       Return code reflects whether the	operation succeeded or failed and what
       was the reason of failure.

        Return	code 0 - Success

        Return	code 2 - A supplied argument or	flag is	invalid

        Return	 code  3  - The	requested operation is not available on	target
	 device

        Return	code 4 - The current user does not have	permission  to	access
	 this device or	perform	this operation

        Return	code 6 - A query to find an object was unsuccessful

        Return	 code  8  -  A device's	external power cables are not properly
	 attached

        Return	code 9 - NVIDIA	driver is not loaded

        Return	code 10	- NVIDIA Kernel	detected an interrupt issue with a GPU

        Return	code 12	- NVML Shared Library couldn't be found	or loaded

        Return	code 13	 -  Local  version  of	NVML  doesn't  implement  this
	 function

        Return	code 14	- infoROM is corrupted

        Return	 code  15  -  The  GPU has fallen off the bus or has otherwise
	 become	inaccessible

        Return	code 255 - Other error or internal driver error	occurred

GPU ATTRIBUTES
       The following list describes all	 possible  data	 returned  by  the  -q
       device  query  option. Unless otherwise noted all numerical results are
       base 10 and unitless.

   Timestamp
       The current system timestamp at the time	nvidia-smi was invoked.	Format
       is "Day-of-week Month Day HH:MM:SS Year".

   Driver Version
       The version  of	the  installed	NVIDIA	display	 driver.  This	is  an
       alphanumeric string.

   CUDA	Version
       The  latest  CUDA version supported by the driver. This is usually, but
       not always, the version of the CUDA toolkit installed  on  the  system.
       This is an alphanumeric string.

   Attached GPUs
       The number of NVIDIA GPUs in the	system.

   Product Name
       The  official  product name of the GPU. This is an alphanumeric string.
       For all products.

   Product Brand
       The official brand of the GPU. This is an alphanumeric string. For  all
       products.

   Product Architecture
       The  official  architecture  name  of  the GPU. This is an alphanumeric
       string. For all products.

   Display Mode
       This field is deprecated, and will be removed in	a future release.

   Display Attached
       A flag that indicates whether a	physical  display  (e.g.  monitor)  is
       currently  connected to any of the GPU's	connectors. "Yes" indicates an
       attached	display. "No" indicates	otherwise.

   Display Active
       A flag that indicates whether a display is  initialized	on  the	 GPU's
       (e.g.  memory  is  allocated on the device for display).	Display	can be
       active even when	no monitor is physically attached. "Enabled" indicates
       an active display. "Disabled" indicates otherwise.

   Persistence Mode
       A flag that indicates whether persistence mode is enabled for the  GPU.
       Value  is  either  "Enabled"  or	 "Disabled".  When persistence mode is
       enabled the NVIDIA driver remains loaded	even when no  active  clients,
       such  as	 X11  or  nvidia-smi,  exist.  This  minimizes the driver load
       latency associated with running dependent apps, such as CUDA  programs.
       For all CUDA-capable products. Linux only.

   Addressing Mode
       A  field	 that indicates	which addressing mode is currently active. The
       value is	"ATS" or "HMM" or "None".  When	 the  mode  is	"ATS",	system
       allocated  memory  like	malloc is addressable from the GPU via Address
       Translation Services. This means	there is effectively a single  set  of
       page  tables  used by both the CPU and the GPU. When the	mode is	"HMM",
       system allocated	memory like malloc is addressable  from	 the  GPU  via
       software-based mirroring	of the CPU's page tables, on the GPU. When the
       mode is "None", neither ATS nor HMM is active. Linux only.

   MIG Mode
       MIG Mode	configuration status

       Current	      MIG mode currently in use	- NA/Enabled/Disabled

       Pending	      Pending configuration of MIG Mode	- Enabled/Disabled

   MIG Device
       When  MIG  is  enabled,	each  MIG  device has the following attributes
       displayed:

       Index	      Unique identifier	for this MIG device within its	parent
		      GPU.

       GPU Instance ID
		      Identifier  of  the  GPU	instance  that this MIG	device
		      belongs to.

       Compute Instance	ID
		      Identifier  of  the  compute  instance  within  the  GPU
		      instance.

       Device Attributes
		      Hardware engines allocated to this MIG device. These are
		      shared  among compute instances associated with the same
		      GPU instance.

	   Multiprocessor count

	       Number of SMs (Streaming	Multiprocessors)

	   Copy	Engine count

	       Number of copy engines

	   Encoder count

	       Number of video encoders

	   Decoder count

	       Number of video decoders

	   OFA count

	       Number of OFAs (Optical Flow Accelerators)

	   JPG count

	       Number of JPEG encoders/decoders

       ECC Errors     ECC error	counts for this	MIG device.

	   SRAM	uncorrectable errors

	       Number of uncorrectable errors detected in any of the SRAMs.

       Shared FB Memory	Usage
		      FB memory	allocation and usage of	this MIG device.  This
		      is  shared  among	 the compute instances associated with
		      the same GPU instance.

	   Total

	       Total size of FB	memory.

	   Reserved

	       Reserved	size of	FB memory.

	   Used

	       Used size of FB memory.

	   Free

	       Available size of FB memory.

       Shared BAR1 Memory
		      BAR1 memory allocation and usage	of  this  MIG  device.
		      This  is	shared	among the compute instances associated
		      with the same GPU	instance.

	   Total

	       Total size of BAR1 memory.

	   Used

	       Used size of BAR1 memory.

	   Free

	       Available size of BAR1 memory.

   Accounting Mode
       A flag that indicates whether accounting	mode is	enabled	for  the  GPU.
       Value  is  either  "Enabled"  or	"Disabled". When accounting is enabled
       statistics are calculated for each compute process running on the  GPU.
       Statistics  can	be queried during the lifetime or after	termination of
       the process. The	execution time of process is reported as 0  while  the
       process	is in running state and	updated	to actual execution time after
       the process has terminated. See	--help-query-accounted-apps  for  more
       info.

   Accounting Mode Buffer Size
       Returns	the  size  of the circular buffer that holds list of processes
       that can	be queried for accounting stats. This is the maximum number of
       processes  that	accounting  information	 will  be  stored  for	before
       information  about oldest processes will	get overwritten	by information
       about new processes.

   Driver Model
       On Windows, the TCC and WDDM driver models are  supported.  The	driver
       model  can  be  changed	with the (-dm) or (-fdm) flags.	The TCC	driver
       model is	optimized for compute applications. I.E. kernel	 launch	 times
       will  be	 quicker  with	TCC.  The  WDDM	 driver	 model is designed for
       graphics	applications and is not	recommended for	compute	 applications.
       Linux does not support multiple driver models, and will always have the
       value of	"N/A".

       Current	      The  driver  model  currently  in	 use.  Always "N/A" on
		      Linux.

       Pending	      The driver model that will be used on the	 next  reboot.
		      Always "N/A" on Linux.

   Serial Number
       This number matches the serial number physically	printed	on each	board.
       It is a globally	unique immutable alphanumeric value.

   GPU UUID
       This  value is the globally unique immutable alphanumeric identifier of
       the GPU.	It does	not correspond to any physical label on	the board.

   GPU PDI
       This value is the Per Device Identifier of the  GPU.  It	 is  a	64-bit
       value that provides uniqueness guarantee	for the	GPU.

   Minor Number
       The  minor  number  for	the device is such that	the NVIDIA device node
       file for	 each  GPU  will  have	the  form  /dev/nvidia[minor  number].
       Available only on Linux platform.

   VBIOS Version
       The BIOS	of the GPU board.

   MultiGPU Board
       Whether or not this GPU is part of a multiGPU board.

   Board ID
       The  unique  board  ID assigned by the driver. If two or	more GPUs have
       the same	board ID and the above "MultiGPU" field	is true	then the  GPUs
       are on the same board.

   Board Part Number
       The unique part number of the GPU's board

   GPU Part Number
       The unique part number of the GPU

   FRU Part Number
       Unique FRU part number of the GPU

   Platform Info
       Platform	 Information  are  compute tray	platform specific information.
       They are	GPU's positional index and platform identifying	information.

       Chassis Serial Number

       Serial Number of	the chassis containing this GPU.

       Slot Number

       The slot	number in the chassis containing this GPU (includes switches).

       Tray Index

       The tray	index within the compute slots in the chassis containing  this
       GPU (does not include switches).

       Host ID

       Index of	the node within	the slot containing this GPU.

       Peer Type

       Platform	indicated NVLink-peer type (e.g. switch	present	or not).

       Module Id

       ID of this GPU within the node.

       GPU Fabric GUID

       Fabric ID for this GPU.

   Inforom Version
       Version numbers for each	object in the GPU board's inforom storage. The
       inforom	is  a  small, persistent store of configuration	and state data
       for the GPU. All	inforom	version	fields are numerical. It can be	useful
       to know these version  numbers  because	some  GPU  features  are  only
       available with inforoms of a certain version or higher.

       If  any	of  the	 fields	 below return Unknown Error additional Inforom
       verification check is performed	and  appropriate  warning  message  is
       displayed.

       Image Version  Global  version of the infoROM image. Image version just
		      like VBIOS version uniquely describes the	exact  version
		      of  the  infoROM	flashed	 on  the  board	in contrast to
		      infoROM object version which is  only  an	 indicator  of
		      supported	features.

       OEM Object     Version for the OEM configuration	data.

       ECC Object     Version for the ECC recording data.

       Power Management	Object
		      Version for the power management data.

       Inforom checksum	validation
		      Inforom  checksum	validation ("valid", "invalid",	"N/A")
		      Only	   available	      via	   --query-gpu
		      inforom.checksum_validation

   Inforom BBX Object Flush
       Information about flushing of the blackbox data to the inforom storage.

       Latest Timestamp
		      The  timestamp  of  the  latest  flush of	the BBX	Object
		      during the current run.

       Latest Duration
		      The duration of the  latest  flush  of  the  BBX	Object
		      during the current run.

   GPU Operation Mode
       GOM  allows  one	 to  reduce power usage	and optimize GPU throughput by
       disabling GPU features.

       Each GOM	is designed to meet specific user needs.

       In "All On" mode	everything is enabled and running at full speed.

       The "Compute" mode is designed for running only compute tasks. Graphics
       operations are not allowed.

       The "Low	Double	Precision"  mode  is  designed	for  running  graphics
       applications that don't require high bandwidth double precision.

       GOM can be changed with the (--gom) flag.

       Supported  on  GK110 M-class and	X-class	Tesla products from the	Kepler
       family. Not supported on	Quadro and Tesla C-class products. Low	Double
       Precision  and  All On modes are	the only modes available for supported
       GeForce Titan products.

       Current	      The GOM currently	in use.

       Pending	      The GOM that will	be used	on the next reboot.

   GPU C2C Mode
       The C2C mode of the GPU.

   GPU Reset Status
       Reset status of the GPU.	This functionality is deprecated.

       Reset Required Requested	functionality has been deprecated

       Drain and Reset Recommended
		      Requested	functionality has been deprecated

   GPU Recovery	Action
       Action to take to clear fault  that  previously	happened.  It  is  not
       intended	for determining	which fault triggered recovery action.
       Possible	values:	None, Reset, Reboot, Drain P2P,	Drain and Reset

       None

       No recovery action needed

       Reset

       Example scenario	- Uncontained HBM/SRAM UCE
       The GPU has encountered a fault that requires a reset to	recover.
       Terminate  all  GPU processes, reset the	GPU using 'nvidia-smi -r', and
       the GPU can be used again by starting new GPU processes.

       Reboot

       Example scenario	- UVM fatal error
       The GPU has encountered a fault may have	left the OS in an inconsistent
       state.
       Reboot the operating system to restore the  OS  back  to	 a  consistent
       state.
       Node reboot required.
       Application cannot restart without node reboot
       OS warm reboot is sufficient (no	need for AC/DC cycle)

       Drain P2P

       Example scenario	- N/A
       The  GPU	has encountered	a fault	that requires all peer-to-peer traffic
       to be quiesced.
       Terminate all GPU  processes  that  conduct  peer-to-peer  traffic  and
       disable UVM persistence mode.
       Disable	job  scheduling	 (no  new  jobs),  stop	 all applications when
       convenient, if persistence mode is enabled, disable it
       Once    all     peer-to-peer	traffic	    are	    drained,	 query
       NVML_FI_DEV_GET_GPU_RECOVERY_ACTION again, which	will return one	of the
       other actions.
       If still	DRAIN_P2P, then	GPU reset.

       Drain and Reset

       Example scenario	- Contained HBM	UCE
       Reset Recommended.
       The  GPU	 has  encountered  a fault that	results	the GPU	to temporarily
       operate at a reduced capacity, such as part of its frame	buffer	memory
       being offlined, or some of its MIG partitions down.
       No  new	work  should  be  scheduled on the GPU,	but existing work that
       didn't get affected are safe to continue	until they finish or  reach  a
       good checkpoint.
       Safe  to	 restart  application  (memory capacity	will be	reduced	due to
       dynamic page offlining),	but need  to  eventually  reset	 (to  get  row
       remap).
       Asserted	only for UCE row remaps.
       After  all existing work	have drained, reset the	GPU to regain its full
       capacity.

   GSP Firmware	Version
       Firmware	version	of GSP.	This is	an alphanumeric	string.

   PCI
       Basic PCI info for the device. Some  of	this  information  may	change
       whenever	cards are added/removed/moved in a system. For all products.

       Bus	      PCI bus number, in hex

       Device	      PCI device number, in hex

       Domain	      PCI domain number, in hex

       Base Classcode PCI Base classcode, in hex

       Sub Classcode  PCI Sub classcode, in hex

       Device Id      PCI vendor device	id, in hex

       Sub System Id  PCI Sub System id, in hex

       Bus Id	      PCI bus id as "domain:bus:device.function", in hex

   GPU Link information
       The PCIe	link generation	and bus	width

       Current	      The  current  link  generation  and  width. These	may be
		      reduced when the GPU is not in use.

       Max	      The maximum link generation and width possible with this
		      GPU and system configuration. For	example,  if  the  GPU
		      supports	a  higher  PCIe	 generation  than  the	system
		      supports then this reports the system PCIe generation.

   Bridge Chip
       Information related to Bridge Chip  on  the  device.  The  bridge  chip
       firmware	 is  only  present on certain boards and may display "N/A" for
       some newer multiGPUs boards.

       Type	      The type of bridge chip.	Reported  as  N/A  if  doesn't
		      exist.

       Firmware	Version
		      The firmware version of the bridge chip. Reported	as N/A
		      if doesn't exist.

   Replays Since Reset
       The number of PCIe replays since	reset.

   Replay Number Rollovers
       The number of PCIe replay number	rollovers since	reset. A replay	number
       rollover	 occurs	 after 4 consecutive replays and results in retraining
       the link.

   Tx Throughput
       The GPU-centric transmission throughput across the  PCIe	 bus  in  MB/s
       over the	past 20ms. Only	supported on Maxwell architectures and newer.

   Rx Throughput
       The GPU-centric receive throughput across the PCIe bus in MB/s over the
       past 20ms. Only supported on Maxwell architectures and newer.

   Atomic Caps
       The PCIe	atomic capabilities of outbound/inbound	operations of the GPU.

   Fan Speed
       The  fan	 speed	value  is  the	percent	of the product's maximum noise
       tolerance fan speed that	the device's fan is currently intended to  run
       at.  This  value	 may  exceed 100% in certain cases. Note: The reported
       speed is	the intended fan speed.	If the fan is physically  blocked  and
       unable  to  spin, this output will not match the	actual fan speed. Many
       parts do	not report fan speeds because they rely	on cooling via fans in
       the surrounding enclosure. For all  discrete  products  with  dedicated
       fans.

   Performance State
       The  current  performance  state	 for  the  GPU.	 States	 range from P0
       (maximum	performance) to	P12 (minimum performance).

   Clocks Event	Reasons
       Retrieves information about factors that	are reducing the frequency  of
       clocks.

       If  all event reasons are returned as "Not Active" it means that	clocks
       are running as high as possible.

       Idle	      This option is deprecated	 and  will  be	removed	 in  a
		      future  CUDA  release. Nothing is	running	on the GPU and
		      the clocks are dropping to Idle state.

       Application Clocks Setting
		      This option is deprecated	 and  will  be	removed	 in  a
		      future   CUDA   release.	 GPU  clocks  are  limited  by
		      applications clocks setting. E.g.	can be	changed	 using
		      nvidia-smi  --applications-clocks=<Desired Clock Freq in
		      MHz>

       SW Power	Cap   SW Power Scaling algorithm is reducing the clocks	 below
		      requested	 clocks	 because the GPU is consuming too much
		      power. E.g. SW power  cap	 limit	can  be	 changed  with
		      nvidia-smi --power-limit=<Power Limit Value in W>

       HW Slowdown    This  option  will  be removed a future CUDA release. HW
		      Slowdown is engaged,  reducing  the  core	 clocks	 by  a
		      factor  of  2 or more. It	is active if either HW Thermal
		      Slowdown or HW Power Brake are active.

       HW Thermal Slowdown
		      HW Thermal Slowdowns are reducing	the core clocks	 by  a
		      factor of	2 or more due to temperature being too high.

       HW Power	Brake External Power Brake Assertion is	triggered (e.g.	by the
		      system power supply).

       Sync Boost     This  GPU	 has  been  added  to  a Sync boost group with
		      nvidia-smi or DCGM in order to maximize performance  per
		      watt.  All  GPUs	will be	limited	by the frequency which
		      can be achieved by the slowest GPU. Look at the throttle
		      reasons for other	GPUs in	the system to  see  why	 those
		      GPUs are holding this one	at lower clocks.

       SW Thermal Slowdown
		      SW  Thermal  capping  algorithm is reducing clocks below
		      requested	clocks because GPU temperature is higher  than
		      Max Operating Temp

       Display Clock Setting
		      This field will be removed in a future CUDA release. GPU
		      clocks are limited by current setting of Display clocks.
		      Only supported on	Volta devices.

   Clock Event Reasons Counters
       Counters,  in  microseconds,  for  the amount of	time factors have been
       reducing	the frequency of clocks.

       SW Power	Capping
		      Amount of	time SW	Power Scaling  algorithm  has  reduced
		      the  clocks  below  requested clocks because the GPU was
		      consuming	too much power.

       Sync Boost Group
		      Amount of	time the  clock	 frequency  of	this  GPU  was
		      reduced  to  match the minimum possible clock across the
		      sync boost group.

       SW Thermal Slowdown
		      Amount of	time SW	Thermal	capping	algorithm has  reduced
		      clocks  below  requested	clocks because GPU temperature
		      was higher than Max Operating Temp.

       HW Thermal Slowdown
		      Amount of	time HW	Thermal	Slowdown was engaged, reducing
		      the core clocks by  a  factor  of	 2  or	more,  due  to
		      temperature being	too high.

       HW Power	Braking
		      Amount  of  time	External  Power	 Brake	Assertion  was
		      triggered	(e.g. by the system power supply).

   Sparse Operation Mode
       A flag that indicates whether sparse operation mode is enabled for  the
       GPU.  Value is either "Enabled" or "Disabled". Reported as "N/A"	if not
       supported.

   FB Memory Usage
       On-board	frame buffer memory information. Reported total	memory can  be
       affected	 by  ECC state.	If ECC does affect the total available memory,
       memory is decreased by several percent, due  to	the  requisite	parity
       bits. The driver	may also reserve a small amount	of memory for internal
       use,  even  without  active  work on the	GPU. On	systems	where GPUs are
       NUMA nodes, the accuracy	of FB memory utilization provided  by  nvidia-
       smi  depends  on	the memory accounting of the operating system. This is
       because FB memory is managed by the operating  system  instead  of  the
       NVIDIA  GPU  driver.  Typically,	pages allocated	from FB	memory are not
       released	even after the process terminates to enhance  performance.  In
       scenarios  where	 the operating system is under memory pressure,	it may
       resort to utilizing FB memory. Such actions can result in discrepancies
       in the accuracy of memory reporting. For	all products.

       Total	      Total size of FB memory.

       Reserved	      Reserved size of FB memory.

       Used	      Used size	of FB memory.

       Free	      Available	size of	FB memory.

   BAR1	Memory Usage
       BAR1 is used to map the FB (device memory) so that it can  be  directly
       accessed	 by  the CPU or	by 3rd party devices (peer-to-peer on the PCIe
       bus).

       Total	      Total size of BAR1 memory.

       Used	      Used size	of BAR1	memory.

       Free	      Available	size of	BAR1 memory.

   Compute Mode
       The compute mode	flag indicates whether individual or multiple  compute
       applications may	run on the GPU.

       "Default" means multiple	contexts are allowed per device.

       "Exclusive  Process"  means  only  one  context	is allowed per device,
       usable from multiple threads at a time.

       "Prohibited" means no contexts  are  allowed  per  device  (no  compute
       apps).

       "EXCLUSIVE_PROCESS"   was  added	 in  CUDA  4.0.	 Prior	CUDA  releases
       supported  only	one   exclusive	  mode,	  which	  is   equivalent   to
       "EXCLUSIVE_THREAD" in CUDA 4.0 and beyond.

       For all CUDA-capable products.

   Utilization
       Utilization  rates  report  how	busy each GPU is over time, and	can be
       used to determine how much an application is  using  the	 GPUs  in  the
       system. Note: On	MIG-enabled GPUs, querying the utilization of encoder,
       decoder,	jpeg, ofa, gpu,	and memory is not currently supported.

       Note: During driver initialization when ECC is enabled one can see high
       GPU  and	 Memory	 Utilization  readings.	 This  is caused by ECC	Memory
       Scrubbing mechanism that	is performed during driver initialization.

       GPU	      Percent of time over the past sample period during which
		      one or more kernels was executing	on the GPU. The	sample
		      period may be between 1 second and 1/6 second  depending
		      on the product.

       Memory	      Percent of time over the past sample period during which
		      global  (device)	memory	was being read or written. The
		      sample period may	be between 1  second  and  1/6	second
		      depending	on the product.

       Encoder	      Percent of time over the past sample period during which
		      the  GPU's  video	 encoder  was being used. The sampling
		      rate is variable and can be obtained  directly  via  the
		      nvmlDeviceGetEncoderUtilization()	API

       Decoder	      Percent of time over the past sample period during which
		      the  GPU's  video	 decoder  was being used. The sampling
		      rate is variable and can be obtained  directly  via  the
		      nvmlDeviceGetDecoderUtilization()	API

       JPEG	      Percent of time over the past sample period during which
		      the GPU's	JPEG decoder was being used. The sampling rate
		      is  variable  and	 can  be  obtained  directly  via  the
		      nvmlDeviceGetJpgUtilization() API

       OFA	      Percent of time over the past sample period during which
		      the GPU's	OFA (Optical Flow Accelerator) was being used.
		      The sampling  rate  is  variable	and  can  be  obtained
		      directly via the nvmlDeviceGetOfaUtilization() API

   Encoder Stats
       Encoder	Stats  report the count	of active encoder sessions, along with
       the  average  Frames  Per  Second  (FPS)	 and   average	 latency   (in
       microseconds) for all these active sessions on this device.

       Active Sessions
		      The  total  number  of  active  encoder sessions on this
		      device.

       Average FPS    The average  Frame  Per  Sencond	(FSP)  of  all	active
		      encoder sessions on this device.

       Average Latency
		      The  average  latency  in	 microseconds  of  all	active
		      encoder sessions on this device.

   DRAM	Encryption Mode
       A flag that indicates whether DRAM Encryption support is	 enabled.  May
       be  either  "Enabled"  or  "Disabled".  Changes to DRAM Encryption mode
       require a reboot. Requires Inforom ECC object.

       Current	      The DRAM Encryption  mode	 that  the  GPU	 is  currently
		      operating	under.

       Pending	      The DRAM Encryption mode that the	GPU will operate under
		      after the	next reboot.

   ECC Mode
       A  flag	that  indicates	 whether ECC support is	enabled. May be	either
       "Enabled" or "Disabled".	Changes	to ECC mode require a reboot. Requires
       Inforom ECC object version 1.0 or higher.

       Current	      The ECC mode that	the GPU	is currently operating under.

       Pending	      The ECC mode that	the GPU	will operate under  after  the
		      next reboot.

   ECC Errors
       NVIDIA  GPUs  can provide error counts for various types	of ECC errors.
       Some ECC	errors are either single  or  double  bit,  where  single  bit
       errors  are  corrected and double bit errors are	uncorrectable. Texture
       memory errors may be correctable	via resend  or	uncorrectable  if  the
       resend	fails.	These  errors  are  available  across  two  timescales
       (volatile and aggregate).  Single  bit  ECC  errors  are	 automatically
       corrected  by  the  HW and do not result	in data	corruption. Double bit
       errors are detected but not corrected. Please see the ECC documents  on
       the web for information on compute application behavior when double bit
       errors  occur.  Volatile	 error	counters  track	 the  number of	errors
       detected	since the last driver load.  Aggregate	error  counts  persist
       indefinitely and	thus act as a lifetime counter.

       A  note	about  volatile	 counts:  On Windows this is once per boot. On
       Linux this can be more frequent.	On Linux the driver  unloads  when  no
       active clients exist. Hence, if persistence mode	is enabled or there is
       always a	driver client active (e.g. X11), then Linux also sees per-boot
       behavior.  If not, volatile counts are reset each time a	compute	app is
       run.

       Tesla and Quadro	products pre-volta can display total ECC error counts,
       as well as a breakdown of errors	based on location  on  the  chip.  The
       locations  are described	below. Location-based data for aggregate error
       counts requires Inforom ECC object version 2.0. All  other  ECC	counts
       require ECC object version 1.0.

       Device Memory  Errors detected in global	device memory.

       Register	File  Errors detected in register file memory.

       L1 Cache	      Errors detected in the L1	cache.

       L2 Cache	      Errors detected in the L2	cache.

       Texture Memory Parity errors detected in	texture	memory.

       Total	      Total  errors detected across entire chip. Sum of	Device
		      Memory, Register File, L1	Cache, L2  Cache  and  Texture
		      Memory.

       On Turing the output is such:

       SRAM Correctable
		      Number  of  correctable  errors  detected	 in any	of the
		      SRAMs

       SRAM Uncorrectable
		      Number of	uncorrectable errors detected in  any  of  the
		      SRAMs

       DRAM Correctable
		      Number of	correctable errors detected in the DRAM

       DRAM Uncorrectable
		      Number of	uncorrectable errors detected in the DRAM

       On  Ampere+  The	 categorization	of SRAM	errors has been	expanded upon.
       SRAM errors are now categorized as either  parity  or  SEC-DED  (single
       error  correctable/double error detectable) depending on	which unit hit
       the error. A histogram has been added that categorizes  what  unit  hit
       the  SRAM  error.  Additionally a flag has been added that indicates if
       the threshold for the specific SRAM has been exceeded.

       SRAM Uncorrectable Parity
		      Number of	uncorrectable errors detected  in  SRAMs  that
		      are parity protected

       SRAM Uncorrectable SEC-DED
		      Number  of  uncorrectable	 errors	detected in SRAMs that
		      are SEC-DED protected

       Aggregate Uncorrectable SRAM Sources
		      Details about the	 sources  of  Aggregate	 uncorrectable
		      SRAM errors

       SRAM L2	      Errors that occurred in the L2 cache

       SRAM SM	      Errors that occurred in the SM

       SRAM Microcontroller
		      Errors  that  occurred  in  a  microcontroller  (PMU/GSP
		      etc...)

       SRAM PCIE      Errors that occrred in any PCIE related unit

       SRAM Other     Errors occuring in anything else not covered above

       If one of the repair flags is pending, check the	 GPU  Recovery	action
       and take	the appropriate	steps.

       Channel Repair Pending
		      Indicates	if a Channel repair is pending

       TPC Repair Pending
		      Indicates	if a TPC repair	is pending

       Unrepairable Memory
		      Indicates	if there is unrepairable memory

   Page	Retirement
       NVIDIA  GPUs  can  retire  pages	 of GPU	device memory when they	become
       unreliable. This	can happen when	multiple single	bit ECC	 errors	 occur
       for  the	 same  page,  or  on  a	 double	 bit ECC error.	When a page is
       retired,	the NVIDIA driver  will	 hide  it  such	 that  no  driver,  or
       application memory allocations can access it.

       Double  Bit  ECC	 The  number of	GPU device memory pages	that have been
       retired due to a	double bit ECC error.

       Single Bit ECC The number of GPU	device memory  pages  that  have  been
       retired due to multiple single bit ECC errors.

       Pending	Checks if any GPU device memory	pages are pending blacklist on
       the next	reboot.	Pages that are retired but  not	 yet  blacklisted  can
       still be	allocated, and may cause further reliability issues.

   Row Remapper
       NVIDIA  GPUs  can  remap	 rows  of  GPU	device memory when they	become
       unreliable. This	can happen when	a single uncorrectable	ECC  error  or
       multiple	 correctable  ECC  errors occur	on the same row. When a	row is
       remapped, the NVIDIA driver will	remap the faulty  row  to  a  reserved
       row.  All  future  accesses  to	the  row  will access the reserved row
       instead of the faulty row. This feature is available on Ampere+

       Correctable Error The number of rows that have  been  remapped  due  to
       correctable ECC errors.

       Uncorrectable  Error  The number	of rows	that have been remapped	due to
       uncorrectable ECC errors.

       Pending Indicates whether or not	a row is pending  remapping.  The  GPU
       must be reset for the remapping to go into effect.

       Remapping Failure Occurred Indicates whether or not a row remapping has
       failed in the past.

       Bank  Remap  Availability Histogram Each	memory bank has	a fixed	number
       of reserved rows	that can be used for row remapping. The	histogram will
       classify	the remap  availability	 of  each  bank	 into  Maximum,	 High,
       Partial,	 Low  and  None.  Maximum availability means that all reserved
       rows are	available for remapping	while None means that no reserved rows
       are available. Correctable  row	remappings  don't  count  towards  the
       availability  histogram	since  row  remappings	due to correctable row
       remappings can be evicted by an uncorrectable row remapping.

   Temperature
       Readings	from temperature sensors on the	board.	All  readings  are  in
       degrees	C.  Not	all products support all reading types.	In particular,
       products	in module form factors that  rely  on  case  fans  or  passive
       cooling	do  not	 usually  provide  temperature readings. See below for
       restrictions.

       T.Limit:	The T.Limit sensor  measures  the  current  margin  in	degree
       Celsius	to  the	 maximum  operating  temperature. As such it is	not an
       absolute	temperature reading rather a relative measurement.

       Not all products	support	T.Limit	sensor readings.

       When supported, nvidia-smi reports the current T.Limit temperature as a
       signed value that counts	down. A	T.Limit	temperature of 0  C  or	 lower
       indicates  that	the  GPU  may  optimize	 its  clock  based  on thermal
       conditions. Further, when the T.Limit sensor  is	 supported,  available
       temperature  thresholds	are  also  reported  relative  to T.Limit (see
       below) instead of absolute measurements.

       GPU Current Temp
		      Core GPU	temperature.  For  all	discrete  and  S-class
		      products.

       GPU T.Limit Temp
		      Current  margin  in degrees Celsius from the maximum GPU
		      operating	temperature.

       GPU Shutdown Temp
		      The temperature at which a GPU will shutdown.

       GPU Shutdown T.Limit Temp
		      The T.Limit temperature below which a GPU	may  shutdown.
		      Since  shutdown  can  only  triggered by the maximum GPU
		      temperature it is	possible for the current T.Limit to be
		      more negative than this threshold.

       GPU Slowdown Temp
		      The temperature at which a GPU HW	will begin  optimizing
		      clocks due to thermal conditions,	in order to cool.

       GPU Slowdown T.Limit Temp
		      The  T.Limit  temperature	 at  or	below which GPU	HW may
		      optimize its clocks for thermal conditions.  Since  this
		      clock  adjustment	 can only triggered by the maximum GPU
		      temperature it is	possible for the current T.Limit to be
		      more negative than this threshold.

       GPU Max Operating Temp
		      The temperature at which GPU SW will optimize its	 clock
		      for thermal conditions.

       GPU Max Operating T.Limit Temp
		      The T.Limit temperature below which GPU SW will optimize
		      its clock	for thermal conditions.

       Memory Current Temp
		      Current  temperature  of	GPU  memory. Only available on
		      supported	devices.

       Memory Max Operating Temp
		      The temperature at which GPU SW will optimize its	memory
		      clocks  for  thermal  conditions.	 Only	available   on
		      supported	devices.

   GPU Power Readings
       Power  readings	help  to  shed light on	the current power usage	of the
       GPU, and	the factors that affect	that usage. When power	management  is
       enabled the GPU limits power draw under load to fit within a predefined
       power envelope by manipulating the current performance state. See below
       for limits of availability.

       Average Power Draw
		      The average power	draw for the entire board for the last
		      second,  in  watts.  Only	 supported  on	Ampere (except
		      GA100) or	newer devices.

       Instantaneous Power Draw
		      The last measured	power draw for the  entire  board,  in
		      watts.

       Requested Power Limit
		      The  power limit requested by software, in watts.	Set by
		      software such as nvidia-smi. Power Limit can be adjusted
		      using -pl,--power-limit= switches.

       Enforced	Power Limit
		      The  power  management  algorithm's  power  ceiling,  in
		      watts.  Total  board  power  draw	 is manipulated	by the
		      power management algorithm such that it stays under this
		      value. This limit	is the minimum of various limits  such
		      as the software limit listed above.

       Default Power Limit
		      The  default power management algorithm's	power ceiling,
		      in watts.	Power Limit will be set	back to	Default	 Power
		      Limit after driver unload.

       Min Power Limit
		      The  minimum  value in watts that	power limit can	be set
		      to.

       Max Power Limit
		      The maximum value	in watts that power limit can  be  set
		      to.

   Module Power	Readings
       Power  readings	help  to  shed light on	the current power usage	of the
       Module, and the factors that affect that	 usage.	 A  module  is	GPU  +
       supported NVIDIA	CPU + other components which consume power. When power
       management  is  enabled,	the Module limits power	draw under load	to fit
       within  a  predefined  power  envelope  by  manipulating	 the   current
       performance state. Supported on Hopper and newer	datacenter products.

       Average Power Draw
		      The  average  power  draw	 for the entire	module for the
		      last second, in watts.

       Instantaneous Power Draw
		      The last measured	power draw for the entire  module,  in
		      watts.

       Requested Power Limit
		      The power	limit requested	by software, in	watts, for the
		      whole  module. Set by software such as nvidia-smi. Power
		      Limit can	be adjusted using -pl,--power-limit=  switches
		      with -s/--scope=1.

       Enforced	Power Limit
		      The  power  management  algorithm's  power  ceiling,  in
		      watts. Total module power	draw  is  manipulated  by  the
		      power management algorithm such that it stays under this
		      value.  This limit is the	minimum	of various limits such
		      as the software limit listed above.

       Default Power Limit
		      The default power	management algorithm's power  ceiling,
		      in watts.	Module Power Limit will	be set back to Default
		      Power Limit after	driver unload.

       Min Power Limit
		      The  minimum  value in watts that	module power limit can
		      be set to.

       Max Power Limit
		      The maximum value	in watts that module power  limit  can
		      be set to.

   GPU Memory Power Readings
       Information about GPU memory power consumption.

       Average Power Draw
		      The average power	draw for the GPU memory	subsystem over
		      the last second, in watts.

       Instantaneous Power Draw
		      The   last  measured  power  draw	 for  the  GPU	memory
		      subsystem, in watts.

   Power Smoothing
       Power Smoothing related definitions  and	 currently  set	 values.  This
       feature	allows	users  to  tune	 power	parameters  to	minimize power
       fluctuations in large datacenter	environments.

       Enabled	      Value is "Yes" if	the feature is enabled and "No"	if the
		      feature is not enabled.

       Delayed Power Smoothing Supported
		      Value is "Yes" if	the Delayed Power Smoothing feature is
		      supported	and "No" if the	feature	is not supported.

       Privilege Level
		      The current privilege for	the user. Value	is 0, 1	or  2.
		      Note  that  the  higher  the  privilege  level, the more
		      information the user will	have access to.

       Immediate Ramp Down
		      Values are "Enabled" or "Disabled".  Indicates  if  ramp
		      down  hysteresis value will be honored (when enabled) or
		      ignored (when disabled).

       Current TMP    The last read value of the Total Module Power, in	watts.

       Current TMP FLoor
		      The last read value of the Total Module Power floor,  in
		      watts.

       Max % TMP Floor
		      The  highest  percentage value for which the Percent TMP
		      Floor can	be set.

       Min % TMP Floor
		      The lowest percentage value for which  the  Percent  TMP
		      Floor can	be set.

       HW Lifetime % Remaining
		      As  this feature is used,	the circuitry which drives the
		      feature wears down. This value gives the	percentage  of
		      the remaining lifetime of	this hardware.

       Current Primary Power Floor
		      The  current value of the	primary	power floor, in	watts.
		      This value is calculated by doing	TMP Ceiling *  (%  TMP
		      FLoor value).

       Current Secondary Power Floor
		      The  current  value  of  the  secondary  power floor, in
		      watts. This is the power floor that  is  applied	during
		      active  workload	periods	 on the	GPU when primary floor
		      activation window	multiplier is set to a non-zero	value.

       Min Primary Floor Activation Offset
		      This is the  minimum  primary  floor  activation	offset
		      accepted	by  the	 driver	 specified in watts. This is a
		      static field.

       Min Primary Floor Activation Point
		      This is the minimum  absolute  raw  value	 specified  in
		      watts  that  the	driver	will use for switching between
		      primary and secondary floor. This	point is calculated as
		      'secondary  power	 floor	+  primary  floor   activation
		      offset',	and  then  computed  value  is floored to 'min
		      primary floor activation point' by  the  driver  at  run
		      time.  This  value  is  used  to avoid setting of	switch
		      point too	low accidentally.

       Window Multiplier
		      This is the multiplier unit specified in	ms  for	 other
		      multipliers  in  the  profile  (primary floor activation
		      window  multiplier  and  primary	floor  target	window
		      multiplier). This	is a static field.

       Number of Preset	Profiles
		      This  value  is  the  total  number  of  Preset Profiles
		      supported.

   Current Profile
       Values for the currently	acvive power smoothing preset profile.

       **% TMP Floor**
		      The percentage of	the TMP	Ceiling, which is used to  set
		      the  TMP floor, for the currently	active preset profile.
		      For example, if max TMP is 1000 W, and the %  TMP	 floor
		      is 50%, then the min TMP value will be 500 W. This value
		      is in the	range [Min % TMP Floor,	Max % TMP Floor].

       Ramp Up Rate   The  ramp	 up  rate, measured in mW/s, for the currently
		      active preset profile.

       Ramp Down Rate The ramp down rate, measured in mW/s, for	the  currently
		      active preset profile.

       Ramp Down Hysteresis
		      The ramp down hysteresis value, in ms, for the currently
		      active preset profile.

       Secondary Power Floor
		      The  secondary  power  floor, measured in	watts, for the
		      currently	active preset profile. This is the power floor
		      that will	be applied during active workload  periods  on
		      the  GPU when primary floor activation window multiplier
		      is set to	a non-zero value.

       Primary Floor Activation	Window Multiplier
		      The time multiplier for the  activation  moving  average
		      window  size  for	 the  currently	active preset profile.
		      This is the 'X' ms time multiplier  for  the  activation
		      moving   average	window	size.  The  activation	moving
		      average is  compared  against  the  (secondary  floor  +
		      primary  floor  activation offset	value) to determine if
		      the controller should switch from	the secondary floor to
		      the primary  floor.  Setting  this  to  0	 will  disable
		      switching	to the secondary floor.

       Primary Floor Target Window Multiplier
		      The time multiplier for the target moving	average	window
		      size  for	 the  currently	active preset profile. This is
		      the 'X' ms time multiplier for the target	moving average
		      window size. When	set  to	 non-zero  value,  the	target
		      moving  average power determines the primary floor. When
		      set to 0,	driver will use	the Floor  percentage  instead
		      to derive	the primary floor.

       Primary Floor Activation	Offset
		      The  primary Floor Activation Offset, measured in	watts,
		      for the currently	active preset profile. If  the	target
		      moving average falls below the secondary floor plus this
		      offset, the primary floor	will be	activated.

       Active Preset Profile Number
		      The number of the	active preset profile.

   Admin Overrides
       Admin  overrides	allow users with sufficient permissions	to preempt the
       values of the currently active preset profile. If an admin override  is
       set  for	one of the fields, then	this value will	be used	instead	of any
       other configured	value.

       **% TMP Floor**
		      The admin	override value for % TMP Floor.	This value  is
		      in the range [Min	% TMP Floor, Max % TMP Floor].

       Ramp Up Rate   The  admin  override value for ramp up rate, measured in
		      mW/s.

       Ramp Down Rate The admin	override value for ramp	down rate, measured in
		      mW/s.

       Ramp Down Hysteresis
		      The admin	override value for ramp	down hysteresis	value,
		      in ms.

       Secondary Power Floor
		      The admin	override  value	 for  secondary	 power	floor,
		      measured	in watts. This is the power floor that will be
		      applied during active workload periods on	the  GPU  when
		      primary  floor  activation window	multiplier is set to a
		      non-zero value.

       Primary Floor Activation	Window Multiplier
		      The admin	override value for primary time	multiplier for
		      the activation moving average window size. This  is  the
		      'X' ms time multiplier for the activation	moving average
		      window  size.  The activation moving average is compared
		      against the (secondary floor + primary floor  activation
		      offset  value)  to  determine  if	 the controller	should
		      switch from the secondary	floor to  the  primary	floor.
		      Setting	this  to  0  will  disable  switching  to  the
		      secondary	floor.

       Primary Floor Target Window Multiplier
		      The admin	override value for primary time	multiplier for
		      the target moving	average	window size. This is  the  'X'
		      ms  time multiplier for the target moving	average	window
		      size. When set to	 non-zero  value,  the	target	moving
		      average  power determines	the primary floor. When	set to
		      0, driver	will  use  the	Floor  percentage  instead  to
		      derive the primary floor.

       Primary Floor Activation	Offset
		      The  admin  override  value for primary Floor Activation
		      Offset, measured in watts. If the	target moving  average
		      falls  below  the	 secondary floor plus this offset, the
		      primary floor will be activated.

   Workload Power Profiles
       Pre-tuned  GPU  profiles	  help	 to   provide	immediate,   optimized
       configurations	for  Datacenter	 use  cases.  This  sections  includes
       information about the currently requested on enfornced power profiles.

       Requested Profiles
		      The list of user requested profiles.

       Enforced	Profiles
		      Since many of the	profiles have conflicting goals,  some
		      configurations  of  requested profiles are incompatible.
		      This is the list of the  requested  profiles  which  are
		      currently	enforced.

   EDPp	Multiplier
       The  EDPp  multiplier  expressed	as a percentage. This feature is meant
       for system administrators and cannot be configured via NVML or  nvidia-
       smi.

   Clocks
       Current	frequency  at which parts of the GPU are running. All readings
       are in MHz. Note	that it	is possible  for  clocks  to  report  a	 lower
       freqency	 than  the  lowest  frequency  that can	be set by SW due to HW
       optimizations in	certain	scenarios.

       Graphics	      Current frequency	of graphics (shader) clock.

       SM	      Current  frequency  of  SM  (Streaming   Multiprocessor)
		      clock.

       Memory	      Current frequency	of memory clock.

       Video	      Current frequency	of video (encoder + decoder) clocks.

   Applications	Clocks
       Applications  Clocks  will  be removed in a future CUDA release.	Please
       use -lmc/-lgc for locking memory/graphics clocks	and -rmc/-rgc to reset
       memory/graphcis clocks. User specified frequency	at which  applications
       will  be	running	at. Can	be changed with	[-ac \|	--applications-clocks]
       switches.

       Graphics	      User specified frequency of graphics (shader) clock.

       Memory	      User specified frequency of memory clock.

   Default Applications	Clocks
       Default frequency at which applications will be running at. Application
       clocks can be changed with  [-ac	 \|  --applications-clocks]  switches.
       Application  clocks  can	 be  set  to  default  using [-rac \| --reset-
       applications-clocks] switches.

       Graphics	      Default  frequency  of  applications  graphics  (shader)
		      clock.

       Memory	      Default frequency	of applications	memory clock.

   Deferred Clocks
       Deferred	clocks are clocks that will be applied after the next driver
       load. Memory
		      The Memory Clock value in	MHz that takes effect the next
		      time  the	 GPU is	initialized. This can be guaranteed by
		      unloading	and reloading the kernel module.

   Max Clocks
       Maximum frequency at which parts	of the GPU  are	 design	 to  run.  All
       readings	are in MHz. Current P0 clocks (reported	in Clocks section) can
       differ from max clocks by few MHz.

       Graphics	      Maximum frequency	of graphics (shader) clock.

       SM	      Maximum	frequency  of  SM  (Streaming  Multiprocessor)
		      clock.

       Memory	      Maximum frequency	of memory clock.

       Video	      Maximum frequency	of video (encoder + decoder) clock.

   Max Customer	Boost Clocks
       Maximum customer	boost frequency	at which parts of the GPU are designed
       to run. All readings are	in MHz.

       Graphics	      Maximum customer boost frequency	of  graphics  (shader)
		      clock.

   Clock Policy
       User-specified  settings	 for  automated	 clocking changes such as auto
       boost.

       Auto Boost     Indicates	whether	auto boost mode	is  currently  enabled
		      for  this	GPU (On) or disabled for this GPU (Off). Shows
		      (N/A) if boost  is  not  supported.  Auto	 boost	allows
		      dynamic	GPU  clocking  based  on  power,  thermal  and
		      utilization. When	auto boost is disabled	the  GPU  will
		      attempt  to  maintain  clocks  at	 precisely the Current
		      Application Clocks settings (whenever a CUDA context  is
		      active).	With  auto  boost  enabled  the	GPU will still
		      attempt	to   maintain	 this	 floor,	   but	  will
		      opportunistically	 boost	to  higher  clocks when	power,
		      thermal and utilization  headroom	 allow.	 This  setting
		      persists	for  the life of the CUDA context for which it
		      was requested. Apps can request a	particular mode	either
		      via an NVML call (see NVML SDK) or by setting  the  CUDA
		      environment  variable  CUDA_AUTO_BOOST.  This feature is
		      deprecated and will be removed in	a future CUDA release.

       Auto Boost Default
		      Indicates	the  default  setting  for  auto  boost	 mode,
		      either  enabled  (On)  or	disabled (Off).	Shows (N/A) if
		      boost is not supported. Apps will	 run  in  the  default
		      mode  if they have not explicitly	requested a particular
		      mode. Note: Auto Boost settings can only be modified  if
		      "Persistence  Mode" is enabled, which is NOT by default.
		      This feature is deprecated and  will  be	removed	 in  a
		      future CUDA release.

   Fabric
       GPU Fabric information

       State

       Indicates   the	 state	 of  the  GPU's	 handshake  with  the  nvidia-
       fabricmanager (a.k.a. GPU fabric	probe)
       Possible	values:	Completed, In Progress,	Not Started, Not supported

       Status

       Status of the GPU fabric	probe response from the	nvidia-fabricmanager.
       Possible	values:	NVML_SUCCESS or	one of the failure codes.

       Clique ID

       A clique	is a set of GPUs that  can  communicate	 to  each  other  over
       NVLink.
       The GPUs	belonging to the same clique share the same clique ID.
       Clique ID will only be valid for	NVLink multi-node systems.

       Cluster UUID

       UUID of an NVLink multi-node cluster to which this GPU belongs.
       Cluster UUID will be zero for NVLink single-node	systems.

       Health

       Summary	-  Summary  of	Fabric	Health	<Healthy,  Unhealthy,  Limited
       Capacity>
       Bandwidth - is the GPU NVLink bandwidth degraded	or not <True/False>
       Route Recovery in progress -  is	 NVLink	 route	recovery  in  progress
       <True/False>
       Route   Unhealthy   -  is  NVLink  route	 recovery  failed  or  aborted
       <True/False>
       Access Timeout Recovery - is NVLink access timeout recovery in progress
       <True/False>
       Incorrect Configuration -  Incorrect  Configuration  status  <Incorrect
       SystemGuid, Incorrect Chassis Serial Number, No Partition, Insufficient
       Nvlink Resources, Incompatible GPU Firmware, Invalid Location, None>

   Processes
       List  of	 processes  having  Compute or Graphics	Context	on the device.
       Compute processes are reported on all  the  fully  supported  products.
       Reporting  for  Graphics	processes is limited to	the supported products
       starting	with Kepler architecture.

       Each Entry is of	format "<GPU Index> <GI	Index> <CI Index> <PID>	<Type>
       <Process	Name> <GPU Memory Usage>"

       GPU Index      Represents NVML Index of the device.

       GPU Instance Index
		      Represents GPU Instance Index  of	 the  MIG  device  (if
		      enabled).

       Compute Instance	Index
		      Represents  Compute Instance Index of the	MIG device (if
		      enabled).

       PID	      Represents  Process  ID  corresponding  to  the	active
		      Compute or Graphics context.

       Type	      Displayed	 as  "C" for Compute Process, "G" for Graphics
		      Process, "M" for MPS ("Multi-Process  Service")  Compute
		      Process,	and "C+G" or "M+C" for the process having both
		      Compute  and  Graphics  or  MPS  Compute	 and   Compute
		      contexts.

       Process Name   Represents process name for the Compute, MPS Compute, or
		      Graphics process.

       GPU Memory Usage
		      Amount   of  memory  used	 by  the  GPU  context,	 which
		      represents FB memory usage for discrete GPUs  or	system
		      memory  usage  for  integrated  GPUs.  Not  available on
		      Windows when running in WDDM mode	 because  Windows  KMD
		      manages all the memory not NVIDIA	driver.

   Device Monitoring
       The  "nvidia-smi	dmon" command-line is used to monitor one or more GPUs
       (up to 16 devices) plugged into the system. This	tool allows  the  user
       to  see one line	of monitoring data per monitoring cycle. The output is
       in concise format and easy to interpret in interactive mode. The	output
       data per	line is	limited	by the	terminal  size.	 It  is	 supported  on
       Tesla,  GRID,  Quadro  and limited GeForce products for Kepler or newer
       GPUs under bare metal 64	bits Linux. By default,	 the  monitoring  data
       includes	 Power	Usage,	Temperature,  SM  clocks,  Memory  clocks  and
       Utilization values for SM, Memory, Encoder, Decoder, JPEG and  OFA.  It
       can  also  be  configured  to report other metrics such as frame	buffer
       memory usage, bar1 memory usage,	power/thermal violations and aggregate
       single/double bit ecc errors. If	any of the metric is not supported  on
       the device or any other error in	fetching the metric is reported	as "-"
       in  the	output	data. The user can also	configure monitoring frequency
       and the number of monitoring iterations for each	run. There is also  an
       option to include date and time at each line. All the supported options
       are  exclusive  and  can	 be  used together in any order. Note: On MIG-
       enabled GPUs, querying the utilization of encoder, decoder, jpeg,  ofa,
       gpu, and	memory is not currently	supported.

       Usage:

       1) Default with no arguments

       nvidia-smi dmon

       Monitors	default	metrics	for up to 16 supported devices under natural
       enumeration (starting with GPU index 0) at a frequency of 1 sec.	Runs
       until terminated	with ^C.

       2) Select one or	more devices

       nvidia-smi dmon -i <device1,device2, .. , deviceN>

       Reports default metrics for the devices selected	by comma separated
       device list. The	tool picks up to 16 supported devices from the list
       under natural enumeration (starting with	GPU index 0).

       3) Select metrics to be displayed

       nvidia-smi dmon -s <metric_group>

       <metric_group> can be one or more from the following:

       p - Power Usage (in Watts) and GPU/Memory Temperature (in C) if
       supported

       u - Utilization (SM, Memory, Encoder, Decoder, JPEG and OFA Utilization
       in %)

       c - Proc	and Mem	Clocks (in MHz)

       v - Power Violations (in	%) and Thermal Violations (as a	boolean	flag)

       m - Frame Buffer, Bar1 and Confidential Compute protected memory	usage
       (in MB)

       e - ECC (Number of aggregated single bit, double	bit ecc	errors)	and
       PCIe Replay errors

       t - PCIe	Rx and Tx Throughput in	MB/s (Maxwell and above)

       4) Configure monitoring iterations

       nvidia-smi dmon -c <number of samples>

       Displays	data for specified number of samples and exit.

       5) Configure monitoring frequency

       nvidia-smi dmon -d <time	in secs>

       Collects	and displays data at every specified monitoring	interval until
       terminated with ^C.

       6) Display date

       nvidia-smi dmon -o D

       Prepends	monitoring data	with date in YYYYMMDD format.

       7) Display time

       nvidia-smi dmon -o T

       Prepends	monitoring data	with time in HH:MM:SS format.

       8) Select GPM metrics to	be displayed

       nvidia-smi dmon --gpm-metrics <gpmMetric1,gpmMetric2,...,gpmMetricN>

       <gpmMetricX> Refer to the documentation for nvmlGpmMetricId_t in	the
       NVML header file

       9) Select which level of	GPM metrics to be displayed

       nvidia-smi dmon --gpm-options <gpmMode>

       <gpmMode> can be	one of the following:

       d - Display Device Level	GPM metrics

       m - Display MIG Level GPM metrics

       dm - Display Device and MIG Level GPM metrics

       md - Display Device and MIG Level GPM metrics, same as 'dm'

       10) Modify output format

       nvidia-smi dmon --format	<formatSpecifier>

       <formatSpecifier> can be	any comma separated combination	of the
       following:

       csv - Format dmon output	as CSV

       nounit -	Remove unit line from dmon output

       noheader	- Remove header	line from dmon output

       11) Help	Information

       nvidia-smi dmon -h

       Displays	help information for using the command line.

   Daemon (EXPERIMENTAL)
       The  "nvidia-smi	 daemon" starts	a background process to	monitor	one or
       more GPUs plugged in to the system.  It	monitors  the  requested  GPUs
       every  monitoring  cycle	 and logs the file in compressed format	at the
       user provided path or the default location  at  /var/log/nvstats/.  The
       log file	is created with	system's date appended to it and of the	format
       nvstats-YYYYMMDD.  The  flush  operation	 to the	log file is done every
       alternate  monitoring  cycle.  Daemon  also  logs  it's	own   PID   at
       /var/run/nvsmi.pid. By default, the monitoring data to persist includes
       Power  Usage,  Temperature,  SM	clocks,	 Memory	clocks and Utilization
       values for SM, Memory, Encoder, Decoder,	JPEG and OFA. The daemon tools
       can also	be configured to record	other metrics  such  as	 frame	buffer
       memory usage, bar1 memory usage,	power/thermal violations and aggregate
       single/double  bit ecc errors.The default monitoring cycle is set to 10
       secs and	can be configured via command-line. It is supported on	Tesla,
       GRID,  Quadro  and GeForce products for Kepler or newer GPUs under bare
       metal 64	bits Linux. The	daemon requires	root privileges	 to  run,  and
       only  supports  running	a  single  instance  on	the system. All	of the
       supported options are exclusive and can be used together	in any	order.
       Note:  On  MIG-enabled  GPUs,  querying	the  utilization  of  encoder,
       decoder,	jpeg, ofa, gpu,	and memory is not currently supported. Usage:

       1) Default with no arguments

       nvidia-smi daemon

       Runs in the background to monitor default metrics for up	to 16
       supported devices under natural enumeration (starting with GPU index 0)
       at a frequency of 10 sec. The date stamped log file is created at
       /var/log/nvstats/.

       2) Select one or	more devices

       nvidia-smi daemon -i <device1,device2, .. , deviceN>

       Runs in the background to monitor default metrics for the devices
       selected	by comma separated device list.	The tool picks up to 16
       supported devices from the list under natural enumeration (starting
       with GPU	index 0).

       3) Select metrics to be monitored

       nvidia-smi daemon -s <metric_group>

       <metric_group> can be one or more from the following:

       p - Power Usage (in Watts) and GPU/Memory Temperature (in C) if
       supported

       u - Utilization (SM, Memory, Encoder, Decoder, JPEG and OFA Utilization
       in %)

       c - Proc	and Mem	Clocks (in MHz)

       v - Power Violations (in	%) and Thermal Violations (as a	boolean	flag)

       m - Frame Buffer, Bar1 and Confidential Compute protected memory	usage
       (in MB)

       e - ECC (Number of aggregated single bit, double	bit ecc	errors)	and
       PCIe Replay errors

       t - PCIe	Rx and Tx Throughput in	MB/s (Maxwell and above)

       4) Configure monitoring frequency

       nvidia-smi daemon -d <time in secs>

       Collects	data at	every specified	monitoring interval until terminated.

       5) Configure log	directory

       nvidia-smi daemon -p <path of directory>

       The log files are created at the	specified directory.

       6) Configure log	file name

       nvidia-smi daemon -j <string to append log file name>

       The command-line	is used	to append the log file name with the user
       provided	string.

       7) Terminate the	daemon

       nvidia-smi daemon -t

       This command-line uses the stored PID (at /var/run/nvsmi.pid) to
       terminate the daemon. It	makes the best effort to stop the daemon and
       offers no guarantees for	it's termination. In case the daemon is	not
       terminated, then	the user can manually terminate	by sending kill	signal
       to the daemon. Performing a GPU reset operation (via nvidia-smi)
       requires	all GPU	processes to be	exited,	including the daemon. Users
       who have	the daemon open	will see an error to the effect	that the GPU
       is busy.

       8) Help Information

       nvidia-smi daemon -h

       Displays	help information for using the command line.

   Replay Mode (EXPERIMENTAL)
       The "nvidia-smi replay" command-line is used to extract/replay  all  or
       parts  of  log file generated by	the daemon. By default,	the tool tries
       to pull the metrics such	as Power Usage,	Temperature, SM	clocks,	Memory
       clocks and Utilization values for SM, Memory,  Encoder,	Decoder,  JPEG
       and  OFA.  The  replay  tool can	also fetch other metrics such as frame
       buffer memory usage, bar1 memory	usage,	power/thermal  violations  and
       aggregate  single/double	bit ecc	errors.	There is an option to select a
       set of metrics to replay,  If  any  of  the  requested  metric  is  not
       maintained  or  logged  as  not-supported then it's shown as "-"	in the
       output. The format of data produced by this mode	is such	that the  user
       is  running  the	 device	 monitoring utility interactively. The command
       line requires mandatory option "-f" to specify complete path of the log
       filename, all the other supported options are exclusive and can be used
       together	 in  any  order.  Note:	 On  MIG-enabled  GPUs,	 querying  the
       utilization  of	encoder,  decoder,  jpeg,  ofa,	gpu, and memory	is not
       currently supported. Usage:

       1) Specify log file to be replayed

       nvidia-smi replay -f <log file name>

       Fetches monitoring data from the	compressed log file and	allows the
       user to see one line of monitoring data (default	metrics	with time-
       stamp) for each monitoring iteration stored in the log file. A new line
       of monitoring data is replayed every other second irrespective of the
       actual monitoring frequency maintained at the time of collection. It is
       displayed till the end of file or until terminated by ^C.

       2) Filter metrics to be replayed

       nvidia-smi replay -f <path to log file> -s <metric_group>

       <metric_group> can be one or more from the following:

       p - Power Usage (in Watts) and GPU/Memory Temperature (in C) if
       supported

       u - Utilization (SM, Memory, Encoder, Decoder, JPEG and OFA Utilization
       in %)

       c - Proc	and Mem	Clocks (in MHz)

       v - Power Violations (in	%) and Thermal Violations (as a	boolean	flag)

       m - Frame Buffer, Bar1 and Confidential Compute protected memory	usage
       (in MB)

       e - ECC (Number of aggregated single bit, double	bit ecc	errors)	and
       PCIe Replay errors

       t - PCIe	Rx and Tx Throughput in	MB/s (Maxwell and above)

       3) Limit	replay to one or more devices

       nvidia-smi replay -f <log file> -i <device1,device2, .. , deviceN>

       Limits reporting	of the metrics to the set of devices selected by comma
       separated device	list. The tool skips any of the	devices	not maintained
       in the log file.

       4) Restrict the time frame between which	data is	reported

       nvidia-smi replay -f <log file> -b <start time in HH:MM:SS format> -e
       <end time in HH:MM:SS format>

       This option allows the data to be limited between the specified time
       range. Specifying time as 0 with	-b or -e option	implies	start or end
       file respectively.

       5) Redirect replay information to a log file

       nvidia-smi replay -f <log file> -r <output file name>

       This option takes log file as an	input and extracts the information
       related to default metrics in the specified output file.

       6) Help Information

       nvidia-smi replay -h

       Displays	help information for using the command line.

   Process Monitoring
       The "nvidia-smi pmon" command-line  is  used  to	 monitor  compute  and
       graphics	 processes  running  on	 one  or  more GPUs (up	to 16 devices)
       plugged into  the  system.  This	 tool  allows  the  user  to  see  the
       statistics  for	all  the  running  processes  on  each device at every
       monitoring cycle. The output is in concise format and easy to interpret
       in interactive mode. The	 output	 data  per  line  is  limited  by  the
       terminal	 size.	It  is	supported  on  Tesla, GRID, Quadro and limited
       GeForce products	for Kepler or newer GPUs  under	 bare  metal  64  bits
       Linux.  By  default,  the monitoring data for each process includes the
       pid, command name  and  average	utilization  values  for  SM,  Memory,
       Encoder	and  Decoder  since  the last monitoring cycle.	It can also be
       configured to report frame buffer memory	usage  for  each  process.  If
       there  is  no  process running for the device, then all the metrics are
       reported	as "-" for the device. If any of the metric is	not  supported
       on  the	device	or  any	 other	error  in  fetching the	metric is also
       reported	as "-" in  the	output	data.  The  user  can  also  configure
       monitoring  frequency  and the number of	monitoring iterations for each
       run. There is also an option to include date and	time at	each line. All
       the supported options are exclusive and can be  used  together  in  any
       order.  Note: On	MIG-enabled GPUs, querying the utilization of encoder,
       decoder,	jpeg, ofa, gpu,	and memory is not currently supported.

       Usage:

       1) Default with no arguments

       nvidia-smi pmon

       Monitors	all the	processes running on each device for up	to 16
       supported devices under natural enumeration (starting with GPU index 0)
       at a frequency of 1 sec.	Runs until terminated with ^C.

       2) Select one or	more devices

       nvidia-smi pmon -i <device1,device2, .. , deviceN>

       Reports statistics for all the processes	running	on the devices
       selected	by comma separated device list.	The tool picks up to 16
       supported devices from the list under natural enumeration (starting
       with GPU	index 0).

       3) Select metrics to be displayed

       nvidia-smi pmon -s <metric_group>

       <metric_group> can be one or more from the following:

       u - Utilization (SM, Memory, Encoder, Decoder, JPEG, and	OFA
       Utilization for the process in %). Reports average utilization since
       last monitoring cycle.

       m - Frame Buffer	and Confidential Compute protected memory usage	(in
       MB). Reports instantaneous value	for memory usage.

       4) Configure monitoring iterations

       nvidia-smi pmon -c <number of samples>

       Displays	data for specified number of samples and exit.

       5) Configure monitoring frequency

       nvidia-smi pmon -d <time	in secs>

       Collects	and displays data at every specified monitoring	interval until
       terminated with ^C. The monitoring frequency must be between 1 to 10
       secs.

       6) Display date

       nvidia-smi pmon -o D

       Prepends	monitoring data	with date in YYYYMMDD format.

       7) Display time

       nvidia-smi pmon -o T

       Prepends	monitoring data	with time in HH:MM:SS format.

       8) Help Information

       nvidia-smi pmon -h

       Displays	help information for using the command line.

   Topology
       List topology information about the system's GPUs, how they connect  to
       each  other,  their CPU and memory affinities as	well as	qualified NICs
       capable of RDMA.

       Note: On	some systems, a	NIC is used as a PCI  bridge  for  the	NVLINK
       switches	and is not useful from a networking or RDMA point of view. The
       nvidia-smi  topo	command	will filter the	NIC's ports/PCIe sub-functions
       out of the topology matrix by examining the  NIC's  sysfs  entries.  On
       some kernel versions, nvidia-smi	requires root privileges to read these
       sysfs entries.

       Usage:

       Topology	connections and	affinities matrix between the GPUs and NICs in
       the system

       nvidia-smi topo -m

       Displays	a matrix of connections	between	all GPUs and NICs(including
       their data-direct devices if applicable)	in the system along with
       CPU/memory affinities for the GPUs with the following legend:

       Legend:
	X = Self
	SYS  =	Connection  traversing	PCIe  as  well as the SMP interconnect
       between NUMA nodes (e.g., QPI/UPI)
	NODE = Connection traversing PCIe as well as the interconnect  between
       PCIe Host Bridges within	a NUMA node
	PHB  =	Connection  traversing	PCIe  as  well	as  a PCIe Host	Bridge
       (typically the CPU)
	PXB = Connection traversing multiple PCIe switches (without traversing
       the PCIe	Host Bridge)
	PIX = Connection traversing a single  PCIe  switch  NV#	 =  Connection
       traversing a bonded set of # NVLinks

       Note: This command may also display bonded NICs which may not be	RDMA
       capable.

       nvidia-smi topo -mp

       Displays	a matrix of PCI-only connections between all GPUs and NICs in
       the system along	with CPU/memory	affinities for the GPUs	with the same
       legend as the 'nvidia-smi topo -m' command. This	command	excludes
       NVLINK connections and shows PCI	connections between GPUs.

       nvidia-smi topo -c <CPU number>

       Shows all the GPUs with an affinity to the specified CPU	number.

       nvidia-smi topo -n <traversal_path> -i <deviceID>

       Shows  all  the	GPUs  connected	with the given GPU using the specified
       traversal path. The traversal path values are:
	0 = A single PCIe switch on a dual GPU board
	1 = A single PCIe switch
	2 = Multiple PCIe switches
	3 = A PCIe host	bridge
	4 = An on-CPU interconnect link	between	PCIe host bridges
	5 = An SMP interconnect	link between NUMA nodes

       nvidia-smi topo -p -i <deviceID1>,<deviceID2>

       Shows the most direct PCIe path traversal for a given pair of GPUs.

       nvidia-smi topo -p2p <capability>

       Shows the P2P status between all	GPUs, given a  capability.  Capability
       values are:
	r - p2p	read capability
	w - p2p	write capability
	n - p2p	nvlink capability
	a - p2p	atomics	capability
	p - p2p	pcie capability

       nvidia-smi topo -C -i <deviceID>

       Shows the NUMA ID of the	nearest	CPU for	a GPU represented by the
       device ID.

       nvidia-smi topo -M -i <deviceID>

       Shows the NUMA ID of the	nearest	memory for a GPU represented by	the
       device ID.

       nvidia-smi topo -gnid -i	<deviceID>

       Shows the NUMA ID of the	GPU represented	by the device ID, if
       applicable. Displays N/A	otherwise.

       nvidia-smi topo -nvme

       Displays	 a matrix of PCI connections between all GPUs and NVME devices
       in the system with the following	legend:

       Legend:
	X = Self
	SYS = Connection traversing PCIe  as  well  as	the  SMP  interconnect
       between NUMA nodes (e.g., QPI/UPI)
	NODE  =	Connection traversing PCIe as well as the interconnect between
       PCIe Host Bridges within	a NUMA node
	PHB = Connection traversing  PCIe  as  well  as	 a  PCIe  Host	Bridge
       (typically the CPU)
	PXB  = Connection traversing multiple PCIe bridges (without traversing
       the PCIe	Host Bridge)
	PIX = Connection traversing at most a single PCIe bridge

   Nvlink
       The "nvidia-smi nvlink"	command-line  is  used	to  manage  the	 GPU's
       Nvlinks.	It provides options to set and query Nvlink information.

       Usage:

       1) Display help menu

       nvidia-smi nvlink -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi nvlink -i <GPU IDs>

       nvidia-smi nvlink --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option
       applies to all of the supported GPUs.

       3) Select a specific NvLink

       nvidia-smi nvlink -l <GPU Nvlink	Id>

       nvidia-smi nvlink --list	<GPU Nvlink Id>

       Selects a specific Nvlink of the	GPU for	the given command, if valid.
       If not used, the	given command-line option allies to all	of the GPU's
       Nvlinks.

       4) Query	Nvlink Status

       nvidia-smi nvlink -s

       nvidia-smi nvlink --status

       Get the status of the GPU's Nvlinks.

       If Active, the Bandwidth	of the links will be displayed.

       If the link is present but Not Active, it will show the link as
       Inactive.

       If the link is in Sleep state, it will show as Sleep.

       5) Query	Nvlink capabilities

       nvidia-smi nvlink -c

       nvidia-smi nvlink --capabilities

       Get the GPU's Nvlink capabilities.

       6) Query	the Nvlink's remote node PCI bus

       nvidia-smi nvlink -p

       nvidia-smi nvlink -pcibusid

       Get the Nvlink's	remote node PCI	bus ID.

       7) Query	the Nvlink's remote link info

       nvidia-smi nvlink -R

       nvidia-smi nvlink -remotelinkinfo

       Get the remote device PCI bus ID	and NvLink ID for a link.

       8) Set Nvlink Counter Control is	DEPRECATED

       9) Get Nvlink Counter Control is	DEPRECATED

       10) Get Nvlink Counters is DEPRECATED, -gt/--getthroughput should be
       used instead

       11) Reset Nvlink	counters is DEPRECATED

       12) Query Nvlink	Error Counters

       nvidia-smi nvlink -e

       nvidia-smi nvlink --errorcounters

       Get the Nvlink error counters.

       For NVLink 4

       Replay Errors - count the number	of replay 'events' that	occurred

       Recovery	Errors - count the number of link recovery events

       CRC Errors - count the number of	CRC errors in received packets

       For NVLink 5

       Tx packets - Total Tx packets on	the link

       Tx bytes	- Total	Tx bytes on the	link

       Rx packets - Total Rx packets on	the link

       Rx bytes	- Total	Rx bytes on the	link

       Malformed packet	Errors - Number	of packets Rx on a link	where packets
       are malformed

       Buffer overrun Errors - Number of packets that were discarded on	Rx due
       to buffer overrun

       Rx Errors - Total number	of packets with	errors Rx on a link

       Rx remote Errors	- Total	number of packets Rx - stomp/EBP marker

       Rx General Errors - Total number	of packets Rx with header mismatch

       Local link integrity Errors - Total number of times that	the count of
       local errors exceeded a threshold

       Tx discards - Total number of tx	error packets that were	discarded

       Link recovery successful	events - Number	of times link went from	Up to
       recovery, succeeded and link came back up

       Link recovery failed events - Number of times link went from Up to
       recovery, failed	and link was declared down

       Total link recovery events - Number of times link went from Up to
       recovery, irrespective of the result

       Effective Errors	- Sum of the number of errors in each Nvlink packet

       Effective BER - BER for symbol errors

       Symbol Errors - Number of errors	in rx symbols

       Symbol BER - BER	for symbol errors

       FEC Errors - [0-15] - count of symbol errors that are corrected

       13) Query Nvlink	CRC error counters

       nvidia-smi nvlink -ec

       nvidia-smi nvlink --crcerrorcounters

       Get the Nvlink per-lane CRC/ECC error counters.

       CRC - NVLink 4 and before - Total Rx CRC	errors on an NVLink Lane

       ECC - NVLink 4 -	Total Rx ECC errors on an NVLink Lane

       Deprecated NVLink 5 onwards

       14) Reset Nvlink	Error Counters

       nvidia-smi nvlink -re

       nvidia-smi nvlink --reseterrorcounters

       Reset all Nvlink	error counters to zero.

       NvLink 5	NOT SUPPORTED

       15) Query Nvlink	throughput counters

       nvidia-smi nvlink -gt <Data Type>

       nvidia-smi nvlink --getthroughput <Data Type>

       <Data Type> can be one of the following:

       d - Tx and Rx data payload in KiB.

       r - Tx and Rx raw payload and protocol overhead in KiB.

       16) Set Nvlink Low Power	thresholds

       nvidia-smi nvlink -sLowPwrThres <Threshold>

       nvidia-smi nvlink --setLowPowerThreshold	<Threshold>

       Set the Nvlink Low Power	Threshold, before the links go into Low	Power
       Mode.

       Threshold ranges	and units can be found using -gLowPwrInfo.

       17) Get Nvlink Low Power	Info

       nvidia-smi nvlink -gLowPwrInfo

       nvidia-smi nvlink --getLowPowerInfo

       Query the Nvlink's Low Power Info.

       18) Set Nvlink Bandwidth	mode

       nvidia-smi nvlink -sBwMode <Bandwidth Mode>

       nvidia-smi nvlink --setBandwidthMode <Bandwidth Mode>

       Set the Nvlink Bandwidth	mode for all GPUs. This	is DEPRECATED for
       Blackwell+.

       The options are:

       FULL - All links	are at max Bandwidth.

       OFF - Bandwidth is not used. P2P	is via PCIe bus.

       MIN - Bandwidth is at minimum speed.

       HALF - Bandwidth	is at around half of FULL speed.

       3QUARTER	- Bandwidth is at around 75% of	FULL speed.

       19) Get Nvlink Bandwidth	mode

       nvidia-smi nvlink -gBwMode

       nvidia-smi nvlink --getBandwidthMode

       Get the Nvlink Bandwidth	mode for all GPUs. THis	is DEPRECATED for
       Blackwell+.

       20) Query for Nvlink Bridge

       nvidia-smi nvlink -cBridge

       nvidia-smi nvlink --checkBridge

       Query for Nvlink	Bridge presence.

       21) Set the GPU's Nvlink	Width

       nvidia-smi nvlink -sLWidth <Link	Width>

       nvidia-smi nvlink --setLinkWidth	<Link Width>

       Set the GPU's Nvlink width, which will be keep those number of links
       Active, and the rest to sleep.

       <Link Width> can	be one of the following:

       values -	List possible Link Widths to be	set.

       The numerical value from	the above option.

       22) Get the GPU's Nvlink	Width

       nvidia-smi nvlink -gLWidth

       nvidia-smi nvlink --getLinkWidth

       Query the GPU's Nvlink Width.

       23) Get the GPU's Nvlink	Device Information

       nvidia-smi nvlink -info

       nvidia-smi nvlink --info

       Query the GPU's Nvlink device information.

   C2C
       The  "nvidia-smi	 c2c"  command-line  is	 used  to manage the GPU's C2C
       Links. It provides options to query C2C Link information.

       Usage:

       1) Display help menu

       nvidia-smi c2c -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi c2c -i <GPU IDs>

       nvidia-smi c2c --id <GPU	IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option
       applies to all of the supported GPUs.

       3) Select a specific C2C	Link

       nvidia-smi c2c -l <GPU C2C Id>

       nvidia-smi c2c --list <GPU C2C Id>

       Selects a specific C2C Link of the GPU for the given command, if	valid.
       If not used, the	given command-line option allies to all	of the GPU's
       C2C Links.

       4) Query	C2C Link Status

       nvidia-smi c2c -s

       nvidia-smi c2c --status

       Get the status of the GPU's C2C Links. If active, the Bandwidth of the
       links will be displayed.

       5) Query	C2C Link Error Counters

       nvidia-smi c2c -e

       nvidia-smi c2c -errorCounters

       Display the C2C Link error counters.

       6) Query	C2C Link Power Info

       nvidia-smi c2c -gLowPwrInfo

       nvidia-smi c2c -getLowPowerInfo

       Display the C2C Link Power state.

   vGPU	Management
       The "nvidia-smi vgpu"  command  reports	on  GRID  vGPUs	 executing  on
       supported  GPUs	and  hypervisors  (refer  to  driver release notes for
       supported platforms).  Summary  reporting  provides  basic  information
       about  vGPUs  currently	executing  on  the  system. Additional options
       provide detailed	reporting of vGPU properties,  per-vGPU	 reporting  of
       SM,  Memory,  Encoder,  Decoder,	Jpeg, and OFA utilization, and per-GPU
       reporting of supported and creatable vGPUs.  Periodic  reports  can  be
       automatically  generated	by specifying a	configurable loop frequency to
       any command. Note: On MIG-enabled GPUs,	querying  the  utilization  of
       encoder,	  decoder,  jpeg,  ofa,	 gpu,  and  memory  is	not  currently
       supported.

       Usage:

       1) Help Information

       nvidia-smi vgpu -h

       Displays	help information for using the command line.

       2) Default with no arguments

       nvidia-smi vgpu

       Reports summary of all the vGPUs	currently active on each device.

       3) Display detailed info	on currently active vGPUs

       nvidia-smi vgpu -q

       Collects	and displays information on currently active vGPUs on each
       device, including driver	version, utilization, and other	information.

       4) Select one or	more devices

       nvidia-smi vgpu -i <device1,device2, .. , deviceN>

       Reports summary for all the vGPUs currently active on the devices
       selected	by comma-separated device list.

       5) Display supported vGPUs

       nvidia-smi vgpu -s

       Displays	vGPU types supported on	each device. Use the -v	/ --verbose
       option to show detailed info on each vGPU type.

       6) Display creatable vGPUs

       nvidia-smi vgpu -c

       Displays	vGPU types creatable on	each device. This varies dynamically,
       depending on the	vGPUs already active on	the device. Use	the -v /
       --verbose option	to show	detailed info on each vGPU type.

       7) Report utilization for currently active vGPUs.

       nvidia-smi vgpu -u

       Reports average utilization (SM,	Memory,	Encoder, Decoder, Jpeg,	and
       OFA) for	each active vGPU since last monitoring cycle. The default
       cycle time is 1 second, and the command runs until terminated with ^C.
       If a device has no active vGPUs,	its metrics are	reported as "-".

       8) Configure loop frequency

       nvidia-smi vgpu [-s -c -q -u] -l	<time in secs>

       Collects	and displays data at a specified loop interval until
       terminated with ^C. The loop frequency must be between 1	and 10 secs.
       When no time is specified, the loop frequency defaults to 5 secs.

       9) Display GPU engine usage

       nvidia-smi vgpu -p

       Display GPU engine usage	of currently active processes running in the
       vGPU VMs.

       10) Display migration capabitlities.

       nvidia-smi vgpu -m

       Display pGPU's migration/suspend/resume capability.

       11) Display the vGPU Software scheduler state.

       nvidia-smi vgpu -ss

       Display the information about vGPU Software scheduler state.

       12) Display the vGPU Software scheduler capabilities.

       nvidia-smi vgpu -sc

       Display the list	of supported vGPU scheduler policies returned along
       with the	other capabilities values, if the engine is Graphics type. For
       other engine types, it is BEST EFFORT policy and	other capabilities
       will be zero. If	ARR is supported and enabled, scheduling frequency and
       averaging factor	are applicable else timeSlice is applicable.

       13) Display the vGPU Software scheduler logs.

       nvidia-smi vgpu -sl

       Display the vGPU	Software scheduler runlist logs.

       nvidia-smi --query-vgpu-scheduler-logs=[input parameters]

       Display the vGPU	Software scheduler runlist logs	in CSV format.

       14) Set the vGPU	Software scheduler state.

       nvidia-smi vgpu --set-vgpu-scheduler-state [options]

       Set the vGPU Software scheduler policy and states.

       15) Display NVIDIA Encoder session info.

       nvidia-smi vgpu -es

       Display the information about encoder sessions for currently running
       vGPUs.

       16) Display accounting statistics.

       nvidia-smi vgpu --query-accounted-apps=[input parameters]

       Display accounting stats	for compute/graphics processes.

       To find the list	of properties which can	be queried, run	- 'nvidia-smi
       --help-query-accounted-apps'.

       17) Display NVIDIA Frame	Buffer Capture session info.

       nvidia-smi vgpu -fs

       Display the information about FBC sessions for currently	running	vGPUs.

       Note : Horizontal resolution, vertical resolution, average FPS and
       average latency data for	a FBC session may be zero if there are no new
       frames captured since the session started.

       18) Set vGPU heterogeneous mode.

       nvidia-smi vgpu -shm

       Set vGPU	heterogeneous mode of the device for timesliced	vGPUs with
       different framebuffer sizes.

       19) Set vGPU MIG	timeslice mode.

       nvidia-smi vgpu -smts

       Set vGPU	MIG timeslice mode of the device.

       20) Display the currently creatable vGPU	types on the user provided GPU
       Instance

       nvidia-smi vgpu -c -gi <GPU instance IDs> -i <GPU IDs>

       nvidia-smi vgpu -c --gpu-instance-id <GPU instance IDs> --id <GPU IDs>

       Provide comma separated values for more than one	GPU instance. The
       target GPU index	(MANDATORY) for	the given GPU instance.

       21) Display detailed information	of the currently active	vGPU instances
       on the user provided GPU	Instance

       nvidia-smi vgpu -q -gi <GPU instance IDs> -i <GPU IDs>

       nvidia-smi vgpu -q --gpu-instance-id <GPU instance IDs> --id <GPU IDs>

       Provide comma separated values for more than one	GPU instance. The
       target GPU index	(MANDATORY) for	the given GPU instance.

       22) Display the vGPU scheduler state on the user	provided GPU Instance

       nvidia-smi vgpu -ss -gi <GPU instance IDs> -i <GPU IDs>

       nvidia-smi vgpu -ss --gpu-instance-id <GPU instance IDs>	--id <GPU IDs>

       Provide comma separated values for more than one	GPU instance. The
       target GPU index	(MANDATORY) for	the given GPU instance.

       23) Get the vGPU	heterogeneous mode on the user provided	GPU Instance

       nvidia-smi vgpu -ghm -gi	<GPU instance IDs> -i <GPU IDs>

       nvidia-smi vgpu -ghm --gpu-instance-id <GPU instance IDs> --id <GPU
       IDs>

       Provide comma separated values for more than one	GPU instance. The
       target GPU index	(MANDATORY) for	the given GPU instance.	If not used,
       the given command-line option applies to	all of the GPU instances.

       24) Set the vGPU	heterogeneous mode on the user provided	GPU Instance

       nvidia-smi vgpu -shm -gi	<GPU instance IDs> -i <GPU IDs>

       nvidia-smi vgpu -shm --gpu-instance-id <GPU instance IDs> --id <GPU
       IDs>

       Provide comma separated values for more than one	GPU instance. The
       target GPU index	(MANDATORY) for	the given GPU instance.

       25) Set the vGPU	Software scheduler state on the	user provided GPU
       Instance.

       nvidia-smi vgpu set-vgpu-scheduler-state	[options] -gi <GPU instance
       IDs> -i <GPU IDs>

       nvidia-smi vgpu set-vgpu-scheduler-state	[options] --gpu-instance-id
       <GPU instance IDs> --id <GPU IDs>

       Provide comma separated values for more than one	GPU instance. The
       target GPU index	(MANDATORY) for	the given GPU instance.

       26) Display the vGPU scheduler logs on the user provided	GPU Instance

       nvidia-smi vgpu -sl -gi <GPU instance IDs> -i <GPU IDs>

       nvidia-smi vgpu -sl --gpu-instance-id <GPU instance IDs>	--id <GPU IDs>

       Provide comma separated values for more than one	GPU instance. The
       target GPU index	(MANDATORY) for	the given GPU instance.

       nvidia-smi vgpu --query-gpu-instance-vgpu-scheduler-logs=[input
       parameters] -gi <GPU instance IDs> -i <GPU IDs>

       Display the vGPU	Software scheduler logs	in CSV format on the user
       provided	GPU Instance.

       27) Display detailed information	of the currently creatable vGPU	types
       on the user provided GPU	Instance

       nvidia-smi vgpu -c -v -gi <GPU instance IDs> -i <GPU IDs>

       nvidia-smi vgpu -c -v --gpu-instance-id <GPU instance IDs> --id <GPU
       IDs>

       Provide comma separated values for more than one	GPU instance. The
       target GPU index	(MANDATORY) for	the given GPU instance.

   MIG Management
       The privileged "nvidia-smi mig" command-line is	used  to  manage  MIG-
       enabled	GPUs.  It  provides  options  to  create, list and destroy GPU
       instances and compute instances.

       Usage:

       1) Display help menu

       nvidia-smi mig -h

       Displays	help menu for using the	command-line.

       2) Select one or	more GPUs

       nvidia-smi mig -i <GPU IDs>

       nvidia-smi mig --id <GPU	IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option
       applies to all of the supported GPUs.

       3) Select one or	more GPU instances

       nvidia-smi mig -gi <GPU instance	IDs>

       nvidia-smi mig --gpu-instance-id	<GPU instance IDs>

       Selects one or more GPU instances using the given comma-separated GPU
       instance	IDs. If	not used, the given command-line option	applies	to all
       of the GPU instances.

       4) Select one or	more compute instances

       nvidia-smi mig -ci <compute instance IDs>

       nvidia-smi mig --compute-instance-id <compute instance IDs>

       Selects one or more compute instances using the given comma-separated
       compute instance	IDs. If	not used, the given command-line option
       applies to all of the compute instances.

       5) List GPU instance profiles

       nvidia-smi mig -lgip -i <GPU IDs>

       nvidia-smi mig --list-gpu-instance-profiles --id	<GPU IDs>

       Lists GPU instance profiles, their availability and IDs.	Profiles
       describe	the supported types of GPU instances, including	all of the GPU
       resources they exclusively control.

       6) List GPU instance possible placements

       nvidia-smi mig -lgipp -i	<GPU IDs>

       nvidia-smi mig --list-gpu-instance-possible-placements --id <GPU	IDs>

       Lists GPU instance possible placements. Possible	placements describe
       the locations of	the supported types of GPU instances within the	GPU.

       7) Create GPU instance

       nvidia-smi mig -cgi <GPU	instance specifiers> -i	<GPU IDs>

       nvidia-smi mig --create-gpu-instance <GPU instance specifiers> --id
       <GPU IDs>

       Creates GPU instances for the given GPU instance	specifiers. A GPU
       instance	specifier comprises a GPU instance profile name	or ID and an
       optional	placement specifier consisting of a colon and a	placement
       start index. The	command	fails if the GPU resources required to
       allocate	the requested GPU instances are	not available, or if the
       placement index is not valid for	the given profile.

       8) Create a GPU instance	along with the default compute instance

       nvidia-smi mig -cgi <GPU	instance profile IDs or	names> -i <GPU IDs> -C

       nvidia-smi mig --create-gpu-instance <GPU instance profile IDs or
       names> --id <GPU	IDs> --default-compute-instance

       9) List GPU instances

       nvidia-smi mig -lgi -i <GPU IDs>

       nvidia-smi mig --list-gpu-instances --id	<GPU IDs>

       Lists GPU instances and their IDs.

       10) Destroy GPU instance

       nvidia-smi mig -dgi -gi <GPU instance IDs> -i <GPU IDs>

       nvidia-smi mig --destroy-gpu-instances --gpu-instance-id	<GPU instance
       IDs> --id <GPU IDs>

       Destroys	GPU instances. The command fails if the	requested GPU instance
       is in use by an application.

       11) List	compute	instance profiles

       nvidia-smi mig -lcip -gi	<GPU instance IDs> -i <GPU IDs>

       nvidia-smi mig --list-compute-instance-profiles --gpu-instance-id <GPU
       instance	IDs> --id <GPU IDs>

       Lists compute instance profiles,	their availability and IDs. Profiles
       describe	the supported types of compute instances, including all	of the
       GPU resources they share	or exclusively control.

       12) List	compute	instance possible placements

       nvidia-smi mig -lcipp -gi <GPU instance IDs> -i <GPU IDs>

       nvidia-smi mig --list-compute-instance-possible-placements --gpu-
       instance-id <GPU	instance IDs> --id <GPU	IDs>

       Lists compute instance possible placements. Possible placements
       describe	the locations of the supported types of	compute	instances
       within the GPU instance.

       13) Create compute instance

       nvidia-smi mig -cci <compute instance profile IDs or names> -gi <GPU
       instance	IDs> -i	<GPU IDs>

       nvidia-smi mig --create-compute-instance	<compute instance profile IDs
       or names> --gpu-instance-id <GPU	instance IDs> --id <GPU	IDs>

       Creates compute instances for the given compute instance	spcifiers. A
       compute instance	specifier comprises a compute instance profile name or
       ID and an optional placement specifier consisting of a colon and	a
       placement start index. The command fails	if the GPU resources required
       to allocate the requested compute instances are not available, or if
       the placement index is not valid	for the	given profile.

       14) List	compute	instances

       nvidia-smi mig -lci -gi <GPU instance IDs> -i <GPU IDs>

       nvidia-smi mig --list-compute-instances --gpu-instance-id <GPU instance
       IDs> --id <GPU IDs>

       Lists compute instances and their IDs.

       15) Destroy compute instance

       nvidia-smi mig -dci -ci <compute	instance IDs> -gi <GPU instance	IDs>
       -i <GPU IDs>

       nvidia-smi mig --destroy-compute-instance --compute-instance-id
       <compute	instance IDs> --gpu-instance-id	<GPU instance IDs> --id	<GPU
       IDs>

       Destroys	compute	instances. The command fails if	the requested compute
       instance	is in use by an	application.

   Boost Slider
       The privileged "nvidia-smi boost-slider"	command-line is	used to	manage
       boost slider on GPUs. It	provides options to  list  and	control	 boost
       sliders.

       Usage:

       1) Display help menu

       nvidia-smi boost-slider -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi boost-slider -i <GPU IDs>

       nvidia-smi boost-slider --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option
       applies to all of the supported GPUs.

       3) List boost sliders

       nvidia-smi boost-slider -l

       nvidia-smi boost-slider --list

       List all	boost sliders for the selected devices.

       4) Set video boost slider

       nvidia-smi boost-slider --vboost	<value>

       Set the video boost slider for the selected devices.

   Power Hint
       The  privileged	"nvidia-smi  power-hint" command-line is used to query
       power hint on GPUs.

       Usage:

       1) Display help menu

       nvidia-smi boost-slider -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi boost-slider -i <GPU IDs>

       nvidia-smi boost-slider --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option
       applies to all of the supported GPUs.

       3) List power hint info

       nvidia-smi boost-slider -l

       nvidia-smi boost-slider --list-info

       List all	boost sliders for the selected devices.

       4) Query	power hint

       nvidia-smi boost-slider -gc <value> -t <value> -p <profile ID>

       nvidia-smi boost-slider --graphics-clock	<value>	--temperature <value>
       --profile <profile ID>

       Query power hint	with graphics clock, temperature and profile id.

       5) Query	power hint

       nvidia-smi boost-slider -gc <value> -mc <value> -t <value> -p <profile
       ID>

       nvidia-smi boost-slider --graphics-clock	<value>	--memory-clock <value>
       --temperature <value> --profile <profile	ID>

       Query power hint	with graphics clock, memory clock, temperature and
       profile id.

   Confidential	Compute
       The  "nvidia-smi	 conf-compute"	command-line   is   used   to	manage
       confidential compute. It	provides options to set	and query confidential
       compute.

       Usage:

       1) Display help menu

       nvidia-smi conf-compute -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi conf-compute -i <GPU IDs>

       nvidia-smi conf-compute --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option
       applies to all of the supported GPUs.

       3) Query	confidential compute CPU capability

       nvidia-smi conf-compute -gc

       nvidia-smi conf-compute --get-cpu-caps

       Get confidential	compute	CPU capability.

       4) Query	confidential compute GPUs capability

       nvidia-smi conf-compute -gg

       nvidia-smi conf-compute --get-gpus-caps

       Get confidential	compute	GPUs capability.

       5) Query	confidential compute devtools mode

       nvidia-smi conf-compute -d

       nvidia-smi conf-compute --get-devtools-mode

       Get confidential	compute	DevTools mode.

       6) Query	confidential compute environment

       nvidia-smi conf-compute -e

       nvidia-smi conf-compute --get-environment

       Get confidential	compute	environment.

       7) Query	confidential compute feature status

       nvidia-smi conf-compute -f

       nvidia-smi conf-compute --get-cc-feature

       Get confidential	compute	CC feature status.

       8) Query	confidential compute GPU protected/unprotected memory sizes

       nvidia-smi conf-compute -gm

       nvidia-smi conf-compute --get-mem-size-info

       Get confidential	compute	GPU protected/unprotected memory sizes.

       9) Set confidential compute GPU unprotected memory size

       nvidia-smi conf-compute -sm <value>

       nvidia-smi conf-compute --set-unprotected-mem-size <value>

       Set confidential	compute	GPU unprotected	memory size in KiB. Requires
       root.

       10) Set confidential compute GPUs ready state

       nvidia-smi conf-compute -srs <value>

       nvidia-smi conf-compute --set-gpus-ready-state <value>

       Set confidential	compute	GPUs ready state. The value must be 1 to set
       the ready state and 0 to	unset it. Requires root.

       11) Query confidential compute GPUs ready state

       nvidia-smi conf-compute -grs

       nvidia-smi conf-compute --get-gpus-ready-state

       Get confidential	compute	GPUs ready state.

       12) Set Confidential Compute Key	Rotation Max Attacker Advantage

       nvidia-smi conf-compute -skr <value>

       nvidia-smi conf-compute --set-key-rotation-max-attacker-advantage

       Set Confidential	Compute	Key Rotation Max Attacker Advantage.

       13) Display Confidential	Compute	Key Rotation Threshold Info

       nvidia-smi conf-compute -gkr

       nvidia-smi conf-compute --get-key-rotation-threshold-info

       Display Confidential Compute Key	Rotation Threshold Info.

       14) Display Confidential	Compute	Multi-GPU Mode

       nvidia-smi conf-compute -mgm

       nvidia-smi conf-compute --get-multigpu-mode

       Display Confidential Compute Multi-GPU Mode.

       15) Display Confidential	Compute	Detailed Info

       nvidia-smi conf-compute -q

       nvidia-smi conf-compute --query-conf-compute

       Display Confidential Compute Detailed Info.

   GPU Performance Monitoring(GPM) Stream State
       The  "nvidia-smi	 gpm"  command-line  is	used to	manage GPU performance
       monitoring unit.	It provides options to query and set the stream	state.

       Usage:

       1) Display help menu

       nvidia-smi gpm -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi gpm -i <GPU IDs>

       nvidia-smi gpm --id <GPU	IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option
       applies to all of the supported GPUs.

       3) Query	GPU performance	monitoring stream state

       nvidia-smi gpm -g

       nvidia-smi gpm --get-stream-state

       Get gpm stream state for	the selected devices.

       4) Set GPU performance monitoring stream	state

       nvidia-smi gpm -s <value>

       nvidia-smi gpm --set-stream-state <value>

       Set gpm stream state for	the selected devices.

   GPU PCI section
       The "nvidia-smi pci" command-line is used to manage GPU	PCI  counters.
       It provides options to query and	clear PCI counters.

       Usage:

       1) Display help menu

       nvidia-smi pci -h

       Displays	help menu for using the	command-line.

       2) Query	PCI error counters

       nvidia-smi pci -i <GPU index> -gErrCnt

       Query PCI error counters	of a GPU

       3) Clear	PCI error counters

       nvidia-smi pci -i <GPU index> -cErrCnt

       Clear PCI error counters	of a GPU

       4) Query	PCI counters

       nvidia-smi pci -i <GPU index> -gCnt

       Query PCI RX and	TX counters of a GPU

   Power Smoothing
       The  "nvidia-smi	 power-smoothing" command-line is used to manage Power
       Smoothing related data on the GPU. It provides  options	to  set	 Power
       Smoothing related data and query	the preset profile definitions.

       Usage:

       1) Display help menu

       nvidia-smi power-smoothing -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi power-smoothing -i <GPU IDs>

       nvidia-smi power-smoothing --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option
       applies to all of the supported GPUs.

       2) List one Preset Profile ID

       nvidia-smi power-smoothing -p <Profile ID>

       nvidia-smi power-smoothing --profile <Profile ID>

       Selects a Preset	Profile	ID for which to	update a value.	This is
       required	when updating a	Preset Profile parameter and prohibited	in all
       other cases.

       2) Set Active Preset Profile ID

       nvidia-smi power-smoothing -spp <Profile	ID>

       nvidia-smi power-smoothing --set-preset-profile <Profile	ID>

       Activate	the deisred Preset Profile ID. Requires	root.

       2) Update percentage Total Module Power (TMP) floor

       nvidia-smi power-smoothing -ptf <Percentage> -p <Profile	ID>

       nvidia-smi power-smoothing --percent-tmp-floor <Percentage> --profile
       <Profile	ID>

       Sets the	percentage TMP floor to	inputted value for a given Preset
       Profile ID. The desired percentage should be from 0 - 100, given	in the
       form of "AB.CD",	with a maximum of two decimal places of	precision. For
       example,	to set value to	34.56%,	user will input	34.56. Input can also
       contain zero or one decimal places of precision.	This option requires a
       profile ID as an	argument. Requires root.

       2) Update Ramp-Up Rate

       nvidia-smi power-smoothing -rur <value> -p <Profile ID>

       nvidia-smi power-smoothing --ramp-up-rate <value> --profile <Profile
       ID>

       Sets the	Ramp-Up	Rate to	the desired value for a	given Preset Profile
       ID. The rate given must be in the units of mW/s.	This option requires a
       profile ID as an	argument. Requires root.

       2) Update Ramp-Down Rate

       nvidia-smi power-smoothing -rdr <value> -p <Profile ID>

       nvidia-smi power-smoothing --ramp-down-rate <value> --profile <Profile
       ID>

       Sets the	Ramp-Down Rate to the desired value for	a given	Preset Profile
       ID. The rate given must be in the units of mW/s.	This option requires a
       profile ID as an	argument. Requires root.

       2) Update Ramp-Down Hysteresis

       nvidia-smi power-smoothing -rdh <value> -p <Profile ID>

       nvidia-smi power-smoothing --ramp-down-hysteresis <value> --profile
       <Profile	ID>

       Sets the	Ramp-Down Hysteresis to	the desired value for a	given Preset
       Profile ID. The rate given must be in the units of ms. This option
       requires	a profile ID as	an argument. Requires root.

       2) Displays the Preset Profile definitions for all Profile IDs

       nvidia-smi power-smoothing -ppd

       nvidia-smi power-smoothing --print-profile-definitions

       Displays	all values for each Preset Profile IDs.

       2) Set Feature State

       nvidia-smi power-smoothing -s <state>

       nvidia-smi power-smoothing --state <state>

       Sets the	state of the feature to	either 0/DISABLED or 1/ENABLED.
       Requires	root.

    Power Profiles"
       The "nvidia-smi power-profiles" command-line is used to manage Workload
       Power  Profiles	related	data on	the GPU. It provides options to	update
       Power Profiles data and query the supported Power Profiles.

       Usage:

       1) Display help menu

       nvidia-smi power-profiles -h

       Displays	help menu for using the	command-line.

       2) List one or more GPUs

       nvidia-smi power-profiles -i <GPU IDs>

       nvidia-smi power-profiles --id <GPU IDs>

       Selects one or more GPUs	using the given	comma-separated	GPU indexes,
       PCI bus IDs or UUIDs. If	not used, the given command-line option
       applies to all of the supported GPUs.

       3) List Power Profiles

       nvidia-smi power-profiles -l

       nvidia-smi power-profiles --list

       List all	Workload Power Profiles	supported by the device.

       4) List Detailed	Power Profiles info

       nvidia-smi power-profiles -ld

       nvidia-smi power-profiles --list-detailed

       List all	Workload Power Profiles	supported by the device	along with
       their metadata. This includes the Profile ID, the Priority (where a
       lower number indicates a	higher priority), and Profiles that conflict
       with the	given profile. If two or more conflicting profiles are
       requested, not all my be	enforced.

       5) Get Requested	Profiles

       nvidia-smi power-profiles -gr

       nvidia-smi power-profiles --get-requested

       Get a list of all currently requested Power Profiles. Note that if any
       of the profiles conflict, then not all may be enforced.

       6) Set Requested	Profiles

       nvidia-smi power-profiles -sr <Profile ID>

       nvidia-smi power-profiles --set-requested <Profile ID(s)>

       Adds the	input profile(s) to the	list of	requested Power	Profiles. The
       input is	a comma	separated list of profile IDs with no spaces. Requires
       root.

       7) Clear	Requested Profiles

       nvidia-smi power-profiles -cr <Profile ID>

       nvidia-smi power-profiles --clear-requested <Profile ID(s)>

       Removes the input profile(s) to the list	of requested Power Profiles.
       The input is a comma separated list of profile IDs with no spaces.
       Requires	root.

       8) Get Enforced Profiles

       nvidia-smi power-profiles -ge

       nvidia-smi power-profiles --get-enforced

       Get a list of all currently enforced Power Profiles. Note that this
       list may	differ from the	requested Profiles list	if multiple
       conflicting profiles are	selected.

   GPU RUSD section
       The "nvidia-smi rusd" command-line is used to manage GPU	RUSD settings.
       It provides options to set RUSD settings. RUSD is Read only User	Shared
       Data buffer that	keeps GPU metrics.

       Usage:

       1) Display help menu

       nvidia-smi rusd -h

       Displays	help menu for using the	command-line. Example:

       nvidia-smi rusd -h

	   rusd	-- RUSD	settings section

	   Usage: nvidia-smi rusd [options]

	   Options include:
	   [-h | --help]: Display help information
	   [-i | --id]:	Enumeration index, PCI bus ID or UUID.

	   [-spm | --set-polling-mask]:	Set polling mask for the given comma-separated list of metric groups
	       Groups are "none", "clock", "performance", "memory", "power", "thermal",	"pci", "fan", "proc_util", "all"

       2) Set RUSD poll	mask

       nvidia-smi rusd -i <GPU index> -spm <mask_value>

       Set RUSD	poll mask Example:

       nvidia-smi rusd -spm all
       nvidia-smi rusd -spm clock,performance
       nvidia-smi rusd -spm none

   GPU PRM section
       The "nvidia-smi prm" command-line is used to read GPU PRM registers and
       counters. This option  is  only	available  on  GPUs  based  on	NVIDIA
       Blackwell or newer architectures.

       Usage:

       1) Display help menu

       nvidia-smi prm -h

       Displays	the help menu for using	the command-line. Example:

       nvidia-smi prm -h
	   [-h | --help]: Display help information
	   [-i | --index]: GPU index; mandatory	if "-n,	--name"	is selected
	   [-l | --list]: List all supported PRM registers and counters
	   [-n | --name]: PRM Register name; mandatory if any of "-f" or "-p" are selected
	   [-f | --info]: List all supported PRM parameters for	the given register or counter
	   [-p | --params]: PRM	input parameters, if any; parameters are a comma-separated list	of <key>=<value> pairs

       2) List supported PRM registers and counters

       nvidia-smi prm --list

       Displays	the list of supported GPU PRM registers	and counters. Example:

       nvidia-smi prm --list
       Supported PRM registers:
		   GHPKT
		   MCAM
		   MGIR
		   MLPC
		   MORD
		   MPSCR
		   MTCAP
		   MTECR
		   MTEIM
		   MTEWE
		   MTIE
		   MTIM
		   MTRC_CAP
		   MTRC_CONF
		   MTRC_CTRL
		   MTSR
		   PAOS
		   PDDR
		   PGUID
		   PLIB
		   PLTC
		   PMAOS
		   PMLP
		   PMTU
		   PPAOS
		   PPCNT
		   PPHCR
		   PPLM
		   PPLR
		   PPRM
		   PPRT
		   PPSLC
		   PPSLS
		   PPTT
		   PTYS
		   SLRG
		   SLTP
	   Supported PRM counters:
		   link_down_events
		   oper_recovery
		   plr_rcv_code_err
		   plr_rcv_codes
		   plr_rcv_uncorrectable_code
		   plr_retry_codes
		   plr_sync_events
		   plr_xmit_codes
		   plr_xmit_retry_events
		   port_xmit_wait
		   successful_recovery_events
		   time_between_last_2_recoveries
		   time_since_last_recovery
		   total_successful_recovery_events

       3) List supported input parameters for a	given PRM register or counter

       nvidia-smi prm -n <register> -f or nvidia-smi prm -c <counter> -f

       Lists  the  supported  input  parameters	 (if  any)  for	 the given PRM
       register	or counter. Example:

       nvidia-smi prm -n PPCNT -f
       Supported PRM parameters	for register PPCNT:
	       grp
	       port_type
	       lp_msb
	       pnat
	       local_port
	       swid
	       prio_tc
	       grp_profile
	       plane_ind
	       counters_cap
	       lp_gl
	       clr

       Note that some registers	do not take any	input parameters; in this case
       the output of the above command will be '[NONE]'. Example:

       nvidia-smi prm -n MGIR -f
       Supported PRM parameters	for register MGIR:
	       [NONE]

       4) Read GPU PRM register

       nvidia-smi prm -i <GPU-index> -n	<register> -p <Comma-separated list of
       key EQUALS value	pairs>

       Reads the specified GPU PRM register with the  given  input  parameters
       and  outputs  to	 the  screen. Note that	the output may not include all
       information in the register. Example:

       nvidia-smi prm -i 0 -n PPCNT -p=local_port=1,pnat=1,grp=35
       PPCNT:
	       grp = 35, port_type = 0,	lp_msb = 0, pnat = 1, local_port = 1, swid = 0
	       prio_tc = 0, grp_profile	= 0, plane_ind = 0, counters_cap = 0, lp_gl = 0, clr = 0

       5) Read GPU PRM counter

       nvidia-smi prm -i <GPU-index> -c	<counter> -p <Comma-separated list  of
       key EQUALS value	pairs>

       Reads the specified GPU PRM counter with	the given input	parameters and
       outputs to the screen. Example:

	   nvidia-smi prm -i 0 -c plr_rcv_codes	-p "local_port=1"
	   plr_rcv_codes ==> 0x64aace03ff

   System on Chip section
       The  "nvidia-smi	 soc"  command-line  is	 used to manage	system on chip
       (SoC) metrics It	provides  options  to  query  SoC  metrics.  This  SoC
       section is only available on Tegra Linux	system.

       Usage:

       1) Display help menu

       nvidia-smi soc -h

       Displays	help menu for using the	command-line.

       Example:

       nvidia-smi soc -h
	   soc -- System on Chip section

	   Usage: nvidia-smi soc [options]

	   Options include:
	   [-h | --help]: Display help information
	   [-q | --query]: Query SoC metrics

       2) Query	Soc Metrics

       nvidia-smi soc -q

       Query SoC metrics.

       Example:

       nvidia-smi soc -q

       Memory:
	   MemTotal: 128.83 GiB
	   MemFree: 89.43 GiB
       CPU:
	   cpu0:
	       clock: 972MHz
	       utilization: 0%
	   cpu1:
	       clock: 972MHz
	       utilization: 0%
	   cpu2:
	       clock: 972MHz
	       utilization: 0%
	   cpu3:
	       clock: 972MHz
	       utilization: 0%
	   cpu4:
	       clock: 972MHz
	       utilization: 0%
	   cpu5:
	       clock: 972MHz
	       utilization: 0%
	   cpu6:
	       clock: 972MHz
	       utilization: 0%
	   cpu7:
	       clock: 972MHz
	       utilization: 0%
	   cpu8:
	       clock: 1350MHz
	       utilization: 0%
	   cpu9:
	       clock: 1674MHz
	       utilization: 0%
	   cpu10:
	       clock: 972MHz
	       utilization: 0%
	   cpu11:
	       clock: 972MHz
	       utilization: 0%
	   cpu12:
	       clock: 972MHz
	       utilization: 0%
	   cpu13:
	       clock: 972MHz
	       utilization: 0%
       Memory Controller:
	   utilization:	0%
	   clock: 4266MHz
       Video Image Compositor:
	   state: off
       Programmable Vision Accelerator:
	   state: off
       Audio Processing	Engine:
	   Clock: 300 MHz
       Thermal info:
	   cpu-thermal:	59.22C
	   tj-thermal: 60.41C
	   soc012-thermal: 58.47C
	   soc345-thermal: 60.41C
       Power info:
	   VDD_GPU: 5145 mW
	   VDD_CPU_SOC_MSS: 5937 mW
	   VIN_SYS_5V0:	4939 mW

UNIT ATTRIBUTES
       The  following  list  describes all possible data returned by the -q -u
       unit query option. Unless otherwise noted  all  numerical  results  are
       base 10 and unitless.

   Timestamp
       The current system timestamp at the time	nvidia-smi was invoked.	Format
       is "Day-of-week Month Day HH:MM:SS Year".

   Driver Version
       The  version  of	the installed NVIDIA display driver. Format is "Major-
       Number.Minor-Number".

   HIC Info
       Information about any Host Interface Cards (HIC)	that are installed  in
       the system.

       Firmware	Version
		      The version of the firmware running on the HIC.

   Attached Units
       The number of attached Units in the system.

   Product Name
       The  official  product name of the unit.	This is	an alphanumeric	value.
       For all S-class products.

   Product Id
       The product identifier for the unit. This is an alphanumeric  value  of
       the form	"part1-part2-part3". For all S-class products.

   Product Serial
       The  immutable  globally	 unique	 identifier  for  the unit. This is an
       alphanumeric value. For all S-class products.

   Firmware Version
       The version of the firmware running on  the  unit.  Format  is  "Major-
       Number.Minor-Number". For all S-class products.

   LED State
       The  LED	 indicator is used to flag systems with	potential problems. An
       LED color of AMBER indicates an issue. For all S-class products.

       Color	      The color	 of  the  LED  indicator.  Either  "GREEN"  or
		      "AMBER".

       Cause	      The  reason  for the current LED color. The cause	may be
		      listed as	any combination	of "Unknown", "Set to AMBER by
		      host system", "Thermal sensor  failure",	"Fan  failure"
		      and "Temperature exceeds critical	limit".

   Temperature
       Temperature readings for	important components of	the Unit. All readings
       are  in	degrees	 C. Not	all readings may be available. For all S-class
       products.

       Intake	      Air temperature at the unit intake.

       Exhaust	      Air temperature at the unit exhaust point.

       Board	      Air temperature across the unit board.

   PSU
       Readings	for the	unit power supply. For all S-class products.

       State	      Operating	state of the PSU. The power supply  state  can
		      be  any  of  the	following: "Normal", "Abnormal", "High
		      voltage",	  "Fan	 failure",   "Heatsink	 temperature",
		      "Current	limit",	 "Voltage  below  UV alarm threshold",
		      "Low-voltage", "I2C remote  off  command",  "MOD_DISABLE
		      input" or	"Short pin transition".

       Voltage	      PSU voltage setting, in volts.

       Current	      PSU current draw,	in amps.

   Fan Info
       Fan readings for	the unit. A reading is provided	for each fan, of which
       there can be many. For all S-class products.

       State	      The state	of the fan, either "NORMAL" or "FAILED".

       Speed	      For a healthy fan, the fan's speed in RPM.

   Attached GPUs
       A  list	of PCI bus ids that correspond to each of the GPUs attached to
       the unit. The bus ids have the  form  "domain:bus:device.function",  in
       hex. For	all S-class products.

NOTES
       On  Linux,  NVIDIA device files may be modified by nvidia-smi if	run as
       root. Please see	the relevant section of	the driver README file.

       The -a and -g arguments are now deprecated  in  favor  of  -q  and  -i,
       respectively. However, the old arguments	still work for this release.

EXAMPLES
   nvidia-smi -q
       Query  attributes  for  all  GPUs  once,	 and  display in plain text to
       stdout.

   nvidia-smi --format=csv,noheader --query-gpu=uuid,persistence_mode
       Query UUID and persistence mode of all GPUs in the system.

   nvidia-smi -q -d ECC,POWER -i 0 -l 10 -f out.log
       Query ECC errors	and power consumption for GPU 0	at a frequency	of  10
       seconds,	indefinitely, and record to the	file out.log.

	       nvidia-smi	    -c		 1	     -i		  GPU-
       b2f5f1b745e3d23d-65a3a26d-097db358-7303e0b6-149642ff3d219f8587cde3a8""
       Set  the	 compute  mode	to  "PROHIBITED"  for  GPU  with  UUID	 "GPU-
       b2f5f1b745e3d23d-65a3a26d-097db358-7303e0b6-149642ff3d219f8587cde3a8".

   nvidia-smi -q -u -x --dtd
       Query  attributes  for  all  Units once,	and display in XML format with
       embedded	DTD to stdout.

   nvidia-smi --dtd -u -f nvsmi_unit.dtd
       Write the Unit DTD to nvsmi_unit.dtd.

   nvidia-smi -q -d SUPPORTED_CLOCKS
       Display supported clocks	of all GPUs.

   nvidia-smi -i 0 --applications-clocks 2500,745
       Set applications	clocks to 2500 MHz memory, and 745 MHz graphics.

   nvidia-smi mig -cgi 19
       Create a	MIG GPU	instance on profile ID 19.

   nvidia-smi mig -cgi 19:2
       Create a	MIG GPU	instance on profile ID 19 at placement start index 2.

   nvidia-smi boost-slider -l
       List all	boost sliders for all GPUs.

   nvidia-smi boost-slider --vboost 1
       Set vboost to value 1 for all GPUs.

   nvidia-smi power-hint -l
       List clock range, temperature range and	supported  profiles  of	 power
       hint.

   nvidia-smi boost-slider -gc 1350 -t 60 -p 0
       Query power hint	with graphics clock at 1350MHz,	temperature at 60C and
       profile ID at 0.

   nvidia-smi boost-slider -gc 1350 -mc	1215 -t	n5 -p 1
       Query  power  hint  with	 graphics  clock  at  1350MHz, memory clock at
       1216MHz,	temperature at -5C and profile ID at 1.

DEPRECATON AND REMOVAL NOTICES
   Features deprecated and/or removed between nvidia-smi v580 Update and v575

        Removed deprecated graphics voltage value  from  Voltage  section  of
	 'nvidia-smi -q'

        Removed deprecated GPU	Reset Status from 'nvidia-smi -q' output

        Deprecated GPU	Fabric State and Status	from 'nvidia-smi -q'

CHANGE LOG
   Known Issues

        On  systems  where  GPUs  are	NUMA  nodes, the accuracy of FB	memory
	 utilization provided by nvidia-smi depends on the  memory  accounting
	 of  the operating system. This	is because FB memory is	managed	by the
	 operating system instead of the NVIDIA	GPU driver.  Typically,	 pages
	 allocated  from  FB  memory  are  not released	even after the process
	 terminates to enhance performance. In scenarios where	the  operating
	 system	 is  under  memory  pressure,  it  may	resort to utilizing FB
	 memory. Such actions can result in discrepancies in the  accuracy  of
	 memory	reporting.

        On  Linux  GPU	 Reset	can't  be  triggered when there	is pending GOM
	 change.

        On Linux GPU Reset may	not successfully change	pending	 ECC  mode.  A
	 full reboot may be required to	enable the mode	change.

        On Linux platforms that configure NVIDIA GPUs as NUMA nodes, enabling
	 persistence  mode  or	resetting GPUs may print 'Warning: persistence
	 mode is disabled on device' if	nvidia-persistenced is not running, or
	 if nvidia-persistenced	cannot access files  in	 the  NVIDIA  driver's
	 procfs	  directory   for  the	device	(/proc/driver/nvidia/gpus/<PCI
	 config='' address>=''>/). During GPU reset and	 driver	 reload,  this
	 directory  will  be deleted and recreated, and	outstanding references
	 to the	deleted	directory, such	 as  mounts  or	 shells,  can  prevent
	 processes from	accessing files	in the new directory.

        There	might  be  a slight discrepency	between	volatile/aggregate ECC
	 counters if recovery action was not taken

        The GPU hostname commands are currently only supported	on  compatible
	 GB200 platforms.

   Changes between nvidia-smi v590 Update and v580

        Added support for inclusion of	NIC data-direct	devices	in 'nvidia-smi
	 topo -m'

        Added	support	 to  display System on Chip metrics via	a new command:
	 'nvidia-smi soc' (support only	on Tegra Linux system)

        Added support for setting RUSD	(Read only User	Shared Data)  settings
	 via a new command: 'nvidia-smi	rusd'

        Deprecated Applications Clocks, including:

        Current  Applications	Clocks	frequencies  for  Memory  and Graphics
	 clocks

        Default Applications  Clocks  frequencies  for	 Memory	 and  Graphics
	 clocks

        The  -ac option to set	Applications Clocks frequencies	for Memory and
	 Graphics clocks

        The -rac option to reset Applications Clocks frequencies  for	Memory
	 and Graphics clocks

        Added Nvlink version to 'nvidia-smi nvlink -info' output

        Added new option 'nvidia-smi power-profiles -or' to set and overwrite
	 the requested power profiles.

        Added	new field 'EDPp	Multipler' to 'nvidia-smi -q', which expresses
	 the EDPp ratio	as a percentage.

        Added	new  field  '--query-gpu=edpp_multipler'   to	retrieve   the
	 multipler.

        Added Unrepairable memory status to ECC field:	'nvidia-smi -q -d ECC'

        Modified  the	'FB  Memory  Usage', 'BAR1 Memory Usage' fields	in the
	 'nvidia-smi -q' output	to 'Shared  FB	Memory	Usage',	 'Shared  BAR1
	 Usage'	respectively to	indicate they are shared among the MIG devices
	 associated with the same GPU instance.

        Added	a  new	sub-option '-ei' to 'nvidia-smi	vgpu -sl' to query the
	 vGPU software scheduler logs on the user provided engine.

        Added new '--query-gpu' options for Delayed Power Smoothing:

	  power_smoothing.supported

	  power_smoothing.primary_power_floor

	  power_smoothing.secondary_power_floor

	  power_smoothing.min_primary_floor_activation_offset

	  power_smoothing.min_primary_floor_activation_point

	  power_smoothing.window_multiplier

	  power_smoothing.curr_profile.secondary_power_floor

	  power_smoothing.curr_profile.primary_floor_act_window_multiplier

	  power_smoothing.curr_profile.primary_floor_tar_window_multiplier

	  power_smoothing.curr_profile.primary_floor_act_offset

	  power_smoothing.admin_override.secondary_power_floor

	  power_smoothing.admin_override.primary_floor_act_window_multiplier

	  power_smoothing.admin_override.primary_floor_tar_window_multiplier

	  power_smoothing.admin_override.primary_floor_act_offset

        Added	4  new	configurable  profile  fields  in  'nvidia-smi	power-
	 smoothing'.

   Changes between nvidia-smi v580 Update and v575

        Added	Device NVLINK Encryption status	in the new nvlink info command
	 'nvidia-smi nvlink -info'

        Added Muti-GPU	mode NVLINK Encryption	(NVLE)	in  'nvidia-smi	 conf-
	 compute -mgm' and 'nvidia-smi conf-compute -q'

        Added	Nvlink	Firmware  Version  info	 to  the  nvlink  info command
	 'nvidia-smi nvlink -info'

        Added Channel/TPC repair pending flags	to ECC field:  'nvidia-smi  -q
	 -d ECC'

        Removed  deprecated  graphics	voltage	 value from Voltage section of
	 'nvidia-smi -q'

        Removed deprecated GPU	Reset Status from 'nvidia-smi -q' output

        Added a new option to read GPU	PRM registers: 'nvidia-smi prm'

        Added a new  'Bus'  reset  option  to	the  existing  reset  command:
	 'nvidia-smi -r	bus'

        Added	a  new	output	field  called 'GPU PDI'	to the 'nvidia-smi -q'
	 output

        Added a new cmdline option  '--columns'  or  '-col'  to  display  the
	 summary in multi-column format.

        Modified  the	'Memory-Usage',	'BAR1-Usage' headers in	the MIG	device
	 table to 'Shared Memory-Usage', 'Shared BAR1-Usage'  respectively  to
	 indicate  they	 are  shared among the MIG devices associated with the
	 same GPU instance.

        Updated GPU Fabric output from	'nvidia-smi -q'	output:

	  Added Incorrect Configuration and Summary fields to	Fabric	Health
	   output

        Added support for NVIDIA Jetson Thor platform

	  Note	 that  the  following  features	are currently not supported on
	   Jetson Thor:

	    Clock queries and commands

	    Power queries and commands

	    Thermal and temperature queries

	    Per-process utilization via 'nvidia-smi pmon'

	    SOC memory	utilization

        Added new Incorrect Configuration Strings to Fabric Health output

	  Incompatible	Gpu Firmware

	  Invalid Location

        Added new command line	options	'--get-hostname' and  '--set-hostname'
	 to get	and set	GPU hostnames, respectively.

        Added a new command to	read GPU PRM counters: 'nvidia-smi prm -c'

   Changes between nvidia-smi v575 Update and v570

        Added new --query-gpu option inforom.checksum_validation to check the
	 inforom      checksum	    validation	   (nvidia-smi	   --query-gpu
	 inforom.checksum_validation)

        Updated 'nvidia-smi -q' to print both 'Instantaneous Power Draw'  and
	 'Average  Power  Draw'	 in  all  cases	 where 'Power Draw' used to be
	 printed.

        Added support to nvidia-smi c2c -e to display C2C Link	Errors

        Added support to nvidia-smi c2c  -gLowPwrInfo	to  display  C2C  Link
	 Power state

        Added new fields for Clock Event Reason Counters which	can be queries
	 with  'nvidia-smi  -q'	 or  with  the	'nvidia-smi -q -d PERFORMANCE'
	 display flag.

        Added new query GPU options for Clock Event Reason Counters: 'nvidia-
	 smi							      --query-
	 gpu=clocks_event_reasons_counters.{sw_power_cap,sw_thermal_slowdown,sync_boost,hw_thermal_slowdown,hw_power_brake_slowdown}'

        Added	new  fields  for  MIG  timeslicing  which  can be queried with
	 'nvidia-smi -q'

        Added a new cmdline option '-smts' to 'nvidia-smi vgpu' to  set  vGPU
	 MIG timeslice mode

        Added	a  new	sub-option  '-gi' to 'nvidia-smi vgpu -c' to query the
	 currently creatable vGPU types	on the user provided GPU Instance

        Added a new  sub-option  '-gi'	 to  'nvidia-smi  vgpu	-q'  to	 query
	 detailed  information	of  the	currently active vGPU instances	on the
	 user provided GPU Instance

        Added a new sub-option	'-gi' to 'nvidia-smi vgpu -ss'	to  query  the
	 vGPU software scheduler state on the user provided GPU	Instance

        Added	a  new	sub-option '-gi' to 'nvidia-smi	vgpu -sl' to query the
	 vGPU software scheduler logs on the user provided GPU Instance

        Added a new cmdline option '-ghm' to 'nvidia-smi vgpu'	 to  get  vGPU
	 heterogeneous mode on the user	provided GPU Instance

        Added	a  new	sub-option  '-gi' to 'nvidia-smi vgpu -shm' to set the
	 vGPU heterogeneous mode on the	user provided GPU Instance

        Added new field for max instances  per	 GPU  Instance	which  can  be
	 queried with 'nvidia-smi vgpu -s -v'

        Added a new sub-option	'-gi' to 'nvidia-smi vgpu set-scheduler-state'
	 to  set  the  vGPU  software scheduler	state on the user provided GPU
	 Instance.

        Added a new sub-option	'-gi' to 'nvidia-smi  vgpu  -c	-v'  to	 query
	 detailed information of the creatable vGPU types on the user provided
	 GPU Instance

        Added	a new cmdlin option '--query-gpu-instance-vgpu-scheduler-logs'
	 to 'nvidia-smi	vgpu' to get the vGPU software scheduler logs  on  the
	 user provided GPU Instance in CSV format. See nvidia-smi vgpu --help-
	 gpu-instance-vgpu-query-scheduler-logs	for details.

   Changes between nvidia-smi v570 Update and v565

        Added	new  cmdline option '-\sLWidth'	and '-\gLWidth'	to 'nvidia-smi
	 nvlink'

        Added new ability to display  Nvlink  sleep  state  with  'nvidia-smi
	 nvlink	-\s for	Blackwell and onward generations'

        Added	new  query  GPU	options	for average/instant module power draw:
	 'nvidia-smi --query-gpu=module.power.draw.{average,instant}'

        Added new query GPU options for default/max/min module	power  limits:
	 'nvidia-smi						      --query-
	 gpu=module.power.{default_limit,max_limit,min_limit}'

        Added new query GPU options  for  module  power  limits:  'nvidia-smi
	 --query-gpu=module.power.limit'

        Added	new  query  GPU	 options  for  enforced	 module	 power limits:
	 'nvidia-smi --query-gpu=module.enforced.power.limit'

        Added new query GPU aliases for GPU Power options

        Added a new command to	get  confidential  compute  info:  'nvidia-smi
	 conf-compute -q'

        Added	new  Power Profiles section in nvidia-smi -q and corresponding
	 -d display flag POWER_PROFILES

        Added	new  Power  Profiles  option  'nvidia-smi  power-profiles'  to
	 get/set power profiles	related	information.

        Added the platform information	query to 'nvidia-smi -q'

        Added	the  platform  information  query  to  'nvidia-smi --query-gpu
	 platform'

        Added new Power Smoothing option 'nvidia-smi power-smoothing' to  set
	 power smoothing related values.

        Added	new Power Smoothing section in nvidia-smi -q and corresponding
	 -d display flag POWER_SMOOTHING

        Deprecated graphics voltage value from	Voltage	section	of  nvidia-smi
	 -q.  Voltage  now  always  displays as	'N/A' and will be removed in a
	 future	release.

        Added new topo	option nvidia-smi topo -nvme to	display	GPUs vs	 NVMes
	 connecting path.

        Changed  help	string	for the	command	'nvidia-smi topo -p2p -p' from
	 'prop'	to 'pcie' to better describe the p2p capability.

        Added new command 'nvidia-smi pci -gCnt' to query PCIe	RX/TX Bytes.

        Added EGM  capability	display	 under	new  Capabilities  section  in
	 nvidia-smi -q command.

        Add multiGpuMode dipsplay via nvidia-smi via 'nvidia-smi conf-compute
	 --get-multigpu-mode' or 'nvidia-smi conf-compute -mgm'

        GPU  Reset  Status in nvidia-smi -q has been deprecated. GPU Recovery
	 action	provides all the necessary actions

        nvidia-smi -q will now	display	Dram encryption	state

        nvidia-smi  -den/--dram-encryption   0/1   to	 disable/enable	  dram
	 encryption

        Added	new status to nvidia fabric health. nvidia-smi -q will display
	 3 new fields in Fabric	Health - Route	Recovery  in  progress,	 Route
	 Unhealthy and Access Timeout Recovery

        In  nvidia-smi	 -q  Platform  Info - RACK GUID	is changed to Platform
	 Info -	RACK Serial Number

        In nvidia-smi --query-gpu new option for gpu_recovery_action is added

        Added new counters for	Nvlink5	in nvidia-smi nvlink -e:

	  Effective Errors to get sum of the number of	errors in each	Nvlink
	   packet

	  Effective BER to get	Effective BER for effective errors

	  FEC	Errors	-  0  to  15  to  get  count of	symbol errors that are
	   corrected

        Added a new output field called 'GPU Fabric GUID' to the  'nvidia-smi
	 -q' output

        Added a new property called 'platform.gpu_fabric_guid'	to 'nvidia-smi
	 --query-gpu'

        Updated 'nvidia-smi nvlink -gLowPwrInfo' command to display the Power
	 Threshold Range and Units

   Changes between nvidia-smi v565 Update and v560

        Added the reporting of	vGPU homogeneous mode to 'nvidia-smi -q'.

        Added	the  reporting	of  homogeneous	vGPU placements	to 'nvidia-smi
	 vgpu -s -v', complementing the	existing  reporting  of	 heterogeneous
	 vGPU placements.

   Changes between nvidia-smi v560 Update and v555

        Added 'Atomic Caps Inbound' in	the PCI	section	of 'nvidia-smi -q'.

        Updated  ECC  and  row	 remapper output for options '--query-gpu' and
	 '--query-remapped-rows'.

        Added support for events including ECC	single-bit error  storm,  DRAM
	 retirement,  DRAM  retirement	failure, contained/nonfatal poison and
	 uncontained/fatal poison.

        Added support in 'nvidia-smi nvlink  -e'  to  display	NVLink5	 error
	 counters

   Changes between nvidia-smi v550 Update and v545

        Added	a  new	cmdline	 option	 to  print  out	 version  information:
	 --version

        Added ability to print	out only the GSP firmware version with'nvidia-
	 smi   -q    -d'.    Example	commandline:	nvidia-smi    -q    -d
	 GSP_FIRMWARE_VERSION

        Added support to query	pci.baseClass and pci.subClass.	See nvidia-smi
	 --help-query-gpu for details.

        Added PCI base	and sub	classcodes to 'nvidia-smi -q' output.

        Added	new  cmdline option '--format' to 'nvidia-smi dmon' to support
	 'csv',	'nounit' and 'noheader'	format specifiers

        Added a new cmdline option '--gpm-options' to	'nvidia-smi  dmon'  to
	 support GPM metrics report in MIG mode

        Added the NVJPG and NVOFA utilization report to 'nvidia-smi pmon'

        Added	the  NVJPG  and	 NVOFA utilization report to 'nvidia-smi -q -d
	 utilization'

        Added the NVJPG and NVOFA utilization report to 'nvidia-smi vgpu  -q'
	 to report NVJPG/NVOFA utilization on active vgpus

        Added	the NVJPG and NVOFA utilization	report to 'nvidia-smi vgpu -u'
	 to periodically report	NVJPG/NVOFA utilization	on active vgpus

        Added the NVJPG and NVOFA utilization report to 'nvidia-smi vgpu  -p'
	 to periodically report	NVJPG/NVOFA utilization	on running processs of
	 active	vgpus

        Added	a  new	cmdline	option '-shm' to 'nvidia-smi vgpu' to set vGPU
	 heterogeneous mode

        Added the reporting of	vGPU heterogeneous mode	in 'nvidia-smi -q'

        Added ability to call 'nvidia-smi  mig	 -lgip'	 and  'nvidia-smi  mig
	 -lgipp' to work without requiring MIG being enabled

        Added	support	 to  query confidential	compute	key rotation threshold
	 info.

        Added support to set confidential compute key rotation	 max  attacker
	 advantage.

        Added	a  new cmdline option '--sparse-operation-mode'	to 'nvidia-smi
	 clocks' to set	the sparse operation mode

        Added the reporting of	sparse operation mode  to  'nvidia-smi	-q  -d
	 PERFORMANCE'

   Changes between nvidia-smi v535 Update and v545

        Added support to query	the timestamp and duration of the latest flush
	 of the	BBX object to the inforom storage.

        Added support for reporting out GPU Memory power usage.

   Changes between nvidia-smi v535 Update and v530

        Updated  the  SRAM error status reported in the ECC query 'nvidia-smi
	 -q -d ECC'

        Added support to query	and report the GPU JPEG	and OFA	(Optical  Flow
	 Accelerator) utilizations.

        Removed deprecated 'stats' command.

        Added support to set the vGPU software	scheduler state.

        Renamed counter collection unit to gpu	performance monitoring.

        Added new C2C Mode reporting to device	query.

        Added	 back  clock_throttle_reasons  to  --query-gpu	to  not	 break
	 backwards compatibility

        Added support to get confidential compute  CPU	 capability  and  GPUs
	 capability.

        Added	support	to set confidential compute unprotected	memory and GPU
	 ready state.

        Added support to get confidential compute memory info and  GPU	 ready
	 state.

        Added	 support   to  display	confidential  compute  devtools	 mode,
	 environment and feature status.

   Changes between nvidia-smi v525 Update and v530

        Added support to query	power.draw.average and power.draw.instant. See
	 nvidia-smi --help-query-gpu for details.

        Added support to get the vGPU software	scheduler state.

        Added support to get the vGPU software	scheduler logs.

        Added support to get the vGPU software	scheduler capabilities.

        Renamed Clock Throttle	Reasons	to Clock Event Reasons.

   Changes between nvidia-smi v520 Update and v525

        Added support to query	and set	counter	collection unit	stream state.

   Changes between nvidia-smi v470 Update and v510

        Add new 'Reserved' memory reporting to	the FB memory output

   Changes between nvidia-smi v465 Update and v470

        Added support to query	power hint

   Changes between nvidia-smi v460 Update and v465

        Removed support for -acp,--application-clock-permissions option

   Changes between nvidia-smi v450 Update and v460

        Add option to specify placement when creating a MIG GPU instance.

        Added support to query	and control boost slider

   Changes between nvidia-smi v445 Update and v450

        Added --lock-memory-clock and --reset-memory-clock command to lock to
	 closest min/max Memory	clock provided and  ability  to	 reset	Memory
	 clock

        Allow fan speeds greater than 100% to be reported

        Added topo support to display NUMA node affinity for GPU devices

        Added support to create MIG instances using profile names

        Added support to create the default compute instance while creating a
	 GPU instance

        Added support to query	and disable MIG	mode on	Windows

        Removed support of GPU	reset(-r) command on MIG enabled vGPU guests

   Changes between nvidia-smi v418 Update and v445

        Added support for Multi Instance GPU (MIG)

        Added	support	to individually	reset NVLink-capable GPUs based	on the
	 NVIDIA	Ampere architecture

   Changes between nvidia-smi v361 Update and v418

        Support for Volta and Turing architectures,  bug  fixes,  performance
	 improvements, and new features

   Changes between nvidia-smi v352 Update and v361

        Added	nvlink	support	 to  expose the	publicly available NVLINK NVML
	 APIs

        Added clocks sub-command with synchronized boost support

        Updated nvidia-smi stats to report GPU	temperature metric

        Updated nvidia-smi dmon to support PCIe throughput

        Updated nvidia-smi daemon/replay to support PCIe throughput

        Updated nvidia-smi dmon, daemon and replay  to	 support  PCIe	Replay
	 Errors

        Added GPU part	numbers	in nvidia-smi -q

        Removed support for exclusive thread compute mode

        Added	Video  (encoder/decode)	 clocks	 to  the Clocks	and Max	Clocks
	 display of nvidia-smi -q

        Added memory temperature output to nvidia-smi dmon

        Added --lock-gpu-clock	 and  --reset-gpu-clock	 command  to  lock  to
	 closest min/max GPU clock provided and	reset clock

        Added --cuda-clocks to	override or restore default CUDA clocks

   Changes between nvidia-smi v346 Update and v352

        Added topo support to display affinities per GPU

        Added topo support to display neighboring GPUs	for a given level

        Added topo support to show pathway between two	given GPUs

        Added	'nvidia-smi  pmon'  command-line  for  process	monitoring  in
	 scrolling format

        Added '--debug' option	to produce an encrypted	debug log for  use  in
	 submission of bugs back to NVIDIA

        Fixed reporting of Used/Free memory under Windows WDDM	mode

        The   accounting  stats  is  updated  to  include  both  running  and
	 terminated processes.	The  execution	time  of  running  process  is
	 reported  as  0  and  updated	to  actual  value  when	the process is
	 terminated.

   Changes between nvidia-smi v340 Update and v346

        Added reporting of PCIe replay	counters

        Added support for reporting Graphics processes	via nvidia-smi

        Added reporting of PCIe utilization

        Added dmon command-line for device monitoring in scrolling format

        Added daemon command-line to run in background	and monitor devices as
	 a daemon process. Generates dated log files at	/var/log/nvstats/

        Added replay command-line to replay/extract the stat files  generated
	 by the	daemon tool

   Changes between nvidia-smi v331 Update and v340

        Added reporting of temperature	threshold information.

        Added reporting of brand information (e.g. Tesla, Quadro, etc.)

        Added support for K40d	and K80.

        Added	reporting of max, min and avg for samples (power, utilization,
	 clock	 changes).   Example	commandline:	nvidia-smi    -q    -d
	 power,utilization, clock

        Added nvidia-smi stats	interface to collect statistics	such as	power,
	 utilization, clock changes, xid events	and perf capping counters with
	 a  notion  of	time  attached	to  each  sample. Example commandline:
	 nvidia-smi stats

        Added support for collectively	reporting metrics  on  more  than  one
	 GPU.  Used with comma separated with '-i' option. Example: nvidia-smi
	 -i 0,1,2

        Added support for displaying the GPU encoder and decoder utilizations

        Added	nvidia-smi   topo   interface	to   display   the   GPUDirect
	 communication matrix (EXPERIMENTAL)

        Added support for displayed the GPU board ID and whether or not it is
	 a multiGPU board

        Removed user-defined throttle reason from XML output

   Changes between nvidia-smi v5.319 Update and	v331

        Added reporting of minor number.

        Added reporting BAR1 memory size.

        Added reporting of bridge chip	firmware.

   Changes between nvidia-smi v4.319 Production	and v4.319 Update

        Added	 new   --applications-clocks-permission	  switch   to	change
	 permission  requirements  for	setting	 and  resetting	  applications
	 clocks.

   Changes between nvidia-smi v4.304 and v4.319	Production

        Added	reporting of Display Active state and updated documentation to
	 clarify how it	differs	from Display Mode and Display Active state

        For consistency on multi-GPU boards  nvidia-smi  -L  always  displays
	 UUID instead of serial	number

        Added	machine	 readable  selective  reporting.  See  SELECTIVE QUERY
	 OPTIONS section of nvidia-smi -h

        Added queries for  page  retirement  information.  See	 --help-query-
	 retired-pages and -d PAGE_RETIREMENT

        Renamed  Clock	 Throttle  Reason  User	Defined	Clocks to Applications
	 Clocks	Setting

        On error, return codes	have distinct non zero values for  each	 error
	 class.	See RETURN VALUE section

        nvidia-smi  -i	 can now query information from	healthy	GPU when there
	 is a problem with other GPU in	the system

        All messages that point to a problem with a GPU print pci bus id of a
	 GPU at	fault

        New flag --loop-ms for	querying information at	higher rates than once
	 a second (can have negative impact on system performance)

        Added queries for accounting  procsses.  See  --help-query-accounted-
	 apps and -d ACCOUNTING

        Added the enforced power limit	to the query output

   Changes between nvidia-smi v4.304 RC	and v4.304 Production

        Added reporting of GPU	Operation Mode (GOM)

        Added new --gom switch	to set GPU Operation Mode

   Changes between nvidia-smi v3.295 and v4.304	RC

        Reformatted  non-verbose output due to	user feedback. Removed pending
	 information from table.

        Print out helpful message  if	initialization	fails  due  to	kernel
	 module	not receiving interrupts

        Better	 error handling	when NVML shared library is not	present	in the
	 system

        Added new --applications-clocks switch

        Added new filter to --display switch. Run with	-d SUPPORTED_CLOCKS to
	 list possible clocks on a GPU

        When reporting	free memory, calculate it from the rounded  total  and
	 used memory so	that values add	up

        Added	reporting  of  power  management limit constraints and default
	 limit

        Added new --power-limit switch

        Added reporting of texture memory ECC errors

        Added reporting of Clock Throttle Reasons

   Changes between nvidia-smi v2.285 and v3.295

        Clearer error reporting for running commands (like  changing  compute
	 mode)

        When running commands on multiple GPUs	at once	N/A errors are treated
	 as warnings.

        nvidia-smi -i now also	supports UUID

        UUID  format  changed	to  match  UUID	 standard  and	will  report a
	 different value.

   Changes between nvidia-smi v2.0 and v2.285

        Report	VBIOS version.

        Added -d/--display flag to filter parts of data

        Added reporting of PCI	Sub System ID

        Updated docs to indicate we support M2075 and C2075

        Report	HIC HWBC firmware version with -u switch

        Report	max(P0)	clocks next to current clocks

        Added --dtd flag to print the device or unit DTD

        Added message when NVIDIA driver is not running

        Added reporting of PCIe link generation (max and current),  and  link
	 width (max and	current).

        Getting pending driver	model works on non-admin

        Added support for running nvidia-smi on Windows Guest accounts

        Running nvidia-smi without -q command will output non verbose version
	 of -q instead of help

        Fixed parsing of -l/--loop= argument (default value, 0, to big	value)

        Changed format	of pciBusId (to	XXXX:XX:XX.X - this change was visible
	 in 280)

        Parsing  of  busId  for  -i command is	less restrictive. You can pass
	 0:2:0.0 or 0000:02:00 and other variations

        Changed versioning scheme to also include 'driver version'

        XML format always conforms to DTD, even when error conditions occur

        Added support for single and double bit ECC  events  and  XID	errors
	 (enabled by default with -l flag disabled for -x flag)

        Added device reset -r --gpu-reset flags

        Added listing of compute running processes

        Renamed  power	 state to performance state. Deprecated	support	exists
	 in XML	output only.

        Updated DTD version number to 2.0 to match the	updated	XML output

SEE ALSO
       On     Linux,	 the	 driver	    README     is     installed	    as
       /usr/share/doc/NVIDIA_GLX-1.0/README.txt

AUTHOR
       NVIDIA Corporation

COPYRIGHT
       Copyright 2011-2025 NVIDIA Corporation

Version	nvidia-smi 590.48	Mon Dec	8 2025			 nvidia-smi(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=nvidia-smi&sektion=1&manpath=FreeBSD+15.0-RELEASE+and+Ports.quarterly>

home | help