Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
xen-tscmode(7)			      Xen			xen-tscmode(7)

NAME
       xen-tscmode - Xen TSC (time stamp counter) and timekeeping discussion

OVERVIEW
       As of Xen 4.0, a	new config option called tsc_mode may be specified for
       each domain.  The default for tsc_mode handles the vast majority	of
       hardware	and software environments.  This document is targeted for Xen
       users and administrators	that may need to select	a non-default
       tsc_mode.

       Proper selection	of tsc_mode depends on an understanding	not only of
       the guest operating system (OS),	but also of the	application set	that
       will ever run on	this guest OS.	This is	because	tsc_mode applies
       equally to both the OS and ALL apps that	are running on this domain,
       now or in the future.

       Key questions to	be answered for	the OS and/or each application are:

          Does	the OS/app use the rdtsc instruction at	all?  (We will explain
	   below how to	determine this.)

          At  what  frequency is the rdtsc instruction	executed by either the
	   OS or any running apps?  If the  sum	 exceeds  about	 10,000	 rdtsc
	   instructions	 per  second  per processor, we	call this a "high-TSC-
	   frequency"  OS/app/environment.   (This  is	relatively  rare,  and
	   developers of OS's and apps that are	high-TSC-frequency are usually
	   aware of it.)

          If  the  OS/app does	use rdtsc, will	it behave incorrectly if "time
	   goes	backwards" or if the frequency of the  TSC  suddenly  changes?
	   If  so,  we	call this a "TSC-sensitive" app	or OS; otherwise it is
	   "TSC-resilient".

       This last is the	US$64,000 question as it may be	 very  difficult  (or,
       for  legacy  apps,  even	 impossible)  to  predict all possible failure
       cases.  As a result, unless proven otherwise, any app that  uses	 rdtsc
       must  be	 assumed  to be	TSC-sensitive and, as we will see, this	is the
       default starting	in Xen 4.0.

       Xen's new tsc_mode parameter determines the circumstances  under	 which
       the  family  of rdtsc instructions are executed "natively" vs emulated.
       Roughly speaking, native	means rdtsc is	fast  but  TSC-sensitive  apps
       may, under unpredictable	circumstances, run incorrectly;	emulated means
       there is	some performance degradation (unobservable in most cases), but
       TSC-sensitive  apps  will  always run correctly.	 Prior to Xen 4.0, all
       rdtsc instructions  were	 native:  "fast	 but  potentially  incorrect."
       Starting	 at  Xen  4.0,	the default is that all	rdtsc instructions are
       "correct	but potentially	slow".	The tsc_mode parameter in 4.0 provides
       an intelligent default but allows system	administrator's	to adjust  how
       rdtsc instructions are executed differently for different domains.

       The non-default choices for tsc_mode are:

          tsc_mode=1 (always emulate).

	   All	rdtsc  instructions are	emulated; this is the best choice when
	   TSC-sensitive apps are running and it is  necessary	to  understand
	   worst-case	performance   degradation   for	 a  specific  hardware
	   environment.

          tsc_mode=2 (never emulate).

	   This	is the same as prior to	Xen 4.0	and is the best	choice	if  it
	   is  certain	that all apps running in this VM are TSC-resilient and
	   highest performance is required.

          tsc_mode=3 (PVRDTSCP).

	   This	mode has been removed.

       If tsc_mode is left  unspecified	 (or  set  to  tsc_mode=0),  a	hybrid
       algorithm  is  utilized	to ensure correctness while providing the best
       performance possible given:

          the requirement of correctness,

          the underlying hardware, and

          whether or not the VM has been saved/restored/migrated

       To understand this in more detail, the rest of this  document  must  be
       read.

DETERMINING RDTSC FREQUENCY
       To  determine the frequency of rdtsc instructions that are emulated, an
       "xl" command can	be used	by a privileged	user of	domain0.  The command:

	   # xl	debug-key s; xl	dmesg |	tail

       provides	information about TSC usage in each domain where TSC emulation
       is currently enabled.

TSC HISTORY
       To understand tsc_mode completely, some background on TSC is required:

       The x86 "timestamp counter", or TSC,  is	 a  64-bit  register  on  each
       processor  that increases monotonically.	 Historically, TSC incremented
       every processor cycle, but on recent  processors,  it  increases	 at  a
       constant	 rate even if the processor changes frequency (for example, to
       reduce processor	power usage).  TSC is known by x86 programmers as  the
       fastest,	 highest-precision measurement of the passage of time so it is
       often used as a foundation for performance monitoring.  And since it is
       guaranteed  to  be  monotonically  increasing  and,  at	64  bits,   is
       guaranteed to not wraparound within 10 years, it	is sometimes used as a
       random  number  or  a  unique  sequence	identifier,  such  as to stamp
       transactions so they can	be replayed in a specific order.

       On  most	 older	SMP  and  early	 multi-core  machines,	TSC  was   not
       synchronized  between  processors.  Thus	if an application were to read
       the TSC on  one	processor,  then  was  moved  by  the  OS  to  another
       processor,  then	 read  TSC  again,  it	might  appear  that "time went
       backwards".   This  loss	 of  monotonicity  resulted  in	 many  obscure
       application   bugs   when   TSC-sensitive   apps	 were  ported  from  a
       uniprocessor to an SMP environment; as a	result,	many  applications  --
       especially  in the Windows world	-- removed their dependency on TSC and
       replaced	their timestamp	needs with OS-specific functions, losing  both
       performance  and	 precision.  On	some more recent generations of	multi-
       core machines, especially multi-socket multi-core machines, the TSC was
       synchronized but	if one	processor  were	 to  enter  certain  low-power
       states,	its TSC	would stop, destroying the synchrony and again causing
       obscure	bugs.	This  reinforced  decisions  to	 avoid	use   of   TSC
       altogether.   On	 the  most  recent generations of multi-core machines,
       however,	synchronization	is provided across all processors in all power
       states,	even  on  multi-socket	machines,  and	provide	 a  flag  that
       indicates  that	TSC is synchronized and	"invariant".  Thus TSC is once
       again useful for	applications, and even	newer  operating  systems  are
       using  and  depending  upon  TSC	 for  critical	timekeeping tasks when
       running on these	recent machines.

       We will refer to	hardware that ensures TSC  is  both  synchronized  and
       invariant  as  "TSC-safe"  and any hardware on which TSC	is not (or may
       not remain) synchronized	as "TSC-unsafe".

       As a result of TSC's sordid history, two	classes	 of  applications  use
       TSC:  old  applications	designed  for  single processors, and the most
       recent  enterprise  applications	 which	require	 high-frequency	 high-
       precision timestamping.

       We  will	 refer	to  apps  that	might break if running on a TSC-unsafe
       machine as "TSC-sensitive"; apps	that don't use TSC, or do use TSC  but
       use  it	in  a  way  that  monotonicity	and  frequency	invariance are
       unimportant as "TSC-resilient".

       The emergence of	virtualization once again  complicates	the  usage  of
       TSC.   When  features  such  as	save/restore  or  live	migration  are
       employed, a guest OS and	all its	currently running applications may  be
       invisibly transported to	an entirely different physical machine.	 While
       TSC  may	 be  "safe"  on	 one  machine, it is essentially impossible to
       precisely synchronize TSC across	a  data	 center	 or  even  a  pool  of
       machines.  As a result, when run	in a virtualized environment, rare and
       obscure	"time  going  backwards"  problems  might once again occur for
       those TSC-sensitive applications.  Worse, if a guest OS moves from, for
       example,	a 3GHz machine to a 1.5GHz machine, attempts by	an  OS/app  to
       measure	time  intervals	 with TSC may without notice be	incorrect by a
       factor of two.

       The rdtsc (read timestamp counter) instruction is used to read the  TSC
       register.   The	rdtscp	instruction  is	 a  variant of rdtsc on	recent
       processors.  We	refer  to  these  together  as	the  rdtsc  family  of
       instructions,  or  just	"rdtsc".  Instructions in the rdtsc family are
       non-privileged, but privileged software may set a cpuid	bit  to	 cause
       all  rdtsc  family  instructions	to trap.  This trap can	be detected by
       Xen, which can then transparently "emulate" the results	of  the	 rdtsc
       instruction  and	 return	 control  to  the  code	 following  the	 rdtsc
       instruction.

       To provide a "safe" TSC,	i.e. to	ensure both  TSC  monotonicity	and  a
       fixed  rate,  Xen  provides  rdtsc emulation whenever necessary or when
       explicitly specified by a per-VM	configuration option.	TSC  emulation
       is  relatively  slow  --	 roughly  15-20	 times	slower	than the rdtsc
       instruction when	executed natively.  However,  except  when  an	OS  or
       application  uses  the rdtsc instruction	at a high frequency (e.g. more
       than about 10,000 times per second  per	processor),  this  performance
       degradation  is	not  noticeable	 (i.e.	<0.3%).	 And, TSC emulation is
       nearly  always  faster  than  OS-provided  alternatives	(e.g.  Linux's
       gettimeofday).	For environments where it is certain that all apps are
       TSC-resilient (e.g.   "TSC-safeness"  is	 not  necessary)  and  highest
       performance  is	a  requirement,	TSC emulation may be entirely disabled
       (tsc_mode==2).

       The default mode	(tsc_mode==0) checks TSC-safeness  of  the  underlying
       hardware	 on which the virtual machine is launched.  If it is TSC-safe,
       rdtsc will execute at hardware speed; if	 it  is	 not,  rdtsc  will  be
       emulated.   Once	 a  virtual  machine  is  save/restored	 or  migrated,
       however,	there are two possibilities: TSC remains native	IF the	source
       physical	 machine  and  target  physical	 machine  have	the  same  TSC
       frequency  (or,	for  HVM/PVH  guests,  if  TSC	scaling	  support   is
       available);  else  TSC  is  emulated.   Note that, though emulated, the
       "apparent" TSC frequency	will be	 the  TSC  frequency  of  the  initial
       physical	machine, even after migration.

       Finally,	 tsc_mode==1  always  enables TSC emulation, regardless	of the
       underlying physical hardware. The "apparent" TSC	frequency will be  the
       TSC  frequency  of  the initial physical	machine, even after migration.
       This mode is useful to measure any performance degradation  that	 might
       be  encountered	by  a  tsc_mode==0 domain after	migration occurs, or a
       tsc_mode==3 domain when it is running on	TSC-unsafe hardware.

       Note that while Xen ensures that	 an  emulated  TSC  is	"safe"	across
       migration,  it  does  not  ensure that it continues to tick at the same
       rate during the actual migration.  As an	oversimplified example,	if TSC
       is ticking once per second in a guest, and the guest is saved when  the
       TSC  is 1000, then restored 30 seconds later, TSC is only guaranteed to
       be greater than or equal	to 1001, not precisely 1030.  This has some OS
       implications as will be seen in the next	section.

TSC INVARIANT BIT and NO_MIGRATE
       Related to TSC emulation, the "TSC Invariant"  bit  is  architecturally
       defined	in a cpuid bit on the most recent x86 processors.  If set, TSC
       invariance ensures that the TSC is "safe", that is it will increment at
       a constant rate regardless of power events, will	be synchronized	across
       all processors, and was properly	initialized to zero on all  processors
       at boot-time by system hardware/BIOS.  As long as system	software never
       writes to TSC, TSC will be safe and continuously	incremented at a fixed
       rate and	thus can be used as a system "clocksource".

       This  bit is used by some OS's, and specifically	by Linux starting with
       version 2.6.30(?),  to  select  TSC  as	a  system  clocksource.	  Once
       selected,  TSC  remains	the  Linux  system clocksource unless manually
       overridden.  In a virtualized environment, since	it is not possible  to
       synchronize  TSC	 across	 all  the machines in a	pool or	data center, a
       migration may "break" TSC as a usable clocksource; while	time will  not
       go  backwards,  it  may	not  track wallclock time well enough to avoid
       certain time-sensitive consequences.  As	a result, Xen can only	expose
       the  TSC	 Invariant  bit	to a guest OS if it is certain that the	domain
       will never migrate.  As of Xen 4.0, the "no_migrate=1" VM configuration
       option may  be  specified  to  disable  migration.   If	no_migrate  is
       selected	 and  the  VM  is  running  on	a  physical  machine with "TSC
       Invariant",  Linux  2.6.30+  will  safely  use  TSC   as	  the	system
       clocksource.   But,  attempts  to  migrate or, once saved, restore this
       domain will fail.

       There is	another	cpuid-related complication: The	x86 cpuid  instruction
       is  non-privileged.   HVM  domains  are	configured to always trap this
       instruction to Xen, where Xen can "filter" the result.  In a PV OS, all
       cpuid instructions have been replaced by	a  paravirtualized  equivalent
       of the cpuid instruction	("pvcpuid") and	also trap to Xen.  But apps in
       a  PV guest that	use a cpuid instruction	execute	it directly, without a
       trap to Xen.  As	a result, an app may directly examine the physical TSC
       Invariant cpuid bit and make decisions based on that bit.

HARDWARE TSC SCALING
       Intel VMX TSC scaling and AMD SVM TSC ratio allow the guest TSC read by
       guest rdtsc/p increasing	in a different frequency  than	the  host  TSC
       frequency.

       If  a  HVM  container  in default TSC mode (tsc_mode=0) is created on a
       host that provides constant TSC,	its guest TSC frequency	 will  be  the
       same as the host. If it is later	migrated to another host that provides
       constant	 TSC and supports Intel	VMX TSC	scaling/AMD SVM	TSC ratio, its
       guest TSC frequency will	be the same before and after migration.

       For above HVM container in default  TSC	mode  (tsc_mode=0),  if	 above
       hosts  support rdtscp, both guest rdtsc and rdtscp instructions will be
       executed	natively before	and after migration.

AUTHORS
       Dan Magenheimer <dan.magenheimer@oracle.com>

4.19.2-pre			  2025-02-17			xen-tscmode(7)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=xen-tscmode&sektion=7&manpath=FreeBSD+Ports+14.3.quarterly>

home | help