Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
VOTEQUORUM(5)	  Corosync Cluster Engine Programmer's Manual	 VOTEQUORUM(5)

NAME
       votequorum - Votequorum Configuration Overview

OVERVIEW
       The  votequorum	service	 is part of the	corosync project. This service
       can be optionally loaded	into the nodes of a corosync cluster to	 avoid
       split-brain  situations.	  It does this by having a number of votes as-
       signed to each system in	the cluster and	ensuring that only when	a  ma-
       jority of the votes are present,	cluster	operations are allowed to pro-
       ceed.   The  service  must  be  loaded into all nodes or	none. If it is
       loaded into a subset of	cluster	 nodes	the  results  will  be	unpre-
       dictable.

       The  following  corosync.conf  extract  will  enable votequorum service
       within corosync:

       quorum {
	   provider: corosync_votequorum
       }

       votequorum reads	its configuration from corosync.conf. Some values  can
       be  changed at runtime, others are only read at corosync	startup. It is
       very important that those values	are consistent across  all  the	 nodes
       participating  in  the  cluster	or  votequorum behavior	will be	unpre-
       dictable.

       votequorum requires an expected_votes value to function,	 this  can  be
       provided	 in  two ways.	The number of expected votes will be automati-
       cally calculated	when the nodelist { }  section	is  present  in	 coro-
       sync.conf or expected_votes can be specified in the quorum { } section.
       Lack  of	 both will disable votequorum. If both are present at the same
       time, the quorum.expected_votes value will override the one  calculated
       from the	nodelist.

       Example (no nodelist) of	an 8 node cluster (each	node has 1 vote):

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
       }

       Example (with nodelist) of a 3 node cluster (each node has 1 vote):

       quorum {
	   provider: corosync_votequorum
       }

       nodelist	{
	   node	{
	       ring0_addr: 192.168.1.1
	   }
	   node	{
	       ring0_addr: 192.168.1.2
	   }
	   node	{
	       ring0_addr: 192.168.1.3
	   }
       }

SPECIAL	FEATURES
       two_node: 1

       Enables two node	cluster	operations (default: 0).

       The  "two  node cluster"	is a use case that requires special considera-
       tion.  With a standard two node cluster,	each node with a single	 vote,
       there are 2 votes in the	cluster. Using the simple majority calculation
       (50%  of	 the  votes  +	1) to calculate	quorum,	the quorum would be 2.
       This means that the both	nodes would always have	to be  alive  for  the
       cluster to be quorate and operate.

       Enabling	two_node: 1, quorum is set artificially	to 1.

       Example configuration 1:

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 2
	   two_node: 1
       }

       Example configuration 2:

       quorum {
	   provider: corosync_votequorum
	   two_node: 1
       }

       nodelist	{
	   node	{
	       ring0_addr: 192.168.1.1
	   }
	   node	{
	       ring0_addr: 192.168.1.2
	   }
       }

       NOTES:  enabling	 two_node: 1 automatically enables wait_for_all. It is
       still possible to override wait_for_all by explicitly setting it	to  0.
       If more than 2 nodes join the cluster, the two_node option is automati-
       cally disabled.

       wait_for_all: 1

       Enables Wait For	All (WFA) feature (default: 0).

       The  general behaviour of votequorum is to switch a cluster from	inquo-
       rate to quorate as soon as possible. For	example, in an 8 node cluster,
       where every node	has 1 vote, expected_votes is set to 8 and  quorum  is
       (50%  +	1)  5. As soon as 5 (or	more) nodes are	visible	to each	other,
       the partition of	5 (or more) becomes quorate and	can start operating.

       When WFA	is enabled, the	cluster	will be	quorate	 for  the  first  time
       only after all nodes have been visible at least once at the same	time.

       This  feature  has  the	advantage of avoiding some startup race	condi-
       tions, with the cost that all nodes need	to be up at the	same  time  at
       least once before the cluster can operate.

       A  common  startup race condition based on the above example is that as
       soon as 5 nodes become quorate, with the	other 3	still offline, the re-
       maining 3 nodes will be fenced.

       It is very useful when combined with last_man_standing (see below).

       Example configuration:

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
	   wait_for_all: 1
       }

       last_man_standing: 1 / last_man_standing_window:	10000

       Enables	Last  Man  Standing  (LMS)  feature  (default:	0).    Tunable
       last_man_standing_window	(default: 10 seconds, expressed	in ms).

       The general behaviour of	votequorum is to set expected_votes and	quorum
       at  startup (unless modified by the user	at runtime, see	below) and use
       those values during the whole lifetime of the cluster.

       Using for example an 8 node cluster where each node  has	 1  vote,  ex-
       pected_votes is set to 8	and quorum to 5. This condition	allows a total
       failure	of 3 nodes. If a 4th node fails, the cluster becomes inquorate
       and it will stop	providing services.

       Enabling	 LMS  allows  the  cluster  to	dynamically  recalculate   ex-
       pected_votes  and  quorum under specific	circumstances. It is essential
       to enable WFA when using	LMS in High Availability clusters.

       Using the above 8 node cluster example, with LMS	 enabled  the  cluster
       can  retain quorum and continue operating by losing, in a cascade fash-
       ion, up to 6 nodes with only 2 remaining	active.

       Example chain of	events:
       1) cluster is fully operational with 8 nodes.
	  (expected_votes: 8 quorum: 5)

       2) 3 nodes die, cluster is quorate with 5 nodes.

       3) after	last_man_standing_window timer expires,
	  expected_votes and quorum are	recalculated.
	  (expected_votes: 5 quorum: 3)

       4) at this point, 2 more	nodes can die and
	  cluster will still be	quorate	with 3.

       5) once again, after last_man_standing_window
	  timer	expires	expected_votes and quorum are
	  recalculated.
	  (expected_votes: 3 quorum: 2)

       6) at this point, 1 more	node can die and
	  cluster will still be	quorate	with 2.

       7) one more last_man_standing_window timer
	  (expected_votes: 2 quorum: 2)

       NOTES: In order for the cluster to downgrade automatically from 2 nodes
       to a 1 node cluster, the	auto_tie_breaker feature must also be  enabled
       (see  below).  If auto_tie_breaker is not enabled, and one more failure
       occurs, the remaining node will not be quorate. LMS does	not work  with
       asymmetric voting schemes, each node must vote 1. LMS is	also incompat-
       ible  with  quorum  devices, if last_man_standing is specified in coro-
       sync.conf then the quorum device	will be	disabled.

       Example configuration 1:

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
	   last_man_standing: 1
       }

       Example configuration 2 (increase timeout to 20 seconds):

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
	   last_man_standing: 1
	   last_man_standing_window: 20000
       }

       auto_tie_breaker: 1

       Enables Auto Tie	Breaker	(ATB) feature (default:	0).

       The general behaviour of	votequorum allows a simultaneous node  failure
       up to 50% - 1 node, assuming each node has 1 vote.

       When  ATB  is  enabled,	the  cluster can suffer	up to 50% of the nodes
       failing at the same time, in a deterministic fashion.  By  default  the
       cluster	partition,  or the set of nodes	that are still in contact with
       the node	that has the lowest nodeid  will  remain  quorate.  The	 other
       nodes will be inquorate.	This behaviour can be changed by also specify-
       ing

       auto_tie_breaker_node: lowest|highest|<list of node IDs>

       `lowest'	 is  the  default, `highest' is	similar	in that	if the current
       set of nodes contains the highest nodeid	then it	will  remain  quorate.
       Alternatively it	is possible to specify a particular node ID or list of
       node  IDs  that	will be	required to maintain quorum. If	a (space-sepa-
       rated) list is given, the nodes are evaluated in	order, so if the first
       node is present then it will be used to determine  the  quorate	parti-
       tion, if	that node is not in either half	(ie was	not in the cluster be-
       fore  the split)	then the second	node ID	will be	checked	for and	so on.
       ATB is incompatible with	quorum devices - if auto_tie_breaker is	speci-
       fied in corosync.conf then the quorum device will be disabled.

       Example configuration 1:

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
	   auto_tie_breaker: 1
	   auto_tie_breaker_node: lowest
       }

       Example configuration 2:
       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
	   auto_tie_breaker: 1
	   auto_tie_breaker_node: 1 3 5
       }

       allow_downscale:	1

       Enables allow downscale (AD) feature (default: 0).

       THIS FEATURE IS INCOMPLETE AND CURRENTLY	UNSUPPORTED.

       The general behaviour of	votequorum is to never decrease	expected votes
       or quorum.

       When AD is enabled, both	expected votes	and  quorum  are  recalculated
       when  a node leaves the cluster in a clean state	(normal	corosync shut-
       down process) down to configured	expected_votes.

       Example use case:

       1) N node cluster (where	N is any value higher than 3)

       2) expected_votes set to	3 in corosync.conf

       3) only 3 nodes are running

       4) admin	requires to increase processing	power and adds 10 nodes

       5) internal expected_votes is automatically set to 13

       6) minimum expected_votes is 3 (from configuration)

       - up to this point this is standard votequorum behavior -

       7) once the work	is done, admin wants to	remove nodes from the cluster

       8) using	an ordered shutdown the	admin can reduce the cluster size
	  automatically	back to	3, but not below 3, where normal quorum
	  operation will work as usual.

       Example configuration:

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 3
	   allow_downscale: 1
       }
       allow_downscale implicitly enabled EVT (see below).

       expected_votes_tracking:	1

       Enables Expected	Votes Tracking (EVT) feature (default: 0).

       Expected	Votes Tracking stores the highest-seen value of	expected votes
       on disk and uses	that as	the minimum value for expected	votes  in  the
       absence of any higher authority (eg a current quorate cluster). This is
       useful for when a group of nodes	becomes	detached from the main cluster
       and  after  a  restart could have enough	votes to provide quorum, which
       can happen after	using allow_downscale.

       Note that even if the in-memory version of expected_votes  is  reduced,
       eg  by  removing	 nodes	or using corosync-quorumtool, the stored value
       will still be the highest value seen - it never gets reduced.

       The value is held in the	file ev_tracking (stored in the	directory con-
       figured in system.state_dir or /var/lib/corosync/ when unset) which can
       be deleted if you really	do need	to reduce the expected votes  for  any
       reason, like the	node has been moved to a different cluster.

VARIOUS	NOTES
       * WFA / LMS / ATB / AD can be used combined together.

       *  In  order  to	 change	the default votes for a	node there are two op-
       tions:

       1) nodelist:

       nodelist	{
	   node	{
	       ring0_addr: 192.168.1.1
	       quorum_votes: 3
	   }
	   ....
       }

       2) quorum section (deprecated):

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 2
	   votes: 2
       }

       In the event that both nodelist and quorum { votes: } are defined,  the
       value from the nodelist will be used.

       *  Only votes, quorum_votes, expected_votes and two_node	can be changed
       at runtime. Everything else requires a cluster restart.

BUGS
       No known	bugs at	the time of writing. The authors are from  outerspace.
       Deal with it.

SEE ALSO
       corosync(8),  corosync.conf(5),	corosync-quorumtool(8),	 corosync-qde-
       vice(8),	votequorum_overview(3)

corosync Man Page		  2018-12-14			 VOTEQUORUM(5)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=votequorum&sektion=5&manpath=FreeBSD+Ports+14.3.quarterly>

home | help