Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
SAM_OVERVIEW(3)	  Corosync Cluster Engine Programmer's Manual  SAM_OVERVIEW(3)

NAME
       sam_overview - Overview of the Simple Availability Manager

OVERVIEW
       The  SAM	 library provide a tool	to check the health of an application.
       The main	purpose	of SAM is to restart a local process when it fails  to
       respond to a healthcheck	request	in a configured	time interval.

       During  sam_initialize(3),  a  duplicate	copy of	the process is created
       using the fork(3) system	call.  This duplicate  process	copy  contains
       the  logic for executing	the SAM	server.	 The SAM server	is responsible
       for requesting healthchecks from	the active  process,  and  controlling
       the  lifecycle  of  the	active	process	 when it fails.	 If the	active
       process fails to	respond	to the healthcheck request  sent  by  the  SAM
       server, it will be sent a user configurable signal (default SIGTERM) to
       request shutdown	of the application.  After a configured	time interval,
       the  process  will  be  forcibly	killed by being	sent a SIGKILL signal.
       Once the	active process terminates, the SAM server will	create	a  new
       active process.

       The Simple Availability Manager is meant	to be used in conjunction with
       the  cpg	 service.   Used  together,  it	 is  possible to restart a cpg
       process that fails healthchecking during	operation.

       The main	features of SAM	include:

	      	 A configurable	recovery policy.

	      	 A configurable	time interval for health check operations.

	      	 A notification	via signal before recovery action is taken.

	      	 A mechanism to	indicate to  the  application  the  number  of
		 times an active process has been created by the SAM server.

	      	 Both  application  driven  health  checking  and event	driven
		 health	checking.

Initializing SAM
       The SAM library is initialized by sam_initialize(3).   sam_initalize(3)
       may only	be called once per process.  Calling it	more then once has un-
       defined results and is not recommended or tested.

Setting	warning	callback
       User  configurable  signal (default SIGTERM) is sent to the application
       when a recovery action is planned.  The application can	use  the  sig-
       nal(3) system call to monitor for this signal.

       There  are  no  special constraints on what SAM apis may	be called in a
       warning callback.  After	time_interval expires,	a  SIGKILL  signal  is
       sent to the active process to force its termination.

Registering the	active process
       The  active  process is registered with SAM by calling sam_register(3).
       This function should only be called one time in a process.  After a re-
       covery action is	taken, the new active process will begin execution  at
       the next	line of	code in	a user process after sam_register(3).

Enabling event driven healthchecking
       Two types of healthchecking are available to the	user.  The first model
       is one where the	user application healthchecks during its normal	opera-
       tion.   It  is never requested to healtcheck, and if the	active process
       doesn't	respond	 within	 the  time  interval,  the  process  will   be
       restarted.

       A more useful mechanism for healthchecking is event driven healthcheck-
       ing.  Because this model	is directed by the SAM server, It isn't	neces-
       sary  to	 guess	or  add	 timers	 to  the  active  process  to signal a
       healthcheck operation is	successful.  To	use event driven  healthcheck-
       ing, the	sam_hc_callback_register(3) function should be executed.

Quorum integration
       SAM  has	 special  policies  (SAM_RECOVERY_POLICY_QUIT  and  SAM_RECOV-
       ERY_POLICY_RESTART) for integration with	quorum service.	This  policies
       changes SAM behaviour in	two aspects.

	      	 Call of sam_start(3) blocks until corosync becomes quorate

	      	 User selected recovery	action is taken	immediately after lost
		 of quorum.

Storing	user data
       Sometimes  there	is need	to store some data, which survives between in-
       stances.	 One can in such case use files, databases, ...	or  much  sim-
       pler  in	 memory	 solution presented by sam_data_store(3), sam_data_re-
       store(3)	and sam_data_getsize(3)	functions.

Confdb integration
       SAM has policy flag used	 for  confdb  system  integration  (SAM_RECOV-
       ERY_POLICY_CONFDB).   If	 process  is  registered  with	this flag, new
       confdb object PROCESS_NAME:PID is created with following	keys:

	      	 recovery - will be quit or restart depending on policy

	      	 poll_period - period of health	checking in milliseconds

	      	 last_updated -	Timestamp (in nanoseconds) of the last	health
		 check.

	      	 state	- state	of process (can	be one of registered, started,
		 failed, waiting for quorum)

       Object is automatically deleted if process exits	 with  stopped	health
       checking.

       Confdb  integration  with corosync watchdog can be used in implicit and
       explicit	way.

       Implicit	way is achieved	by setting recovery policy  to	QUIT  and  let
       process exit with started health	checking.  If this happened, object is
       not deleted and corosync	watchdog will take required action.

       Explicit	 way  is  useful  for situations, when developer can deal with
       some non-fatal fall of application.  This mode is achieved  by  setting
       policy to RESTART and using SAM same as without Confdb integration.  If
       real fail is needed (like too many restarts at all, per/sec, ...), it's
       possible	 to  use sam_mark_failed(3) and	let corosync watchdog take re-
       quired action.

BUGS
SEE ALSO
       sam_initialize(3),      sam_data_getsize(3),	  sam_data_restore(3),
       sam_data_store(3),  sam_finalize(3),  sam_mark_failed(3), sam_start(3),
       sam_stop(3), sam_register(3),  sam_warn_signal_set(3),  sam_hc_send(3),
       sam_hc_callback_register(3)

corosync Man Page		  21/05/2010		       SAM_OVERVIEW(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=sam_overview&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help