FreeBSD Manual Pages

home | help
fi_trigger(3)		       Libfabric v1.15.1		 fi_trigger(3)

NAME
       fi_trigger - Triggered operations

SYNOPSIS
	      #include <rdma/fi_trigger.h>

DESCRIPTION
       Triggered  operations allow an application to queue a data transfer re-
       quest that is deferred until a specified	condition is met.   A  typical
       use  is	to  send a message only	after receiving	all input data.	 Trig-
       gered operations	can help reduce	 the  latency  needed  to  initiate  a
       transfer	 by removing the need to return	control	back to	an application
       prior to	the data transfer starting.

       An endpoint must	be created with	the FI_TRIGGER capability in order for
       triggered operations to be specified.  A	 triggered  operation  is  re-
       quested	by  specifying	the  FI_TRIGGER	flag as	part of	the operation.
       Such an endpoint	is referred to as a trigger-able endpoint.

       Any data	transfer operation is  potentially  trigger-able,  subject  to
       provider	constraints.  Trigger-able endpoints are initialized such that
       only  those interfaces supported	by the provider	which are trigger-able
       are available.

       Triggered operations require  that  applications	 use  struct  fi_trig-
       gered_context  as  their	 per  operation	 context  parameter, or	if the
       provider	requires the  FI_CONTEXT2  mode,  struct  fi_trigger_context2.
       The  use	 of  struct  fi_triggered_context[2]  replaces	struct fi_con-
       text[2],	 if  required  by  the	provider.   Although  struct  fi_trig-
       gered_context[2]	 is not	opaque to the application, the contents	of the
       structure may be	modified by the	provider once it has been submitted as
       an operation.   This  structure	has  similar  requirements  as	struct
       fi_context[2].	It  must  be  allocated	 by the	application and	remain
       valid until the corresponding operation completes  or  is  successfully
       canceled.

       Struct  fi_triggered_context[2]	is  used to specify the	condition that
       must be met before the triggered	data transfer is  initiated.   If  the
       condition  is  met when the request is made, then the data transfer may
       be initiated immediately.  The format of	struct fi_triggered_context[2]
       is described below.

	      struct fi_triggered_context {
		  enum fi_trigger_event	event_type;   /* trigger type */
		  union	{
		      struct fi_trigger_threshold threshold;
		      struct fi_trigger_xpu xpu;
		      void *internal[3]; /* reserved */
		  } trigger;
	      };

	      struct fi_triggered_context2 {
		  enum fi_trigger_event	event_type;   /* trigger type */
		  union	{
		      struct fi_trigger_threshold threshold;
		      struct fi_trigger_xpu xpu;
		      void *internal[7]; /* reserved */
		  } trigger;
	      };

       The triggered context indicates the type	of event assigned to the trig-
       ger, along with a union of trigger details that is based	on  the	 event
       type.

COMPLETION BASED TRIGGERS
       Completion  based  triggers defer a data	transfer until one or more re-
       lated data transfers complete.  For example, a send  operation  may  be
       deferred	 until a receive operation completes, indicating that the data
       to be transferred is now	available.

       The following trigger event related to completion  based	 transfers  is
       defined.

       FI_TRIGGER_THRESHOLD
	      This indicates that the data transfer operation will be deferred
	      until  an	event counter crosses an application specified thresh-
	      old value.  The threshold	is  specified  using  struct  fi_trig-
	      ger_threshold:

	      struct fi_trigger_threshold {
		  struct fid_cntr *cntr; /* event counter to check */
		  size_t threshold;	 /* threshold value */
	      };

       Threshold  operations  are triggered in the order of the	threshold val-
       ues.  This is true even if the counter increments by  a	value  greater
       than 1.	If two triggered operations have the same threshold, they will
       be triggered in the order in which they were submitted to the endpoint.

XPU TRIGGERS
       XPU  based  triggers  work  in  conjunction  with  heterogenous	memory
       (FI_HMEM	capability).  XPU triggers define a split execution model  for
       specifying  a  data  transfer  separately from initiating the transfer.
       Unlike completion triggers, the user controls the timing	 of  when  the
       transfer	starts by writing data into a trigger variable location.

       XPU  transfers allow the	requesting and triggering to occur on separate
       computational domains.  For example, a process running on the host  CPU
       can  setup a data transfer, with	a compute kernel running on a GPU sig-
       naling the start	of the transfer.  XPU refers to	a CPU, GPU,  FPGA,  or
       other acceleration device with some level of computational ability.

       Endpoints must be created with both the FI_TRIGGER and FI_XPU capabili-
       ties to use XPU triggers.  XPU triggered	enabled	endpoints only support
       XPU  triggered operations.  The behavior	of mixing XPU triggered	opera-
       tions with normal data transfers	or non-XPU triggered operations	is not
       defined by the API and subject to provider support and implementation.

       The use of  XPU	triggers  requires  coordination  between  the	fabric
       provider, application, and submitting XPU.  The result is that hardware
       implementation details need to be conveyed across the computational do-
       mains.  The XPU trigger API abstracts those details.  When submitting a
       XPU trigger operation, the user identifies the XPU where	the triggering
       will occur.  The	triggering XPU must match with the location of the lo-
       cal  memory  regions.  For example, if triggering will be done by a GPU
       kernel, the type	of GPU and its local identifier	are given.  As output,
       the fabric provider will	return a list of variables  and	 corresponding
       values.	 The XPU signals that the data transfer	is safe	to initiate by
       writing the given values	to the specified variable locations.  The num-
       ber of variables	and their sizes	are provider specific.

       XPU trigger operations are submitted using  the	FI_TRIGGER  flag  with
       struct  fi_triggered_context  or	 struct	 fi_triggered_context2,	as re-
       quired by the provider.	The trigger event_type is:

       FI_TRIGGER_XPU
	      Indicates	that the data transfer operation will be deferred  un-
	      til the user writes provider specified data to provider indicat-
	      ed  memory locations.  The user indicates	which device will ini-
	      tiate the	write.	The struct fi_trigger_xpu is  used  to	convey
	      both  input and output data regarding the	signaling of the trig-
	      ger.

	      struct fi_trigger_var {
		  enum fi_datatype datatype;
		  int count;
		  void *addr;
		  union	{
		      uint8_t val8;
		      uint16_t val16;
		      uint32_t val32;
		      uint64_t val64;
		      uint8_t *data;
		  } value;
	      };

	      struct fi_trigger_xpu {
		  int count;
		  enum fi_hmem_iface iface;
		  union	{
		      uint64_t reserved;
		      int cuda;
		      int ze;
		  } device;
		  struct fi_trigger_var	*var;
	      };

       On input	to a triggered operation, the iface field indicates the	 soft-
       ware  interface	that  will be used to write the	variables.  The	device
       union specifies the device identifier.  For valid iface and device val-
       ues, see	fi_mr(3).  The iface and device	must match with	the iface  and
       device  of  any	local HMEM memory regions.  Count should be set	to the
       number of fi_trigger_var	 structures  available,	 with  the  var	 field
       pointing	to an array of struct fi_trigger_var.  The user	is responsible
       for ensuring that there are sufficient fi_trigger_var structures	avail-
       able  and of an appropriate size.  The count and	size of	fi_trigger_var
       structures can be obtained by calling fi_getopt() on the	endpoint  with
       the FI_OPT_XPU_TRIGGER option.  See fi_endpoint(3) for details.

       Each  fi_trigger_var  structure referenced should have the datatype and
       count fields initialized	to the number  of  values  referenced  by  the
       struct  fi_trigger_val.	 If the	count is 1, one	of the val fields will
       be used to return the necessary data (val8, val16, etc.).  If  count  >
       1,  the	data  field  will return all necessary data used to signal the
       trigger.	 The data field	must reference a buffer	large enough  to  hold
       the returned bytes.

       On output, the provider will set	the fi_trigger_xpu count to the	number
       of  fi_trigger_var variables that must be signaled.  Count will be less
       than or equal to	the input value.  The provider	will  initialize  each
       valid  fi_trigger_var entry with	information needed to signal the trig-
       ger.  The datatype indicates the	size of	the data that must be written.
       Valid  datatype	values	are  FI_UINT8,	 FI_UINT16,   FI_UINT32,   and
       FI_UINT64.  For signal variables	<= 64 bits, the	count field will be 1.
       If  a  trigger  requires	 writing more than 64-bits, the	datatype field
       will be set to FI_UINT8,	with count set to the  number  of  bytes  that
       must  be	written.  The data that	must be	written	to signal the start of
       an operation is returned	through	either the value union val  fields  or
       data array.

       Users  signal  the  start of a transfer by writing the returned data to
       the given memory	address.  The write must occur from the	specified  in-
       put XPU location	(based on the iface and	device fields).	 If a transfer
       cannot  be initiated for	some reason, such as an	error occurring	before
       the transfer can	start, the triggered operation should be  canceled  to
       release	any allocated resources.  If multiple variables	are specified,
       they must be updated in order.

       Note that the provider will not modify the fi_trigger_xpu  or  fi_trig-
       ger_var structures after	returning from the data	transfer call.

       In  order  to  support  multiple	provider implementations, users	should
       trigger data transfer operations	in the same order that they are	queued
       and should serialize the	writing	of triggers that  reference  the  same
       endpoint.   Providers may return	the same trigger variable for multiple
       data transfer requests.

DEFERRED WORK QUEUES
       The following feature and description are enhancements to triggered op-
       eration support.

       The deferred work queue interface is designed as	 primitive  constructs
       that  can be used to implement application-level	collective operations.
       They are	a more advanced	form of	triggered operation.   They  allow  an
       application  to queue operations	to a deferred work queue that is asso-
       ciated with the domain.	Note that the deferred work queue is a concep-
       tual construct, rather than an  implementation  requirement.   Deferred
       work  requests  consist of three	main components: an event or condition
       that must first be met, an operation to perform,	and a completion noti-
       fication.

       Because deferred	work requests are posted directly to the domain,  they
       can  support a broader set of conditions	and operations.	 Deferred work
       requests	are submitted using struct fi_deferred_work.  That  structure,
       along  with  the	corresponding operation	structures (referenced through
       the op union) used to describe the work must remain valid until the op-
       eration completes or is canceled.  The format of	the deferred work  re-
       quest is	as follows:

	      struct fi_deferred_work {
		  struct fi_context2	context;

		  uint64_t		threshold;
		  struct fid_cntr	*triggering_cntr;
		  struct fid_cntr	*completion_cntr;

		  enum fi_trigger_op	op_type;

		  union	{
		      struct fi_op_msg		  *msg;
		      struct fi_op_tagged	  *tagged;
		      struct fi_op_rma		  *rma;
		      struct fi_op_atomic	  *atomic;
		      struct fi_op_fetch_atomic	  *fetch_atomic;
		      struct fi_op_compare_atomic *compare_atomic;
		      struct fi_op_cntr		  *cntr;
		  } op;
	      };

       Once a work request has been posted to the deferred work	queue, it will
       remain  on  the	queue until the	triggering counter (success plus error
       counter values) has reached the indicated threshold.  If	the triggering
       condition has already been met at the time the work request is  queued,
       the operation will be initiated immediately.

       On the completion of a deferred data transfer, the specified completion
       counter	will be	incremented by one.  Note that deferred	counter	opera-
       tions do	not update the completion counter; only	the counter  specified
       through	the fi_op_cntr is modified.  The completion_cntr field must be
       NULL for	counter	operations.

       Because deferred	work targets support of	collective communication oper-
       ations, posted work requests do not generate  any  completions  at  the
       endpoint	by default.  For example, completed operations are not written
       to  the	EP's  completion queue or update the EP	counter	(unless	the EP
       counter is explicitly referenced	as the completion_cntr).  An  applica-
       tion may	request	EP completions by specifying the FI_COMPLETION flag as
       part of the operation.

       It is the responsibility	of the application to detect and handle	situa-
       tions that occur	which could result in a	deferred work request's	condi-
       tion  not  being	met.  For example, if a	work request is	dependent upon
       the successful completion of a data transfer  operation,	 which	fails,
       then the	application must cancel	the work request.

       To submit a deferred work request, applications should use the domain's
       fi_control  function  with  command  FI_QUEUE_WORK  and	struct	fi_de-
       ferred_work as the fi_control arg parameter.  To	cancel a deferred work
       request,	use fi_control with command FI_CANCEL_WORK and the correspond-
       ing  struct  fi_deferred_work  to  cancel.   The	  fi_control   command
       FI_FLUSH_WORK  will cancel all queued work requests.  FI_FLUSH_WORK may
       be used to flush	all work queued	to the domain, or may be used to  can-
       cel all requests	waiting	on a specific triggering_cntr.

       Deferred	work requests are not acted upon by the	provider until the as-
       sociated	 event	has  occurred; although, certain validation checks may
       still occur when	a request is submitted.	 Referenced data  buffers  are
       not  read  or otherwise accessed.  But the provider may validate	fabric
       objects,	such as	endpoints and counters,	and that input parameters fall
       within supported	ranges.	 If a specific request is not supported	by the
       provider, it will fail the operation with -FI_ENOSYS.

SEE ALSO
       fi_getinfo(3), fi_endpoint(3), fi_mr(3),	fi_alias(3), fi_cntr(3)

AUTHORS
       OpenFabrics.

Libfabric Programmer's Manual	  2021-11-20			 fi_trigger(3)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=fi_trigger&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages