Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
fi_msg(3)			   @VERSION@			     fi_msg(3)

       fi_msg -	Message	data transfer operations

       fi_recv / fi_recvv / fi_recvmsg
	      Post a buffer to receive an incoming message

       fi_send / fi_sendv / fi_sendmsg
       fi_inject / fi_senddata : Initiate an operation to send a message

	      #include <rdma/fi_endpoint.h>

	      ssize_t fi_recv(struct fid_ep *ep, void *	buf, size_t len,
		  void *desc, fi_addr_t	src_addr, void *context);

	      ssize_t fi_recvv(struct fid_ep *ep, const	struct iovec *iov, void	**desc,
		  size_t count,	fi_addr_t src_addr, void *context);

	      ssize_t fi_recvmsg(struct	fid_ep *ep, const struct fi_msg	*msg,
		  uint64_t flags);

	      ssize_t fi_send(struct fid_ep *ep, const void *buf, size_t len,
		  void *desc, fi_addr_t	dest_addr, void	*context);

	      ssize_t fi_sendv(struct fid_ep *ep, const	struct iovec *iov,
		  void **desc, size_t count, fi_addr_t dest_addr, void *context);

	      ssize_t fi_sendmsg(struct	fid_ep *ep, const struct fi_msg	*msg,
		  uint64_t flags);

	      ssize_t fi_inject(struct fid_ep *ep, const void *buf, size_t len,
		  fi_addr_t dest_addr);

	      ssize_t fi_senddata(struct fid_ep	*ep, const void	*buf, size_t len,
		  void *desc, uint64_t data, fi_addr_t dest_addr, void *context);

	      ssize_t fi_injectdata(struct fid_ep *ep, const void *buf,	size_t len,
		  uint64_t data, fi_addr_t dest_addr);

       ep     Fabric  endpoint	on which to initiate send or post receive buf-

       buf    Data buffer to send or receive.

       len    Length of	data buffer to send or receive,	 specified  in	bytes.
	      Valid   transfers	  are  from  0	bytes  up  to  the  endpoint's

       iov    Vectored data buffer.

       count  Count of vectored	data entries.

       desc   Descriptor associated with the data buffer.  See fi_mr(3).

       data   Remote CQ	data to	transfer with the sent message.

	      Destination address for connectionless transfers.	  Ignored  for
	      connected	endpoints.

	      Source  address  to  receive  from for connectionless transfers.
	      Applies only to connectionless  endpoints	 with  the  FI_DIRECT-
	      ED_RECV capability enabled, otherwise this field is ignored.  If
	      set to FI_ADDR_UNSPEC, any source	address	may match.

       msg    Message descriptor for send and receive operations.

       flags  Additional flags to apply	for the	send or	receive	operation.

	      User specified pointer to	associate with	the  operation.	  This
	      parameter	 is  ignored if	the operation will not generate	a suc-
	      cessful completion, unless an op flag specifies the context  pa-
	      rameter be used for required input.

       The  send  functions  --	 fi_send, fi_sendv, fi_sendmsg,	fi_inject, and
       fi_senddata -- are used to transmit a message from one endpoint to  an-
       other  endpoint.	  The  main  difference	between	send functions are the
       number and type of parameters that they accept  as  input.   Otherwise,
       they perform the	same general function.	Messages sent using fi_msg op-
       erations	are received by	a remote endpoint into a buffer	posted to  re-
       ceive such messages.

       The  receive  functions -- fi_recv, fi_recvv, fi_recvmsg	-- post	a data
       buffer to an endpoint to	receive	inbound	messages.  Similar to the send
       operations,  receive  operations	 operate asynchronously.  Users	should
       not touch the posted data buffer(s) until  the  receive	operation  has

       An  endpoint must be enabled before an application can post send	or re-
       ceive operations	to it.	For connected endpoints, receive  buffers  may
       be  posted  prior  to  connect  or accept being called on the endpoint.
       This ensures that buffers are available to receive incoming data	 imme-
       diately after the connection has	been established.

       Completed  message  operations  are reported to the user	through	one or
       more event collectors associated	with the endpoint.  Users provide con-
       text  which  are	associated with	each operation,	and is returned	to the
       user as part of the event completion.  See fi_cq	for  completion	 event

       The call	fi_send	transfers the data contained in	the user-specified da-
       ta buffer to a remote endpoint, with  message  boundaries  being	 main-
       tained.	 For connection	based endpoints	(FI_EP_MSG) the	local endpoint
       must be connected to a remote endpoint or destination before fi_send is
       called.	 Unless	the endpoint has been configured differently, the data
       buffer passed into fi_send must not be touched by the application until
       the fi_send call	completes asynchronously.

       The  fi_sendv  call  adds support for a scatter-gather list to fi_send.
       The fi_sendv transfers the set of data buffers referenced  by  the  iov
       parameter to a remote endpoint as a single message.

       The fi_sendmsg call supports data transfers over	both connected and un-
       connected endpoints, with the ability to	control	the send operation per
       call  through  the  use	of  flags.   The  fi_sendmsg  function takes a
       struct fi_msg as	input.

	      struct fi_msg {
		  const	struct iovec *msg_iov; /* scatter-gather array */
		  void		     **desc;   /* local	request	descriptors */
		  size_t	     iov_count;/* # elements in	iov */
		  fi_addr_t	     addr;     /* optional endpoint address */
		  void		     *context; /* user-defined context */
		  uint64_t	     data;     /* optional message data	*/

       The send	inject call is an optimized version of fi_send with  the  fol-
       lowing characteristics.	The data buffer	is available for reuse immedi-
       ately on	return from the	call, and no CQ	entry will be written  if  the
       transfer	completes successfully.

       Conceptually,  this means that the fi_inject function behaves as	if the
       FI_INJECT transfer flag were set, selective  completions	 are  enabled,
       and  the	 FI_COMPLETION	flag is	not specified.	Note that the CQ entry
       will be suppressed even if the default behavior of the endpoint	is  to
       write CQ	entries	for all	successful completions.	 See the flags discus-
       sion below for more details.  The requested message size	 that  can  be
       used with fi_inject is limited by inject_size.

       The send	data call is similar to	fi_send, but allows for	the sending of
       remote CQ data (see FI_REMOTE_CQ_DATA flag) as part of the transfer.

       The inject data call is similar to fi_inject, but allows	for the	 send-
       ing  of	remote	CQ  data  (see	FI_REMOTE_CQ_DATA flag)	as part	of the

       The fi_recv call	posts a	data buffer to the receive queue of the	corre-
       sponding	 endpoint.  Posted receives are	searched in the	order in which
       they were posted	in order to match sends.  Message boundaries are main-
       tained.	 The  order in which the receives complete is dependent	on the
       endpoint	type and protocol.  For	unconnected  endpoints,	 the  src_addr
       parameter can be	used to	indicate that a	buffer should be posted	to re-
       ceive incoming data from	a specific remote endpoint.

       The fi_recvv call adds support for a scatter-gather  list  to  fi_recv.
       The fi_recvv posts the set of data buffers referenced by	the iov	param-
       eter to a receive incoming data.

       The fi_recvmsg call supports posting buffers over  both	connected  and
       unconnected  endpoints,	with the ability to control the	receive	opera-
       tion per	call through the use of	flags.	The fi_recvmsg function	 takes
       a struct	fi_msg as input.

       The  fi_recvmsg	and  fi_sendmsg	 calls allow the user to specify flags
       which can change	the default message handling of	the  endpoint.	 Flags
       specified  with	fi_recvmsg / fi_sendmsg	override most flags previously
       configured with the endpoint, except where noted	 (see  fi_endpoint.3).
       The   following	list  of  flags	 are  usable  with  fi_recvmsg	and/or

	      Applies to fi_sendmsg and	fi_senddata.  Indicates	that remote CQ
	      data  is	available  and	should be sent as part of the request.
	      See fi_getinfo for additional details on FI_REMOTE_CQ_DATA.

	      Applies to posted	receive	operations  for	 endpoints  configured
	      for  FI_BUFFERED_RECV  or	FI_VARIABLE_MSG.  This flag is used to
	      retrieve a message that was buffered by the provider.   See  the
	      Buffered Receives	section	for details.

	      Indicates	 that  a  completion entry should be generated for the
	      specified	operation.  The	endpoint must be bound to a completion
	      queue with FI_SELECTIVE_COMPLETION that corresponds to the spec-
	      ified operation, or this flag is ignored.

	      Applies to posted	receive	operations  for	 endpoints  configured
	      for  FI_BUFFERED_RECV  or	FI_VARIABLE_MSG.  This flag is used to
	      free a message that was  buffered	 by  the  provider.   See  the
	      Buffered Receives	section	for details.

	      Indicates	 that the user has additional requests that will imme-
	      diately be posted	after the current call returns.	 Use  of  this
	      flag  may	 improve performance by	enabling the provider to opti-
	      mize its access to the fabric hardware.

	      Applies to fi_sendmsg.  Indicates	that the outbound data	buffer
	      should  be  returned to user immediately after the send call re-
	      turns, even if the operation is  handled	asynchronously.	  This
	      may require that the underlying provider implementation copy the
	      data into	a local	buffer and transfer out	of that	buffer.	  This
	      flag can only be used with messages smaller than inject_size.

	      Applies to posted	receive	operations.  This flag allows the user
	      to post a	single buffer that will	receive	multiple incoming mes-
	      sages.  Received messages	will be	packed into the	receive	buffer
	      until the	buffer has been	consumed.  Use of this flag may	 cause
	      a	single posted receive operation	to generate multiple events as
	      messages are placed into the buffer.  The	placement of  received
	      data  into  the  buffer  may  be	subjected to provider specific
	      alignment	restrictions.

       The buffer will be released by the provider when	the  available	buffer
       space  falls  below  the	specified minimum (see FI_OPT_MIN_MULTI_RECV).
       Note that an entry to the associated receive completion queue will  al-
       ways  be	generated when the buffer has been consumed, even if other re-
       ceive completions have been suppressed (i.e.  the Rx context  has  been
       configured for FI_SELECTIVE_COMPLETION).	 See the FI_MULTI_RECV comple-
       tion flag fi_cq(3).

	      Applies to fi_sendmsg.  Indicates	that a	completion  should  be
	      generated	when the source	buffer(s) may be reused.

	      Applies  to  fi_sendmsg.	Indicates that a completion should not
	      be generated until the operation has been	successfully transmit-
	      ted and is no longer being tracked by the	provider.

	      Applies  to  fi_sendmsg.	 Indicates that	a completion should be
	      generated	when the operation has been processed by the  destina-

	      Applies  to  transmits.  Indicates that the requested operation,
	      also known as the	fenced operation, and any operation posted af-
	      ter the fenced operation will be deferred	until all previous op-
	      erations targeting the same peer endpoint	have completed.	 Oper-
	      ations  posted after the fencing will see	and/or replace the re-
	      sults of any operations initiated	prior to the fenced operation.

       The ordering of operations starting at the posting of the fenced	opera-
       tion  (inclusive)  to the posting of a subsequent fenced	operation (ex-
       clusive)	is controlled by the endpoint's	ordering semantics.

	      Applies to transmits.  This  flag	 indicates  that  the  address
	      specified	 as  the  data transfer	destination is a multicast ad-
	      dress.  This flag	must be	used in	all  multicast	transfers,  in
	      conjunction with a multicast fi_addr_t.

Buffered Receives
       Buffered	receives indicate that the networking layer allocates and man-
       ages the	data buffers used to receive network data transfers.  As a re-
       sult,  received	messages  must be copied from the network buffers into
       application buffers for processing.  However,  applications  can	 avoid
       this  copy  if  they are	able to	process	the message in place (directly
       from the	networking buffers).

       Handling	buffered receives differs based	on the size of the message be-
       ing  sent.  In general, smaller messages	are passed directly to the ap-
       plication for processing.  However, for large messages, an  application
       will  only  receive  the	 start of the message and must claim the rest.
       The details for how small messages are reported and large messages  may
       be claimed are described	below.

       When  a provider	receives a message, it will write an entry to the com-
       pletion queue associated	with the receiving endpoint.   For  discussion
       purposes,  the  completion  queue  is  assumed  to  be  configured  for
       FI_CQ_FORMAT_DATA.  Since buffered receives are not associated with ap-
       plication  posted  buffers,  the	 CQ  entry  op_context will point to a
       struct fi_recv_context.

	      struct fi_recv_context {
		  struct fid_ep	*ep;
		  void *context;

       The 'ep'	field will point to the	receiving endpoint or Rx context,  and
       'context'  will be NULL.	 The CQ	entry's	'buf' will point to a provider
       managed buffer where the	start of the received message is located,  and
       'len' will be set to the	total size of the message.

       The  maximum  sized message that	a provider can buffer is limited by an
       FI_OPT_BUFFERED_LIMIT.  This threshold can be obtained and may  be  ad-
       justed  by the application using	the fi_getopt and fi_setopt calls, re-
       spectively.  Any	adjustments must be made prior to  enabling  the  end-
       point.  The CQ entry 'buf' will point to	a buffer of received data.  If
       the sent	message	is larger than	the  buffered  amount,	the  CQ	 entry
       'flags'	will  have  the	FI_MORE	bit set.  When the FI_MORE bit is set,
       'buf' will reference at least FI_OPT_BUFFERED_MIN bytes	of  data  (see
       fi_endpoint.3 for more info).

       After  being notified that a buffered receive has arrived, applications
       must either claim or discard the	message.   Typically,  small  messages
       are  processed and discarded, while large messages are claimed.	Howev-
       er, an application is free to claim or discard any  message  regardless
       of message size.

       To  claim  a message, an	application must post a	receive	operation with
       the FI_CLAIM flag set.  The struct fi_recv_context returned as part  of
       the  notification  must be provided as the receive operation's context.
       The struct fi_recv_context contains a  'context'	 field.	  Applications
       may  modify  this  field	prior to claiming the message.	When the claim
       operation completes, a standard receive completion entry	will be	gener-
       ated on the completion queue.  The 'context' of the associated CQ entry
       will be set to the 'context' value passed in through  the  fi_recv_con-
       text structure, and the CQ entry	flags will have	the FI_CLAIM bit set.

       Buffered	 receives that are not claimed must be discarded by the	appli-
       cation when it is done processing the CQ	entry data.  To	discard	a mes-
       sage,  an application must post a receive operation with	the FI_DISCARD
       flag set.  The struct fi_recv_context returned as part of the notifica-
       tion  must  be  provided	 as the	receive	operation's context.  When the
       FI_DISCARD flag is set for a receive operation, the receive input  buf-
       fer(s) and length parameters are	ignored.

       IMPORTANT:  Buffered  receives must be claimed or discarded in a	timely
       manner.	Failure	to do so may result in increased memory	usage for net-
       work  buffering	or  communication stalls.  Once	a buffered receive has
       been claimed or discarded,  the	original  CQ  entry  'buf'  or	struct
       fi_recv_context data may	no longer be accessed by the application.

       The  use	 of  the  FI_CLAIM  and	FI_DISCARD operation flags is also de-
       scribed with  respect  to  tagged  message  transfers  in  fi_tagged.3.
       Buffered	 receives  of  tagged messages will include the	message	tag as
       part of the CQ entry, if	available.

       The handling of buffered	receives follows all message ordering restric-
       tions  assigned	to an endpoint.	 For example, completions may indicate
       the order in which received messages arrived at the receiver  based  on
       the endpoint attributes.

Variable Length	Messages
       Variable	 length	 messages,  or simply variable messages, are transfers
       where the size of the message is	unknown	to the receiver	prior  to  the
       message	being sent.  It	indicates that the recipient of	a message does
       not know	the amount of data to expect prior to  the  message  arriving.
       It  is  most  commonly  used  when the size of message transfers	varies
       greatly,	with very large	messages interspersed with much	 smaller  mes-
       sages,  making  receive	side  message  buffering  difficult to manage.
       Variable	messages are not subject to max	 message  length  restrictions
       (i.e.   struct  fi_ep_attr::max_msg_size	 limits), and may be up	to the
       maximum value of	size_t (e.g.  SIZE_MAX)	in length.

       Variable	length messages	support	requests that  the  provider  allocate
       and  manage  the	network	message	buffers.  As a result, the application
       requirements and	provider behavior is identical as  those  defined  for
       supporting  the	FI_BUFFERED_RECV  mode	bit.  See the Buffered Receive
       section above for details.  The main difference is  that	 buffered  re-
       ceives  are  limited by the fi_ep_attr::max_msg_size threshold, whereas
       variable	length messages	are not.

       Support for variable messages is	indicated through the  FI_VARIABLE_MSG
       capability bit.

       If  an endpoint has been	configured with	FI_MSG_PREFIX, the application
       must include buffer space of size msg_prefix_size, as specified by  the
       endpoint	 attributes.  The prefix buffer	must occur at the start	of the
       data referenced by the buf parameter, or	be referenced by the first  IO
       vector.	 Message prefix	space cannot be	split between multiple IO vec-
       tors.  The size of the prefix buffer should be included as part of  the
       total buffer length.

       Returns 0 on success.  On error,	a negative value corresponding to fab-
       ric errno is returned.  Fabric errno values are defined in  rdma/fi_er-

       See the discussion below	for details handling FI_EAGAIN.

	      Indicates	 that  the underlying provider currently lacks the re-
	      sources needed to	initiate the requested operation.  The reasons
	      for  a provider returning	FI_EAGAIN are varied.  However,	common
	      reasons include insufficient internal buffering or full process-
	      ing queues.

       Insufficient  internal  buffering  is  often associated with operations
       that use	FI_INJECT.  In such cases,  additional	buffering  may	become
       available as posted operations complete.

       Full  processing	 queues	may be a temporary state related to local pro-
       cessing (for example, a large message is	being transferred), or may  be
       the  result of flow control.  In	the latter case, the queues may	remain
       blocked until additional	resources are made  available  at  the	remote
       side of the transfer.

       In  all	cases, the operation may be retried after additional resources
       become available.  It is	strongly recommended that  applications	 check
       for transmit and	receive	completions after receiving FI_EAGAIN as a re-
       turn value, independent of the operation	which failed.  This is partic-
       ularly  important  in  cases  where manual progress is employed,	as ac-
       knowledgements or flow control messages may need	to be processed	in or-
       der to resume execution.

       fi_getinfo(3), fi_endpoint(3), fi_domain(3), fi_cq(3)


Libfabric Programmer's Manual	  2019-09-27			     fi_msg(3)


Want to link to this manual page? Use this URL:

home | help