.\" Automatically generated by Pandoc 2.9.2.1
.\"
.TH "fi_efa" "7" "2022\-12\-11" "Libfabric Programmer\[cq]s Manual" "#VERSION#"
.hy
.SH NAME
.PP
fi_efa - The Amazon Elastic Fabric Adapter (EFA) Provider
.SH OVERVIEW
.PP
The EFA provider supports the Elastic Fabric Adapter (EFA) device on
Amazon EC2.
EFA provides reliable and unreliable datagram send/receive with direct
hardware access from userspace (OS bypass).
.SH SUPPORTED FEATURES
.PP
The following features are supported:
.TP
\f[I]Endpoint types\f[R]
The provider supports endpoint type \f[I]FI_EP_DGRAM\f[R], and
\f[I]FI_EP_RDM\f[R] on a new Scalable (unordered) Reliable Datagram
protocol (SRD).
SRD provides support for reliable datagrams and more complete error
handling than typically seen with other Reliable Datagram (RD)
implementations.
The EFA provider provides segmentation, reassembly of out-of-order
packets to provide send-after-send ordering guarantees to applications
via its \f[I]FI_EP_RDM\f[R] endpoint.
.TP
\f[I]RDM Endpoint capabilities\f[R]
The following data transfer interfaces are supported via the
\f[I]FI_EP_RDM\f[R] endpoint: \f[I]FI_MSG\f[R], \f[I]FI_TAGGED\f[R], and
\f[I]FI_RMA\f[R].
\f[I]FI_SEND\f[R], \f[I]FI_RECV\f[R], \f[I]FI_DIRECTED_RECV\f[R],
\f[I]FI_MULTI_RECV\f[R], and \f[I]FI_SOURCE\f[R] capabilities are
supported.
The endpoint provides send-after-send guarantees for data operations.
The \f[I]FI_EP_RDM\f[R] endpoint does not have a maximum message size.
.TP
\f[I]DGRAM Endpoint capabilities\f[R]
The DGRAM endpoint only supports \f[I]FI_MSG\f[R] capability with a
maximum message size of the MTU of the underlying hardware
(approximately 8 KiB).
.TP
\f[I]Address vectors\f[R]
The provider supports \f[I]FI_AV_TABLE\f[R] and \f[I]FI_AV_MAP\f[R]
address vector types.
\f[I]FI_EVENT\f[R] is unsupported.
.TP
\f[I]Completion events\f[R]
The provider supports \f[I]FI_CQ_FORMAT_CONTEXT\f[R],
\f[I]FI_CQ_FORMAT_MSG\f[R], and \f[I]FI_CQ_FORMAT_DATA\f[R].
\f[I]FI_CQ_FORMAT_TAGGED\f[R] is supported on the RDM endpoint.
Wait objects are not currently supported.
.TP
\f[I]Modes\f[R]
The provider requires the use of \f[I]FI_MSG_PREFIX\f[R] when running
over the DGRAM endpoint, and requires \f[I]FI_MR_LOCAL\f[R] for all
memory registrations on the DGRAM endpoint.
.TP
\f[I]Memory registration modes\f[R]
The RDM endpoint does not require memory registration for send and
receive operations, i.e.\ it does not require \f[I]FI_MR_LOCAL\f[R].
Applications may specify \f[I]FI_MR_LOCAL\f[R] in the MR mode flags in
order to use descriptors provided by the application.
The \f[I]FI_EP_DGRAM\f[R] endpoint only supports \f[I]FI_MR_LOCAL\f[R].
.TP
\f[I]Progress\f[R]
RDM and DGRAM endpoints support \f[I]FI_PROGRESS_MANUAL\f[R].
EFA erroneously claims the support for \f[I]FI_PROGRESS_AUTO\f[R],
despite not properly supporting automatic progress.
Unfortunately, some Libfabric consumers also ask for
\f[I]FI_PROGRESS_AUTO\f[R] when they only require
\f[I]FI_PROGRESS_MANUAL\f[R], and fixing this bug would break those
applications.
This will be fixed in a future version of the EFA provider by adding
proper support for \f[I]FI_PROGRESS_AUTO\f[R].
.TP
\f[I]Threading\f[R]
The RDM endpoint supports \f[I]FI_THREAD_SAFE\f[R], the DGRAM endpoint
supports \f[I]FI_THREAD_DOMAIN\f[R], i.e.\ the provider is not thread
safe when using the DGRAM endpoint.
.SH LIMITATIONS
.PP
The DGRAM endpoint does not support \f[I]FI_ATOMIC\f[R] interfaces.
For RMA operations, completion events for RMA targets
(\f[I]FI_RMA_EVENT\f[R]) is not supported.
The DGRAM endpoint does not fully protect against resource overruns, so
resource management is disabled for this endpoint
(\f[I]FI_RM_DISABLED\f[R]).
.PP
No support for selective completions.
.PP
No support for counters for the DGRAM endpoint.
.PP
No support for inject.
.PP
When using FI_HMEM for either CUDA and Neuron buffers, the provider
requires peer to peer transaction support between the EFA and the
FI_HMEM device.
Therefore, the FI_HMEM_P2P_DISABLED option is not supported by the EFA
provider.
.SH PROVIDER SPECIFIC ENDPOINT LEVEL OPTION
.TP
\f[I]FI_OPT_EFA_RNR_RETRY\f[R]
Defines the number of RNR retry.
The application can use it to reset RNR retry counter via the call to
fi_setopt.
Note that this option must be set before the endpoint is enabled.
Otherwise, the call will fail.
Also note that this option only applies to RDM endpoint.
.SH RUNTIME PARAMETERS
.TP
\f[I]FI_EFA_TX_SIZE\f[R]
Maximum number of transmit operations before the provider returns
-FI_EAGAIN.
For only the RDM endpoint, this parameter will cause transmit operations
to be queued when this value is set higher than the default and the
transmit queue is full.
.TP
\f[I]FI_EFA_RX_SIZE\f[R]
Maximum number of receive operations before the provider returns
-FI_EAGAIN.
.TP
\f[I]FI_EFA_TX_IOV_LIMIT\f[R]
Maximum number of IOVs for a transmit operation.
.TP
\f[I]FI_EFA_RX_IOV_LIMIT\f[R]
Maximum number of IOVs for a receive operation.
.SH RUNTIME PARAMETERS SPECIFIC TO RDM ENDPOINT
.PP
These OFI runtime parameters apply only to the RDM endpoint.
.TP
\f[I]FI_EFA_RX_WINDOW_SIZE\f[R]
Maximum number of MTU-sized messages that can be in flight from any
single endpoint as part of long message data transfer.
.TP
\f[I]FI_EFA_TX_QUEUE_SIZE\f[R]
Depth of transmit queue opened with the NIC.
This may not be set to a value greater than what the NIC supports.
.TP
\f[I]FI_EFA_RECVWIN_SIZE\f[R]
Size of out of order reorder buffer (in messages).
Messages received out of this window will result in an error.
.TP
\f[I]FI_EFA_CQ_SIZE\f[R]
Size of any cq created, in number of entries.
.TP
\f[I]FI_EFA_MR_CACHE_ENABLE\f[R]
Enables using the mr cache and in-line registration instead of a bounce
buffer for iov\[cq]s larger than max_memcpy_size.
Defaults to true.
When disabled, only uses a bounce buffer
.TP
\f[I]FI_EFA_MR_MAX_CACHED_COUNT\f[R]
Sets the maximum number of memory registrations that can be cached at
any time.
.TP
\f[I]FI_EFA_MR_MAX_CACHED_SIZE\f[R]
Sets the maximum amount of memory that cached memory registrations can
hold onto at any time.
.TP
\f[I]FI_EFA_MAX_MEMCPY_SIZE\f[R]
Threshold size switch between using memory copy into a pre-registered
bounce buffer and memory registration on the user buffer.
.TP
\f[I]FI_EFA_MTU_SIZE\f[R]
Overrides the default MTU size of the device.
.TP
\f[I]FI_EFA_RX_COPY_UNEXP\f[R]
Enables the use of a separate pool of bounce-buffers to copy unexpected
messages out of the pre-posted receive buffers.
.TP
\f[I]FI_EFA_RX_COPY_OOO\f[R]
Enables the use of a separate pool of bounce-buffers to copy
out-of-order RTS packets out of the pre-posted receive buffers.
.TP
\f[I]FI_EFA_MAX_TIMEOUT\f[R]
Maximum timeout (us) for backoff to a peer after a receiver not ready
error.
.TP
\f[I]FI_EFA_TIMEOUT_INTERVAL\f[R]
Time interval (us) for the base timeout to use for exponential backoff
to a peer after a receiver not ready error.
.TP
\f[I]FI_EFA_ENABLE_SHM_TRANSFER\f[R]
Enable SHM provider to provide the communication across all intra-node
processes.
SHM transfer will be disabled in the case where
\f[C]ptrace protection\f[R] is turned on.
You can turn it off to enable shm transfer.
.TP
\f[I]FI_EFA_SHM_AV_SIZE\f[R]
Defines the maximum number of entries in SHM provider\[cq]s address
vector.
.TP
\f[I]FI_EFA_SHM_MAX_MEDIUM_SIZE\f[R]
Defines the switch point between small/medium message and large message.
The message larger than this switch point will be transferred with large
message protocol.
NOTE: This parameter is now deprecated.
.TP
\f[I]FI_EFA_INTER_MAX_MEDIUM_MESSAGE_SIZE\f[R]
The maximum size for inter EFA messages to be sent by using medium
message protocol.
Messages which can fit in one packet will be sent as eager message.
Messages whose sizes are smaller than this value will be sent using
medium message protocol.
Other messages will be sent using CTS based long message protocol.
.TP
\f[I]FI_EFA_FORK_SAFE\f[R]
Enable fork() support.
This may have a small performance impact and should only be set when
required.
Applications that require to register regions backed by huge pages and
also require fork support are not supported.
.TP
\f[I]FI_EFA_RUNT_SIZE\f[R]
The maximum number of bytes that will be eagerly sent by inflight
messages uses runting read message protocol (Default 307200).
.TP
\f[I]FI_EFA_SET_CUDA_SYNC_MEMOPS\f[R]
Set CU_POINTER_ATTRIBUTE_SYNC_MEMOPS for cuda ptr.
(Default: 1)
.TP
\f[I]FI_EFA_INTER_MIN_READ_MESSAGE_SIZE\f[R]
The minimum message size in bytes for inter EFA read message protocol.
If instance support RDMA read, messages whose size is larger than this
value will be sent by read message protocol.
(Default 1048576).
.TP
\f[I]FI_EFA_INTER_MIN_READ_WRITE_SIZE\f[R]
The mimimum message size for inter EFA write to use read write protocol.
If firmware support RDMA read, and FI_EFA_USE_DEVICE_RDMA is 1, write
requests whose size is larger than this value will use the read write
protocol (Default 65536).
.SH SEE ALSO
.PP
\f[C]fabric\f[R](7), \f[C]fi_provider\f[R](7), \f[C]fi_getinfo\f[R](3)
.SH AUTHORS
OpenFabrics.