.TH "hwlocality_memattrs" 3 "Fri Dec 16 2022" "Version 2.9.0" "Hardware Locality (hwloc)" \" -*- nroff -*- .ad l .nh .SH NAME hwlocality_memattrs \- Comparing memory node attributes for finding where to allocate on .SH SYNOPSIS .br .PP .SS "Data Structures" .in +1c .ti -1c .RI "struct \fBhwloc_location\fP" .br .in -1c .SS "Typedefs" .in +1c .ti -1c .RI "typedef unsigned \fBhwloc_memattr_id_t\fP" .br .in -1c .SS "Enumerations" .in +1c .ti -1c .RI "enum \fBhwloc_memattr_id_e\fP { \fBHWLOC_MEMATTR_ID_CAPACITY\fP, \fBHWLOC_MEMATTR_ID_LOCALITY\fP, \fBHWLOC_MEMATTR_ID_BANDWIDTH\fP, \fBHWLOC_MEMATTR_ID_READ_BANDWIDTH\fP, \fBHWLOC_MEMATTR_ID_WRITE_BANDWIDTH\fP, \fBHWLOC_MEMATTR_ID_LATENCY\fP, \fBHWLOC_MEMATTR_ID_READ_LATENCY\fP, \fBHWLOC_MEMATTR_ID_WRITE_LATENCY\fP, \fBHWLOC_MEMATTR_ID_MAX\fP }" .br .ti -1c .RI "enum \fBhwloc_location_type_e\fP { \fBHWLOC_LOCATION_TYPE_CPUSET\fP, \fBHWLOC_LOCATION_TYPE_OBJECT\fP }" .br .ti -1c .RI "enum \fBhwloc_local_numanode_flag_e\fP { \fBHWLOC_LOCAL_NUMANODE_FLAG_LARGER_LOCALITY\fP, \fBHWLOC_LOCAL_NUMANODE_FLAG_SMALLER_LOCALITY\fP, \fBHWLOC_LOCAL_NUMANODE_FLAG_ALL\fP }" .br .in -1c .SS "Functions" .in +1c .ti -1c .RI "int \fBhwloc_memattr_get_by_name\fP (\fBhwloc_topology_t\fP topology, const char *name, \fBhwloc_memattr_id_t\fP *id)" .br .ti -1c .RI "int \fBhwloc_get_local_numanode_objs\fP (\fBhwloc_topology_t\fP topology, struct \fBhwloc_location\fP *location, unsigned *nr, \fBhwloc_obj_t\fP *nodes, unsigned long flags)" .br .ti -1c .RI "int \fBhwloc_memattr_get_value\fP (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, \fBhwloc_obj_t\fP target_node, struct \fBhwloc_location\fP *initiator, unsigned long flags, hwloc_uint64_t *value)" .br .ti -1c .RI "int \fBhwloc_memattr_get_best_target\fP (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, struct \fBhwloc_location\fP *initiator, unsigned long flags, \fBhwloc_obj_t\fP *best_target, hwloc_uint64_t *value)" .br .ti -1c .RI "int \fBhwloc_memattr_get_best_initiator\fP (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, \fBhwloc_obj_t\fP target, unsigned long flags, struct \fBhwloc_location\fP *best_initiator, hwloc_uint64_t *value)" .br .in -1c .SH "Detailed Description" .PP Platforms with heterogeneous memory require ways to decide whether a buffer should be allocated on 'fast' memory (such as HBM), 'normal' memory (DDR) or even 'slow' but large-capacity memory (non-volatile memory)\&. These memory nodes are called 'Targets' while the CPU accessing them is called the 'Initiator'\&. Access performance depends on their locality (NUMA platforms) as well as the intrinsic performance of the targets (heterogeneous platforms)\&. .PP The following attributes describe the performance of memory accesses from an Initiator to a memory Target, for instance their latency or bandwidth\&. Initiators performing these memory accesses are usually some PUs or Cores (described as a CPU set)\&. Hence a Core may choose where to allocate a memory buffer by comparing the attributes of different target memory nodes nearby\&. .PP There are also some attributes that are system-wide\&. Their value does not depend on a specific initiator performing an access\&. The memory node Capacity is an example of such attribute without initiator\&. .PP One way to use this API is to start with a cpuset describing the Cores where a program is bound\&. The best target NUMA node for allocating memory in this program on these Cores may be obtained by passing this cpuset as an initiator to \fBhwloc_memattr_get_best_target()\fP with the relevant memory attribute\&. For instance, if the code is latency limited, use the Latency attribute\&. .PP A more flexible approach consists in getting the list of local NUMA nodes by passing this cpuset to \fBhwloc_get_local_numanode_objs()\fP\&. Attribute values for these nodes, if any, may then be obtained with \fBhwloc_memattr_get_value()\fP and manually compared with the desired criteria\&. .PP \fBSee also\fP .RS 4 An example is available in doc/examples/memory-attributes\&.c in the source tree\&. .RE .PP \fBNote\fP .RS 4 The API also supports specific objects as initiator, but it is currently not used internally by hwloc\&. Users may for instance use it to provide custom performance values for host memory accesses performed by GPUs\&. .PP The interface actually also accepts targets that are not NUMA nodes\&. .RE .PP .SH "Typedef Documentation" .PP .SS "typedef unsigned \fBhwloc_memattr_id_t\fP" .PP A memory attribute identifier\&. May be either one of \fBhwloc_memattr_id_e\fP or a new id returned by \fBhwloc_memattr_register()\fP\&. .SH "Enumeration Type Documentation" .PP .SS "enum \fBhwloc_local_numanode_flag_e\fP" .PP Flags for selecting target NUMA nodes\&. .PP \fBEnumerator\fP .in +1c .TP \fB\fIHWLOC_LOCAL_NUMANODE_FLAG_LARGER_LOCALITY \fP\fP Select NUMA nodes whose locality is larger than the given cpuset\&. For instance, if a single PU (or its cpuset) is given in \fCinitiator\fP, select all nodes close to the package that contains this PU\&. .TP \fB\fIHWLOC_LOCAL_NUMANODE_FLAG_SMALLER_LOCALITY \fP\fP Select NUMA nodes whose locality is smaller than the given cpuset\&. For instance, if a package (or its cpuset) is given in \fCinitiator\fP, also select nodes that are attached to only a half of that package\&. .TP \fB\fIHWLOC_LOCAL_NUMANODE_FLAG_ALL \fP\fP Select all NUMA nodes in the topology\&. The initiator \fCinitiator\fP is ignored\&. .SS "enum \fBhwloc_location_type_e\fP" .PP Type of location\&. .PP \fBEnumerator\fP .in +1c .TP \fB\fIHWLOC_LOCATION_TYPE_CPUSET \fP\fP Location is given as a cpuset, in the location cpuset union field\&. .TP \fB\fIHWLOC_LOCATION_TYPE_OBJECT \fP\fP Location is given as an object, in the location object union field\&. .SS "enum \fBhwloc_memattr_id_e\fP" .PP Memory node attributes\&. .PP \fBEnumerator\fP .in +1c .TP \fB\fIHWLOC_MEMATTR_ID_CAPACITY \fP\fP The "Capacity" is returned in bytes (local_memory attribute in objects)\&. Best capacity nodes are nodes with \fBhigher capacity\fP\&. .PP No initiator is involved when looking at this attribute\&. The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_HIGHER_FIRST\fP\&. .TP \fB\fIHWLOC_MEMATTR_ID_LOCALITY \fP\fP The "Locality" is returned as the number of PUs in that locality (e\&.g\&. the weight of its cpuset)\&. Best locality nodes are nodes with \fBsmaller locality\fP (nodes that are local to very few PUs)\&. Poor locality nodes are nodes with larger locality (nodes that are local to the entire machine)\&. .PP No initiator is involved when looking at this attribute\&. The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_HIGHER_FIRST\fP\&. .TP \fB\fIHWLOC_MEMATTR_ID_BANDWIDTH \fP\fP The "Bandwidth" is returned in MiB/s, as seen from the given initiator location\&. Best bandwidth nodes are nodes with \fBhigher bandwidth\fP\&. .PP The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_HIGHER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&. .PP This is the average bandwidth for read and write accesses\&. If the platform provides individual read and write bandwidths but no explicit average value, hwloc computes and returns the average\&. .TP \fB\fIHWLOC_MEMATTR_ID_READ_BANDWIDTH \fP\fP The "ReadBandwidth" is returned in MiB/s, as seen from the given initiator location\&. Best bandwidth nodes are nodes with \fBhigher bandwidth\fP\&. .PP The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_HIGHER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&. .TP \fB\fIHWLOC_MEMATTR_ID_WRITE_BANDWIDTH \fP\fP The "WriteBandwidth" is returned in MiB/s, as seen from the given initiator location\&. Best bandwidth nodes are nodes with \fBhigher bandwidth\fP\&. .PP The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_HIGHER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&. .TP \fB\fIHWLOC_MEMATTR_ID_LATENCY \fP\fP The "Latency" is returned as nanoseconds, as seen from the given initiator location\&. Best latency nodes are nodes with \fBsmaller latency\fP\&. .PP The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_LOWER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&. .PP This is the average latency for read and write accesses\&. If the platform provides individual read and write latencies but no explicit average value, hwloc computes and returns the average\&. .TP \fB\fIHWLOC_MEMATTR_ID_READ_LATENCY \fP\fP The "ReadLatency" is returned as nanoseconds, as seen from the given initiator location\&. Best latency nodes are nodes with \fBsmaller latency\fP\&. .PP The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_LOWER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&. .TP \fB\fIHWLOC_MEMATTR_ID_WRITE_LATENCY \fP\fP The "WriteLatency" is returned as nanoseconds, as seen from the given initiator location\&. Best latency nodes are nodes with \fBsmaller latency\fP\&. .PP The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_LOWER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&. .SH "Function Documentation" .PP .SS "int hwloc_get_local_numanode_objs (\fBhwloc_topology_t\fP topology, struct \fBhwloc_location\fP * location, unsigned * nr, \fBhwloc_obj_t\fP * nodes, unsigned long flags)" .PP Return an array of local NUMA nodes\&. By default only select the NUMA nodes whose locality is exactly the given \fClocation\fP\&. More nodes may be selected if additional flags are given as a OR'ed set of \fBhwloc_local_numanode_flag_e\fP\&. .PP If \fClocation\fP is given as an explicit object, its CPU set is used to find NUMA nodes with the corresponding locality\&. If the object does not have a CPU set (e\&.g\&. I/O object), the CPU parent (where the I/O object is attached) is used\&. .PP On input, \fCnr\fP points to the number of nodes that may be stored in the \fCnodes\fP array\&. On output, \fCnr\fP will be changed to the number of stored nodes, or the number of nodes that would have been stored if there were enough room\&. .PP \fBNote\fP .RS 4 Some of these NUMA nodes may not have any memory attribute values and hence not be reported as actual targets in other functions\&. .PP The number of NUMA nodes in the topology (obtained by \fBhwloc_bitmap_weight()\fP on the root object nodeset) may be used to allocate the \fCnodes\fP array\&. .PP When an object CPU set is given as locality, for instance a Package, and when flags contain both \fBHWLOC_LOCAL_NUMANODE_FLAG_LARGER_LOCALITY\fP and \fBHWLOC_LOCAL_NUMANODE_FLAG_SMALLER_LOCALITY\fP, the returned array corresponds to the nodeset of that object\&. .RE .PP .SS "int hwloc_memattr_get_best_initiator (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, \fBhwloc_obj_t\fP target, unsigned long flags, struct \fBhwloc_location\fP * best_initiator, hwloc_uint64_t * value)" .PP Return the best initiator for the given attribute and target NUMA node\&. If the attribute does not relate to a specific initiator (it does not have the flag \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP), \fC-1\fP is returned and \fCerrno\fP is set to \fCEINVAL\fP\&. .PP If \fCvalue\fP is non \fCNULL\fP, the corresponding value is returned there\&. .PP If multiple initiators have the same attribute values, only one is returned (and there is no way to clarify how that one is chosen)\&. Applications that want to detect initiators with identical/similar values, or that want to look at values for multiple attributes, should rather get all values using \fBhwloc_memattr_get_value()\fP and manually select the initiator they consider the best\&. .PP The returned initiator should not be modified or freed, it belongs to the topology\&. .PP \fCflags\fP must be \fC0\fP for now\&. .PP If there are no matching initiators, \fC-1\fP is returned with \fCerrno\fP set to \fCENOENT\fP; .SS "int hwloc_memattr_get_best_target (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, struct \fBhwloc_location\fP * initiator, unsigned long flags, \fBhwloc_obj_t\fP * best_target, hwloc_uint64_t * value)" .PP Return the best target NUMA node for the given attribute and initiator\&. If the attribute does not relate to a specific initiator (it does not have the flag \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP), location \fCinitiator\fP is ignored and may be \fCNULL\fP\&. .PP If \fCvalue\fP is non \fCNULL\fP, the corresponding value is returned there\&. .PP If multiple targets have the same attribute values, only one is returned (and there is no way to clarify how that one is chosen)\&. Applications that want to detect targets with identical/similar values, or that want to look at values for multiple attributes, should rather get all values using \fBhwloc_memattr_get_value()\fP and manually select the target they consider the best\&. .PP \fCflags\fP must be \fC0\fP for now\&. .PP If there are no matching targets, \fC-1\fP is returned with \fCerrno\fP set to \fCENOENT\fP; .PP \fBNote\fP .RS 4 The initiator \fCinitiator\fP should be of type \fBHWLOC_LOCATION_TYPE_CPUSET\fP when refering to accesses performed by CPU cores\&. \fBHWLOC_LOCATION_TYPE_OBJECT\fP is currently unused internally by hwloc, but users may for instance use it to provide custom information about host memory accesses performed by GPUs\&. .RE .PP .SS "int hwloc_memattr_get_by_name (\fBhwloc_topology_t\fP topology, const char * name, \fBhwloc_memattr_id_t\fP * id)" .PP Return the identifier of the memory attribute with the given name\&. .SS "int hwloc_memattr_get_value (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, \fBhwloc_obj_t\fP target_node, struct \fBhwloc_location\fP * initiator, unsigned long flags, hwloc_uint64_t * value)" .PP Return an attribute value for a specific target NUMA node\&. If the attribute does not relate to a specific initiator (it does not have the flag \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP), location \fCinitiator\fP is ignored and may be \fCNULL\fP\&. .PP \fCflags\fP must be \fC0\fP for now\&. .PP \fBNote\fP .RS 4 The initiator \fCinitiator\fP should be of type \fBHWLOC_LOCATION_TYPE_CPUSET\fP when refering to accesses performed by CPU cores\&. \fBHWLOC_LOCATION_TYPE_OBJECT\fP is currently unused internally by hwloc, but users may for instance use it to provide custom information about host memory accesses performed by GPUs\&. .RE .PP .SH "Author" .PP Generated automatically by Doxygen for Hardware Locality (hwloc) from the source code\&.