NAME¶
libedac - EDAC error reporting library
SYNOPSIS¶
#include <edac.h>
cc ... -ledac
edac_handle * edac_handle_create (void);
void edac_handle_destroy (edac_handle *edac);
int edac_handle_init (edac_handle *edac);
unsigned int edac_mc_count (edac_handle *edac);
int edac_handle_reset (edac_handle *edac);
int edac_error_totals (edac_handle *edac, struct edac_totals *totals);
edac_mc * edac_next_mc (edac_handle *edac);
int edac_mc_get_info (edac_mc *mc, struct edac_mc_info *info);
edac_mc *edac_next_mc_info (edac_handle *edac,
struct edac_mc_info *info);
int edac_mc_reset (struct edac_mc *mc);
edac_csrow * edac_next_csrow (struct edac_mc *mc);
int edac_csrow_get_info (edac_csrow *csrow,
struct edac_csrow_info *info);
edac_csrow * edac_next_csrow_info (edac_mc *mc,
struct edac_csrow_info *info);
const char * edac_strerror (edac_handle *edac);
edac_for_each_mc_info (edac_handle *edac, edac_mc *mc,
struct edac_csrow_info *info) { ... }
edac_for_each_csrow_info (edac_mc *mc, edac_csrow *csrow,
struct edac_csrow_info *info) { ... }
DESCRIPTION¶
The
libedac library offers a very simple programming interface to the
information exported from in-kernel EDAC (Error Detection and Correction)
drivers in sysfs. The
edac-util(8) utility uses
libedac to
report errors in a user-friendly manner from the command line.
EDAC errors for most systems are recorded in sysfs on a per memory controller
(MC) basis. Memory controllers are further subdivided by csrow and channel.
The
libedac library provides a method to loop through multiple MCs, and
their corresponding csrows, obtaining information about each component from
sysfs along the way. There is also a simple single call to retrieve the total
error counts for a given machine.
In order to use
libedac an
edac_handle must first be opened via
the call
edac_handle_create(). Once the handle is created, sysfs data
can be loaded into the handle with
edac_handle_init(). A final call to
edac_handle_destroy() will free all memory and open files associated
with the edac handle.
edac_handle_create() will return
NULL on failure to allocate
memory.
The
edac_strerror function will return a descriptive string
representation of the last error for the
libedac handle
edac.
The
edac_error_totals() function will return the total counts of memory
and pci errors in the
totals structure passed to the function. The
totals structure is of type
edac_totals which has the form:
struct edac_totals {
unsigned int ce_total; /* Total corrected errors */
unsigned int ue_total; /* Total uncorrected errors */
unsigned int pci_parity_total; /* Total PCI Parity errors */
};
Systems may have one or more memory controllers (MCs) with EDAC information. The
number of MCs detected by EDAC drivers may be queried with the
edac_mc_count() function, while the
edac_next_mc function will
return a handle to the next memory controller in the
libedac
handle´s internal list. This memory controller is represented by the
opaque
edac_mc type.
edac_next_mc will return
NULL when there are no further memory
controllers to return. Thus the following example code is another method to
count all EDAC MCs (assuming the EDAC library handle
edac has already
been initialized):
int i = 0;
edac_mc *mc;
while ((mc = edac_next_mc (edac)))
i++;
return (i);
To query information about an
edac_mc, use the
edac_mc_get_info
function. This function fills in the given
info structure, which is of
type
edac_mc_info:
struct edac_mc_info {
char id[]; /* Id of memory controller */
char mc_name[]; /* Name of MC */
unsigned int size_mb; /* Amount of RAM in MB */
unsigned int ce_count; /* Corrected error count */
unsigned int ce_noinfo_count;/* noinfo Corrected errors */
unsigned int ue_count; /* Uncorrected error count */
unsigned int ue_noinfo_count;/* noinfo Uncorrected errors*/
};
The function
edac_next_mc_info() can be used to loop through all EDAC
memory controllers and obtain MC information in a single call. It is a
combined
edac_next_mc() and
edac_mc_get_info().
The function
edac_handle_reset() will reset the internal memory
controller iterator in the
libedac handle. A subsequent call to
edac_next_mc() would thus return the first EDAC MC.
A convenience macro,
edac_for_each_mc_info(), is provided which defines a
for loop that iterates through all memory controller
objects for a given EDAC handle, returning the MC information in the
info structure on each iteration. For example (assuming initialized
libedac handle
edac):
edac_mc *mc;
struct edac_mc_info info;
int count = 0;
edac_for_each_mc_info (edac, mc, info) {
count++;
printf ("MC info: id=%s name=%s\n", info.id, info.mc_name);
}
Each EDAC memory controller may have one or more
csrows associated with
it. Similar to the MC iterator functions described above, the
edac_next_csrow() function allows
libedac users to loop through
all csrows within a given MC. Once the last csrow is reached, the function
will return
NULL.
The
edac_csrow_get_info() function returns information about
edac_csrow in the edac_csrow_info structure, which has
the contents:
struct edac_csrow_info {
char id[]; /* CSROW Identity (e.g. csrow0) */
unsigned int size_mb; /* CSROW size in MB */
unsigned int ce_count; /* Total corrected errors */
unsigned int ue_count; /* Total uncorrected errors */
struct edac_channel channel[EDAC_MAX_CHANNELS];
};
struct edac_channel {
int valid; /* Is this channel valid */
unsigned int ce_count; /* Corrected error count */
int dimm_label_valid; /* Is DIMM label valid? */
char dimm_label[]; /* DIMM name */
};
The
edac_next_csrow_info() function is a combined version of
edac_next_csrow() and
edac_csrow_get_info() for convenience.
The
edac_mc_reset() function is provided to reset the
edac_mc
internal csrow iterator.
A convenience macro,
edac_for_each_csrow_info(), is provided which
defines a for loop that iterates through all csrow objects in an EDAC memory
controller, returning the csrow information in the
info structure on
each iteration.
EXAMPLES¶
Initialize
libedac handle:
edac_handle *edac;
if (!(edac = edac_handle_create ())) {
fprintf (stderr, "edac_handle_create: Out of memory!\n");
exit (1);
}
if (edac_handle_init (edac) < 0) {
fprintf (stderr, "Unable to get EDAC data: %s\n",
edac_strerror (edac));
exit (1);
}
printf ("EDAC initialized with %d MCs\n", edac_mc_count (edac));
edac_handle_destroy (edac);
Report all DIMM labels for MC:csrow:channel combinations
edac_mc *mc;
edac_csrow *csrow;
struct edac_mc_info mci;
struct edac_csrow_info csi;
edac_for_each_mc_info (ctx->edac, mc, mci) {
edac_for_each_csrow_info (mc, csrow, csi) {
char *label[2] = { "unset", "unset" };
if (csi.channel[0].dimm_label_valid)
label[0] = csi.channel[0].dimm_label;
if (csi.channel[1].dimm_label_valid)
label[1] = csi.channel[1].dimm_label;
printf ("%s:%s:ch0 = %s\n", mci.id, csi.id, label[0]);
printf ("%s:%s:ch1 = %s\n", mci.id, csi.id, label[1]);
}
}
SEE ALSO¶
edac-util(8),
edac-ctl(8)