NAME¶
lamtrace - Unload LAM trace data.
SYNOPSIS¶
lamtrace [-hkvR] [-mpi] [-l listno] [-f #secs] [filename]
[nodes] [processes]
OPTIONS¶
- -h
- Print useful information on this command.
- -k
- Copy and do not remove trace data.
- -v
- Be verbose.
- -R
- Delete all trace data from the specified nodes.
- -l
- Unload only from the given list number.
- -mpi
- Unload trace data for an MPI application.
- -f #secs
- Signal target processes to flush trace data to the daemon. Then wait #secs
before unloading.
- filename
- Place trace data into this file (default: def.lamtr).
DESCRIPTION¶
The -t option of mpirun(1) and loadgo(1) allows the application to generate
execution traces. These traces are first stored in a buffer within each
application process. When the buffer is full and when the application
terminates, the runtime buffer is flushed to the trace daemon (a structural
component within the LAM daemon). The trace daemon will also collect data up
to a pre-compiled limit. Beyond this limit, the oldest traces in will be
forgotten in favor of the newer traces.
After an application has finished, the record of its execution is stored in the
trace daemons of each node that was running the application. The
lamtrace command can be used to retrieve these traces and store them in
one file for display by a performance visualization tool, such as
xmpi(1). If
the application was started by
xmpi(1),
lamtrace is not normally needed
as the equivalent functionality is invoked with a button.
Incomplete trace data can be unloaded while the application is running. The
output file must not exist prior to invoking
lamtrace. This is a good
situation to use the -k option, which preserves the trace daemon's contents
after unloading. Each reload will then get the entire run's trace data up to
the present time.
A running process is likely to be holding the most recent trace data in an
internal buffer. A standard LAM signal, LAM_SIGTRACE (see doom(1)), causes
trace enabled processes to flush the internal trace buffer to the daemon. The
-f option tells
lamtrace to send this signal to all target processes
before unloading trace data. A race condition develops between the target
process storing trace data to the daemon and the unloading procedure. The
problem is foisted upon the user who gives a delay parameter after -f.
Trace data are organized by node, process identifier and list number. A process
can store traces on any node, although the local node is the obvious, least
intrusive choice. The process can identify itself in any meaningful way
(getpid(2) is a good idea) The list number is also chosen by the process.
These values may be set by an instrumented library, such as libmpi(3), or
directly by the application with lam_rtrstore(2). Unloading flexibility
follows that of storing with the -l option selecting the list number, and
standard LAM command line mnemonics selecting nodes and processes.
Dropping old traces when a pre-compiled volume limit is reached only happens for
positive list numbers. Traces in negatively numbered lists will be collected
until the underlying system runs out of memory. Do not use negative list
numbers for high volume trace data.
If no process selection is given on the command line, trace data will be
unloaded for all processes on each specified node.
LAM, its trace daemon and
lamtrace are all unaware of the format and
meaning of traces.
The -R option does not unload trace data. It causes the target trace daemons to
free the memory occupied by trace data in the given list. If all lists are
specified (no -l option), the trace daemon is effectively reset to its state
after initiating LAM.
Unloading MPI Trace Data¶
A special capability, selected by the -mpi option, exists to search for and
unload only the trace data generated by an MPI application. For this purpose,
lamtrace is aware of the particular reserved list numbers that
libmpi(3) uses to store traces. It begins by searching all specified nodes and
processes (the whole LAM multicomputer, if nothing is specified) for a special
trace generated by process rank 0 in MPI_COMM_WORLD of an MPI application.
This special trace contains the node and process identifiers of all processes
in that MPI_COMM_WORLD communicator.
lamtrace then uses the node /
process information to collect all trace data generated by libmpi(3).
If multiple world communicators exist within LAM's trace daemons, the first one
found is used. Multiple worlds may be present due to multiple concurrent
applications, trace data from a previous run not removed (either with
lamtrace or
lamclean(1)), or an application that spawns processes. A
particular world communicator can be located by providing precise node and
process location to
lamtrace.
The -mpi option is not compatible with the -l option.
EXAMPLES¶
- lamtrace -v -mpi mytraces
- Unload trace data into the file "mytraces" from the first MPI
application found in a search of the entire LAM multicomputer. Report on
important steps as they are done.
- lamtrace n30 -l 5 p21367
- Unload trace data from list 5 of process ID 21367 on node 30. Operate
silently.
- lamtrace -mpi n30 p21367
- Unload trace data from the MPI application world group whose process rank
0 has PID 21367 and is/was running on node 30.
BUGS¶
Since trace data can be unloaded during an application's execution, there should
be a way to incrementally append to an output file. This is a bit tricky with
-mpi, but it can be done.
FILES¶
- def.lamtr
- default output file
SEE ALSO¶
mpirun(1), loadgo(1), lam_rtrstore(1),
lamclean(1), libmpi(3),
xmpi(1)