.TH "resource_monitor" 1 "" "CCTools 7.0.9 FINAL" "Cooperative Computing Tools" .SH NAME .LP \fBresource_monitor\fP - monitors the cpu, memory, io, and disk usage of a tree of processes. .SH SYNOPSIS .LP \FC\fBresource_monitor [options] -- command [command-options]\fP\FT .SH DESCRIPTION .LP \fBresource_monitor\fP is a tool to monitor the computational resources used by the process created by the command given as an argument, and all its descendants. The monitor works 'indirectly', that is, by observing how the environment changed while a process was running, therefore all the information reported should be considered just as an estimate (this is in contrast with direct methods, such as ptrace). It has been tested in Linux, FreeBSD, and Darwin, and can be used automatically by \FCmakeflow\FT and \FCwork queue\FT applications. Additionally, the user can specify maximum resource limits in the form of a file, or a string given at the command line. If one of the resources goes over the limit specified, then the monitor terminates the task, and reports which resource went over the respective limits. In systems that support it, \fBresource_monitor\fP wraps some libc functions to obtain a better estimate of the resources used. Currently, the monitor does not support interactive applications. That is, if a process issues a read call from standard input, and standard input has not been redirected, then the tree process is terminated. This is likely to change in future versions of the tool. \fBresource_monitor\fP generates up to three log files: a summary file encoded as json with the maximum values of resource used, a time-series that shows the resources used at given time intervals, and a list of files that were opened during execution. The summary file is a JSON document with the following fields. Unless indicated, all fields are an array with two values, a number that describes the measurement, and a string describing the units (e.g., \FC[ measurement\FT). .fam C .nf .nh .IP "" 8 command: the command line given as an argument start: time at start of execution, since the epoch end: time at end of execution, since the epoch exit_type: one of "normal", "signal" or "limit" (a string) signal: number of the signal that terminated the process Only present if exit_type is signal cores: maximum number of cores used cores_avg: number of cores as cpu_time/wall_time exit_status: final status of the parent process max_concurrent_processes: the maximum number of processes running concurrently total_processes: count of all of the processes created wall_time: duration of execution, end - start cpu_time: user+system time of the execution virtual_memory: maximum virtual memory across all processes memory: maximum resident size across all processes swap_memory: maximum swap usage across all processes bytes_read: amount of data read from disk bytes_written: amount of data written to disk bytes_received: amount of data read from network interfaces bytes_sent: amount of data written to network interfaces bandwidth: maximum bandwidth used total_files: total maximum number of files and directories of all the working directories in the tree disk: size of all working directories in the tree limits_exceeded: resources over the limit with -l, -L options (JSON object) peak_times: seconds from start when a maximum occured (JSON object) snapshots: List of intermediate measurements, identified by snapshot_name (JSON object) .fi .hy .fam .P The time-series log has a row per time sample. For each row, the columns have the following meaning (all columns are integers): .fam C .nf .nh .IP "" 8 wall_clock the sample time, since the epoch, in microseconds cpu_time accumulated user + kernel time, in microseconds cores current number of cores used max_concurrent_processes concurrent processes at the time of the sample virtual_memory current virtual memory size, in MB memory current resident memory size, in MB swap_memory current swap usage, in MB bytes_read accumulated number of bytes read, in bytes bytes_written accumulated number of bytes written, in bytes bytes_received accumulated number of bytes received, in bytes bytes_sent accumulated number of bytes sent, in bytes bandwidth current bandwidth, in bps total_files current number of files and directories, across all working directories in the tree disk current size of working directories in the tree, in MB .fi .hy .fam .P .SH OPTIONS .LP .LP .TP \fB-d\fP, \fB-\-debug\fP=\fI\fP . Enable debugging for this subsystem. .TP \fB-o\fP, \fB-\-debug-file\fP=\fI\fP . Write debugging output to this file. By default, debugging is sent to stderr (":stderr"). You may specify logs be sent to stdout (":stdout"), to the system syslog (":syslog"), or to the systemd journal (":journal"). .TP .B \ -v,--version . Show version string. .TP .B \ -h,--help . Show help text. .TP \fB-i\fP, \fB-\-interval\fP=\fI\fP . Maximum interval between observations, in seconds (default=1). .TP .B \ --pid=pid . Track pid instead of executing a command line (warning: less precise measurements). .TP .B \ --accurate-short-processes . Accurately measure short running processes (adds overhead). .TP \fB-c\fP, \fB-\-sh\fP=\fI\fP . Read command line from \FCstr\FT, and execute as '/bin/sh -c \FCstr\FT'. .TP \fB-l\fP, \fB-\-limits-file\fP=\fI\fP . Use maxfile with list of var: value pairs for resource limits. .TP \fB-L\fP, \fB-\-limits\fP=\fI\fP . String of the form "var: value, var: value\ to specify resource limits. (Could be specified multiple times.) .TP .B \ -f, --child-in-foreground . Keep the monitored process in foreground (for interactive use). .TP \fB-O\fP, \fB-\-with-output-files\fP=\fI