NAME¶
reporting - Grid Engine reporting file format
DESCRIPTION¶
A Grid Engine system writes a reporting file
$SGE_ROOT/$SGE_CELL/common/reporting if
reporting=true is
specified in the
reporting_params. This occurs intervals of the
flush_time specified in the same place. The reporting file contains
data that can be used for accounting, monitoring and analysis purposes. It
contains information about the cluster (hosts, queues, load values,
consumables, etc.), about the jobs running in the cluster and about sharetree
configuration and usage. All information is time-related and events are dumped
to the reporting file in a configurable interval. It allows "real
time" monitoring of the cluster status as well as historical analysis.
The reporting file is an ASCII file. Each line contains one record, and the
fields of a record are separated by a delimiter (:). The reporting file
contains records of different type. Each record type has a specific record
structure.
The first two fields are common to all reporting records:
- time
- The time when the record was created. All time values described here are
the number of seconds since the Unix epoch (1970-01-01 00:00:00 UTC).
- record type
- Type of the accounting record.
The different types of records and their structure are described below, eliding
the leading
time and
record type fields in each case.
new_job¶
The new_job record is written whenever a new job enters the system (usually by a
submission command). It has the following fields:
- submission_time
- Time when the job was submitted.
- job_number
- The job number.
- task_number
- The array task id. Always has the value -1 for new_job records (as array
tasks haven't been created at that stage).
- pe_taskid
- The task id of parallel tasks. Always has the value "none" for
new_job records.
- job_name
- The job name (from -N submission option).
- owner
- The job owner.
- group
- The Unix group of the job owner.
- project
- The project the job is running in.
- department
- The department the job owner is in.
- account
- The account string specified for the job (from the -A submission
option).
- priority
- The job priority (from the -p submission option).
job_log¶
If
joblog=true is specified in the
reporting_params, a job_log
record is written whenever a job, an array task or a PE task changes status. A
status change can be the transition from pending to running, but can also be
triggered by user actions, like suspension of a job. It has the following
fields:
- event_time
- Time when the event was generated.
- event
- A one word description of the event.
- job_number
- The job number.
- task_number
- The PE task id. Always has the value -1 for new_job records (as parallel
tasks haven't been created at that stage).
- pe_taskid
- The task id of parallel tasks. Always has the value "none" for
new_job records.
- state
- The state of the job after the event was processed.
- user
- The user who initiated the event (or special usernames
"qmaster", "scheduler" and "execd" for
actions of the system itself like scheduling jobs, executing jobs
etc.).
- host
- The host from which the action was initiated (e.g. the submit host, the
qmaster host, etc.).
- state_time
- Reserved field for later use.
- submission_time
- Time when the job was submitted.
- job_name
- The job name (from the -N submission option).
- owner
- The job owner.
- group
- The Unix group of the job owner.
- project
- The project the job is running in.
- department
- The department the job owner is in.
- account
- The account string specified for the job (from the -A submission
option).
- priority
- The job priority (from the -p submission option).
- message
- A message describing the reported action.
acct¶
Records of type acct are accounting records. Normally, they are written whenever
a job, a task of an array job, or a task of a parallel job terminates.
However, for long running jobs an intermediate acct record is created once a
day after midnight. This results in multiple accounting records for a
particular job and allows for fine-grained resource usage monitoring over
time.
Accounting records have the structure described in with the addition of leading
time and type fields (which are not used in the accounting file).
queue¶
Records of type queue contain state information for queues (queue instances). A
queue record has the following fields:
- qname
- The cluster queue name.
- hostname
- The hostname of a specific queue instance.
- report_time
- The time when a state change was triggered.
- state
- The new queue state. The possible states are single-letter values, as
reported by with the -q option.
queue_consumable¶
A queue_consumable record contains information about queue consumable values in
addition to queue state information:
- qname
- The cluster queue name.
- hostname
- The hostname of a specific queue instance.
- report_time
- The time when a state change was triggered.
- state
- The new queue state.
- consumables
- Description of consumable values. Information about multiple consumables
is separated by space. A consumable description has the format
name=actual_value=configured_value.
Consumables are only logged if
log_consumables=true is specified in the
reporting_params, or the consumable is specified in the local or global
report_variables.
host¶
A host record contains information about hosts and host load values. It contains
the following information:
- hostname
- The name of the host.
- report_time
- The time when the reported information was generated.
- state
- The new host state. Currently, Grid Engine doesn't track a host state; the
field is reserved for future use. Always contains the value X.
- load values
- Description of load values. Information about multiple load values is
separated by space. A load value description has the format
name=actual_value.
host_consumable¶
A host_consumable record contains information about hosts and host consumables.
Host consumables can, for example, be licenses. It contains the following
information:
- hostname
- The name of the host.
- report_time
- The time when the reported information was generated.
- state
- The new host state. Currently, Grid Engine doesn't track a host state; the
field is reserved for future use. Always contains the value X.
- consumables
- Description of consumable values. Information about multiple consumables
is separated by space. A consumable description has the format
name=actual_value=configured_value.
Consumables are only logged if
log_consumables=true is specified in the
reporting_params, or the consumable is specified in the local or global
report_variables.
sharelog¶
The Grid Engine qmaster can dump information about sharetree configuration and
use to the reporting file. The
reporting_params can specify
sharelog, which sets an interval in which sharetree information will be
dumped. It is set in the format HH:MM:SS. A value of 00:00:00 configures
qmaster not to dump sharetree information. Intervals of several minutes up to
hours are sensible values for this parameter. The record contains the
following fields
- current time
- The present time
- usage time
- The time used so far
- node name
- The node name
- user name
- The user name
- project name
- The project name
- shares
- The total shares
- job count
- The job count
- level
- The percentage of shares used
- total
- The adjusted percentage of shares used
- long target share
- The long target percentage of resource shares used
- short target share
- The short target percentage of resource shares used
- actual share
- The actual percentage of resource shares used
- usage
- The combined shares used
- cpu
- The number of CPU seconds used
- mem
- The time integral of memory used (in GB seconds)
- io
- The IO done (in GB)
- long target cpu
- The long target cpu used
- long target mem
- The long target memory used
- long target io
- The long target IO used
new_ar¶
A new_ar record contains information about advance reservation objects. Entries
of this type will be added if an advance reservation is created. It contains
the following information:
- submission_time
- The time when the advance reservation was created.
- ar_number
- The advance reservation number identifying the reservation.
- ar_owner
- The owner of the advance reservation.
ar_attribute¶
The ar_attribute record is written whenever a new advance reservation was added
or the attribute of an existing advance reservation has changed. It has the
following fields.
- event_time
- The time when the event was generated.
- submission_time
- The time when the advance reservation was created.
- ar_number
- The advance reservation number identifying the reservation.
- ar_name
- Name of the advance reservation.
- ar_account
- An account string which was specified during the creation of the advance
reservation.
- ar_start_time
- Start time.
- ar_end_time
- End time.
- ar_granted_pe
- The parallel environment which was selected for an advance
reservation.
- ar_granted_resources
- The granted resources which were selected for an advance reservation.
ar_log¶
The ar_log record is written whenever an advance reservation changes status. A
status change can be from pending to active, but can also be triggered by
system events like host outage. It has the following fields.
- ar_state_change_time
- The time when the event occurred which caused a state change.
- submission_time
- The time when the advance reservation was created.
- ar_number
- The advance reservation number identifying the reservation.
- ar_state
- The new state.
- ar_event
- An event id identifying the event which caused the state change.
- ar_message
- A message describing the event which caused the state change.
ar_acct¶
The ar_acct records are accounting records which are written for every queue
instance whenever an advance reservation terminates. Advance reservation
accounting records comprise following fields.
- ar_termination_time
- The time when the advance reservation terminated.
- submission_time
- The time when the advance reservation was created.
- ar_number
- The advance reservation number identifying the reservation.
- ar_qname
- Cluster queue name which the advance reservation reserved.
- ar_hostname
- The name of the execution host.
- ar_slots
- The number of slots which were reserved.
FILES¶
$SGE_ROOT/$SGE_CELL/common/reporting
SEE ALSO¶
COPYRIGHT¶
See for a full statement of rights and permissions.