COLMUX(1) | colmux | COLMUX(1) |
NAME¶
colmux - multiplex communications to multiple systems running collectl from a single systemSYNOPSIS¶
colmux [-command "collectl-switches... [-p filespec]]" [-address addr1[,addr2,...]|-addr filename] [-cols col1[,col2...]] | [-column num]DESCRIPTION¶
This utility gathers up data generated by collectl from multiple systems and multiplexes it into a single consolidated format. It runs in essentially 2 distinct modes, the first is known as real-time, because data is retrieved and displayed in real time. The second is playback mode because data is played back from existing collectl data files. There are also 2 general formats for the data being displayed. The first is a multi-line display in which the data is displayed in the native form that collectl displays it, except it is sorted by a distint column, essentially allowing one to see the TOP producers of that data. The second format is a single line display in which one or more distinct data elements from each source is displayed on the same line. This latter format is never sorted, but rather positionally organized by the name of the system that generated it. Collectl will be then be executed, using any optional switches specified by -command, on each of the systems specified by -address OR read those addresses from a file it the target of that switch is a filename rather than a list of hosts OR on the local system if -address is not specified. See collectl for details of the various switches. In some cases certain collectl switches will not make sense in a colmux environment and if chosen will generate an error. Further, if hosts are specified with -address, they should be a individual addresses or hostnames separated by commas. In turn, any of them can be in what those familiar with pdsh would recognize as -w format. Colmux will then execute the collectl command, gather the results from all sources for a particular interval and display them one result per line, sorted by the specified column OR all on the same line in groups specified by -cols. The number of lines displayed is set to the size of the terminal window by default, but can be changed using -lines. The one exception is the use of -nosort which only applies to the playback of existing collectl raw files. In this mode all records for a particular interval will be displayed and the sorting bypassed, making this a speedy and convenient mechanism for gathering all data from all systems in one place for potential further processing. Colmux will never modify the size of the terminal window so to see more or wider lines either expand the window or override the number of display lines and run it again. If the number display lines is set greater then the terminal height or 0, colmux will no longer overlay the previous window and simply run in a continuous scrolling mode. Common Switches -address list|pdsh|filenameSpecify any combination of addresses as hostnames OR in
pdsh -w format OR a filename containing a list of hostnames/addresses, 1 per
line. You MUST have passwordless ssh access to these nodes. If a different
username is required, be sure to specify addresses in username@host format
noting you do not have to have the same username on each host. If specified,
these usernames will override those specified with the -username switch. rsh
access is not supported.
-command switches
One can specify virtually any collectl command here, both
in real-time or playback mode. Some switches may only be used during one mode
or the other and colmux will usually let you know if you specify an invalid
combination or an otherwise restricted switch. Only those directly affecting
colmux are listed below:
--from, --thru
-help
Limit the timeframe for data being played back, noting
you can include both the from and thru times with the --from switch if you
separate then with a hyphen.
-o time-format
This is a "magic" switch in that it not only
tells collectl how to display dates/times (no other options are permitted
using -o other than those from the set [dDTm]), it also tells colmux how to
display dates/times too.
In single line mode, the timestamp will either come from the host system in
real-time mode OR the first host when run in playback mode. This is the most
common use/need for this switch.
In real-time/top mode this switch is not allowed since colmux simply reports the
current time of the system it is running on.
When playing back data multi-line formatted data from one or more files, a
timestamp for each interval is reported, consisting of the time of that
interval. When this switch is included, each line will be tagged with an
appropriate timestamp since on rare occasions they may not necessarily all be
identical.
-p playback-file
This switch tells colmux to run in playback mode. The
filename should include the directory location and is usually specified with
wild cards, limiting the selected file(s) to a specific date. When those files
are on the same host (-address is not specified), they may be for multiple
hosts, but when the files are on remote hosts they must all be for be that
unique host. If the file specification includes the string TODAY or YESTERDAY
they will be replaced with *yyyymmdd* for that date.
-P
Run collectl in plot-format. This allows one to specify
just about any combination of subsystems since all data is always displayed on
a single line. However, due to the lack of formatting, this also makes no
sense for multi-line displays and is therefore only supported in single-line
format.
Show a brief help message and exit.
-hostwidth n
By default, colmux set the hostwidth to 8, unless it sees
something wider and for most situations this is sufficient. However, if one
specifies hostnames that are aliases of the longer hostname, colmux has no way
of knowing the real hostlengths until after it starts receiving data from
collectl and the formatting will be off if the hostnames are longer than the
default. To overcome this problem, use this switch to force the hostname to be
wider.
-lines
Change the number of lines that are displayed for each
interval in multi-line mode. The default will be determined by the terminal
size returned by the linux resize command if present. If that command is not
present, the size will be initially set to 24. If -lines is greater than the
terminal size or 0, top-like behavior will not be used when in real-time mode.
Single-line format controls the number of lines displayed between headers. A
value of 0 will only display the header one time.
-noescape
Colmux uses brute-force screen formatting, that is it
generates its own VT100 escape sequences to clear lines and/or move the
cursor. On some occasions you may want to disable this sequences if you wish
to recode the output and do your own post-processing of it. This switch will
do just that.
-port
Sometimes a remote version of collectl is already using
the default socket. This allows one to start another instance and override
that value.
-test
This tells colmux to execute the specified collectl
command either locally or on the first remote system specified by -address,
print the associated header with the selected column(s) highlighted and also
include each column name along with its ordinal number, making it fairly easy
to make sure you've selected the right column(s).
-username name
Use this username for ALL ssh commands. It can be
overridden for specific hosts by specifying them with the -address switch with
the desired hostnames.
-version
Display the version and exit. It will also report if
Term::ReadKey is installed and if so what its version number is.
Playback Mode Specific
The following additional switches only apply to playback mode. There are no
real-time mode specific switches.
-delay seconds
Introduce a delay between intervals in seconds. You can
specify fractional values. Not using this switch will cause the output to be
displayed as fast as it can be rendered.
-home
Move the cursor to the home position (upper left-hand
corner) of the display to use a top-like display format. This ONLY applies to
multi-line mode when in playback mode and provides a mechanism for displaying
recorded data in a top-like fashion.
-hostfilter addr[,addr]
When playing back files for multiple hosts on the local
system, sometimes you do not want to play back ALL the host files. This filter
allows you to specify only those hosts which you want to process. The format
of the list of addresses is specified in the same way as -address except that
you cannot specify a filename.
-nosort
Intended primarily for output that would be redirected to
a file, do not sort or include any escape sequences in the output.
Multi-Line Format
When there is more output then will fit on the screen,
colmux includes the text:
-colhelp
Displaying: lines xx thru yy out of zz
on the right-side of the top line of the display, where xx is typically 1.
However, once colmux is running, one might want to look at subsequent lines, ie
those below the bottom of the screen and therefore invisible. If the ReadKey
module is installed, one can simply use the PageDown key to move down the
display and the PageUp key to move in the other direction. If ReadKey is not
installed, typing the multi-key sequences pd<ENTER> or pu<ENTER>
will cause the same thing to happen.When you wish to change the sort column and the arrow
keys aren't available to you, it may be cumbersome to identify the number of
the column to type in followed by RETURN. This tells colmux to display the
numbers over each column eliminating the need to manually count them and find
the one you want.
-column num
Set the sort column to this number. The column numbering
is determined by the columns returned by collectl for the requested command.
Since date/time columns are optional for non-plot data, their inclusion will
change the numbering of the columns so if you are not sure you selected the
correct column, you should first execute your command with -test included.
You can also change the column number interactively with the RIGHT/LEFT arrow
keys IF the ReadKey module is installed (see colmux -version) OR simply type
it in followed by the <ENTER> key.
-nobold
Do not highlight the selected column. This may be useful
when redirecting output to a file and you do not want the associated escape
sequences to be written to it.
-reverse
Reverse the default sort order. You can also change the
direction of the sort interactively with the UP/DOWN arrow keys IF the ReadKey
module is installed (see colmux -version)
OR simply type the r key and <ENTER>.
-zero
OR simply type the r key and <ENTER>.
Do not display any rows with 0 in the sort column. You
can also type z<ENTER>interactively.
Single-Line Format
-col1000
Divide each column by 1000 before display
-colk
Divide each column by 1024 before display
-collog10
Remap large numbers to a smaller number of values by
taking the log10 of them and further transforming by the followign mapping:
0,1 to 0, 10 to 10, 100 to 20, 1000 to 30, 10000 to 40, ... 1e9 to 90.
-cols num,...
Group all data together for each host by column
number(s). As with -column, you can confirm the correct column(s) have been
selected by first running with -test.
-colnodet
Do not show data for individual hosts, just display the
totals.
-colnodiv num,...
Do not divide the specified column numbers by 1000 or
1024 when col1000 or colk or apply the colllog10 transformation when
specified. A typical usage is if you want to look at cpu loads as well as
network or disk stats in which case you may want to divide the latter by 1024
but not the cpu.
-coltotal
Include the totals for each column to the right.
-colwidth
Set the output columns to this width, typically used in
conjunction with -col1000 or colk to allow more hosts to fit onto the same
line. It can also be used if the host names are too narrow for column headers
and you have room to display wider names.
Exception Reporting Specific
In single-line format, rather than wait for all hosts to report their data,
colmux simply reports the last data seen when the time to generate a line of
output has come. In most cases, these do reflect the most recent data values
but in times of load, the data may be late getting to colmux and so a previous
value may be reported. If the age of that data exceeds a defined number of
intervals, the default is currently 2, an exception value will be reported of
-1. At other times it has been seen where kernel/driver bugs may cause
incorrect values to be reported as negative numbers and those values are also
reported as -1. Both the age and exception values can be changed with the
following switches.
-age number
When initially starting up and all hosts have not yet
reported any data, colmux will display a -1 to indicate no data has been seen
yet. If during processing a host fails to report in -age intervals, the
default is 2, colmux will also report a -1 indicating the data is stale.
-negdataval val
In some cases, there could be erroneous data reported as
negative numbers (though sometimes negative numbers are valid). When
specified, replace any negative numbers with this value.
-nodataval val
This switch allows you to change the -1 that is normally
reported for missing or stale data to the specified value, most commonly
0.
Diagnostics
The following switches are intended more for diagnostic purposes than normal
operation, though are also worth using on appropriate occasions.
-debug val
This switch is for generating diagnostic information at
various levels. It is actually a bit mask, whose values are listed in the
beginning on colmux itself. Perhaps the most useful value is 1 as it will
cause colmux to display all the remote commands issues to each host in the
address list and can often reveal problems when things don't seem to be
working correctly
-nocheck
This switch was initially included in an earlier version
when remote host checking was causing problem in some cases and by skipping
those checks, colmux would run more reliably. While it is felt that as of
V3.2.0 these reachability checks are now reliable and should not be skipped,
this switch has been left in place.
-quiet
By default and when -nocheck not specified, colmux checks
the versions of all collectl instances against that of the first node found to
be running collectl and if different, reports the mismatch. This switch
suppresses that warning.
-reachable
By default, when a node is found to not be reachable,
colmux will remove it from its list of hosts and continue execution. This
switch will tell colmux to exit when all hosts are not reachable.
Miscellaneous
There are 2 switches whose descriptions don't really fit anywhere else:
-colbin path
On rare occasions, such as testing a patch to collectl in
a copy NOT in /usr/bin, you may want to tell colmux to use that copy instead
of the standard one. Use this switch to point to that copy. Naturally that
copy must exist in that location on all systems.
-keepalive secs
Colmux uses ssh to start collectl on each remote machine
and then communications between collectl and colmux occur over a socket.
Normally, ssh is configured to timeout after an interval of inactivity, such
as 30 minutes, which means a long-running colmux session will begin to lose
connections when this interval is reached. By specifying a keepalive interval,
you're telling the ssh to send a periodic keepalive to the other end so that
connection doesn't get dropped.
-timeout secs
By default, collectl waits up to 10 seconds for remote
instances of collectl to connect back. On slower networks or when a very large
number of instances have been started, they may fail to connect back in time.
This switch will extend that timeout, but it also requires collectl V3.6.4 be
used because earlier version do not support this feature.
WHAT HAS CHANGED WITH VERSION 3?¶
Users of Version 2 will find this to look like a new utility though in actuality only a couple of enhancements have been made to the functionality, which include: sorting of multi-line data Rather than simply report all the data for all hosts specified, something ver few people actually used, only the top-n hosts will now have their data reported, sorted by the column specified by -column. ability to playback data from collectl files Simply add -p to the collectl command and the associated file(s) for the same day will be played back and the data reported in either multi- or single-line format. new features, include -test to show which column(s) selected Instead of manually counting which column(s) you wish to select for sorting or single-line mode, -test will show you column numbering, which can be particularly useful for wide lines. Additional switches for enhanced multi-line formatting have also been included. several changes to single line modenew way to request prefacing lines with
timestamps: Simply add the desired time format using -o to the collectl
command
no longer need -w for non-plot data: colmux is smart enough to recognize
fields that end in K/M/G and convert them to the appropriate values before
sorting. However it will still display them in their original forms. Further,
you can even sort on non-numeric fields such as device names and many of the
fields reported for process data.
several switched eliminated
Yes, it is hard to believe but a number of switches have
been eliminated either because their functionality is encompassed in other
mechanisms or their function has been deemed obsolete.
-date, -mmdd, -time: time formats now handled with -o in collectl command
-hosts, -machines: use -address
-rsh: nobody uses rsh anymore
PLAYBACK MODE RESTRICTIONS¶
All logs being played back must have been collected using the same interval as colmux only looks at the first file/host to determine the appropriate value. It is assumed all clocks are reasonably well synchronized as colmux uses time to determine which data is to be displayed as a set. All files must be in the same directory on all systems and that directory must be included in the playback file specification All files on a remote host must be for that host onlyEXAMPLES¶
Run collectl on 3 nodes, showing CPU, Disk and Network statistics once a second and sorted by column 1, which happens to be total cpu. colmux -addr abc,def,xyz Dynamically display top processes on nodes n1-n10 of a cluster once a second, sorted by column 5. colmux -addr n[1-10] -command "-sZ :1" -column 5 Do the same for yesterday, between the hours of 5AM and 6AM, being sure to stall for 1/2 second between intervals. Note, if you leave off -addr you could put all the logs into /var/log/collectl on the local host and play them back from there. colmux -addr n[1-10] -command "-sZ -p/var/log/collectl/YESTERDAY -from 05:00-06:00" -column 5 -delay .5 Look at the amount of mapped and slab memory consumed on nodes n1-n10 and n15 in real-time, every 2 seconds using single-line format. Include totals and preface each line with the time. Since memory sizes tend to be rather large, divide each by 1024 so we see MB rather than KB. Note that the columns numbers are always displayed are ascending order regardless of their order in -cols. To be sure, first test the column numbers. colmux -addr n[1-10,15] -command "-sm -i2 -oT" -cols 6,7 -coltot -colk -testRESTRICTIONS¶
colmux requires passwordless ssh between the node it is running on those it is monitoring. also be sure the port you are using for communications, the default is 2655, if openKNOWN PROBLEMS¶
see source codeAUTHOR¶
This program was written by Mark Seger (mark.seger@hp.com).SEE ALSO¶
http://collectl-utils.sourceforge.net/colmux.htmlDECEMBER 2010 | LOCAL |