.\" Copyright (c) 2014 Stijn van Dongen
.TH "clm info2" 1 "16 May 2014" "clm info2 14-137" "USER COMMANDS "
.po 2m
.de ZI
.\" Zoem Indent/Itemize macro I.
.br
'in +\\$1
.nr xa 0
.nr xa -\\$1
.nr xb \\$1
.nr xb -\\w'\\$2'
\h'|\\n(xau'\\$2\h'\\n(xbu'\\
..
.de ZJ
.br
.\" Zoem Indent/Itemize macro II.
'in +\\$1
'in +\\$2
.nr xa 0
.nr xa -\\$2
.nr xa -\\w'\\$3'
.nr xb \\$2
\h'|\\n(xau'\\$3\h'\\n(xbu'\\
..
.if n .ll -2m
.am SH
.ie n .in 4m
.el .in 8m
..
.SH NAME
clm info2 \- compute performance measures for graphs and clusterings\&.

clminfo2 is not in actual fact a program\&. This manual
page documents the behaviour and options of the clm program when
invoked in mode \fIinfo2\fP\&. The options \fB-h\fP, \fB--apropos\fP,
\fB--version\fP, \fB-set\fP, \fB--nop\fP are accessible
in all \fBclm\fP modes\&. They are described
in the \fBclm\fP manual page\&.
.SH SYNOPSIS

\fBclm info2\fP [options] <graph file> <cluster file> <cluster file>*

\fBclm info2\fP
\fB[-o\fP fname (\fIwrite to file \fBfname\fP\fP)\fB]\fP
\fB[-pi\fP f (\fIapply inflation beforehand\fP)\fB]\fP
\fB[--list\fP (\fIlist efficiency for all nodes\fP)\fB]\fP
\fB[-tf\fP spec (\fIapply tf-spec to input matrix\fP)\fB]\fP
\fB[-cl-ceil\fP <num> (\fIskip clusters of size exceeding \fB<num>\fP\fP)\fB]\fP
\fB[-cat-max\fP <num> (\fIdo at most \fB<num>\fP tree levels\fP)\fB]\fP
\fB[-cl-tree\fP fname (\fIexpect file with nested clusterings\fP)\fB]\fP
\fB[-t\fP <int> (\fIuse <int> threads\fP)\fB]\fP
\fB[-J\fP <intJ> (\fIa total of <intJ> jobs are used\fP)\fB]\fP
\fB[-j\fP <intj> (\fIthis job has index <intj>\fP)\fB]\fP
\fB[-h\fP (\fIprint synopsis, exit\fP)\fB]\fP
\fB[--apropos\fP (\fIprint synopsis, exit\fP)\fB]\fP
\fB[--version\fP (\fIprint version, exit\fP)\fB]\fP
<matrix file> <cluster file> <cluster file>*
.SH DESCRIPTION

\fBclm info2\fP is a streamlined and updated version of \fBclm info\fP\&. The
latter outputs a key-value format listing a number of measures\&. In contrast,
\fBclm info2\fP only outputs the so-called efficiency criterion, a quality
index for networks and clusterings\&. This criterion can be generated for
each node independently with the \fB--list\fP option, indicating how
well a clustering captures the neighbour distribution of a given node\&.

\fBclm info2\fP can utilise threading and job dispatching\&. This may be useful
when dealing with very large graphs\&.

Multiple clusterings can be supplied on the command-line\&.
Output is tabular, each row corresponding with a clustering in the
ordering as supplied on the command line\&. Multiple columns will
result only if node-wise output is induced with \fB--list\fP\&.
By default a single number is produced for each individual clustering:
the mean of all node-wise scores for that clustering\&.

The \fBefficiency\fP factor is described in [1] (see
the \fBREFERENCES\fP section)\&. It tries to balance the dual aims of
capturing a lot of edges or edge weights and keeping the cluster footprint
or area fraction small\&. The efficiency number has several appealing
mathematical properties, cf\&. [1]\&.
.SH OPTIONS

.ZI 2m "\fB-o\fP fname (\fIoutput file name\fP)"
\&
.br
.in -2m

.ZI 2m "\fB-pi\fP f (\fIapply inflation beforehand\fP)"
\&
.br
Apply inflation to the graph matrix and compute the performance
measures for the result\&.
.in -2m

.ZI 2m "\fB-tf\fP <tf-spec> (\fItransform input matrix values\fP)"
\&
.br
shared_defopt{-tf}
.in -2m

.ZI 2m "\fB--list\fP (\fIlist efficiency for all nodes\fP)"
\&
.br
The efficiency scores for all nodes are given on a single line\&.
Each clustering specified corresponds to a single line\&.
.in -2m

.ZI 2m "\fB-cl-tree\fP fname (\fIexpect file with nested clusterings (cone format)\fP)"
\&
'in -2m
.ZI 2m "\fB-cl-ceil\fP <num> (\fIskip (nested) clusters of size exceeding <num>\fP)"
\&
'in -2m
'in +2m
\&
.br
The specified file should contain a hierarchy of nested
clusterings such as generated by \fBmclcm\fP\&. The output is then
in a special format, undocumented but easy to understand\&.
Its purpose is to help cherrypick a single clustering
from a tree, in conjunction with the slightly experimental
and undocumented program \fBmlmfifofum\fP\&.

The measure that is used is very slow to compute for large clusters, and
generally it will be outside any interesting range (i\&.e\&. it will be small)\&.
Use \fB-cl-ceil\fP to skip clusters exceeding the specified size \-
\fBclm info\fP will directly proceed to subclusters if they exist\&.
.in -2m

.ZI 2m "\fB-cat-max\fP num (\fIdo at most num levels\fP)"
\&
.br
This only has effect when used with \fB-cl-tree\fP\&.
\fBclm info\fP will start at the most fine-grained level, working upwards\&.
.in -2m

.ZI 2m "\fB-t\fP <int> (\fIuse <int> threads\fP)"
\&
'in -2m
.ZI 2m "\fB-j\fP <intj> (\fIthis job has index <intj>\fP)"
\&
'in -2m
.ZI 2m "\fB-J\fP <intJ> (\fIa total of <intJ> jobs are used\fP)"
\&
'in -2m
'in +2m
\&
.br
For very large graphs (millions of nodes) and clusterings with large
clusters it may be helpful to allow this program to use multiple CPUs\&.
Additionally it is possible to spread the computation over multiple
jobs/machines\&. These three options are described in the \fBclmprotocols\fP manual page\&.
The following set of options, if given to as many commands, defines three jobs, each running four threads\&.

.di ZV
.in 0
.nf \fC
-t 4 -J 3 -j 0 -o out\&.0
-t 4 -J 3 -j 1 -o out\&.1
-t 4 -J 3 -j 2 -o out\&.2
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

The output can then be collected with

.di ZV
.in 0
.nf \fC
clxdo add_table out\&.[0-2]
.fi \fR
.in
.di
.ne \n(dnu
.nf \fC
.ZV
.fi \fR

.in -2m
.SH AUTHOR

Stijn van Dongen\&.
.SH SEE ALSO

\fBmclfamily(7)\fP for an overview of all the documentation
and the utilities in the mcl family\&.
.SH REFERENCES

[1] Stijn van Dongen\&. \fIPerformance criteria for graph clustering and Markov
cluster experiments\fP\&. Technical Report INS-R0012, National Research
Institute for Mathematics and Computer Science in the Netherlands,
Amsterdam, May 2000\&.
.br
http://www\&.cwi\&.nl/ftp/CWIreports/INS/INS-R0012\&.ps\&.Z