.\" Copyright (c) 2014 Stijn van Dongen .TH "clm info2" 1 "16 May 2014" "clm info2 14-137" "USER COMMANDS " .po 2m .de ZI .\" Zoem Indent/Itemize macro I. .br 'in +\\$1 .nr xa 0 .nr xa -\\$1 .nr xb \\$1 .nr xb -\\w'\\$2' \h'|\\n(xau'\\$2\h'\\n(xbu'\\ .. .de ZJ .br .\" Zoem Indent/Itemize macro II. 'in +\\$1 'in +\\$2 .nr xa 0 .nr xa -\\$2 .nr xa -\\w'\\$3' .nr xb \\$2 \h'|\\n(xau'\\$3\h'\\n(xbu'\\ .. .if n .ll -2m .am SH .ie n .in 4m .el .in 8m .. .SH NAME clm info2 \- compute performance measures for graphs and clusterings\&. clminfo2 is not in actual fact a program\&. This manual page documents the behaviour and options of the clm program when invoked in mode \fIinfo2\fP\&. The options \fB-h\fP, \fB--apropos\fP, \fB--version\fP, \fB-set\fP, \fB--nop\fP are accessible in all \fBclm\fP modes\&. They are described in the \fBclm\fP manual page\&. .SH SYNOPSIS \fBclm info2\fP [options] * \fBclm info2\fP \fB[-o\fP fname (\fIwrite to file \fBfname\fP\fP)\fB]\fP \fB[-pi\fP f (\fIapply inflation beforehand\fP)\fB]\fP \fB[--list\fP (\fIlist efficiency for all nodes\fP)\fB]\fP \fB[-tf\fP spec (\fIapply tf-spec to input matrix\fP)\fB]\fP \fB[-cl-ceil\fP (\fIskip clusters of size exceeding \fB\fP\fP)\fB]\fP \fB[-cat-max\fP (\fIdo at most \fB\fP tree levels\fP)\fB]\fP \fB[-cl-tree\fP fname (\fIexpect file with nested clusterings\fP)\fB]\fP \fB[-t\fP (\fIuse threads\fP)\fB]\fP \fB[-J\fP (\fIa total of jobs are used\fP)\fB]\fP \fB[-j\fP (\fIthis job has index \fP)\fB]\fP \fB[-h\fP (\fIprint synopsis, exit\fP)\fB]\fP \fB[--apropos\fP (\fIprint synopsis, exit\fP)\fB]\fP \fB[--version\fP (\fIprint version, exit\fP)\fB]\fP * .SH DESCRIPTION \fBclm info2\fP is a streamlined and updated version of \fBclm info\fP\&. The latter outputs a key-value format listing a number of measures\&. In contrast, \fBclm info2\fP only outputs the so-called efficiency criterion, a quality index for networks and clusterings\&. This criterion can be generated for each node independently with the \fB--list\fP option, indicating how well a clustering captures the neighbour distribution of a given node\&. \fBclm info2\fP can utilise threading and job dispatching\&. This may be useful when dealing with very large graphs\&. Multiple clusterings can be supplied on the command-line\&. Output is tabular, each row corresponding with a clustering in the ordering as supplied on the command line\&. Multiple columns will result only if node-wise output is induced with \fB--list\fP\&. By default a single number is produced for each individual clustering: the mean of all node-wise scores for that clustering\&. The \fBefficiency\fP factor is described in [1] (see the \fBREFERENCES\fP section)\&. It tries to balance the dual aims of capturing a lot of edges or edge weights and keeping the cluster footprint or area fraction small\&. The efficiency number has several appealing mathematical properties, cf\&. [1]\&. .SH OPTIONS .ZI 2m "\fB-o\fP fname (\fIoutput file name\fP)" \& .br .in -2m .ZI 2m "\fB-pi\fP f (\fIapply inflation beforehand\fP)" \& .br Apply inflation to the graph matrix and compute the performance measures for the result\&. .in -2m .ZI 2m "\fB-tf\fP (\fItransform input matrix values\fP)" \& .br shared_defopt{-tf} .in -2m .ZI 2m "\fB--list\fP (\fIlist efficiency for all nodes\fP)" \& .br The efficiency scores for all nodes are given on a single line\&. Each clustering specified corresponds to a single line\&. .in -2m .ZI 2m "\fB-cl-tree\fP fname (\fIexpect file with nested clusterings (cone format)\fP)" \& 'in -2m .ZI 2m "\fB-cl-ceil\fP (\fIskip (nested) clusters of size exceeding \fP)" \& 'in -2m 'in +2m \& .br The specified file should contain a hierarchy of nested clusterings such as generated by \fBmclcm\fP\&. The output is then in a special format, undocumented but easy to understand\&. Its purpose is to help cherrypick a single clustering from a tree, in conjunction with the slightly experimental and undocumented program \fBmlmfifofum\fP\&. The measure that is used is very slow to compute for large clusters, and generally it will be outside any interesting range (i\&.e\&. it will be small)\&. Use \fB-cl-ceil\fP to skip clusters exceeding the specified size \- \fBclm info\fP will directly proceed to subclusters if they exist\&. .in -2m .ZI 2m "\fB-cat-max\fP num (\fIdo at most num levels\fP)" \& .br This only has effect when used with \fB-cl-tree\fP\&. \fBclm info\fP will start at the most fine-grained level, working upwards\&. .in -2m .ZI 2m "\fB-t\fP (\fIuse threads\fP)" \& 'in -2m .ZI 2m "\fB-j\fP (\fIthis job has index \fP)" \& 'in -2m .ZI 2m "\fB-J\fP (\fIa total of jobs are used\fP)" \& 'in -2m 'in +2m \& .br For very large graphs (millions of nodes) and clusterings with large clusters it may be helpful to allow this program to use multiple CPUs\&. Additionally it is possible to spread the computation over multiple jobs/machines\&. These three options are described in the \fBclmprotocols\fP manual page\&. The following set of options, if given to as many commands, defines three jobs, each running four threads\&. .di ZV .in 0 .nf \fC -t 4 -J 3 -j 0 -o out\&.0 -t 4 -J 3 -j 1 -o out\&.1 -t 4 -J 3 -j 2 -o out\&.2 .fi \fR .in .di .ne \n(dnu .nf \fC .ZV .fi \fR The output can then be collected with .di ZV .in 0 .nf \fC clxdo add_table out\&.[0-2] .fi \fR .in .di .ne \n(dnu .nf \fC .ZV .fi \fR .in -2m .SH AUTHOR Stijn van Dongen\&. .SH SEE ALSO \fBmclfamily(7)\fP for an overview of all the documentation and the utilities in the mcl family\&. .SH REFERENCES [1] Stijn van Dongen\&. \fIPerformance criteria for graph clustering and Markov cluster experiments\fP\&. Technical Report INS-R0012, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000\&. .br http://www\&.cwi\&.nl/ftp/CWIreports/INS/INS-R0012\&.ps\&.Z