.\" Copyright (c) 2022 Stijn van Dongen .TH "clm info" 1 "9 Oct 2022" "clm info 22-282" "USER COMMANDS " .po 2m .de ZI .\" Zoem Indent/Itemize macro I. .br 'in +\\$1 .nr xa 0 .nr xa -\\$1 .nr xb \\$1 .nr xb -\\w'\\$2' \h'|\\n(xau'\\$2\h'\\n(xbu'\\ .. .de ZJ .br .\" Zoem Indent/Itemize macro II. 'in +\\$1 'in +\\$2 .nr xa 0 .nr xa -\\$2 .nr xa -\\w'\\$3' .nr xb \\$2 \h'|\\n(xau'\\$3\h'\\n(xbu'\\ .. .if n .ll -2m .am SH .ie n .in 4m .el .in 8m .. .SH NAME clm_info \- compute performance measures for graphs and clusterings\&. clminfo is not in actual fact a program\&. This manual page documents the behaviour and options of the clm program when invoked in mode \fIinfo\fP\&. The options \fB-h\fP, \fB--apropos\fP, \fB--version\fP, \fB-set\fP, \fB--nop\fP are accessible in all \fBclm\fP modes\&. They are described in the \fBclm\fP manual page\&. .SH SYNOPSIS \fBclm info\fP [options] * \fBclm info\fP \fB[-o\fP fname (\fIwrite to file \fBfname\fP\fP)\fB]\fP \fB[-pi\fP f (\fIapply inflation beforehand\fP)\fB]\fP \fB[-tf\fP spec (\fIapply tf-spec to input matrix\fP)\fB]\fP \fB[-cl-tree\fP fname (\fIexpect file with nested clusterings\fP)\fB]\fP \fB[-cat-max\fP num (\fIdo at most \fBnum\fP tree levels\fP)\fB]\fP \fB[-cl-ceil\fP (\fIskip clusters of size exceeding \fP)\fB]\fP \fB[--node-self-measures\fP (\fIdump measure for native cluster\fP)\fB]\fP \fB[--node-all-measures\fP (\fIdump measure for incident cluster\fP)\fB]\fP \fB[-h\fP (\fIprint synopsis, exit\fP)\fB]\fP \fB[--apropos\fP (\fIprint synopsis, exit\fP)\fB]\fP \fB[--version\fP (\fIprint version, exit\fP)\fB]\fP * .SH DESCRIPTION \fBclm info\fP computes several numbers indicative for the efficiency with with a clustering captures the edge mass of a given graph\&. Use it in conjunction with \fBclm dist\fP to determine which clusterings you accept\&. See the EXAMPLES section in \fBclm dist\fP for an example of \fBclm dist\fP and \fBclm info\fP (and \fBclm meet\fP) usage\&. Output can be generated for multiple clusterings at the same time\&. The \fBefficiency\fP factor is described in [1] (see the \fBREFERENCES\fP section)\&. It tries to balance the dual aims of capturing a lot of edges or edge weights and keeping the cluster footprint or area fraction small\&. The efficiency number has several appealing mathematical properties, cf\&. [1]\&. It is related to, but not derivable from, the second and third numbers, the \fImass fraction\fP and the \fIarea fraction\fP\&. The \fBmass fraction\fP is defined as follows\&. Let \fBe\fP be an edge of the graph\&. The clustering \fIcaptures\fP \fBe\fP if the two nodes associated with \fBe\fP are in the same cluster\&. Now the mass fraction is the joint weight of all captured edges divided by the joint weight of all edges in the input graph\&. The \fBarea fraction\fP is roughly the sum of the squares of all cluster sizes for all clusters in the clustering, divided by the square of the number of nodes in the graph\&. It says \fIroughly\fP, because the actual formula uses the quantity \fBN\fP*(\fBN-1\fP) wherever it says square (of \fBN\fP) above\&. A low/high area fraction indicates a fine-grained/coarse clustering\&. .SH OPTIONS .ZI 2m "\fB-o\fP fname (\fIoutput file name\fP)" \& .br .in -2m .ZI 2m "\fB-pi\fP f (\fIapply inflation beforehand\fP)" \& .br Apply inflation to the graph matrix and compute the performance measures for the result\&. .in -2m .ZI 2m "\fB-tf\fP (\fItransform input matrix values\fP)" \& .br shared_defopt{-tf} .in -2m .ZI 2m "\fB-cl-tree\fP fname (\fIexpect file with nested clusterings (cone format)\fP)" \& 'in -2m .ZI 2m "\fB-cl-ceil\fP (\fIskip (nested) clusters of size exceeding \fP)" \& 'in -2m 'in +2m \& .br The specified file should contain a hierarchy of nested clusterings such as generated by \fBmclcm\fP\&. The output is then in a special format, undocumented but easy to understand\&. Its purpose is to help cherrypick a single clustering from a tree, in conjunction with the slightly experimental and undocumented program \fBmlmfifofum\fP\&. The measure that is used is very slow to compute for large clusters, and generally it will be outside any interesting range (i\&.e\&. it will be small)\&. Use \fB-cl-ceil\fP to skip clusters exceeding the specified size \- \fBclm info\fP will directly proceed to subclusters if they exist\&. .in -2m .ZI 2m "\fB-cat-max\fP num (\fIdo at most num levels\fP)" \& .br This only has effect when used with \fB-cl-tree\fP\&. \fBclm info\fP will start at the most fine-grained level, working upwards\&. .in -2m .ZI 2m "\fB--node-all-measures\fP (\fIdump node-wise criteria for all incident clusters\fP)" \& 'in -2m .ZI 2m "\fB--node-self-measures\fP (\fIdump node-wise criteria for native cluster\fP)" \& 'in -2m 'in +2m \& .br These options return a key-value based format, with the meaning of the keys as follows\&. .di ZV .in 0 .nf \fC nm file name (redundant unless multiple cluster files are provided) ni node index ci cluster index nn number of neighbours of this node (constant for a give node) nc cluster size (constant for a given cluster) ef efficiency for this node/cluster combination em max-efficiency for this node/cluster combination mf mass fraction: percentage of edge weights for this node in this cluster ma total mass of edge weights for this node in this cluster xn number of neighbours of the node that are not in the cluster xc number of nodes in the cluster that are not a neighbour of the node ns number of neighbours of the node that are also in this cluster ti the maximum of the edge weights for neighbours of this node that are in this cluster to the maximum of the edge weights for neighbours of this node that are NOT in this cluster al (alien) 1 if the node is not native to the cluster, 0 if the node is native .fi \fR .in .di .ne \n(dnu .nf \fC .ZV .fi \fR .in -2m .SH AUTHOR Stijn van Dongen\&. .SH SEE ALSO \fBmclfamily(7)\fP for an overview of all the documentation and the utilities in the mcl family\&. .SH REFERENCES [1] Stijn van Dongen\&. \fIPerformance criteria for graph clustering and Markov cluster experiments\fP\&. Technical Report INS-R0012, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000\&. .br http://www\&.cwi\&.nl/ftp/CWIreports/INS/INS-R0012\&.ps\&.Z