.\" Copyright (c) 2014 Stijn van Dongen .TH "mcx q" 1 "16 May 2014" "mcx q 14-137" "USER COMMANDS " .po 2m .de ZI .\" Zoem Indent/Itemize macro I. .br 'in +\\$1 .nr xa 0 .nr xa -\\$1 .nr xb \\$1 .nr xb -\\w'\\$2' \h'|\\n(xau'\\$2\h'\\n(xbu'\\ .. .de ZJ .br .\" Zoem Indent/Itemize macro II. 'in +\\$1 'in +\\$2 .nr xa 0 .nr xa -\\$2 .nr xa -\\w'\\$3' .nr xb \\$2 \h'|\\n(xau'\\$3\h'\\n(xbu'\\ .. .if n .ll -2m .am SH .ie n .in 4m .el .in 8m .. .SH NAME mcxquery \- compute simple graph statistics .SH SYNOPSIS mcxquery is not in actual fact a program\&. This manual page documents the behaviour and options of the mcx program when invoked in mode \fIquery\fP\&. The options \fB-h\fP, \fB--apropos\fP, \fB--version\fP, \fB-set\fP, \fB--nop\fP, \fB-progress\fP\ \&\fI\fP are accessible in all \fBmcx\fP modes\&. They are described in the \fBmcx\fP manual page\&. \fBmcxquery\fP \fB[-abc\fP (\fIspecify label input\fP)\fB]\fP \fB[-imx\fP (\fIspecify matrix input\fP)\fB]\fP \fB[-o\fP (\fIoutput file name\fP)\fB]\fP \fB[-tab\fP (\fIuse tab file\fP)\fB]\fP \fB[--node-attr\fP (\fIoutput node degree and weight attributes\fP)\fB]\fP \fB[-vary-threshold\fP (\fIanalyze graph at similarity cutoffs\fP)\fB]\fP \fB[-vary-knn\fP (\fIanalyze graph for varying k-NN\fP)\fB]\fP \fB[-vary-ceil\fP (\fIanalyze graph for varying ceil reductions\fP)\fB]\fP \fB[--no-legend\fP (\fIdo not output explanatory legend\fP)\fB]\fP \fB[--reduce\fP (\fIuse reduced matrix\fP)\fB]\fP \fB[--test-metric\fP (\fItest whether graph distance is metric\fP)\fB]\fP \fB[--test-cycle\fP (\fItest whether graph contains cycles\fP)\fB]\fP \fB[-test-cycle\fP (\fItest cycles, report cycles\fP)\fB]\fP \fB[--vary-correlation\fP (\fIanalyze graph at correlation cutoffs\fP)\fB]\fP \fB[--clcf\fP (\fIinclude clustering coefficient analysis\fP)\fB]\fP \fB[--eff\fP (\fIinclude efficiency criterion\fP)\fB]\fP \fB[-div\fP (\fIcluster size separating value\fP)\fB]\fP \fB[--dim\fP (\fIreport native format and dimensions\fP)\fB]\fP \fB[--values\fP (\fIoutput all arc entries/weights, unsorted\fP)\fB]\fP \fB[--values-sorted\fP (\fIoutput all entries/weights, sorted\fP)\fB]\fP \fB[-values-hist\fP (\fIweight histogram\fP)\fB]\fP \fB[-degrees-hist\fP (\fIdegrees histogram\fP)\fB]\fP \fB[--output-table\fP (\fIoutput logical tab separated table without key\fP)\fB]\fP \fB[-t\fP (\fInumber of threads to use\fP)\fB]\fP \fB[-icl\fP (\fIinput clustering\fP)\fB]\fP \fB[-tf\fP spec (\fIapply tf-spec to input matrix\fP)\fB]\fP \fB[-h\fP (\fIprint synopsis, exit\fP)\fB]\fP \fB[--apropos\fP (\fIprint synopsis, exit\fP)\fB]\fP \fB[--version\fP (\fIprint version, exit\fP)\fB]\fP .SH DESCRIPTION The default \fBmcxquery\fP output is a list of summary statistics for each node\&. These are its node degree, the mean, minimum, maximum and median edge weight\&. If supplied with a clustering, the output will additionally list the cluster size and cluster label for each node\&. Additionally, \fBmcxquery\fP can be used to analyse a graph at different similarity cutoffs or at varying parameters of edge reduction strategies such as mutual nearest neighbour reduction\&. Attributes supplied across different thresholds are the number of connected components, the number of singletons, and statistics (median, average, iqr) on node degrees and edge weights\&. Typically this is done on a graph constructed using a very permissive threshold\&. For example, one can create a graph from array expression data using \fBmcxarray\fP with a very low pearson correlation cutoff such as\ \&0\&.5 Then \fBmcxquery\fP can be used to analyze the graph at increasingly stringent thresholds of\ \&0\&.50, 0\&.55, 0\&.60\ \&\&.\&.\ \&0\&.95\&. Other tasks that \fBmcxquery\fP be used for include: .ZI 2m "\(bu" Produce a histogram of edge weights\&. .in -2m .ZI 2m "\(bu" Produce a histogram of edge node degrees\&. .in -2m .ZI 2m "\(bu" Output all edge weights\&. .in -2m .ZI 2m "\(bu" Test whether the graph weight encodes a metric (for edge weights that encode distances rather than similarites)\&. .in -2m .ZI 2m "\(bu" Test whether the graph has a cycle\&. .in -2m .SH OPTIONS .ZI 2m "\fB-abc\fP (\fIlabel input\fP)" \& .br The file name for input that is in label format\&. .in -2m .ZI 2m "\fB-imx\fP (\fIinput matrix\fP)" \& .br The file name for input that is in mcl native matrix format\&. .in -2m .ZI 2m "\fB-o\fP (\fIoutput file name\fP)" \& .br Set the name of the file where output should be written to\&. .in -2m .ZI 2m "\fB-tab\fP (\fIuse tab file\fP)" \& .br This option causes the output to be printed with the labels found in the tab file\&. .in -2m .ZI 2m "\fB--dim\fP (\fIreport native format and dimensions\fP)" \& .br This will report the matrix format (either interchange or binary) and the matrix dimensions\&. For a graph the two reported dimensions should be equal\&. .in -2m .ZI 2m "\fB--values\fP (\fIoutput all entries/weights, unsorted\fP)" \& 'in -2m .ZI 2m "\fB--values-sorted\fP (\fIoutput all entries/weights, sorted\fP)" \& 'in -2m .ZI 2m "\fB-values-hist\fP (\fIoutput weight histogram\fP)" \& 'in -2m .ZI 2m "\fB-values-hist\fP (\fIoutput weight histogram\fP)" \& 'in -2m .ZI 2m "\fB-degrees-hist\fP (\fIdegrees histogram\fP)" \& 'in -2m 'in +2m \& .br These options are fairly self-documenting\&. The result of both \fB-edges-hist\fP and \fB-degrees-hist\fP is a tab separated table of bin offsets and bin counts\&. When using \fB-edges-hist\fP\ \&\fI\fP the program will create a histogram ranging from the smallest to the largest edge weight\&. .in -2m .ZI 2m "\fB--output-table\fP (\fIoutput logical tab separated table without key\fP)" \& .br This option causes table output such as provided by \fB--vary-correlation\fP to be output in a logical tab-separated format rather than pretty-printed\&. .in -2m .ZI 2m "\fB-vary-threshold\fP (\fIanalyze graphs at similarity cutoffs\fP)" \& .br The graph is analysed at different edge weight thresholds, going from \fB\fP to \fB\fP in \fB\fP steps\&. .in -2m .ZI 2m "\fB--vary-correlation\fP (\fIanalyze graphs at correlation cutoffs\fP)" \& .br This instructs \fBmcxquery\fP to use a threshold list suitable for use with graphs in which the edge weight similarities are correlations\&. The list starts at 0\&.2 and ends at 1\&.0 using increments of 0\&.05\&. If a different start or increment is required it can be achieved by using the \fB-vary-threshold\fP option\&. For example, a start of\ \&0\&.10 and an increment of\ \&0\&.02 are obtained by issuing \fB-vary-threshold\fP\ \&\fB:\&.1/1\&.0/45\fP\&. .in -2m .ZI 2m "defopt{--no-legend}{do not output explanatory legend}" \& .br For a fully parseable output format use \fB--output-table\fP\&. .in -2m .ZI 2m "\fB--clcf\fP (\fIinclude clustering coefficient analysis\fP)" \& 'in -2m .ZI 2m "\fB--eff\fP (\fIinclude efficiency criterion\fP)" \& 'in -2m 'in +2m \& .br These options can be used to compute additional characteristics in the analysis of thresholded graphs with \fB--vary-correlation\fP and \fB-vary-threshold\fP\&. For large graphs these are relatively time-consuming to compute\&. More information and a reference for the efficiency criterion can be found in \fBclminfo(1)\fP\&. .in -2m .ZI 2m "\fB-vary-knn\fP (\fIanalyze graphs for varying k-NN\fP)" \& 'in -2m .ZI 2m "\fB-vary-ceil\fP (\fIanalyze graphs for varying ceil reductions\fP)" \& 'in -2m .ZI 2m "\fB--reduce\fP (\fIuse reduced matrix\fP)" \& 'in -2m 'in +2m \& .br These options cause analysis of a graph as it is subjected to reductions across a range of parameters\&. Refer to \fBmcxio(5)\fP for a description of these reductions\&. The analyses starts at the \fIend\fP argument, and progresses towards the \fIstart\fP argument using decrements of size \fIstep\fP\&. By default the reduction is always computed relative to the start matrix, i\&.e\&. the input matrix after \fB-tf\fP transformations have optionally been applied\&. Specifying \fB--reduce\fP causes this to change so that each new reduction is calculated relative to the reduction just computed\&. For graphs with ties among edge weights it may be useful to use \fB-tf\fP\ \&\fB\&'#tug()\&'\fP\&. This will add small perturbations to the edge weights and have the effect of breaking ties\&. By default perturbations are computed using the cosine between the vectors of neighbours of the two nodes incident to an edge\&. This can be changed to a random perturbation with \fB-tf\fP\ \&\fB\&'#rug()\&'\fP\&. .in -2m .ZI 2m "\fB--test-cycle\fP (\fItest whether graph contains cycles\fP)" \& 'in -2m .ZI 2m "\fB-test-cycle\fP (\fItest cycles, report cyclees\fP)" \& 'in -2m 'in +2m \& .br Test whether the input graph contains cycles\&. With the second option nodes that are part of a cycle are output, up to a maximum of \fI\fP nodes\&. Use \fI\fP=\fB-1\fP to output all such nodes\&. .in -2m .ZI 2m "\fB--test-metric\fP (\fItest whether graph distance is metric\fP)" \& .br This tests all possible triangle relationships\&. .in -2m .ZI 2m "\fB-div\fP (\fIcluster size separating value\fP)" \& .br When analyzing graphs at different thresholds with one of the options above, \fBmcxquery\fP reports the percentage of nodes contained in clusters not exceeding a specified size, by default\ \&3\&. This number can be changed using the \fB-div\fP option\&. .in -2m .ZI 2m "\fB-tf\fP (\fItransform input matrix values\fP)" \& .br Transform the input matrix values according to the syntax described in \fBmcxio(5)\fP\&. .in -2m .ZI 2m "\fB-t\fP (\fInumber of threads to use\fP)" \& .br This has an effect only when using the \fB-vary-knn\fP option, and is only useful on multi-CPU machines\&. .in -2m .ZI 2m "\fB--node-attr\fP (\fIoutput node degree and weight attributes\fP)" \& .br Output is in the form of a tab separated file\&. The option \fB-icl\fP can be used in conjuction\&. .in -2m .ZI 2m "\fB-icl\fP (\fIinput clustering\fP)" \& .br Output for each node the size of the cluster it is in\&. This option can be used in conjunction with \fB--node-attr\fP\&. .in -2m .SH SEE ALSO \fBmcxio(5)\fP, and \fBmclfamily(7)\fP for an overview of all the documentation and the utilities in the mcl family\&.