.\" -*- mode: troff; coding: utf-8 -*- .\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. .ie n \{\ . ds C` "" . ds C' "" 'br\} .el\{\ . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "Stats 3pm" .TH Stats 3pm 2024-03-07 "perl v5.38.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH NAME PDL::Stats \- a collection of statistics modules in Perl Data Language, with a quick\-start guide for non\-PDL people. .SH DESCRIPTION .IX Header "DESCRIPTION" Loads modules named below, making the functions available in the current namespace. .PP Properly formatted documentations online at http://pdl\-stats.sf.net .SH SYNOPSIS .IX Header "SYNOPSIS" .Vb 2 \& use PDL::LiteF; # loads less modules \& use PDL::NiceSlice; # preprocessor for easier pdl indexing syntax \& \& use PDL::Stats; \& \& # Is equivalent to the following: \& \& use PDL::Stats::Basic; \& use PDL::Stats::GLM; \& use PDL::Stats::Kmeans; \& use PDL::Stats::TS; \& \& # and the following if installed; \& \& use PDL::Stats::Distr; \& use PDL::GSL::CDF; .Ve .SH "QUICK-START FOR NON-PDL PEOPLE" .IX Header "QUICK-START FOR NON-PDL PEOPLE" Enjoy PDL::Stats without having to dive into PDL, just wet your feet a little. Three key words two concepts and an icing on the cake, you should be well on your way there. .SS pdl .IX Subsection "pdl" The magic word that puts PDL::Stats at your disposal. pdl creates a PDL numeric data object (a pdl, or ndarray) from perl array or array ref. All PDL::Stats methods, unless meant for regular perl array, can then be called from the data object. .PP .Vb 1 \& my @y = 0..5; \& \& my $y = pdl @y; \& \& # a simple function \& \& my $stdv = $y\->stdv; \& \& # you can skip the intermediate $y \& \& my $stdv = stdv( pdl @y ); \& \& # a more complex method, skipping intermediate $y \& \& my @x1 = qw( y y y n n n ); \& my @x2 = qw( 1 0 1 0 1 0 ) \& \& # do a two\-way analysis of variance with y as DV and x1 x2 as IVs \& \& my %result = pdl(@y)\->anova( \e@x1, \e@x2 ); \& print "$_\et$result{$_}\en" for (sort keys %result); .Ve .PP If you have a list of list, ie array of array refs, pdl will create a multi-dimensional data object. .PP .Vb 1 \& my @a = ( [1,2,3,4], [0,1,2,3], [4,5,6,7] ); \& \& my $a = pdl @a; \& \& print $a . $a\->info; \& \& # here\*(Aqs what you will get \& \& [ \& [1 2 3 4] \& [0 1 2 3] \& [4 5 6 7] \& ] \& PDL: Double D [4,3] .Ve .PP PDL::Stats puts observations in the first dimension and variables in the second dimension, ie pdl [obs, var]. In PDL::Stats the above example represents 4 observations on 3 variables. .PP .Vb 1 \& # you can do all kinds of fancy stuff on such a 2D pdl. \& \& my %result = $a\->kmeans( {NCLUS=>2} ); \& print "$_\et$result{$_}\en" for (sort keys %result); .Ve .PP Make sure the array of array refs is rectangular. If the array refs are of unequal sizes, pdl will pad it out with 0s to match the longest list. .SS info .IX Subsection "info" Tells you the data type (yes pdls are typed, but you shouldn't have to worry about it here*) and dimensionality of the pdl, as seen in the above example. I find it a big help for my sanity to keep track of the dimensionality of a pdl. As mentioned above, PDL::Stats uses 2D pdl with observation x variable dimensionality. .PP *pdl uses double precision by default. If you are working with things like epoch time, then you should probably use pdl(long, \f(CW@epoch\fR) to maintain the precision. .SS list .IX Subsection "list" Come back to the perl reality from the PDL wonder land. list turns a pdl data object into a regular perl list. Caveat: list produces a flat list. The dimensionality of the data object is lost. .SS Signature .IX Subsection "Signature" This is not a function, but a concept. You will see something like this frequently in the pod: .PP .Vb 1 \& stdv \& \& Signature: (a(n); float+ [o]b()) .Ve .PP The signature tells you what the function expects as input and what kind of output it produces. a(n) means it expects a 1D pdl with n elements; [o] is for output, b() means its a scalar. So stdv will take your 1D list and give back a scalar. float+ you can ignore; but if you insist, it means the output is at float or double precision. The name a or b or c is not important. What's important is the thing in the parenthesis. .PP .Vb 1 \& corr \& \& Signature: (a(n); b(n); float+ [o]c()) .Ve .PP Here the function corr takes two inputs, two 1D pdl with the same numbers of elements, and gives back a scalar. .PP .Vb 1 \& t_test \& \& Signature: (a(n); b(m); float+ [o]t(); [o]d()) .Ve .PP Here the function t_test can take two 1D pdls of unequal size (n==m is certainly fine), and give back two scalars, t\-value and degrees of freedom. Yes we accommodate t\-tests with unequal sample sizes. .PP .Vb 1 \& assign \& \& Signature: (data(o,v); centroid(c,v); byte [o]cluster(o,c)) .Ve .PP Here is one of the most complicated signatures in the package. This is a function from Kmeans. assign takes data of observation x variable dimensions, and a centroid of cluster x variable dimensions, and returns an observation x cluster membership pdl (indicated by 1s and 0s). .PP Got the idea? Then we can see how PDL does its magic :) .SS Threading .IX Subsection "Threading" Another concept. The first thing to know is that, threading is optional. .PP PDL threading means automatically repeating the operation on extra elements or dimensions fed to a function. For a function with a signature like this .PP .Vb 1 \& gsl_cdf_tdist_P \& \& Signature: (double x(); double nu(); [o]out()) .Ve .PP the signatures says that it takes two scalars as input, and returns a scalar as output. If you need to look up the p\-values for a list of t's, with the same degrees of freedom 19, .PP .Vb 1 \& my @t = ( 1.65, 1.96, 2.56 ); \& \& my $p = gsl_cdf_tdist_P( pdl(@t), 19 ); \& \& print $p . "\en" . $p\->info; \& \& # here\*(Aqs what you will get \& \& [0.94231136 0.96758551 0.99042586] \& PDL: Double D [3] .Ve .PP The same function is repeated on each element in the list you provided. If you had different degrees of freedoms for the t's, .PP .Vb 1 \& my @df = (199, 39, 19); \& \& my $p = gsl_cdf_tdist_P( pdl(@t), pdl(@df) ); \& \& print $p . "\en" . $p\->info; \& \& # here\*(Aqs what you will get \& \& [0.94973979 0.97141553 0.99042586] \& PDL: Double D [3] .Ve .PP The df's are automatically matched with the t's to give you the results. .PP An example of threading thru extra dimension(s): .PP .Vb 1 \& stdv \& \& Signature: (a(n); float+ [o]b()) .Ve .PP if the input is of 2D, say you want to compute the stdv for each of the 3 variables, .PP .Vb 1 \& my @a = ( [1,1,3,4], [0,1,2,3], [4,5,6,7] ); \& \& # pdl @a is pdl dim [4,3] \& \& my $sd = stdv( pdl @a ); \& \& print $sd . "\en" . $sd\->info; \& \& # this is what you will get \& \& [ 1.2990381 1.118034 1.118034] \& PDL: Double D [3] .Ve .PP Here the function was given an input with an extra dimension of size 3, so it repeats the stdv operation on the extra dimension 3 times, and gives back a 1D pdl of size 3. .PP Threading works for arbitrary number of dimensions, but it's best to refrain from higher dim pdls unless you have already decided to become a PDL wiz / witch. .PP Not all PDL::Stats methods thread. As a rule of thumb, if a function has a signature attached to it, it threads. .SS perldl .IX Subsection "perldl" Essentially a perl shell with "use PDL;" at start up. Comes with the PDL installation. Very handy to try out pdl operations, or just plain perl. print is shortened to p to avoid injury from excessive typing. \f(CW\*(C`my\*(C'\fR goes out of scope at the end of (multi)line input, so mostly you will have to drop the good practice of my here. .SS "For more info" .IX Subsection "For more info" PDL::Course .SH AUTHOR .IX Header "AUTHOR" ~~~~~~~~~~~~ ~~~~~ ~~~~~~~~ ~~~~~ ~~~ `` ><((("> .PP Copyright (C) 2009\-2015 Maggie J. Xiong .PP All rights reserved. There is no warranty. You are allowed to redistribute this software / documentation as described in the file COPYING in the PDL distribution.