.\" Automatically generated by Pod::Man 4.07 (Pod::Simple 3.32)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.if !\nF .nr F 0
.if \nF>0 \{\
.    de IX
.    tm Index:\\$1\t\\n%\t"\\$2"
..
.    if !\nF==2 \{\
.        nr % 0
.        nr F 2
.    \}
.\}
.\" ========================================================================
.\"
.IX Title "PARALLELCPU 1p"
.TH PARALLELCPU 1p "2016-10-10" "perl v5.24.1" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
PDL::ParallelCPU \- Parallel Processor MultiThreading Support in PDL (Experimental)
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
\&\s-1PDL\s0 has support (currently experimental) for splitting up numerical processing
between multiple parallel processor threads (or pthreads) using the \fIset_autopthread_targ\fR
and \fIset_autopthread_size\fR functions.
This can improve processing performance (by greater than 2\-4X in most cases)
by taking advantage of multi-core and/or multi-processor machines.
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 1
\&  use PDL;
\&  
\&  # Set target of 4 parallel pthreads to create, with a lower limit of
\&  #  5Meg elements for splitting processing into parallel pthreads.
\&  set_autopthread_targ(4);
\&  set_autopthread_size(5);
\&  
\&  $a = zeroes(5000,5000); # Create 25Meg element array
\&  
\&  $b = $a + 5; # Processing will be split up into multiple pthreads
\&  
\&  # Get the actual number of pthreads for the last
\&  #  processing operation.
\&  $actualPthreads = get_autopthread_actual();
.Ve
.SH "Terminology"
.IX Header "Terminology"
The use of the term \fIthreading\fR can be confusing with \s-1PDL,\s0 because it can refer to \fI\s-1PDL\s0 threading\fR,
as defined in the PDL::Threading docs, or to \fIprocessor multi-threading\fR.
.PP
To reduce confusion with the existing \s-1PDL\s0 threading terminology, this document uses 
\&\fBpthreading\fR to refer to \fIprocessor multi-threading\fR, which is the use of multiple processor threads
to split up numerical processing into parallel operations.
.SH "Functions that control PDL PThreads"
.IX Header "Functions that control PDL PThreads"
This is a brief listing and description of the \s-1PDL\s0 pthreading functions, see the PDL::Core docs
for detailed information.
.IP "set_autopthread_targ" 5
.IX Item "set_autopthread_targ"
Set the target number of processor-threads (pthreads) for multi-threaded processing. Setting auto_pthread_targ
to 0 means that no pthreading will occur.
.Sp
See PDL::Core for details.
.IP "set_autopthread_size" 5
.IX Item "set_autopthread_size"
Set the minimum size (in Meg-elements or 2**20 elements) of the largest \s-1PDL\s0 involved in a function where auto-pthreading will
be performed. For small PDLs, it probably isn't worth starting multiple pthreads, so this function
is used to define a minimum threshold where auto-pthreading won't be attempted.
.Sp
See PDL::Core for details.
.IP "get_autopthread_actual" 5
.IX Item "get_autopthread_actual"
Get the actual number of pthreads executed for the last pdl processing function.
.Sp
See PDL::get_autopthread_actual for details.
.SH "Global Control of PDL PThreading using Environment Variables"
.IX Header "Global Control of PDL PThreading using Environment Variables"
\&\s-1PDL\s0 PThreading can be globally turned on, without modifying existing code by setting 
environment variables \fB\s-1PDL_AUTOPTHREAD_TARG\s0\fR and \fB\s-1PDL_AUTOPTHREAD_SIZE\s0\fR before running a \s-1PDL\s0 script.
These environment variables are checked when \s-1PDL\s0 starts up and calls to \fIset_autopthread_targ\fR and
\&\fIset_autopthread_size\fR functions made with the environment variable's values.
.PP
For example, if the environment var \fB\s-1PDL_AUTOPTHREAD_TARG\s0\fR is set to 3, and \fB\s-1PDL_AUTOPTHREAD_SIZE\s0\fR is
set to 10, then any pdl script will run as if the following lines were at the top of the file:
.PP
.Vb 2
\& set_autopthread_targ(3);
\& set_autopthread_size(10);
.Ve
.SH "How It Works"
.IX Header "How It Works"
The auto-pthreading process works by analyzing threaded array dimensions in \s-1PDL\s0 operations
and splitting up processing based on the thread dimension sizes and desired number of 
pthreads (i.e. the pthread target or pthread_targ). The offsets and increments that \s-1PDL\s0 uses to step
thru the data in memory are modified for each pthread so each one sees a different set of data when
performing processing.
.PP
\&\fBExample\fR
.PP
.Vb 1
\& $a = sequence(20,4,3); # Small 3\-D Array, size 20,4,3
\& 
\& # Setup auto\-pthreading:
\& set_autopthread_targ(2); # Target of 2 pthreads
\& set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded
\&
\& # This will be split up into 2 pthreads
\& $c = maximum($a);
.Ve
.PP
For the above example, the \fImaximum\fR function has a signature of \f(CW\*(C`(a(n); [o]c())\*(C'\fR, which means that the first
dimension of \f(CW$a\fR (size 20) is a \fICore\fR dimension of the \fImaximum\fR function. The other dimensions of \f(CW$a\fR (size 4,3)
are \fIthreaded\fR dimensions (i.e. will be threaded-over in the \fImaximum\fR function.
.PP
The auto-pthreading algorithm examines the threaded dims of size (4,3) and picks the 4 dimension, 
since it is evenly divisible by the autopthread_targ of 2. The processing of the maximum function is then 
split into two pthreads on the size\-4 dimension, with dim indexes 0,2 processed by one pthread
 and dim indexes 1,3 processed by the other pthread.
.SH "Limitations"
.IX Header "Limitations"
.SS "Must have \s-1POSIX\s0 Threads Enabled"
.IX Subsection "Must have POSIX Threads Enabled"
Auto-PThreading only works if your \s-1PDL\s0 installation was compiled with \s-1POSIX\s0 threads enabled. This is normally
the case if you are running on linux, or other unix variants.
.SS "Non-Threadsafe Code"
.IX Subsection "Non-Threadsafe Code"
Not all the libraries that \s-1PDL\s0 intefaces to are thread-safe, i.e. they aren't written to operate
in a multi-threaded environment without crashing or causing side-effects. Some examples in the \s-1PDL\s0
core is the \fIfft\fR function and the \fIpnmout\fR functions.
.PP
To operate properly with these types of functions, the PPCode flag \fBNoPthread\fR has been introduced to indicate
a function as \fInot\fR being pthread-safe. See \s-1PDL::PP\s0 docs for details.
.SS "Size of \s-1PDL\s0 Dimensions and PThread Target"
.IX Subsection "Size of PDL Dimensions and PThread Target"
Due to the way a \s-1PDL\s0 is split-up for operation using multiple pthreads, the size of a dimension
must be evenly divisible by the pthread target. For example, if a \s-1PDL\s0 has threaded dimension sizes
of (4,3,3) and the \fIauto_pthread_targ\fR has been set to 2, then the first threaded dimension (size 4) will
be picked to be split up into two pthreads of size 2 and 2. However, if the threaded dimension sizes are
(3,3,3) and the \fIauto_pthread_targ\fR is still 2, then pthreading won't occur, because no threaded dimensions
are divisible by 2.
.PP
The algorithm that picks the actual number of pthreads has some smarts (but could probably be improved) 
to adjust down from the \fIauto_pthread_targ\fR to get a number of pthreads that can evenly divide one of the
threaded dimensions. For example, if a \s-1PDL\s0 has threaded dimension sizes of (9,2,2) and the
\&\fIauto_pthread_targ\fR is 4, the algorithm will see that no dimension is divisible by 4, then adjust
down the target to 3, resulting in splitting up the first threaded dimension (size 9) into 3 pthreads.
.SS "Speed improvement might be less than you expect."
.IX Subsection "Speed improvement might be less than you expect."
If you have a 8 core machine and call \fIauto_pthread_targ\fR with 8 to generate 8 parallel pthreads, you
probably won't get a 8X improvement in speed, due to memory bandwidth issues. Even though you have 8 separate
CPUs crunching away on data, you will have (for most common machine architectures) common \s-1RAM\s0 that now becomes
your bottleneck. For simple calculations (e.g simple additions) you can run into a performance limit at about
 4 pthreads. For more complex calculations the limit will be higher.
.SH "COPYRIGHT"
.IX Header "COPYRIGHT"
Copyright 2011 John Cerney. You can distribute and/or
modify this document under the same terms as the current Perl license.
.PP
See: http://dev.perl.org/licenses/