- bullseye 1:2.025-1
- testing 1:2.080-3+b1
- unstable 1:2.081-1
- experimental 1:2.081-1~exp1

PARALLELCPU(1p) | User Contributed Perl Documentation | PARALLELCPU(1p) |

# NAME¶

PDL::ParallelCPU - Parallel processor multi-threading support in PDL

# DESCRIPTION¶

PDL has support for splitting up numerical processing between
multiple parallel processor threads (or pthreads) using the
*set_autopthread_targ* and *set_autopthread_size* functions. This
can improve processing performance (by greater than 2-4X in most cases) by
taking advantage of multi-core and/or multi-processor machines.

As of 2.059, "online_cpus" in PDL::Core is used to set the number of threads used if "PDL_AUTOPTHREAD_TARG" is not set.

# SYNOPSIS¶

use PDL; # Set target of 4 parallel pthreads to create, with a lower limit of # 5Meg elements for splitting processing into parallel pthreads. set_autopthread_targ(4); set_autopthread_size(5); $x = zeroes(5000,5000); # Create 25Meg element array $y = $x + 5; # Processing will be split up into multiple pthreads # Get the actual number of pthreads for the last # processing operation. $actualPthreads = get_autopthread_actual(); # Or compare these to see CPU usage (first one only 1 pthread, second one 10) # in the PDL shell: $x = ones(10,1000,10000); set_autopthread_targ(1); $y = sin($x)*cos($x); p get_autopthread_actual; $x = ones(10,1000,10000); set_autopthread_targ(10); $y = sin($x)*cos($x); p get_autopthread_actual;

# Terminology¶

To reduce the confusion that existed in PDL before 2.075, this
document uses **pthreading** to refer to *processor
multi-threading*, which is the use of multiple processor threads to split
up numerical processing into parallel operations.

# Functions that control PDL pthreads¶

This is a brief listing and description of the PDL pthreading functions, see the PDL::Core docs for detailed information.

- set_autopthread_targ
- Set the target number of processor-threads (pthreads) for multi-threaded
processing. Setting auto_pthread_targ to 0 means that no pthreading will
occur.
See PDL::Core for details.

- set_autopthread_size
- Set the minimum size (in Meg-elements or 2**20 elements) of the largest
PDL involved in a function where auto-pthreading will be performed. For
small PDLs, it probably isn't worth starting multiple pthreads, so this
function is used to define a minimum threshold where auto-pthreading won't
be attempted.
See PDL::Core for details.

- get_autopthread_actual
- Get the actual number of pthreads executed for the last pdl processing
function.
See PDL::get_autopthread_actual for details.

# Global Control of PDL pthreading using Environment Variables¶

PDL pthreading can be globally turned on, without modifying
existing code by setting environment variables **PDL_AUTOPTHREAD_TARG**
and **PDL_AUTOPTHREAD_SIZE** before running a PDL script. These
environment variables are checked when PDL starts up and calls to
*set_autopthread_targ* and *set_autopthread_size* functions made
with the environment variable's values.

For example, if the environment var **PDL_AUTOPTHREAD_TARG** is
set to 3, and **PDL_AUTOPTHREAD_SIZE** is set to 10, then any pdl script
will run as if the following lines were at the top of the file:

set_autopthread_targ(3); set_autopthread_size(10);

# How It Works¶

The auto-pthreading process works by analyzing broadcast array dimensions in PDL operations (those above the operation's "signature" dimensions) and splitting up processing according to those and the desired number of pthreads (i.e. the pthread target or pthread_targ). The offsets, increments, and dimension-sizes (in case the whole dimension does not divide neatly by the number of pthreads) that PDL uses to step thru the data in memory are modified for each pthread so each one sees a different set of data when performing processing.

**Example**

$x = sequence(20,4,3); # Small 3-D Array, size 20,4,3 # Setup auto-pthreading: set_autopthread_targ(2); # Target of 2 pthreads set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded # This will be split up into 2 pthreads $c = maximum($x);

For the above example, the *maximum* function has a signature
of "(a(n); [o]c())", which means that the
first dimension of $x (size 20) is a *Core*
dimension of the *maximum* function. The other dimensions of
$x (size 4,3) are *broadcast* dimensions (i.e.
will be broadcasted-over in the *maximum* function.

The auto-pthreading algorithm examines the broadcasted dims of
size (4,3) and picks the 4 dimension, since it is evenly divisible by the
autopthread_targ of 2. The processing of the maximum function is then split
into two pthreads on the size-4 dimension, with dim indexes 0,2 processed by
one pthread

and dim indexes 1,3 processed by the other pthread.

# Limitations¶

## Must have POSIX Threads Enabled¶

Auto-pthreading only works if your PDL installation was compiled with POSIX threads enabled. This is normally the case if you are running on Windows, Linux, MacOS X, or other unix variants.

## Non-Threadsafe Code¶

Not all the libraries that PDL intefaces to are thread-safe, i.e.
they aren't written to operate in a multi-threaded environment without
crashing or causing side-effects. Some examples in the PDL core is the
*fft* function and the *pnmout* functions.

To operate properly with these types of functions, the PPCode flag
**NoPthread** has been introduced to indicate a function as *not*
being pthread-safe. See PDL::PP docs for details.

## Size of PDL Dimensions and pthread Target¶

As of PDL 2.058, the broadcasted dimension sizes do not need to divide exactly by the pthread target, although if one does, it will be used.

If no dimension is as large as the pthread target, the number of pthreads will be the size of the largest broadcasted dimension.

In order to minimise idle CPUs on the last iteration at the end of
the broadcasted dimension, the algorithm that picks the dimension to pthread
on aims for the largest remainder in dividing the pthread target into the
sizes of the broadcasted dimensions. For example, if a PDL has broadcasted
dimension sizes of (9,6,2) and the *auto_pthread_targ* is 4, the
algorithm will pick the 1-th (size 6), as that will leave a remainder of 2
(leaving 2 idle at the end) in preference to one with size 9, which would
leave 3 idle.

## Speed improvement might be less than you expect.¶

If you have an 8-core machine and call *auto_pthread_targ*
with 8 to generate 8 parallel pthreads, you probably won't get a 8X
improvement in speed, due to memory bandwidth issues. Even though you have 8
separate CPUs crunching away on data, you will have (for most common machine
architectures) common RAM that now becomes your bottleneck. For simple
calculations (e.g simple additions) you can run into a performance limit at
about 4 pthreads. For more CPU-bound calculations the limit will be
higher.

# COPYRIGHT¶

Copyright 2011 John Cerney. You can distribute and/or modify this document under the same terms as the current Perl license.

2022-10-26 | perl v5.36.0 |