.TH "Parmap" 3 2014-10-09 OCamldoc ""
.SH NAME
Parmap \- Module Parmap: efficient parallel map, fold and mapfold on lists and arrays on multicores.
.SH Module
Module   Parmap
.SH Documentation
.sp
Module
.BI "Parmap"
 : 
.B sig  end

.sp
Module 
.B Parmap
: efficient parallel map, fold and mapfold on lists and 
arrays on multicores\&. 
.sp
All the primitives allow to control the granularity of the parallelism 
via an optional parameter 
.B chunksize
: if 
.B chunksize
is omitted, the
input sequence is split evenly among the available cores; if 
.B chunksize
is specified, the input data is split in chunks of size 
.B chunksize
and
dispatched to the available cores using an on demand strategy that 
ensures automatic load balancing\&.
.sp
A specific primitive 
.B array_float_parmap
is provided for fast operations on float arrays\&.
.sp

.sp

.sp
.sp

.PP

.B === 
.B Setting and getting the default value for ncores 
.B  ===

.PP

.I val set_default_ncores 
: 
.B int -> unit
.sp

.sp

.I val get_default_ncores 
: 
.B unit -> int
.sp

.sp

.PP

.B === 
.B Sequence type, subsuming lists and arrays
.B  ===

.PP
.I type 
.B 'a
.I sequence 
=
 | L
.B of 
.B 'a list
 | A
.B of 
.B 'a array
 
.sp

.sp

.PP

.B === The parmapfold, parfold and parmap generic functions, for efficiency reasons,
.B     convert the input data into an array internally, so we provide the \&'a sequence type
.B     to allow passing an array directly as input\&.
.B     If you want to perform a parallel map operation on an array, use array_parmap or array_float_parmap instead\&. ===

.PP

.PP

.B === 
.B Parallel mapfold
.B  ===

.PP

.I val parmapfold 
: 
.B ?init:(int -> unit) ->
.B   ?finalize:(unit -> unit) ->
.B   ?ncores:int ->
.B   ?chunksize:int ->
.B   ('a -> 'b) ->
.B   'a sequence -> ('b -> 'c -> 'c) -> 'c -> ('c -> 'c -> 'c) -> 'c
.sp

.B parmapfold ~ncores:n f (L l) op b concat 
computes 
.B List\&.fold_right op (List\&.map f l) b
by forking 
.B n
processes on a multicore machine\&. 
You need to provide the extra 
.B concat
operator to combine the partial results of the
fold computed on each core\&. If \&'b = \&'c, then 
.B concat
may be simply 
.B op
\&. 
The order of computation in parallel changes w\&.r\&.t\&. sequential execution, so this 
function is only correct if 
.B op
and 
.B concat
are associative and commutative\&.
If the optional 
.B chunksize
parameter is specified,
the processes compute the result in an on\-demand fashion
on blocks of size 
.B chunksize
\&.
.B parmapfold ~ncores:n f (A a) op b concat 
computes 
.B Array\&.fold_right op (Array\&.map f a) b

.sp

.sp

.PP

.B === 
.B Parallel fold
.B  ===

.PP

.I val parfold 
: 
.B ?init:(int -> unit) ->
.B   ?finalize:(unit -> unit) ->
.B   ?ncores:int ->
.B   ?chunksize:int ->
.B   ('a -> 'b -> 'b) -> 'a sequence -> 'b -> ('b -> 'b -> 'b) -> 'b
.sp

.B parfold ~ncores:n op (L l) b concat
computes 
.B List\&.fold_right op l b
by forking 
.B n
processes on a multicore machine\&.
You need to provide the extra 
.B concat
operator to combine the partial results of the
fold computed on each core\&. If \&'b = \&'c, then 
.B concat
may be simply 
.B op
\&. 
The order of computation in parallel changes w\&.r\&.t\&. sequential execution, so this 
function is only correct if 
.B op
and 
.B concat
are associative and commutative\&.
If the optional 
.B chunksize
parameter is specified,
the processes compute the result in an on\-demand fashion
on blocks of size 
.B chunksize
\&.
.B parfold ~ncores:n op (A a) b concat
similarly computes 
.B Array\&.fold_right op a b
\&.
.sp

.sp

.PP

.B === 
.B Parallel map
.B  ===

.PP

.I val parmap 
: 
.B ?init:(int -> unit) ->
.B   ?finalize:(unit -> unit) ->
.B   ?ncores:int -> ?chunksize:int -> ('a -> 'b) -> 'a sequence -> 'b list
.sp

.B parmap  ~ncores:n f (L l) 
computes 
.B List\&.map f l
by forking 
.B n
processes on a multicore machine\&.
.B parmap  ~ncores:n f (A a) 
computes 
.B Array\&.map f a
by forking 
.B n
processes on a multicore machine\&.
If the optional 
.B chunksize
parameter is specified,
the processes compute the result in an on\-demand fashion
on blocks of size 
.B chunksize
; this provides automatic
load balancing for unbalanced computations, but the order
of the result is no longer guaranteed to be preserved\&.
.sp

.sp

.PP

.B === 
.B Parallel iteration
.B  ===

.PP

.I val pariter 
: 
.B ?init:(int -> unit) ->
.B   ?finalize:(unit -> unit) ->
.B   ?ncores:int -> ?chunksize:int -> ('a -> unit) -> 'a sequence -> unit
.sp

.B pariter  ~ncores:n f (L l) 
computes 
.B List\&.iter f l
by forking 
.B n
processes on a multicore machine\&.
.B parmap  ~ncores:n f (A a) 
computes 
.B Array\&.iter f a
by forking 
.B n
processes on a multicore machine\&.
If the optional 
.B chunksize
parameter is specified,
the processes perform the computation in an on\-demand fashion
on blocks of size 
.B chunksize
; this provides automatic
load balancing for unbalanced computations\&.
.sp

.sp

.PP

.B === 
.B Parallel mapfold, indexed
.B  ===

.PP

.I val parmapifold 
: 
.B ?init:(int -> unit) ->
.B   ?finalize:(unit -> unit) ->
.B   ?ncores:int ->
.B   ?chunksize:int ->
.B   (int -> 'a -> 'b) ->
.B   'a sequence -> ('b -> 'c -> 'c) -> 'c -> ('c -> 'c -> 'c) -> 'c
.sp
Like parmapfold, but the map function gets as an extra argument
the index of the mapped element
.sp

.sp

.PP

.B === 
.B Parallel map, indexed
.B  ===

.PP

.I val parmapi 
: 
.B ?init:(int -> unit) ->
.B   ?finalize:(unit -> unit) ->
.B   ?ncores:int ->
.B   ?chunksize:int -> (int -> 'a -> 'b) -> 'a sequence -> 'b list
.sp
Like parmap, but the map function gets as an extra argument
the index of the mapped element
.sp

.sp

.PP

.B === 
.B Parallel iteration, indexed
.B  ===

.PP

.I val pariteri 
: 
.B ?init:(int -> unit) ->
.B   ?finalize:(unit -> unit) ->
.B   ?ncores:int ->
.B   ?chunksize:int -> (int -> 'a -> unit) -> 'a sequence -> unit
.sp
Like pariter, but the iterated function gets as an extra argument
the index of the sequence element
.sp

.sp

.PP

.B === 
.B Parallel map on arrays
.B  ===

.PP

.I val array_parmap 
: 
.B ?init:(int -> unit) ->
.B   ?finalize:(unit -> unit) ->
.B   ?ncores:int -> ?chunksize:int -> ('a -> 'b) -> 'a array -> 'b array
.sp

.B array_parmap  ~ncores:n f a 
computes 
.B Array\&.map f a
by forking 
.B n
processes on a multicore machine\&.
If the optional 
.B chunksize
parameter is specified,
the processes compute the result in an on\-demand fashion
on blochs of size 
.B chunksize
; this provides automatic
load balancing for unbalanced computations, but the order
of the result is no longer guaranteed to be preserved\&.
.sp

.sp

.PP

.B === 
.B Parallel map on arrays, indexed
.B  ===

.PP

.I val array_parmapi 
: 
.B ?init:(int -> unit) ->
.B   ?finalize:(unit -> unit) ->
.B   ?ncores:int -> ?chunksize:int -> (int -> 'a -> 'b) -> 'a array -> 'b array
.sp
Like array_parmap, but the map function gets as an extra argument
the index of the mapped element
.sp

.sp

.PP

.B === 
.B Parallel map on float arrays 
.B  ===

.PP

.I exception WrongArraySize 

.sp

.sp
.I type buf 

.sp

.sp

.I val init_shared_buffer 
: 
.B float array -> buf
.sp

.B init_shared_buffer a
creates a new memory mapped shared buffer big enough to hold a float array of the size of 
.B a
\&.
This buffer can be reused in a series of calls to 
.B array_float_parmap
, avoiding the cost of reallocating it each time\&.
.sp

.sp

.I val array_float_parmap 
: 
.B ?init:(int -> unit) ->
.B   ?finalize:(unit -> unit) ->
.B   ?ncores:int ->
.B   ?chunksize:int ->
.B   ?result:float array ->
.B   ?sharedbuffer:buf -> ('a -> float) -> 'a array -> float array
.sp

.B array_float_parmap  ~ncores:n f a 
computes 
.B Array\&.map f a
by forking 
.B n
processes on a multicore machine, and preallocating the resulting
array as shared memory, which allows significantly more efficient
computation than calling the generic array_parmap function\&.  If the
optional 
.B chunksize
parameter is specified, the processes compute the
result in an on\-demand fashion on blochs of size 
.B chunksize
; this
provides automatic load balancing for unbalanced computations, *and* the
order of the result is still guaranteed to be preserved\&.
.sp
In case you already have at hand an array where to store the result, you
can squeeze out some more cpu cycles by passing it as optional parameter
.B result
: this will avoid the creation of a result array, which can be
costly for very large data sets\&. Raises 
.B WrongArraySize
if 
.B result
is too
small to hold the data\&.
.sp
It is possible to share the same preallocated shared memory space across
calls, by initialising the space calling 
.B init_shared_buffer a
and
passing the result as the optional 
.B sharedbuffer
parameter to each
subsequent call to 
.B array_float_parmap
\&.  Raises WrongArraySize if
.B sharedbuffer
is too small to hold the input data\&.
.sp

.sp

.PP

.B === 
.B Parallel map on float arrays, indexed 
.B  ===

.PP

.I val array_float_parmapi 
: 
.B ?init:(int -> unit) ->
.B   ?finalize:(unit -> unit) ->
.B   ?ncores:int ->
.B   ?chunksize:int ->
.B   ?result:float array ->
.B   ?sharedbuffer:buf -> (int -> 'a -> float) -> 'a array -> float array
.sp

.sp

.PP

.B === Like array_float_parmap, but the map function gets as an extra argument
.B       the index of the mapped element ===

.PP

.PP

.B === 
.B Debugging
.B  ===

.PP

.I val debugging 
: 
.B bool -> unit
.sp

.sp

.PP

.B === Enable or disable debugging code in the library; default: false ===

.PP

.PP

.B === 
.B Helper function for redirection of stdout and stderr
.B  ===

.PP

.I val redirect 
: 
.B ?path:string -> id:int -> unit
.sp

.sp

.PP

.B === Helper function that redirects stdout and stderr to files 
.B       located in the directory path, carrying names of the shape 
.B       stdout\&.NNN and stderr\&.NNN where NNN is the id of the used core\&.
.B       Useful when writing initialisation functions to be passed as
.B       init argument to the parallel combinators\&.
.B       The default value for path is /tmp/\&.parmap\&.PPPP with PPPP the
.B       process id of the main program\&. ===

.PP