.\" -*- mode: troff; coding: utf-8 -*- .\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. .ie n \{\ . ds C` "" . ds C' "" 'br\} .el\{\ . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "PARALLEL_BOOK 7" .TH PARALLEL_BOOK 7 2024-03-22 20240222 parallel .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "Why should you read this book?" .IX Header "Why should you read this book?" If you write shell scripts to do the same processing for different input, then GNU \fBparallel\fR will make your life easier and make your scripts run faster. .PP The book is written so you get the juicy parts first: The goal is that you read just enough to get you going. GNU \fBparallel\fR has an overwhelming amount of special features to help in different situations, and to avoid overloading you with information, the most used features are presented first. .PP All the examples are tested in Bash, and most will work in other shells, too, but there are a few exceptions. So you are recommended to use Bash while testing out the examples. .SH "Learn GNU Parallel in 5 minutes" .IX Header "Learn GNU Parallel in 5 minutes" You just need to run commands in parallel. You do not care about fine tuning. .PP To get going please run this to make some example files: .PP .Vb 2 \& # If your system does not have \*(Aqseq\*(Aq, replace \*(Aqseq\*(Aq with \*(Aqjot\*(Aq \& seq 5 | parallel seq {} \*(Aq>\*(Aq example.{} .Ve .SS "Input sources" .IX Subsection "Input sources" GNU \fBparallel\fR reads values from input sources. One input source is the command line. The values are put after \fB:::\fR : .PP .Vb 1 \& parallel echo ::: 1 2 3 4 5 .Ve .PP This makes it easy to run the same program on some files: .PP .Vb 1 \& parallel wc ::: example.* .Ve .PP If you give multiple \fB:::\fRs, GNU \fBparallel\fR will generate all combinations: .PP .Vb 1 \& parallel wc ::: \-l \-c ::: example.* .Ve .PP GNU \fBparallel\fR can also read the values from stdin (standard input): .PP .Vb 1 \& seq 5 | parallel echo .Ve .SS "Building the command line" .IX Subsection "Building the command line" The command line is put before the \fB:::\fR. It can contain contain a command and options for the command: .PP .Vb 1 \& parallel wc \-l ::: example.* .Ve .PP The command can contain multiple programs. Just remember to quote characters that are interpreted by the shell (such as \fB;\fR): .PP .Vb 1 \& parallel echo counting lines\*(Aq;\*(Aq wc \-l ::: example.* .Ve .PP The value will normally be appended to the command, but can be placed anywhere by using the replacement string \fB{}\fR: .PP .Vb 1 \& parallel echo counting {}\*(Aq;\*(Aq wc \-l {} ::: example.* .Ve .PP When using multiple input sources you use the positional replacement strings \fB{1}\fR and \fB{2}\fR: .PP .Vb 1 \& parallel echo count {1} in {2}\*(Aq;\*(Aq wc {1} {2} ::: \-l \-c ::: example.* .Ve .PP You can check what will be run with \fB\-\-dry\-run\fR: .PP .Vb 1 \& parallel \-\-dry\-run echo count {1} in {2}\*(Aq;\*(Aq wc {1} {2} ::: \-l \-c ::: example.* .Ve .PP This is a good idea to do for every command until you are comfortable with GNU \fBparallel\fR. .SS "Controlling the output" .IX Subsection "Controlling the output" The output will be printed as soon as the command completes. This means the output may come in a different order than the input: .PP .Vb 1 \& parallel sleep {}\*(Aq;\*(Aq echo {} done ::: 5 4 3 2 1 .Ve .PP You can force GNU \fBparallel\fR to print in the order of the values with \&\fB\-\-keep\-order\fR/\fB\-k\fR. This will still run the commands in parallel. The output of the later jobs will be delayed, until the earlier jobs are printed: .PP .Vb 1 \& parallel \-k sleep {}\*(Aq;\*(Aq echo {} done ::: 5 4 3 2 1 .Ve .SS "Controlling the execution" .IX Subsection "Controlling the execution" If your jobs are compute intensive, you will most likely run one job for each core in the system. This is the default for GNU \fBparallel\fR. .PP But sometimes you want more jobs running. You control the number of job slots with \fB\-j\fR. Give \fB\-j\fR the number of jobs to run in parallel: .PP .Vb 4 \& parallel \-j50 \e \& wget https://ftpmirror.gnu.org/parallel/parallel\-{1}{2}22.tar.bz2 \e \& ::: 2012 2013 2014 2015 2016 \e \& ::: 01 02 03 04 05 06 07 08 09 10 11 12 .Ve .SS "Pipe mode" .IX Subsection "Pipe mode" GNU \fBparallel\fR can also pass blocks of data to commands on stdin (standard input): .PP .Vb 1 \& seq 1000000 | parallel \-\-pipe wc .Ve .PP This can be used to process big text files. By default GNU \fBparallel\fR splits on \en (newline) and passes a block of around 1 MB to each job. .SS "That's it" .IX Subsection "That's it" You have now learned the basic use of GNU \fBparallel\fR. This will probably cover most cases of your use of GNU \fBparallel\fR. .PP The rest of this document will go into more details on each of the sections and cover special use cases. .SH "Learn GNU Parallel in an hour" .IX Header "Learn GNU Parallel in an hour" In this part we will dive deeper into what you learned in the first 5 minutes. .PP To get going please run this to make some example files: .PP .Vb 2 \& seq 6 > seq6 \& seq 6 \-1 1 > seq\-6 .Ve .SS "Input sources" .IX Subsection "Input sources" On top of the command line, input sources can also be stdin (standard input or '\-'), files and fifos and they can be mixed. Files are given after \fB\-a\fR or \fB::::\fR. So these all do the same: .PP .Vb 8 \& parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 ::: 6 5 4 3 2 1 \& parallel echo Dice1={1} Dice2={2} :::: <(seq 6) :::: <(seq 6 \-1 1) \& parallel echo Dice1={1} Dice2={2} :::: seq6 seq\-6 \& parallel echo Dice1={1} Dice2={2} :::: seq6 :::: seq\-6 \& parallel \-a seq6 \-a seq\-6 echo Dice1={1} Dice2={2} \& parallel \-a seq6 echo Dice1={1} Dice2={2} :::: seq\-6 \& parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 :::: seq\-6 \& cat seq\-6 | parallel echo Dice1={1} Dice2={2} :::: seq6 \- .Ve .PP If stdin (standard input) is the only input source, you do not need the '\-': .PP .Vb 1 \& cat seq6 | parallel echo Dice1={1} .Ve .PP \fILinking input sources\fR .IX Subsection "Linking input sources" .PP You can link multiple input sources with \fB:::+\fR and \fB::::+\fR: .PP .Vb 2 \& parallel echo {1}={2} ::: I II III IV V VI :::+ 1 2 3 4 5 6 \& parallel echo {1}={2} ::: I II III IV V VI ::::+ seq6 .Ve .PP The \fB:::+\fR (and \fB::::+\fR) will link each value to the corresponding value in the previous input source, so value number 3 from the first input source will be linked to value number 3 from the second input source. .PP You can combine \fB:::+\fR and \fB:::\fR, so you link 2 input sources, but generate all combinations with other input sources: .PP .Vb 2 \& parallel echo Dice1={1}={2} Dice2={3}={4} ::: I II III IV V VI ::::+ seq6 \e \& ::: VI V IV III II I ::::+ seq\-6 .Ve .SS "Building the command line" .IX Subsection "Building the command line" \fIThe command\fR .IX Subsection "The command" .PP The command can be a script, a binary or a Bash function if the function is exported using \fBexport \-f\fR: .PP .Vb 6 \& # Works only in Bash \& my_func() { \& echo in my_func "$1" \& } \& export \-f my_func \& parallel my_func ::: 1 2 3 .Ve .PP If the command is complex, it often improves readability to make it into a function. .PP \fIThe replacement strings\fR .IX Subsection "The replacement strings" .PP GNU \fBparallel\fR has some replacement strings to make it easier to refer to the input read from the input sources. .PP If the input is \fBmydir/mysubdir/myfile.myext\fR then: .PP .Vb 7 \& {} = mydir/mysubdir/myfile.myext \& {.} = mydir/mysubdir/myfile \& {/} = myfile.myext \& {//} = mydir/mysubdir \& {/.} = myfile \& {#} = the sequence number of the job \& {%} = the job slot number .Ve .PP When a job is started it gets a sequence number that starts at 1 and increases by 1 for each new job. The job also gets assigned a slot number. This number is from 1 to the number of jobs running in parallel. It is unique between the running jobs, but is re-used as soon as a job finishes. .PP The positional replacement strings .IX Subsection "The positional replacement strings" .PP The replacement strings have corresponding positional replacement strings. If the value from the 3rd input source is \&\fBmydir/mysubdir/myfile.myext\fR: .PP .Vb 5 \& {3} = mydir/mysubdir/myfile.myext \& {3.} = mydir/mysubdir/myfile \& {3/} = myfile.myext \& {3//} = mydir/mysubdir \& {3/.} = myfile .Ve .PP So the number of the input source is simply prepended inside the {}'s. .SH "Replacement strings" .IX Header "Replacement strings" \&\-\-plus replacement strings .PP change the replacement string (\-I \-\-extensionreplace \-\-basenamereplace \-\-basenamereplace \-\-dirnamereplace \-\-basenameextensionreplace \-\-seqreplace \-\-slotreplace .PP \&\-\-header with named replacement string .PP {= =} .PP Dynamic replacement strings .SS "Defining replacement strings" .IX Subsection "Defining replacement strings" .SS "Copying environment" .IX Subsection "Copying environment" env_parallel .SS "Controlling the output" .IX Subsection "Controlling the output" \fIparset\fR .IX Subsection "parset" .PP \&\fBparset\fR is a shell function to get the output from GNU \fBparallel\fR into shell variables. .PP \&\fBparset\fR is fully supported for \fBBash/Zsh/Ksh\fR and partially supported for \fBash/dash\fR. I will assume you run \fBBash\fR. .PP To activate \fBparset\fR you have to run: .PP .Vb 1 \& . \`which env_parallel.bash\` .Ve .PP (replace \fBbash\fR with your shell's name). .PP Then you can run: .PP .Vb 2 \& parset a,b,c seq ::: 4 5 6 \& echo "$c" .Ve .PP or: .PP .Vb 2 \& parset \*(Aqa b c\*(Aq seq ::: 4 5 6 \& echo "$c" .Ve .PP If you give a single variable, this will become an array: .PP .Vb 2 \& parset arr seq ::: 4 5 6 \& echo "${arr[1]}" .Ve .PP \&\fBparset\fR has one limitation: If it reads from a pipe, the output will be lost. .PP .Vb 2 \& echo This will not work | parset myarr echo \& echo Nothing: "${myarr[*]}" .Ve .PP Instead you can do this: .PP .Vb 3 \& echo This will work > tempfile \& parset myarr echo < tempfile \& echo ${myarr[*]} .Ve .PP sql cvs .SS "Controlling the execution" .IX Subsection "Controlling the execution" \&\-\-dryrun \-v .SS "Remote execution" .IX Subsection "Remote execution" For this section you must have \fBssh\fR access with no password to 2 servers: \fR\f(CB$server1\fR\fB\fR and \fB\fR\f(CB$server2\fR\fB\fR. .PP .Vb 2 \& server1=server.example.com \& server2=server2.example.net .Ve .PP So you must be able to do this: .PP .Vb 2 \& ssh $server1 echo works \& ssh $server2 echo works .Ve .PP It can be setup by running 'ssh\-keygen \-t dsa; ssh-copy-id \f(CW$server1\fR' and using an empty passphrase. Or you can use \fBssh-agent\fR. .PP \fIWorkers\fR .IX Subsection "Workers" .PP \fI\-\-transferfile\fR .IX Subsection "--transferfile" .PP \&\fB\-\-transferfile\fR \fIfilename\fR will transfer \fIfilename\fR to the worker. \fIfilename\fR can contain a replacement string: .PP .Vb 3 \& parallel \-S $server1,$server2 \-\-transferfile {} wc ::: example.* \& parallel \-S $server1,$server2 \-\-transferfile {2} \e \& echo count {1} in {2}\*(Aq;\*(Aq wc {1} {2} ::: \-l \-c ::: example.* .Ve .PP A shorthand for \fB\-\-transferfile {}\fR is \fB\-\-transfer\fR. .PP \fI\-\-return\fR .IX Subsection "--return" .PP \fI\-\-cleanup\fR .IX Subsection "--cleanup" .PP A shorthand for \fB\-\-transfer \-\-return {} \-\-cleanup\fR is \fB\-\-trc {}\fR. .SS "Pipe mode" .IX Subsection "Pipe mode" \&\-\-pipepart .SS "That's it" .IX Subsection "That's it" .SH "Advanced usage" .IX Header "Advanced usage" parset fifo, cmd substitution, arrayelements, array with var names and cmds, env_parset .PP env_parallel .PP Interfacing with R. .PP Interfacing with JSON/jq .PP 4dl() { board="$(printf \-\- '%s' "${1}" | cut \-d '/' \-f4)" thread="$(printf \-\- '%s' "${1}" | cut \-d '/' \-f6)" wget \-qO\- "https://a.4cdn.org/${board}/thread/${thread}.json" | jq \-r ' .posts | map(select(.tim != null)) | map((.tim | tostring) + .ext) | map("https://i.4cdn.org/'"${board}"'/"+.)[] ' | parallel \-\-gnu \-j 0 wget \-nv } .PP Interfacing with XML/? .PP Interfacing with HTML/? .SS "Controlling the execution" .IX Subsection "Controlling the execution" \&\-\-termseq .SS "Remote execution" .IX Subsection "Remote execution" seq 10 | parallel \-\-sshlogin 'ssh \-i "key.pem" a@b.com' echo .PP seq 10 | PARALLEL_SSH='ssh \-i "key.pem"' parallel \-\-sshlogin a@b.com echo .PP seq 10 | parallel \-\-ssh 'ssh \-i "key.pem"' \-\-sshlogin a@b.com echo .PP ssh-agent .PP The sshlogin file format .PP Check if servers are up