.\" Man page generated from reStructuredText.
.
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "TOIL" "1" "Nov 20, 2023" "5.9.0" "Toil"
.SH NAME
toil \- Toil Documentation
.sp
Toil is an open\-source pure\-Python workflow engine that lets people write better pipelines.
.sp
Check out our \fI\%website\fP for a comprehensive list of Toil\(aqs features and read our \fI\%paper\fP to learn what Toil can do
in the real world.  Please subscribe to our low\-volume \fI\%announce\fP mailing list and feel free to also join us on \fI\%GitHub\fP and \fI\%Gitter\fP\&.
.sp
If using Toil for your research, please cite
.INDENT 0.0
.INDENT 3.5
Vivian, J., Rao, A. A., Nothaft, F. A., Ketchum, C., Armstrong, J., Novak, A., … Paten, B. (2017).
Toil enables reproducible, open source, big biomedical data analyses. Nature Biotechnology, 35(4), 314–316.
\fI\%http://doi.org/10.1038/nbt.3772\fP
.UNINDENT
.UNINDENT
.SH QUICKSTART EXAMPLES
.SS Running a basic workflow
.sp
A Toil workflow can be run with just two steps:
.INDENT 0.0
.IP 1. 3
Copy and paste the following code block into a new file called \fBhelloWorld.py\fP:
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from toil.common import Toil
from toil.job import Job


def helloWorld(message, memory=\(dq1G\(dq, cores=1, disk=\(dq1G\(dq):
    return f\(dqHello, world!, here\(aqs a message: {message}\(dq


if __name__ == \(dq__main__\(dq:
    parser = Job.Runner.getDefaultArgumentParser()
    options = parser.parse_args()
    options.clean = \(dqalways\(dq
    with Toil(options) as toil:
        output = toil.start(Job.wrapFn(helloWorld, \(dqYou did it!\(dq))
    print(output)

.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 0.0
.IP 2. 3
Specify the name of the \fI\%job store\fP and run the workflow:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
python3 helloWorld.py file:my\-job\-store
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.sp
Congratulations! You\(aqve run your first Toil workflow using the default \fI\%Batch System\fP, \fBsingleMachine\fP,
using the \fBfile\fP job store.
.sp
Toil uses batch systems to manage the jobs it creates.
.sp
The \fBsingleMachine\fP batch system is primarily used to prepare and debug workflows on a
local machine. Once validated, try running them on a full\-fledged batch system (see \fI\%Batch System API\fP).
Toil supports many different batch systems such as \fI\%Apache Mesos\fP and Grid Engine; its versatility makes it
easy to run your workflow in all kinds of places.
.sp
Toil is totally customizable! Run \fBpython3 helloWorld.py \-\-help\fP to see a complete list of available options.
.sp
For something beyond a \(dqHello, world!\(dq example, refer to \fI\%A (more) real\-world example\fP\&.
.SS Running a basic CWL workflow
.sp
The \fI\%Common Workflow Language\fP (CWL) is an emerging standard for writing
workflows that are portable across multiple workflow engines and platforms.
Running CWL workflows using Toil is easy.
.INDENT 0.0
.IP 1. 3
Copy and paste the following code block into \fBexample.cwl\fP:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
cwlVersion: v1.0
class: CommandLineTool
baseCommand: echo
stdout: output.txt
inputs:
  message:
    type: string
    inputBinding:
      position: 1
outputs:
  output:
    type: stdout
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
and this code into \fBexample\-job.yaml\fP:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
message: Hello world!
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 2. 3
To run the workflow simply enter
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil\-cwl\-runner example.cwl example\-job.yaml
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Your output will be in \fBoutput.txt\fP:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ cat output.txt
Hello world!
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.sp
To learn more about CWL, see the \fI\%CWL User Guide\fP (from where this example was
shamelessly borrowed).
.sp
To run this workflow on an AWS cluster have a look at \fI\%Running a CWL Workflow on AWS\fP\&.
.sp
For information on using CWL with Toil see the section \fI\%CWL in Toil\fP
.SS Running a basic WDL workflow
.sp
The \fI\%Workflow Description Language\fP (WDL) is another emerging language for writing workflows that are portable across multiple workflow engines and platforms.
Running WDL workflows using Toil is still in alpha, and currently experimental.  Toil currently supports basic workflow syntax (see \fI\%WDL in Toil\fP for more details and examples).  Here we go over running a basic WDL helloworld workflow.
.INDENT 0.0
.IP 1. 3
Copy and paste the following code block into \fBwdl\-helloworld.wdl\fP:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
    workflow write_simple_file {
      call write_file
    }
    task write_file {
      String message
      command { echo ${message} > wdl\-helloworld\-output.txt }
      output { File test = \(dqwdl\-helloworld\-output.txt\(dq }
    }

and this code into \(ga\(gawdl\-helloworld.json\(ga\(ga::

    {
      \(dqwrite_simple_file.write_file.message\(dq: \(dqHello world!\(dq
    }
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 2. 3
To run the workflow simply enter
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil\-wdl\-runner wdl\-helloworld.wdl wdl\-helloworld.json
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Your output will be in \fBwdl\-helloworld\-output.txt\fP:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ cat wdl\-helloworld\-output.txt
Hello world!
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.sp
To learn more about WDL, see the main \fI\%WDL website\fP .
.SS A (more) real\-world example
.sp
For a more detailed example and explanation, we\(aqve developed a sample pipeline
that merge\-sorts a temporary file. This is not supposed to be an efficient
sorting program, rather a more fully worked example of what Toil is capable of.
.SS Running the example
.INDENT 0.0
.IP 1. 3
Download \fBthe example code\fP
.IP 2. 3
Run it with the default settings:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 sort.py file:jobStore
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The workflow created a file called \fBsortedFile.txt\fP in your current directory.
Have a look at it and notice that it contains a whole lot of sorted lines!
.sp
This workflow does a smart merge sort on a file it generates, \fBfileToSort.txt\fP\&. The sort is \fIsmart\fP
because each step of the process\-\-\-splitting the file into separate chunks, sorting these chunks, and merging them
back together\-\-\-is compartmentalized into a \fBjob\fP\&. Each job can specify its own resource requirements and will
only be run after the jobs it depends upon have run. Jobs without dependencies will be run in parallel.
.UNINDENT
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
Delete \fBfileToSort.txt\fP before moving on to #3. This example introduces options that specify dimensions for
\fBfileToSort.txt\fP, if it does not already exist. If it exists, this workflow will use the existing file and
the results will be the same as #2.
.UNINDENT
.UNINDENT
.INDENT 0.0
.IP 3. 3
Run with custom options:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 sort.py file:jobStore \e
             \-\-numLines=5000 \e
             \-\-lineLength=10 \e
             \-\-overwriteOutput=True \e
             \-\-workDir=/tmp/
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Here we see that we can add our own options to a Toil script. As noted above, the first two
options, \fB\-\-numLines\fP and \fB\-\-lineLength\fP, determine the number of lines and how many characters are in each line.
\fB\-\-overwriteOutput\fP causes the current contents of \fBsortedFile.txt\fP to be overwritten, if it already exists.
The last option, \fB\-\-workDir\fP, is an option built into Toil to specify where temporary files unique to a job are kept.
.UNINDENT
.SS Describing the source code
.sp
To understand the details of what\(aqs going on inside.
Let\(aqs start with the \fBmain()\fP function. It looks like a lot of code, but don\(aqt worry\-\-\-we\(aqll break it down piece by
piece.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
def main(options=None):
    if not options:
        # deal with command line arguments
        parser = ArgumentParser()
        Job.Runner.addToilOptions(parser)
        parser.add_argument(\(aq\-\-numLines\(aq, default=defaultLines, help=\(aqNumber of lines in file to sort.\(aq, type=int)
        parser.add_argument(\(aq\-\-lineLength\(aq, default=defaultLineLen, help=\(aqLength of lines in file to sort.\(aq, type=int)
        parser.add_argument(\(dq\-\-fileToSort\(dq, help=\(dqThe file you wish to sort\(dq)
        parser.add_argument(\(dq\-\-outputFile\(dq, help=\(dqWhere the sorted output will go\(dq)
        parser.add_argument(\(dq\-\-overwriteOutput\(dq, help=\(dqWrite over the output file if it already exists.\(dq, default=True)
        parser.add_argument(\(dq\-\-N\(dq, dest=\(dqN\(dq,
                            help=\(dqThe threshold below which a serial sort function is used to sort file. \(dq
                                 \(dqAll lines must of length less than or equal to N or program will fail\(dq,
                            default=10000)
        parser.add_argument(\(aq\-\-downCheckpoints\(aq, action=\(aqstore_true\(aq,
                            help=\(aqIf this option is set, the workflow will make checkpoints on its way through\(aq
                                 \(aqthe recursive \(dqdown\(dq part of the sort\(aq)
        parser.add_argument(\(dq\-\-sortMemory\(dq, dest=\(dqsortMemory\(dq,
                        help=\(dqMemory for jobs that sort chunks of the file.\(dq,
                        default=None)

        parser.add_argument(\(dq\-\-mergeMemory\(dq, dest=\(dqmergeMemory\(dq,
                        help=\(dqMemory for jobs that collate results.\(dq,
                        default=None)

        options = parser.parse_args()
    if not hasattr(options, \(dqsortMemory\(dq) or not options.sortMemory:
        options.sortMemory = sortMemory
    if not hasattr(options, \(dqmergeMemory\(dq) or not options.mergeMemory:
        options.mergeMemory = sortMemory

    # do some input verification
    sortedFileName = options.outputFile or \(dqsortedFile.txt\(dq
    if not options.overwriteOutput and os.path.exists(sortedFileName):
        print(f\(aqOutput file {sortedFileName} already exists.  \(aq
              f\(aqDelete it to run the sort example again or use \-\-overwriteOutput=True\(aq)
        exit()

    fileName = options.fileToSort
    if options.fileToSort is None:
        # make the file ourselves
        fileName = \(aqfileToSort.txt\(aq
        if os.path.exists(fileName):
            print(f\(aqSorting existing file: {fileName}\(aq)
        else:
            print(f\(aqNo sort file specified. Generating one automatically called: {fileName}.\(aq)
            makeFileToSort(fileName=fileName, lines=options.numLines, lineLen=options.lineLength)
    else:
        if not os.path.exists(options.fileToSort):
            raise RuntimeError(\(dqFile to sort does not exist: %s\(dq % options.fileToSort)

    if int(options.N) <= 0:
        raise RuntimeError(\(dqInvalid value of N: %s\(dq % options.N)

    # Now we are ready to run
    with Toil(options) as workflow:
        sortedFileURL = \(aqfile://\(aq + os.path.abspath(sortedFileName)
        if not workflow.options.restart:
            sortFileURL = \(aqfile://\(aq + os.path.abspath(fileName)
            sortFileID = workflow.importFile(sortFileURL)
            sortedFileID = workflow.start(Job.wrapJobFn(setup,
                                                        sortFileID,
                                                        int(options.N),
                                                        options.downCheckpoints,
                                                        options=options,
                                                        memory=sortMemory))
        else:
            sortedFileID = workflow.restart()
        workflow.exportFile(sortedFileID, sortedFileURL)

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
First we make a parser to process command line arguments using the \fI\%argparse\fP module. It\(aqs important that we add the
call to \fBJob.Runner.addToilOptions()\fP to initialize our parser with all of Toil\(aqs default options. Then we add
the command line arguments unique to this workflow, and parse the input. The help message listed with the arguments
should give you a pretty good idea of what they can do.
.sp
Next we do a little bit of verification of the input arguments. The option \fB\-\-fileToSort\fP allows you to specify a file
that needs to be sorted. If this option isn\(aqt given, it\(aqs here that we make our own file with the call to
\fBmakeFileToSort()\fP\&.
.sp
Finally we come to the context manager that initializes the workflow. We create a path to the input file prepended with
\fB\(aqfile://\(aq\fP as per the documentation for \fI\%toil.common.Toil()\fP when staging a file that is stored locally. Notice
that we have to check whether or not the workflow is restarting so that we don\(aqt import the file more than once.
Finally we can kick off the workflow by calling \fI\%toil.common.Toil.start()\fP on the job \fBsetup\fP\&. When the workflow
ends we capture its output (the sorted file\(aqs fileID) and use that in \fBtoil.common.Toil.exportFile()\fP to move the
sorted file from the job store back into \(dquserland\(dq.
.sp
Next let\(aqs look at the job that begins the actual workflow, \fBsetup\fP\&.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
def setup(job, inputFile, N, downCheckpoints, options):
    \(dq\(dq\(dq
    Sets up the sort.
    Returns the FileID of the sorted file
    \(dq\(dq\(dq
    RealtimeLogger.info(\(dqStarting the merge sort\(dq)
    return job.addChildJobFn(down,
                             inputFile, N, \(aqroot\(aq,
                             downCheckpoints,
                             options = options,
                             preemptible=True,
                             memory=sortMemory).rv()

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBsetup\fP really only does two things. First it writes to the logs using \fBJob.log()\fP and then
calls \fBaddChildJobFn()\fP\&. Child jobs run directly after the current job. This function turns the \(aqjob function\(aq
\fBdown\fP into an actual job and passes in the inputs including an optional resource requirement, \fBmemory\fP\&. The job
doesn\(aqt actually get run until the call to \fBJob.rv()\fP\&. Once the job \fBdown\fP finishes, its output is returned here.
.sp
Now we can look at what \fBdown\fP does.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
def down(job, inputFileStoreID, N, path, downCheckpoints, options, memory=sortMemory):
    \(dq\(dq\(dq
    Input is a file, a subdivision size N, and a path in the hierarchy of jobs.
    If the range is larger than a threshold N the range is divided recursively and
    a follow on job is then created which merges back the results else
    the file is sorted and placed in the output.
    \(dq\(dq\(dq

    RealtimeLogger.info(\(dqDown job starting: %s\(dq % path)

    # Read the file
    inputFile = job.fileStore.readGlobalFile(inputFileStoreID, cache=False)
    length = os.path.getsize(inputFile)
    if length > N:
        # We will subdivide the file
        RealtimeLogger.critical(\(dqSplitting file: %s of size: %s\(dq
                % (inputFileStoreID, length))
        # Split the file into two copies
        midPoint = getMidPoint(inputFile, 0, length)
        t1 = job.fileStore.getLocalTempFile()
        with open(t1, \(aqw\(aq) as fH:
            fH.write(copySubRangeOfFile(inputFile, 0, midPoint+1))
        t2 = job.fileStore.getLocalTempFile()
        with open(t2, \(aqw\(aq) as fH:
            fH.write(copySubRangeOfFile(inputFile, midPoint+1, length))
        # Call down recursively. By giving the rv() of the two jobs as inputs to the follow\-on job, up,
        # we communicate the dependency without hindering concurrency.
        result = job.addFollowOnJobFn(up,
                                    job.addChildJobFn(down, job.fileStore.writeGlobalFile(t1), N, path + \(aq/0\(aq,
                                                      downCheckpoints, checkpoint=downCheckpoints, options=options,
                                                      preemptible=True, memory=options.sortMemory).rv(),
                                    job.addChildJobFn(down, job.fileStore.writeGlobalFile(t2), N, path + \(aq/1\(aq,
                                                      downCheckpoints, checkpoint=downCheckpoints, options=options,
                                                      preemptible=True, memory=options.mergeMemory).rv(),
                                    path + \(aq/up\(aq, preemptible=True, options=options, memory=options.sortMemory).rv()
    else:
        # We can sort this bit of the file
        RealtimeLogger.critical(\(dqSorting file: %s of size: %s\(dq
                % (inputFileStoreID, length))
        # Sort the copy and write back to the fileStore
        shutil.copyfile(inputFile, inputFile + \(aq.sort\(aq)
        sort(inputFile + \(aq.sort\(aq)
        result = job.fileStore.writeGlobalFile(inputFile + \(aq.sort\(aq)

    RealtimeLogger.info(\(dqDown job finished: %s\(dq % path)
    return result

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Down is the recursive part of the workflow. First we read the file into the local filestore by calling
\fBjob.fileStore.readGlobalFile()\fP\&. This puts a copy of the file in the temp directory for this particular job. This
storage will disappear once this job ends. For a detailed explanation of the filestore, job store, and their interfaces
have a look at \fI\%Managing files within a workflow\fP\&.
.sp
Next \fBdown\fP checks the base case of the recursion: is the length of the input file less than \fBN\fP (remember \fBN\fP
was an option we added to the workflow in \fBmain\fP)? In the base case, we just sort the file, and return the file ID
of this new sorted file.
.sp
If the base case fails, then the file is split into two new tempFiles using \fBjob.fileStore.getLocalTempFile()\fP and
the helper function \fBcopySubRangeOfFile\fP\&. Finally we add a follow on Job \fBup\fP with \fBjob.addFollowOnJobFn()\fP\&.
We\(aqve already seen child jobs. A follow\-on Job is a job that runs after the current job and \fIall\fP of its children (and their children and follow\-ons) have
completed. Using a follow\-on makes sense because \fBup\fP is responsible for merging the files together and we don\(aqt want
to merge the files together until we \fIknow\fP they are sorted. Again, the return value of the follow\-on job is requested
using \fBJob.rv()\fP\&.
.sp
Looking at \fBup\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
def up(job, inputFileID1, inputFileID2, path, options, memory=sortMemory):
    \(dq\(dq\(dq
    Merges the two files and places them in the output.
    \(dq\(dq\(dq

    RealtimeLogger.info(\(dqUp job starting: %s\(dq % path)

    with job.fileStore.writeGlobalFileStream() as (fileHandle, outputFileStoreID):
        fileHandle = codecs.getwriter(\(aqutf\-8\(aq)(fileHandle)
        with job.fileStore.readGlobalFileStream(inputFileID1) as inputFileHandle1:
            inputFileHandle1 = codecs.getreader(\(aqutf\-8\(aq)(inputFileHandle1)
            with job.fileStore.readGlobalFileStream(inputFileID2) as inputFileHandle2:
                inputFileHandle2 = codecs.getreader(\(aqutf\-8\(aq)(inputFileHandle2)
                RealtimeLogger.info(\(dqMerging %s and %s to %s\(dq
                    % (inputFileID1, inputFileID2, outputFileStoreID))
                merge(inputFileHandle1, inputFileHandle2, fileHandle)
        # Cleanup up the input files \- these deletes will occur after the completion is successful.
        job.fileStore.deleteGlobalFile(inputFileID1)
        job.fileStore.deleteGlobalFile(inputFileID2)

        RealtimeLogger.info(\(dqUp job finished: %s\(dq % path)

        return outputFileStoreID

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
we see that the two input files are merged together and the output is written to a new file using
\fBjob.fileStore.writeGlobalFileStream()\fP\&. After a little cleanup, the output file is returned.
.sp
Once the final \fBup\fP finishes and all of the \fBrv()\fP promises are fulfilled, \fBmain\fP receives the sorted file\(aqs ID
which it uses in \fBexportFile\fP to send it to the user.
.sp
There are other things in this example that we didn\(aqt go over such as \fI\%Checkpoints\fP and the details of much of
the \fI\%Toil Class API\fP\&.
.sp
At the end of the script the lines
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
if __name__ == \(aq__main__\(aq
    main()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
are included to ensure that the main function is only run once in the \(aq__main__\(aq process
invoked by you, the user.
In Toil terms, by invoking the script you created the \fIleader process\fP
in which the \fBmain()\fP
function is run. A \fIworker process\fP is a separate process whose sole purpose
is to host the execution of one or more jobs defined in that script. In any Toil
workflow there is always one leader process, and potentially many worker processes.
.sp
When using the single\-machine batch system (the default), the worker processes will be running
on the same machine as the leader process. With full\-fledged batch systems like
Mesos the worker processes will typically be started on separate machines. The
boilerplate ensures that the pipeline is only started once\-\-\-on the leader\-\-\-but
not when its job functions are imported and executed on the individual workers.
.sp
Typing \fBpython3 sort.py \-\-help\fP will show the complete list of
arguments for the workflow which includes both Toil\(aqs and ones defined inside
\fBsort.py\fP\&. A complete explanation of Toil\(aqs arguments can be
found in \fI\%Commandline Options\fP\&.
.SS Logging
.sp
By default, Toil logs a lot of information related to the current environment
in addition to messages from the batch system and jobs. This can be configured
with the \fB\-\-logLevel\fP flag. For example, to only log \fBCRITICAL\fP level
messages to the screen:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 sort.py file:jobStore \e
             \-\-logLevel=critical \e
             \-\-overwriteOutput=True
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This hides most of the information we get from the Toil run. For more detail,
we can run the pipeline with \fB\-\-logLevel=debug\fP to see a comprehensive
output. For more information, see \fI\%Commandline Options\fP\&.
.SS Error Handling and Resuming Pipelines
.sp
With Toil, you can recover gracefully from a bug in your pipeline without losing
any progress from successfully completed jobs. To demonstrate this, let\(aqs add
a bug to our example code to see how Toil handles a failure and how we can
resume a pipeline after that happens. Add a bad assertion at line 52 of the
example (the first line of \fBdown()\fP):
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
def down(job, inputFileStoreID, N, downCheckpoints, memory=sortMemory):
    ...
    assert 1 == 2, \(dqTest error!\(dq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
When we run the pipeline, Toil will show a detailed failure log with a traceback:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 sort.py file:jobStore
\&...
\-\-\-TOIL WORKER OUTPUT LOG\-\-\-
\&...
m/j/jobonrSMP    Traceback (most recent call last):
m/j/jobonrSMP      File \(dqtoil/src/toil/worker.py\(dq, line 340, in main
m/j/jobonrSMP        job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
m/j/jobonrSMP      File \(dqtoil/src/toil/job.py\(dq, line 1270, in _runner
m/j/jobonrSMP        returnValues = self._run(jobGraph, fileStore)
m/j/jobonrSMP      File \(dqtoil/src/toil/job.py\(dq, line 1217, in _run
m/j/jobonrSMP        return self.run(fileStore)
m/j/jobonrSMP      File \(dqtoil/src/toil/job.py\(dq, line 1383, in run
m/j/jobonrSMP        rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
m/j/jobonrSMP      File \(dqtoil/example.py\(dq, line 30, in down
m/j/jobonrSMP        assert 1 == 2, \(dqTest error!\(dq
m/j/jobonrSMP    AssertionError: Test error!
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
If we try and run the pipeline again, Toil will give us an error message saying
that a job store of the same name already exists. By default, in the event of a
failure, the job store is preserved so that the workflow can be restarted,
starting from the previously failed jobs. We can restart the pipeline by running
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 sort.py file:jobStore \e
             \-\-restart \e
             \-\-overwriteOutput=True
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
We can also change the number of times Toil will attempt to retry a failed job:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 sort.py file:jobStore \e
             \-\-retryCount 2 \e
             \-\-restart \e
             \-\-overwriteOutput=True
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
You\(aqll now see Toil attempt to rerun the failed job until it runs out of tries.
\fB\-\-retryCount\fP is useful for non\-systemic errors, like downloading a file that
may experience a sporadic interruption, or some other non\-deterministic failure.
.sp
To successfully restart our pipeline, we can edit our script to comment out
line 30, or remove it, and then run
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 sort.py file:jobStore \e
             \-\-restart \e
             \-\-overwriteOutput=True
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The pipeline will run successfully, and the job store will be removed on the
pipeline\(aqs completion.
.SS Collecting Statistics
.sp
Please see the \fI\%Stats Command\fP section for more on gathering runtime and resource info on jobs.
.SS Launching a Toil Workflow in AWS
.sp
After having installed the \fBaws\fP extra for Toil during the \fI\%Installation\fP and set up AWS
(see \fI\%Preparing your AWS environment\fP), the user can run the basic \fBhelloWorld.py\fP script (\fI\%Running a basic workflow\fP)
on a VM in AWS just by modifying the run command.
.sp
Note that when running in AWS, users can either run the workflow on a single instance or run it on a
cluster (which is running across multiple containers on multiple AWS instances).  For more information
on running Toil workflows on a cluster, see \fI\%Running in AWS\fP\&.
.sp
Also!  Remember to use the \fI\%Destroy\-Cluster Command\fP command when finished to destroy the cluster!  Otherwise things may not be cleaned up properly.
.INDENT 0.0
.IP 1. 3
Launch a cluster in AWS using the \fI\%Launch\-Cluster Command\fP command:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil launch\-cluster <cluster\-name> \e
             \-\-keyPairName <AWS\-key\-pair\-name> \e
             \-\-leaderNodeType t2.medium \e
             \-\-zone us\-west\-2a
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The arguments \fBkeyPairName\fP, \fBleaderNodeType\fP, and \fBzone\fP are required to launch a cluster.
.IP 2. 3
Copy \fBhelloWorld.py\fP to the \fB/tmp\fP directory on the leader node using the \fI\%Rsync\-Cluster Command\fP command:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil rsync\-cluster \-\-zone us\-west\-2a <cluster\-name> helloWorld.py :/tmp
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note that the command requires defining the file to copy as well as the target location on the cluster leader node.
.IP 3. 3
Login to the cluster leader node using the \fI\%Ssh\-Cluster Command\fP command:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil ssh\-cluster \-\-zone us\-west\-2a <cluster\-name>
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note that this command will log you in as the \fBroot\fP user.
.IP 4. 3
Run the Toil script in the cluster:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 /tmp/helloWorld.py aws:us\-west\-2:my\-S3\-bucket
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
In this particular case, we create an S3 bucket called \fBmy\-S3\-bucket\fP in
the \fBus\-west\-2\fP availability zone to store intermediate job results.
.sp
Along with some other \fBINFO\fP log messages, you should get the following output in your terminal window:
\fBHello, world!, here\(aqs a message: You did it!\fP\&.
.IP 5. 3
Exit from the SSH connection.
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ exit
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 6. 3
Use the \fI\%Destroy\-Cluster Command\fP command to destroy the cluster:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil destroy\-cluster \-\-zone us\-west\-2a <cluster\-name>
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note that this command will destroy the cluster leader
node and any resources created to run the job, including the S3 bucket.
.UNINDENT
.SS Running a CWL Workflow on AWS
.sp
After having installed the \fBaws\fP and \fBcwl\fP extras for Toil during the \fI\%Installation\fP and set up AWS
(see \fI\%Preparing your AWS environment\fP), the user can run a CWL workflow with Toil on AWS.
.sp
Also!  Remember to use the \fI\%Destroy\-Cluster Command\fP command when finished to destroy the cluster!  Otherwise things may not be cleaned up properly.
.INDENT 0.0
.IP 1. 3
First launch a node in AWS using the \fI\%Launch\-Cluster Command\fP command:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil launch\-cluster <cluster\-name> \e
             \-\-keyPairName <AWS\-key\-pair\-name> \e
             \-\-leaderNodeType t2.medium \e
             \-\-zone us\-west\-2a
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 2. 3
Copy \fBexample.cwl\fP and \fBexample\-job.yaml\fP from the \fI\%CWL example\fP to the node using
the \fI\%Rsync\-Cluster Command\fP command:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
toil rsync\-cluster \-\-zone us\-west\-2a <cluster\-name> example.cwl :/tmp
toil rsync\-cluster \-\-zone us\-west\-2a <cluster\-name> example\-job.yaml :/tmp
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 3. 3
SSH into the cluster\(aqs leader node using the \fI\%Ssh\-Cluster Command\fP utility:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil ssh\-cluster \-\-zone us\-west\-2a <cluster\-name>
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 4. 3
Once on the leader node, it\(aqs a good idea to update and install the following:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
sudo apt\-get update
sudo apt\-get \-y upgrade
sudo apt\-get \-y dist\-upgrade
sudo apt\-get \-y install git
sudo pip install mesos.cli
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 5. 3
Now create a new \fBvirtualenv\fP with the \fB\-\-system\-site\-packages\fP option and activate:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
virtualenv \-\-system\-site\-packages venv
source venv/bin/activate
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 6. 3
Now run the CWL workflow:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil\-cwl\-runner \e
             \-\-provisioner aws \e
             \-\-jobStore aws:us\-west\-2a:any\-name \e
             /tmp/example.cwl /tmp/example\-job.yaml
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTIP:\fP
.INDENT 3.0
.INDENT 3.5
When running a CWL workflow on AWS, input files can be provided either on the
local file system or in S3 buckets using \fBs3://\fP URI references. Final output
files will be copied to the local file system of the leader node.
.UNINDENT
.UNINDENT
.IP 7. 3
Finally, log out of the leader node and from your local computer, destroy the cluster:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil destroy\-cluster \-\-zone us\-west\-2a <cluster\-name>
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.SS Running a Workflow with Autoscaling \- Cactus
.sp
\fI\%Cactus\fP is a reference\-free, whole\-genome multiple alignment
program that can be run on any of the cloud platforms Toil supports.
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
\fBCloud Independence\fP:
.sp
This example provides a \(dqcloud agnostic\(dq view of running Cactus with Toil. Most options will not change between cloud providers.
However, each provisioner has unique inputs for  \fB\-\-leaderNodeType\fP, \fB\-\-nodeType\fP and \fB\-\-zone\fP\&.
We recommend the following:
.INDENT 0.0
.INDENT 3.5
.TS
center;
|l|l|l|l|.
_
T{
Option
T}	T{
Used in
T}	T{
AWS
T}	T{
Google
T}
_
T{
\fB\-\-leaderNodeType\fP
T}	T{
launch\-cluster
T}	T{
t2.medium
T}	T{
n1\-standard\-1
T}
_
T{
\fB\-\-zone\fP
T}	T{
launch\-cluster
T}	T{
us\-west\-2a
T}	T{
us\-west1\-a
T}
_
T{
\fB\-\-zone\fP
T}	T{
cactus
T}	T{
us\-west\-2
T}
_
T{
\fB\-\-nodeType\fP
T}	T{
cactus
T}	T{
c3.4xlarge
T}	T{
n1\-standard\-8
T}
_
.TE
.UNINDENT
.UNINDENT
.sp
When executing \fBtoil launch\-cluster\fP with \fBgce\fP specified for \fB\-\-provisioner\fP, the option \fB\-\-boto\fP must
be specified and given a path to your .boto file. See \fI\%Running in Google Compute Engine (GCE)\fP for more information about the \fB\-\-boto\fP option.
.UNINDENT
.UNINDENT
.sp
Also!  Remember to use the \fI\%Destroy\-Cluster Command\fP command when finished to destroy the cluster!  Otherwise things may not be cleaned up properly.
.INDENT 0.0
.IP 1. 4
Download \fBpestis.tar.gz\fP
.IP 2. 4
Launch a leader node using the \fI\%Launch\-Cluster Command\fP command:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil launch\-cluster <cluster\-name> \e
             \-\-provisioner <aws, gce> \e
             \-\-keyPairName <key\-pair\-name> \e
             \-\-leaderNodeType <type> \e
             \-\-zone <zone>
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBNOTE:\fP
.INDENT 4.0
.INDENT 3.5
\fBA Helpful Tip\fP
.sp
When using AWS, setting the environment variable eliminates having to specify the \fB\-\-zone\fP option
for each command. This will be supported for GCE in the future.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ export TOIL_AWS_ZONE=us\-west\-2c
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.IP 3. 4
Create appropriate directory for uploading files:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil ssh\-cluster \-\-provisioner <aws, gce> <cluster\-name>
$ mkdir /root/cact_ex
$ exit
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 4. 4
Copy the required files, i.e., seqFile.txt (a text file containing the locations of the input sequences as
well as their phylogenetic tree, see
\fI\%here\fP), organisms\(aq genome sequence
files in FASTA format, and configuration files (e.g. blockTrim1.xml, if desired), up to the leader node:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil rsync\-cluster \-\-provisioner <aws, gce> <cluster\-name> pestis\-short\-aws\-seqFile.txt :/root/cact_ex
$ toil rsync\-cluster \-\-provisioner <aws, gce> <cluster\-name> GCF_000169655.1_ASM16965v1_genomic.fna :/root/cact_ex
$ toil rsync\-cluster \-\-provisioner <aws, gce> <cluster\-name> GCF_000006645.1_ASM664v1_genomic.fna :/root/cact_ex
$ toil rsync\-cluster \-\-provisioner <aws, gce> <cluster\-name> GCF_000182485.1_ASM18248v1_genomic.fna :/root/cact_ex
$ toil rsync\-cluster \-\-provisioner <aws, gce> <cluster\-name> GCF_000013805.1_ASM1380v1_genomic.fna :/root/cact_ex
$ toil rsync\-cluster \-\-provisioner <aws, gce> <cluster\-name> setup_leaderNode.sh :/root/cact_ex
$ toil rsync\-cluster \-\-provisioner <aws, gce> <cluster\-name> blockTrim1.xml :/root/cact_ex
$ toil rsync\-cluster \-\-provisioner <aws, gce> <cluster\-name> blockTrim3.xml :/root/cact_ex
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 5. 4
Log in to the leader node:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil ssh\-cluster \-\-provisioner <aws, gce> <cluster\-name>
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 6. 4
Set up the environment of the leader node to run Cactus:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ bash /root/cact_ex/setup_leaderNode.sh
$ source cact_venv/bin/activate
(cact_venv) $ cd cactus
(cact_venv) $ pip install \-\-upgrade .
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 7. 4
Run \fI\%Cactus\fP as an autoscaling workflow:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
(cact_venv) $ TOIL_APPLIANCE_SELF=quay.io/ucsc_cgl/toil:3.14.0 cactus \e
                  \-\-provisioner <aws, gce> \e
                  \-\-nodeType <type> \e
                  \-\-maxNodes 2 \e
                  \-\-minNodes 0 \e
                  \-\-retry 10 \e
                  \-\-batchSystem mesos \e
                  \-\-logDebug \e
                  \-\-logFile /logFile_pestis3 \e
                  \-\-configFile \e
                  /root/cact_ex/blockTrim3.xml <aws, google>:<zone>:cactus\-pestis \e
                  /root/cact_ex/pestis\-short\-aws\-seqFile.txt \e
                  /root/cact_ex/pestis_output3.hal
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBNOTE:\fP
.INDENT 4.0
.INDENT 3.5
\fBPieces of the Puzzle\fP:
.sp
\fBTOIL_APPLIANCE_SELF=quay.io/ucsc_cgl/toil:3.14.0\fP \-\-\- specifies the version of Toil being used, 3.14.0;
if the latest one is desired, please eliminate.
.sp
\fB\-\-nodeType\fP \-\-\- determines the instance type used for worker nodes. The instance type specified here must be on
the same cloud provider as the one specified with \fB\-\-leaderNodeType\fP
.sp
\fB\-\-maxNodes 2\fP \-\-\- creates up to two instances of the type specified with \fB\-\-nodeType\fP and
launches Mesos worker containers inside them.
.sp
\fB\-\-logDebug\fP \-\-\- equivalent to \fB\-\-logLevel DEBUG\fP\&.
.sp
\fB\-\-logFile /logFile_pestis3\fP \-\-\- writes logs in a file named \fIlogFile_pestis3\fP under \fB/\fP folder.
.sp
\fB\-\-configFile\fP \-\-\- this is not required depending on whether a specific configuration file is intended to run
the alignment.
.sp
\fB<aws, google>:<zone>:cactus\-pestis\fP \-\-\- creates a bucket, named \fBcactus\-pestis\fP, with the specified cloud provider to store intermediate job files and metadata.
\fBNOTE\fP: If you want to use a GCE\-based jobstore, specify \fBgoogle\fP here, not \fBgce\fP\&.
.sp
The result file, named \fBpestis_output3.hal\fP, is stored under \fB/root/cact_ex\fP folder of the leader node.
.sp
Use \fBcactus \-\-help\fP to see all the Cactus and Toil flags available.
.UNINDENT
.UNINDENT
.IP 8. 4
Log out of the leader node:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
(cact_venv) $ exit
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 9. 4
Download the resulted output to local machine:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil rsync\-cluster \e
             \-\-provisioner <aws, gce> <cluster\-name> \e
             :/root/cact_ex/pestis_output3.hal \e
             <path\-of\-folder\-on\-local\-machine>
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 10. 4
Destroy the cluster:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil destroy\-cluster \-\-provisioner <aws, gce> <cluster\-name>
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.SH INTRODUCTION
.sp
Toil runs in various environments, including \fI\%locally\fP and \fI\%in the cloud\fP
(Amazon Web Services and Google Compute Engine).  Toil also supports two DSLs: \fI\%CWL\fP and
(Amazon Web Services and Google Compute Engine).  Toil also supports two DSLs: \fI\%CWL\fP and
\fI\%WDL\fP (experimental).
.sp
Toil is built in a modular way so that it can be used on lots of different systems, and with different configurations.
The three configurable pieces are the
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
\fI\%Job Store API\fP: A filepath or url that can host and centralize all files for a workflow (e.g. a local folder, or an AWS s3 bucket url).
.IP \(bu 2
\fI\%Batch System API\fP: Specifies either a local single\-machine or a currently supported HPC environment (lsf, parasol, mesos, slurm, torque, htcondor, kubernetes, or grid_engine).  Mesos is a special case, and is launched for cloud environments.
.IP \(bu 2
\fI\%Provisioner\fP: For running in the cloud only.  This specifies which cloud provider provides instances to do the \(dqwork\(dq of your workflow.
.UNINDENT
.UNINDENT
.UNINDENT
.SS Job Store
.sp
The job store is a storage abstraction which contains all of the information used in a Toil run. This centralizes all
of the files used by jobs in the workflow and also the details of the progress of the run. If a workflow crashes
or fails, the job store contains all of the information necessary to resume with minimal repetition of work.
.sp
Several different job stores are supported, including the file job store and cloud job stores.
.SS File Job Store
.sp
The file job store is for use locally, and keeps the workflow information in a directory on the machine where the
workflow is launched.  This is the simplest and most convenient job store for testing or for small runs.
.sp
For an example that uses the file job store, see \fI\%Running a basic workflow\fP\&.
.SS Cloud Job Stores
.sp
Toil currently supports the following cloud storage systems as job stores:
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
\fI\%AWS Job Store\fP: An AWS S3 bucket formatted as \(dqaws:<zone>:<bucketname>\(dq where only numbers, letters, and dashes are allowed in the bucket name.  Example: \fIaws:us\-west\-2:my\-aws\-jobstore\-name\fP\&.
.IP \(bu 2
\fI\%Google Job Store\fP: A Google Cloud Storage bucket formatted as \(dqgce:<zone>:<bucketname>\(dq where only numbers, letters, and dashes are allowed in the bucket name.  Example: \fIgce:us\-west2\-a:my\-google\-jobstore\-name\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
These use cloud buckets to house all of the files. This is useful if there are several different
worker machines all running jobs that need to access the job store.
.SS Batch System
.sp
A Toil batch system is either a local single\-machine (one computer) or a currently supported
HPC cluster of computers (lsf, parasol, mesos, slurm, torque, htcondor, or grid_engine).  Mesos
is a special case, and is launched for cloud environments.  These environments manage individual
worker nodes under a leader node to process the work required in a workflow.  The leader and its
workers all coordinate their tasks and files through a centralized job store location.
.sp
See \fI\%Batch System API\fP for a more detailed description of different batch systems.
.SS Provisioner
.sp
The Toil provisioner provides a tool set for running a Toil workflow on a particular cloud platform.
.sp
The \fI\%Cluster Utilities\fP are command line tools used to provision nodes in your desired cloud platform.
They allows you to launch nodes, ssh to the leader, and rsync files back and forth.
.sp
For detailed instructions for using the provisioner see \fI\%Running in AWS\fP or \fI\%Running in Google Compute Engine (GCE)\fP\&.
.SH COMMANDLINE OPTIONS
.sp
A quick way to see all of Toil\(aqs commandline options is by executing the following on a toil script:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil example.py \-\-help
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
For a basic toil workflow, Toil has one mandatory argument, the job store.  All other arguments are optional.
.SS The Job Store
.sp
Running toil scripts requires a filepath or url to a centralizing location for all of the files of the workflow.
This is Toil\(aqs one required positional argument: the job store.  To use the \fI\%quickstart\fP example,
if you\(aqre on a node that has a large \fB/scratch\fP volume, you can specify that the jobstore be created there by
executing: \fBpython3 HelloWorld.py /scratch/my\-job\-store\fP, or more explicitly,
\fBpython3 HelloWorld.py file:/scratch/my\-job\-store\fP\&.
.sp
Syntax for specifying different job stores:
.INDENT 0.0
.INDENT 3.5
Local: \fBfile:job\-store\-name\fP
.sp
AWS: \fBaws:region\-here:job\-store\-name\fP
.sp
Google: \fBgoogle:projectID\-here:job\-store\-name\fP
.UNINDENT
.UNINDENT
.sp
Different types of job store options can be found below.
.SS Commandline Options
.sp
\fBCore Toil Options\fP
Options to specify the location of the Toil workflow and turn on stats collation
about the performance of jobs.
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.BI \-\-workDir \ WORKDIR
Absolute path to directory where temporary files
generated during the Toil run should be placed. Standard
output and error from batch system jobs (unless \-\-noStdOutErr)
will be placed in this directory. A cache directory
may be placed in this directory. Temp files and folders
will be placed in a toil\-<workflowID> within workDir.
The workflowID is generated by Toil and will be reported
in the workflow logs. Default is determined by the variables
(TMPDIR, TEMP, TMP) via mkdtemp. This directory needs to
exist on all machines running jobs; if capturing standard
output and error from batch system jobs is desired, it will
generally need to be on a shared file system. When
sharing a cache between containers on a host, this
directory must be shared between the containers.
.TP
.BI \-\-coordinationDir \ COORDINATION_DIR
Absolute path to directory where Toil will keep state
and lock files. When sharing a cache between containers
on a host, this directory must be shared between the
containers.
.TP
.B  \-\-noStdOutErr
Do not capture standard output and error from batch system jobs.
.TP
.B  \-\-stats
Records statistics about the toil workflow to be used
by \(aqtoil stats\(aq.
.TP
.BI \-\-clean\fB= STATE
Determines the deletion of the jobStore upon
completion of the program. Choices: \(aqalways\(aq,
\(aqonError\(aq,\(aqnever\(aq, or \(aqonSuccess\(aq. The \-\e\-stats option
requires information from the jobStore upon completion
so the jobStore will never be deleted with that flag.
If you wish to be able to restart the run, choose
\(aqnever\(aq or \(aqonSuccess\(aq. Default is \(aqnever\(aq if stats is
enabled, and \(aqonSuccess\(aq otherwise
.TP
.BI \-\-cleanWorkDir \ STATE
Determines deletion of temporary worker directory upon
completion of a job. Choices: \(aqalways\(aq, \(aqonError\(aq, \(aqnever\(aq,
or \(aqonSuccess\(aq. Default = always. WARNING: This option
should be changed for debugging only. Running a full
pipeline with this option could fill your disk with
intermediate data.
.TP
.BI \-\-clusterStats \ FILEPATH
If enabled, writes out JSON resource usage statistics
to a file. The default location for this file is the
current working directory, but an absolute path can
also be passed to specify where this file should be
written. This option only applies when using scalable
batch systems.
.TP
.B  \-\-restart
If \-\e\-restart is specified then will attempt to restart
existing workflow at the location pointed to by the
\-\e\-jobStore option. Will raise an exception if the
workflow does not exist.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBLogging Options\fP
Toil hides stdout and stderr by default except in case of job failure. Log
levels in toil are based on priority from the logging module:
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B  \-\-logOff
Only CRITICAL log levels are shown.
Equivalent to \fB\-\-logLevel=OFF\fP or \fB\-\-logLevel=CRITICAL\fP\&.
.TP
.B  \-\-logCritical
Only CRITICAL log levels are shown.
Equivalent to \fB\-\-logLevel=OFF\fP or \fB\-\-logLevel=CRITICAL\fP\&.
.TP
.B  \-\-logError
Only ERROR, and CRITICAL log levels are shown.
Equivalent to \fB\-\-logLevel=ERROR\fP\&.
.TP
.B  \-\-logWarning
Only WARN, ERROR, and CRITICAL log levels are shown.
Equivalent to \fB\-\-logLevel=WARNING\fP\&.
.TP
.B  \-\-logInfo
All log statements are shown, except DEBUG.
Equivalent to \fB\-\-logLevel=INFO\fP\&.
.TP
.B  \-\-logDebug
All log statements are shown.
Equivalent to \fB\-\-logLevel=DEBUG\fP\&.
.TP
.BI \-\-logLevel\fB= LOGLEVEL
May be set to: \fBOFF\fP (or \fBCRITICAL\fP),
\fBERROR\fP, \fBWARN\fP (or \fBWARNING\fP), \fBINFO\fP, or \fBDEBUG\fP\&.
.TP
.BI \-\-logFile \ FILEPATH
Specifies a file path to write the logging output to.
.TP
.B  \-\-rotatingLogging
Turn on rotating logging, which prevents log files from
getting too big (set using \fB\-\-maxLogFileSize BYTESIZE\fP).
.TP
.BI \-\-maxLogFileSize \ BYTESIZE
The maximum size of a job log file to keep (in bytes),
log files larger than this will be truncated to the last
X bytes. Setting this option to zero will prevent any
truncation. Setting this option to a negative value will
truncate from the beginning. Default=62.5KiB
Sets the maximum log file size in bytes (\fB\-\-rotatingLogging\fP must be active).
.TP
.BI \-\-log\-dir \ DIRPATH
For CWL and local file system only. Log stdout and stderr (if tool requests stdout/stderr) to the DIRPATH.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBBatch System Options\fP
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.BI \-\-batchSystem \ BATCHSYSTEM
The type of batch system to run the job(s) with,
currently can be one of aws_batch, parasol, single_machine,
grid_engine, lsf, mesos, slurm, tes, torque,
htcondor, kubernetes. (default: single_machine)
.TP
.B  \-\-disableAutoDeployment
Should auto\-deployment of the user script be deactivated?
If True, the user script/package should be present at
the same location on all workers.  Default = False.
.TP
.BI \-\-maxLocalJobs \ MAXLOCALJOBS
For batch systems that support a local queue for
housekeeping jobs (Mesos, GridEngine, htcondor, lsf,
slurm, torque).  Specifies the maximum number of these
housekeeping jobs to run on the local system.  The
default (equal to the number of cores) is a maximum of
concurrent local housekeeping jobs.
.TP
.B  \-\-manualMemArgs
Do not add the default arguments: \(aqhv=MEMORY\(aq &
\(aqh_vmem=MEMORY\(aq to the qsub call, and instead rely on
TOIL_GRIDGENGINE_ARGS to supply alternative arguments.
Requires that TOIL_GRIDGENGINE_ARGS be set.
.TP
.B  \-\-runCwlInternalJobsOnWorkers
Whether to run CWL internal jobs (e.g. CWLScatter) on
the worker nodes instead of the primary node. If false
(default), then all such jobs are run on the primary node.
Setting this to true can speed up the pipeline for very
large workflows with many sub\-workflows and/or scatters,
provided that the worker pool is large enough.
.TP
.B  \-\-coalesceStatusCalls
Coalese status calls to prevent the batch system from
being overloaded. Currently only supported for LSF.
.TP
.BI \-\-statePollingWait \ STATEPOLLINGWAIT
Time, in seconds, to wait before doing a scheduler
query for job state. Return cached results if within
the waiting period. Only works for grid engine batch
systems such as gridengine, htcondor, torque, slurm,
and lsf.
.TP
.BI \-\-parasolCommand \ PARASOLCOMMAND
The name or path of the parasol program. Will be
looked up on PATH unless it starts with a
slash. (default: parasol)
.TP
.BI \-\-parasolMaxBatches \ PARASOLMAXBATCHES
Maximum number of job batches the Parasol batch is
allowed to create. One batch is created for jobs with
a unique set of resource requirements. (default: 1000)
.TP
.BI \-\-mesosEndpoint \ MESOSENDPOINT
The host and port of the Mesos server separated by a
colon. (default: <leader IP>:5050)
.TP
.BI \-\-kubernetesHostPath \ KUBERNETES_HOST_PATH
Path on Kubernetes hosts to use as shared inter\-pod temp
directory.
.TP
.BI \-\-kubernetesOwner \ KUBERNETES_OWNER
Username to mark Kubernetes jobs with.
.TP
.BI \-\-kubernetesServiceAccount \ KUBERNETES_SERVICE_ACCOUNT
Service account to run jobs as.
.TP
.BI \-\-kubernetesPodTimeout \ KUBERNETES_POD_TIMEOUT
Seconds to wait for a scheduled Kubernetes pod to
start running. (default: 120s)
.TP
.BI \-\-tesEndpoint \ TES_ENDPOINT
The http(s) URL of the TES server.
(default: \fI\%http:/\fP/<leader IP>:8000)
.TP
.BI \-\-tesUser \ TES_USER
User name to use for basic authentication to TES server.
.TP
.BI \-\-tesPassword \ TES_PASSWORD
Password to use for basic authentication to TES server.
.TP
.BI \-\-tesBearerToken \ TES_BEARER_TOKEN
Bearer token to use for authentication to TES server.
.TP
.BI \-\-awsBatchRegion \ AWS_BATCH_REGION
The AWS region containing the AWS Batch queue to submit
to.
.TP
.BI \-\-awsBatchQueue \ AWS_BATCH_QUEUE
The name or ARN of the AWS Batch queue to submit to.
.TP
.BI \-\-awsBatchJobRoleArn \ AWS_BATCH_JOB_ROLE_ARN
The ARN of an IAM role to run AWS Batch jobs as, so they
can e.g. access a job store. Must be assumable by
ecs\-tasks.amazonaws.com
.TP
.BI \-\-scale \ SCALE
A scaling factor to change the value of all submitted
tasks\(aq submitted cores. Used in single_machine batch
system. Useful for running workflows on smaller
machines than they were designed for, by setting a
value less than 1. (default: 1)
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBData Storage Options\fP
Allows configuring Toil\(aqs data storage.
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B  \-\-linkImports
When using a filesystem based job store, CWL input files
are by default symlinked in. Specifying this option
instead copies the files into the job store, which may
protect them from being modified externally. When not
specified and as long as caching is enabled, Toil will
protect the file automatically by changing the permissions
to read\-only.
.TP
.B  \-\-moveExports
When using a filesystem based job store, output files
are by default moved to the output directory, and a
symlink to the moved exported file is created at the
initial location. Specifying this option instead copies
the files into the output directory. Applies to
filesystem\-based job stores only.
.TP
.B  \-\-disableCaching
Disables caching in the file store. This flag must be
set to use a batch system that does not support
cleanup, such as Parasol.
.TP
.BI \-\-caching \ BOOL
Set caching options. This must be set to \(dqfalse\(dq
to use a batch system that does not support
cleanup, such as Parasol. Set to \(dqtrue\(dq if caching
is desired.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBAutoscaling Options\fP
Allows the specification of the minimum and maximum number of nodes in an
autoscaled cluster, as well as parameters to control the level of provisioning.
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.BI \-\-provisioner \ CLOUDPROVIDER
The provisioner for cluster auto\-scaling. This is the
main Toil \-\e\-provisioner option, and defaults to None
for running on single_machine and non\-auto\-scaling batch
systems. The currently supported choices are \(aqaws\(aq or
\(aqgce\(aq.
.TP
.BI \-\-nodeTypes \ NODETYPES
Specifies a list of comma\-separated node types, each of which is
composed of slash\-separated instance types, and an optional spot
bid set off by a colon, making the node type preemptible. Instance
types may appear in multiple node types, and the same node type
may appear as both preemptible and non\-preemptible.
.INDENT 7.0
.TP
.B Valid argument specifying two node types:
c5.4xlarge/c5a.4xlarge:0.42,t2.large
.TP
.B Node types:
c5.4xlarge/c5a.4xlarge:0.42 and t2.large
.TP
.B Instance types:
c5.4xlarge, c5a.4xlarge, and t2.large
.TP
.B Semantics:
Bid $0.42/hour for either c5.4xlarge or c5a.4xlarge instances,
treated interchangeably, while they are available at that price,
and buy t2.large instances at full price
.UNINDENT
.TP
.BI \-\-minNodes \ MINNODES
Minimum number of nodes of each type in the cluster,
if using auto\-scaling. This should be provided as a
comma\-separated list of the same length as the list of
node types. default=0
.TP
.BI \-\-maxNodes \ MAXNODES
Maximum number of nodes of each type in the cluster,
if using autoscaling, provided as a comma\-separated
list. The first value is used as a default if the list
length is less than the number of nodeTypes.
default=10
.TP
.BI \-\-targetTime \ TARGETTIME
Sets how rapidly you aim to complete jobs in seconds.
Shorter times mean more aggressive parallelization.
The autoscaler attempts to scale up/down so that it
expects all queued jobs will complete within targetTime
seconds. (Default: 1800)
.TP
.BI \-\-betaInertia \ BETAINERTIA
A smoothing parameter to prevent unnecessary
oscillations in the number of provisioned nodes. This
controls an exponentially weighted moving average of the
estimated number of nodes. A value of 0.0 disables any
smoothing, and a value of 0.9 will smooth so much that
few changes will ever be made.  Must be between 0.0 and
0.9. (Default: 0.1)
.TP
.BI \-\-scaleInterval \ SCALEINTERVAL
The interval (seconds) between assessing if the scale of
the cluster needs to change. (Default: 60)
.TP
.BI \-\-preemptibleCompensation \ PREEMPTIBLECOMPENSATION
The preference of the autoscaler to replace
preemptible nodes with non\-preemptible nodes, when
preemptible nodes cannot be started for some reason.
Defaults to 0.0. This value must be between 0.0 and
1.0, inclusive. A value of 0.0 disables such
compensation, a value of 0.5 compensates two missing
preemptible nodes with a non\-preemptible one. A value
of 1.0 replaces every missing pre\-emptable node with a
non\-preemptible one.
.TP
.BI \-\-nodeStorage \ NODESTORAGE
Specify the size of the root volume of worker nodes
when they are launched in gigabytes. You may want to
set this if your jobs require a lot of disk space. The
default value is 50.
.TP
.BI \-\-nodeStorageOverrides \ NODESTORAGEOVERRIDES
Comma\-separated list of nodeType:nodeStorage that are used
to override the default value from \-\e\-nodeStorage for the
specified nodeType(s). This is useful for heterogeneous
jobs where some tasks require much more disk than others.
.TP
.B  \-\-metrics
Enable the prometheus/grafana dashboard for monitoring
CPU/RAM usage, queue size, and issued jobs.
.TP
.B  \-\-assumeZeroOverhead
Ignore scheduler and OS overhead and assume jobs can use every
last byte of memory and disk on a node when autoscaling.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBService Options\fP
Allows the specification of the maximum number of service jobs in a cluster. By
keeping this limited we can avoid nodes occupied with services causing deadlocks.
(Not for CWL).
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.BI \-\-maxServiceJobs \ MAXSERVICEJOBS
The maximum number of service jobs that can be run
concurrently, excluding service jobs running on
preemptible nodes. default=9223372036854775807
.TP
.BI \-\-maxPreemptibleServiceJobs \ MAXPREEMPTIBLESERVICEJOBS
The maximum number of service jobs that can run
concurrently on preemptible nodes.
default=9223372036854775807
.TP
.BI \-\-deadlockWait \ DEADLOCKWAIT
Time, in seconds, to tolerate the workflow running only
the same service jobs, with no jobs to use them, before
declaring the workflow to be deadlocked and stopping.
default=60
.TP
.BI \-\-deadlockCheckInterval \ DEADLOCKCHECKINTERVAL
Time, in seconds, to wait between checks to see if the
workflow is stuck running only service jobs, with no
jobs to use them. Should be shorter than
\-\e\-deadlockWait. May need to be increased if the batch
system cannot enumerate running jobs quickly enough, or
if polling for running jobs is placing an unacceptable
load on a shared cluster. default=30
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBResource Options\fP
The options to specify default cores/memory requirements (if not specified by
the jobs themselves), and to limit the total amount of memory/cores requested
from the batch system.
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.BI \-\-defaultMemory \ INT
The default amount of memory to request for a job.
Only applicable to jobs that do not specify an
explicit value for this requirement. Standard suffixes
like K, Ki, M, Mi, G or Gi are supported. Default is
2.0G
.TP
.BI \-\-defaultCores \ FLOAT
The default number of CPU cores to dedicate a job.
Only applicable to jobs that do not specify an
explicit value for this requirement. Fractions of a
core (for example 0.1) are supported on some batch
systems, namely Mesos and singleMachine. Default is
1.0
.TP
.BI \-\-defaultDisk \ INT
The default amount of disk space to dedicate a job.
Only applicable to jobs that do not specify an
explicit value for this requirement. Standard suffixes
like K, Ki, M, Mi, G or Gi are supported. Default is
2.0G
.TP
.BI \-\-defaultAccelerators \ ACCELERATOR
The default amount of accelerators to request for a
job. Only applicable to jobs that do not specify an
explicit value for this requirement. Each accelerator
specification can have a type (gpu [default], nvidia,
amd, cuda, rocm, opencl, or a specific model like
nvidia\-tesla\-k80), and a count [default: 1]. If both a
type and a count are used, they must be separated by a
colon. If multiple types of accelerators are used, the
specifications are separated by commas. Default is [].
.TP
.BI \-\-defaultPreemptible \ BOOL
Make all jobs able to run on preemptible (spot) nodes
by default.
.TP
.BI \-\-maxCores \ INT
The maximum number of CPU cores to request from the
batch system at any one time. Standard suffixes like
K, Ki, M, Mi, G or Gi are supported.
.TP
.BI \-\-maxMemory \ INT
The maximum amount of memory to request from the batch
system at any one time. Standard suffixes like K, Ki,
M, Mi, G or Gi are supported.
.TP
.BI \-\-maxDisk \ INT
The maximum amount of disk space to request from the
batch system at any one time. Standard suffixes like
K, Ki, M, Mi, G or Gi are supported.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBOptions for rescuing/killing/restarting jobs.\fP
The options for jobs that either run too long/fail or get lost (some batch
systems have issues!).
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.BI \-\-retryCount \ RETRYCOUNT
Number of times to retry a failing job before giving
up and labeling job failed. default=1
.TP
.B  \-\-enableUnlimitedPreemptibleRetries
If set, preemptible failures (or any failure due to an
instance getting unexpectedly terminated) will not count
towards job failures and \-\e\-retryCount.
.TP
.B  \-\-doubleMem
If set, batch jobs which die due to reaching memory
limit on batch schedulers will have their memory
doubled and they will be retried. The remaining
retry count will be reduced by 1. Currently only
supported by LSF. default=False.
.TP
.BI \-\-maxJobDuration \ MAXJOBDURATION
Maximum runtime of a job (in seconds) before we kill
it (this is a lower bound, and the actual time before
killing the job may be longer).
.TP
.BI \-\-rescueJobsFrequency \ RESCUEJOBSFREQUENCY
Period of time to wait (in seconds) between checking
for missing/overlong jobs, that is jobs which get lost
by the batch system. Expert parameter.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBLog Management Options\fP
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.BI \-\-maxLogFileSize \ MAXLOGFILESIZE
The maximum size of a job log file to keep (in bytes),
log files larger than this will be truncated to the
last X bytes. Setting this option to zero will prevent
any truncation. Setting this option to a negative
value will truncate from the beginning. Default=62.5 K
.TP
.BI \-\-writeLogs \ FILEPATH
Write worker logs received by the leader into their
own files at the specified path. Any non\-empty standard
output and error from failed batch system jobs will also
be written into files at this path. The current working
directory will be used if a path is not specified
explicitly. Note: By default only the logs of failed
jobs are returned to leader. Set log level to \(aqdebug\(aq or
enable \-\e\-writeLogsFromAllJobs to get logs back from
successful jobs, and adjust \-\e\-maxLogFileSize to
control the truncation limit for worker logs.
.TP
.BI \-\-writeLogsGzip \ FILEPATH
Identical to \-\e\-writeLogs except the logs files are
gzipped on the leader.
.TP
.BI \-\-writeMessages \ FILEPATH
File to send messages from the leader\(aqs message bus to.
.TP
.B  \-\-realTimeLogging
Enable real\-time logging from workers to leader.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBMiscellaneous Options\fP
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B  \-\-disableChaining
Disables chaining of jobs (chaining uses one job\(aqs
resource allocation for its successor job if
possible).
.TP
.B  \-\-disableJobStoreChecksumVerification
Disables checksum verification for files transferred
to/from the job store. Checksum verification is a safety
check to ensure the data is not corrupted during transfer.
Currently only supported for non\-streaming AWS files
.TP
.BI \-\-sseKey \ SSEKEY
Path to file containing 32 character key to be used
for server\-side encryption on awsJobStore or
googleJobStore. SSE will not be used if this flag is
not passed.
.TP
.BI \-\-setEnv \ NAME\fR,\fB \ \-e \ NAME
NAME=VALUE or NAME, \-e NAME=VALUE or NAME are also valid.
Set an environment variable early on in the worker. If
VALUE is omitted, it will be looked up in the current
environment. Independently of this option, the worker
will try to emulate the leader\(aqs environment before
running a job, except for some variables known to vary
across systems. Using this option, a variable can be
injected into the worker process itself before it is
started.
.TP
.BI \-\-servicePollingInterval \ SERVICEPOLLINGINTERVAL
Interval of time service jobs wait between polling for
the existence of the keep\-alive flag (default=60)
.TP
.B  \-\-forceDockerAppliance
Disables sanity checking the existence of the docker
image specified by TOIL_APPLIANCE_SELF, which Toil uses
to provision mesos for autoscaling.
.TP
.BI \-\-statusWait \ INT
Seconds to wait between reports of running jobs.
(default=3600)
.TP
.B  \-\-disableProgress
Disables the progress bar shown when standard error is
a terminal.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBDebug Options\fP
Debug options for finding problems or helping with testing.
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B  \-\-debugWorker
Experimental no forking mode for local debugging.
Specifically, workers are not forked and stderr/stdout
are not redirected to the log. (default=False)
.TP
.B  \-\-disableWorkerOutputCapture
Let worker output go to worker\(aqs standard out/error
instead of per\-job logs.
.TP
.BI \-\-badWorker \ BADWORKER
For testing purposes randomly kill \-\e\-badWorker
proportion of jobs using SIGKILL. (Default: 0.0)
.TP
.BI \-\-badWorkerFailInterval \ BADWORKERFAILINTERVAL
When killing the job pick uniformly within the interval
from 0.0 to \-\e\-badWorkerFailInterval seconds after the
worker starts. (Default: 0.01)
.TP
.BI \-\-kill_polling_interval \ KILL_POLLING_INTERVAL
Interval of time (in seconds) the leader waits between
polling for the kill flag inside the job store set by
the \(dqtoil kill\(dq command. (default=5)
.UNINDENT
.UNINDENT
.UNINDENT
.SS Restart Option
.sp
In the event of failure, Toil can resume the pipeline by adding the argument
\fB\-\-restart\fP and rerunning the python script. Toil pipelines (but not CWL
pipelines) can even be edited and resumed which is useful for development or
troubleshooting.
.SS Running Workflows with Services
.sp
Toil supports jobs, or clusters of jobs, that run as \fIservices\fP to other
\fIaccessor\fP jobs. Example services include server databases or Apache Spark
Clusters. As service jobs exist to provide services to accessor jobs their
runtime is dependent on the concurrent running of their accessor jobs. The dependencies
between services and their accessor jobs can create potential deadlock scenarios,
where the running of the workflow hangs because only service jobs are being
run and their accessor jobs can not be scheduled because of too limited resources
to run both simultaneously. To cope with this situation Toil attempts to
schedule services and accessors intelligently, however to avoid a deadlock
with workflows running service jobs it is advisable to use the following parameters:
.INDENT 0.0
.IP \(bu 2
\fB\-\-maxServiceJobs\fP: The maximum number of service jobs that can be run concurrently, excluding service jobs running on preemptible nodes.
.IP \(bu 2
\fB\-\-maxPreemptibleServiceJobs\fP: The maximum number of service jobs that can run concurrently on preemptible nodes.
.UNINDENT
.sp
Specifying these parameters so that at a maximum cluster size there will be
sufficient resources to run accessors in addition to services will ensure that
such a deadlock can not occur.
.sp
If too low a limit is specified then a deadlock can occur in which toil can
not schedule sufficient service jobs concurrently to complete the workflow.
Toil will detect this situation if it occurs and throw a
\fBtoil.DeadlockException\fP exception. Increasing the cluster size
and these limits will resolve the issue.
.SS Setting Options directly with the Toil Script
.sp
It\(aqs good to remember that commandline options can be overridden in the Toil script itself.  For example,
\fI\%toil.job.Job.Runner.getDefaultOptions()\fP can be used to run toil with all default options, and in this example,
it will override commandline args to run the default options and always run with the \(dq./toilWorkflow\(dq directory
specified as the jobstore:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
options = Job.Runner.getDefaultOptions(\(dq./toilWorkflow\(dq) # Get the options object

with Toil(options) as toil:
    toil.start(Job())  # Run the script
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
However, each option can be explicitly set within the script by supplying arguments (in this example, we are setting
\fBlogLevel = \(dqDEBUG\(dq\fP (all log statements are shown) and \fBclean=\(dqALWAYS\(dq\fP (always delete the jobstore) like so:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
options = Job.Runner.getDefaultOptions(\(dq./toilWorkflow\(dq) # Get the options object
options.logLevel = \(dqDEBUG\(dq # Set the log level to the debug level.
options.clean = \(dqALWAYS\(dq # Always delete the jobStore after a run

with Toil(options) as toil:
    toil.start(Job())  # Run the script
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
However, the usual incantation is to accept commandline args from the user with the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
parser = Job.Runner.getDefaultArgumentParser() # Get the parser
options = parser.parse_args() # Parse user args to create the options object

with Toil(options) as toil:
    toil.start(Job())  # Run the script
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Which can also, of course, then accept script supplied arguments as before (which will overwrite any user supplied args):
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
parser = Job.Runner.getDefaultArgumentParser() # Get the parser
options = parser.parse_args() # Parse user args to create the options object
options.logLevel = \(dqDEBUG\(dq # Set the log level to the debug level.
options.clean = \(dqALWAYS\(dq # Always delete the jobStore after a run

with Toil(options) as toil:
    toil.start(Job())  # Run the script
.ft P
.fi
.UNINDENT
.UNINDENT
.SH TOIL DEBUGGING
.sp
Toil has a number of tools to assist in debugging.  Here we provide help in working through potential problems that a user might encounter in attempting to run a workflow.
.SS Introspecting the Jobstore
.sp
Note: Currently these features are only implemented for use locally (single machine) with the fileJobStore.
.sp
To view what files currently reside in the jobstore, run the following command:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil debug\-file file:path\-to\-jobstore\-directory \e
      \-\-listFilesInJobStore
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
When run from the commandline, this should generate a file containing the contents of the job store (in addition to
displaying a series of log messages to the terminal).  This file is named \(dqjobstore_files.txt\(dq by default and will be
generated in the current working directory.
.sp
If one wishes to copy any of these files to a local directory, one can run for example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil debug\-file file:path\-to\-jobstore \e
      \-\-fetch overview.txt *.bam *.fastq \e
      \-\-localFilePath=/home/user/localpath
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To fetch \fBoverview.txt\fP, and all \fB\&.bam\fP and \fB\&.fastq\fP files.  This can be used to recover previously used input and output
files for debugging or reuse in other workflows, or use in general debugging to ensure that certain outputs were imported
into the jobStore.
.SS Stats and Status
.sp
See \fI\%Stats Command\fP for more about gathering statistics about job success, runtime, and resource usage from workflows.
.SS Using a Python debugger
.sp
If you execute a workflow using the \fB\-\-debugWorker\fP flag, Toil will not fork in order to run jobs, which means
you can either use \fI\%pdb\fP, or an \fI\%IDE that supports debugging Python\fP as you would normally. Note that the \fB\-\-debugWorker\fP flag will
only work with the \fBsingleMachine\fP batch system (the default), and not any of the custom job schedulers.
.SH RUNNING IN THE CLOUD
.sp
Toil supports Amazon Web Services (AWS) and Google Compute Engine (GCE) in the cloud and has autoscaling capabilities
that can adapt to the size of your workflow, whether your workflow requires 10 instances or 20,000.
.sp
Toil does this by creating a virtual cluster with \fI\%Apache Mesos\fP\&.  \fI\%Apache Mesos\fP requires a leader node to coordinate
the workflow, and worker nodes to execute the various tasks within the workflow.  As the workflow runs, Toil will
\(dqautoscale\(dq, creating and terminating workers as needed to meet the demands of the workflow.
.sp
Once a user is familiar with the basics of running toil locally (specifying a \fI\%jobStore\fP, and
how to write a toil script), they can move on to the guides below to learn how to translate these workflows into cloud
ready workflows.
.SS Managing a Cluster of Virtual Machines (Provisioning)
.sp
Toil can launch and manage a cluster of virtual machines to run using the \fIprovisioner\fP to run a workflow
distributed over several nodes. The provisioner also has the ability to automatically scale up or down the size of
the cluster to handle dynamic changes in computational demand (autoscaling).  Currently we have working provisioners
with AWS and GCE (Azure support has been deprecated).
.sp
Toil uses \fI\%Apache Mesos\fP as the \fI\%Batch System\fP\&.
.sp
See here for instructions for \fI\%Running in AWS\fP\&.
.sp
See here for instructions for \fI\%Running in Google Compute Engine (GCE)\fP\&.
.SS Storage (Toil jobStore)
.sp
Toil can make use of cloud storage such as AWS or Google buckets to take care of storage needs.
.sp
This is useful when running Toil in single machine mode on any cloud platform since it allows you to
make use of their integrated storage systems.
.sp
For an overview of the job store see \fI\%Job Store\fP\&.
.sp
For instructions configuring a particular job store see:
.INDENT 0.0
.IP \(bu 2
\fI\%AWS Job Store\fP
.IP \(bu 2
\fI\%Google Job Store\fP
.UNINDENT
.SH CLOUD PLATFORMS
.SS Running on Kubernetes
.sp
\fI\%Kubernetes\fP is a very popular container orchestration tool that has become a \fIde facto\fP cross\-cloud\-provider API for accessing cloud resources. Major cloud providers like \fI\%Amazon\fP, \fI\%Microsoft\fP, Kubernetes owner \fI\%Google\fP, and \fI\%DigitalOcean\fP have invested heavily in making Kubernetes work well on their platforms, by writing their own deployment documentation and developing provider\-managed Kubernetes\-based products. Using \fI\%minikube\fP, Kubernetes can even be run on a single machine.
.sp
Toil supports running Toil workflows against a Kubernetes cluster, either in the cloud or deployed on user\-owned hardware.
.SS Preparing your Kubernetes environment
.INDENT 0.0
.IP 1. 3
\fBGet a Kubernetes cluster\fP
.sp
To run Toil workflows on Kubernetes, you need to have a Kubernetes cluster set up. This will not be covered here, but there are many options available, and which one you choose will depend on which cloud ecosystem if any you use already, and on pricing. If you are just following along with the documentation, use \fBminikube\fP on your local machine.
.sp
\fBNote that currently the only way to run a Toil workflow on Kubernetes is to use the AWS Job Store, so your Kubernetes workflow will currently have to store its data in Amazon\(aqs cloud regardless of where you run it. This can result in significant egress charges from Amazon if you run it outside of Amazon.\fP
.sp
Kubernetes Cluster Providers:
.INDENT 3.0
.IP \(bu 2
Your own institution
.IP \(bu 2
\fI\%Amazon EKS\fP
.IP \(bu 2
\fI\%Microsoft Azure AKS\fP
.IP \(bu 2
\fI\%Google GKE\fP
.IP \(bu 2
\fI\%DigitalOcean Kubernetes\fP
.IP \(bu 2
\fI\%minikube\fP
.UNINDENT
.IP 2. 3
\fBGet a Kubernetes context on your local machine\fP
.sp
There are two main ways to run Toil workflows on Kubernetes. You can either run the Toil leader on a machine outside the cluster, with jobs submitted to and run on the cluster, or you can submit the Toil leader itself as a job and have it run inside the cluster. Either way, you will need to configure your own machine to be able to submit jobs to the Kubernetes cluster. Generally, this involves creating and populating a file named \fB\&.kube/config\fP in your user\(aqs home directory, and specifying the cluster to connect to, the certificate and token information needed for mutual authentication, and the Kubernetes namespace within which to work. However, Kubernetes configuration can also be picked up from other files in the \fB\&.kube\fP directory, environment variables, and the enclosing host when running inside a Kubernetes\-managed container.
.sp
You will have to do different things here depending on where you got your Kubernetes cluster:
.INDENT 3.0
.IP \(bu 2
\fI\%Configuring for Amazon EKS\fP
.IP \(bu 2
\fI\%Configuring for Microsoft Azure AKS\fP
.IP \(bu 2
\fI\%Configuring for Google GKE\fP
.IP \(bu 2
\fI\%Configuring for DigitalOcean Kubernetes Clusters\fP
.IP \(bu 2
\fI\%Configuring for minikube\fP
.UNINDENT
.sp
Toil\(aqs internal Kubernetes configuration logic mirrors that of the \fBkubectl\fP command. Toil workflows will use the current \fBkubectl\fP context to launch their Kubernetes jobs.
.IP 3. 3
\fBIf running the Toil leader in the cluster, get a service account\fP
.sp
If you are going to run your workflow\(aqs leader within the Kubernetes cluster (see \fI\%Option 1: Running the Leader Inside Kubernetes\fP), you will need a service account in your chosen Kubernetes namespace. Most namespaces should have a service account named \fBdefault\fP which should work fine. If your cluster requires you to use a different service account, you will need to obtain its name and use it when launching the Kubernetes job containing the Toil leader.
.IP 4. 3
\fBSet up appropriate permissions\fP
.sp
Your local Kubernetes context and/or the service account you are using to run the leader in the cluster will need to have certain permissions in order to run the workflow. Toil needs to be able to interact with jobs and pods in the cluster, and to retrieve pod logs. You as a user may need permission to set up an AWS credentials secret, if one is not already available. Additionally, it is very useful for you as a user to have permission to interact with nodes, and to shell into pods.
.sp
The appropriate permissions may already be available to you and your service account by default, especially in managed or ease\-of\-use\-optimized setups such as EKS or minikube.
.sp
However, if the appropriate permissions are not already available, you or your cluster administrator will have to grant them manually. The following \fBRole\fP (\fBtoil\-user\fP) and \fBClusterRole\fP (\fBnode\-reader\fP), to be applied with \fBkubectl apply \-f filename.yaml\fP, should grant sufficient permissions to run Toil workflows when bound to your account and the service account used by Toil workflows. Be sure to replace \fBYOUR_NAMESPACE_HERE\fP with the namespace you are running your workflows in
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: YOUR_NAMESPACE_HERE
  name: toil\-user
rules:
\- apiGroups: [\(dq*\(dq]
  resources: [\(dq*\(dq]
  verbs: [\(dqexplain\(dq, \(dqget\(dq, \(dqwatch\(dq, \(dqlist\(dq, \(dqdescribe\(dq, \(dqlogs\(dq, \(dqattach\(dq, \(dqexec\(dq, \(dqport\-forward\(dq, \(dqproxy\(dq, \(dqcp\(dq, \(dqauth\(dq]
\- apiGroups: [\(dqbatch\(dq]
  resources: [\(dq*\(dq]
  verbs: [\(dqget\(dq, \(dqwatch\(dq, \(dqlist\(dq, \(dqcreate\(dq, \(dqrun\(dq, \(dqset\(dq, \(dqdelete\(dq]
\- apiGroups: [\(dq\(dq]
  resources: [\(dqsecrets\(dq, \(dqpods\(dq, \(dqpods/attach\(dq, \(dqpodtemplates\(dq, \(dqconfigmaps\(dq, \(dqevents\(dq, \(dqservices\(dq]
  verbs: [\(dqpatch\(dq, \(dqget\(dq, \(dqupdate\(dq, \(dqwatch\(dq, \(dqlist\(dq, \(dqcreate\(dq, \(dqrun\(dq, \(dqset\(dq, \(dqdelete\(dq, \(dqexec\(dq]
\- apiGroups: [\(dq\(dq]
  resources: [\(dqpods\(dq, \(dqpods/log\(dq]
  verbs: [\(dqget\(dq, \(dqlist\(dq]
\- apiGroups: [\(dq\(dq]
  resources: [\(dqpods/exec\(dq]
  verbs: [\(dqcreate\(dq]
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node\-reader
rules:
\- apiGroups: [\(dq\(dq]
  resources: [\(dqnodes\(dq]
  verbs: [\(dqget\(dq, \(dqlist\(dq, \(dqdescribe\(dq]
\- apiGroups: [\(dq\(dq]
  resources: [\(dqnamespaces\(dq]
  verbs: [\(dqget\(dq, \(dqlist\(dq, \(dqdescribe\(dq]
\- apiGroups: [\(dqmetrics.k8s.io\(dq]
  resources: [\(dq*\(dq]
  verbs: [\(dq*\(dq]
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To bind a user or service account to the \fBRole\fP or \fBClusterRole\fP and actually grant the permissions, you will need a \fBRoleBinding\fP and a \fBClusterRoleBinding\fP, respectively. Make sure to fill in the namespace, username, and service account name, and add more user stanzas if your cluster is to support multiple Toil users.
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: toil\-developer\-member
  namespace: toil
subjects:
\- kind: User
  name: YOUR_KUBERNETES_USERNAME_HERE
  apiGroup: rbac.authorization.k8s.io
\- kind: ServiceAccount
  name: YOUR_SERVICE_ACCOUNT_NAME_HERE
  namespace: YOUR_NAMESPACE_HERE
roleRef:
  kind: Role
  name: toil\-user
  apiGroup: rbac.authorization.k8s.io
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: read\-nodes
subjects:
\- kind: User
  name: YOUR_KUBERNETES_USERNAME_HERE
  apiGroup: rbac.authorization.k8s.io
\- kind: ServiceAccount
  name: YOUR_SERVICE_ACCOUNT_NAME_HERE
  namespace: YOUR_NAMESPACE_HERE
roleRef:
  kind: ClusterRole
  name: node\-reader
  apiGroup: rbac.authorization.k8s.io
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.SS AWS Job Store for Kubernetes
.sp
Currently, the only job store, which is what Toil uses to exchange data between jobs, that works with jobs running on Kubernetes is the AWS Job Store. This requires that the Toil leader and Kubernetes jobs be able to connect to and use Amazon S3 and Amazon SimpleDB. It also requires that you have an Amazon Web Services account.
.INDENT 0.0
.IP 1. 3
\fBGet access to AWS S3 and SimpleDB\fP
.sp
In your AWS account, you need to create an AWS access key. First go to the IAM dashboard; for \(dqus\-west1\(dq, the link would be:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
https://console.aws.amazon.com/iam/home?region=us\-west\-1#/home
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Then create an access key, and save the Access Key ID and the Secret Key. As documented in \fI\%the AWS documentation\fP:
.INDENT 3.0
.IP 1. 3
On the IAM Dashboard page, choose your account name in the navigation bar, and then choose My Security Credentials.
.IP 2. 3
Expand the Access keys (access key ID and secret access key) section.
.IP 3. 3
Choose Create New Access Key. Then choose Download Key File to save the access key ID and secret access key to a file on your computer. After you close the dialog box, you can\(aqt retrieve this secret access key again.
.UNINDENT
.sp
Make sure that, if your AWS infrastructure requires your user to authenticate with a multi\-factor authentication (MFA) token, you obtain a second secret key and access key that don\(aqt have this requirement. The secret key and access key used to populate the Kubernetes secret that allows the jobs to contact the job store need to be usable without human intervention.
.IP 2. 3
\fBConfigure AWS access from the local machine\fP
.sp
This only really needs to happen if you run the leader on the local machine. But we need the files in place to fill in the secret in the next step. Run:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ aws configure
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Then when prompted, enter your secret key and access key. This should create a file \fB~/.aws/credentials\fP that looks like this:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
[default]
aws_access_key_id =  BLAH
aws_secret_access_key =  blahblahblah
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 3. 3
\fBCreate a Kubernetes secret to give jobs access to AWS\fP
.UNINDENT
.INDENT 0.0
.INDENT 3.5
Go into the directory where the \fBcredentials\fP file is:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ cd ~/.aws
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Then, create a Kubernetes secret that contains it. We\(aqll call it \fBaws\-credentials\fP:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ kubectl create secret generic aws\-credentials \-\-from\-file credentials
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SS Configuring Toil for your Kubernetes environment
.sp
To configure your workflow to run on Kubernetes, you will have to configure several environment variables, in addition to passing the \fB\-\-batchSystem kubernetes\fP option. Doing the research to figure out what values to give these variables may require talking to your cluster provider.
.INDENT 0.0
.IP 1. 3
\fBTOIL_AWS_SECRET_NAME\fP is the most important, and \fBmust\fP be set to the secret that contains your AWS \fBcredentials\fP file, \fBif\fP your cluster nodes don\(aqt otherwise have access to S3 and SimpleDB (such as through IAM roles). This is required for the AWS job store to work, which is currently the only job store that can be used on Kubernetes. In this example we are using \fBaws\-credentials\fP\&.
.IP 2. 3
\fBTOIL_KUBERNETES_HOST_PATH\fP \fBcan\fP be set to allow Toil jobs on the same physical host to share a cache. It should be set to a path on the host where the shared cache should be stored. It will be mounted as \fB/var/lib/toil\fP, or at \fBTOIL_WORKDIR\fP if specified, inside the container. This path must already exist on the host, and must have as much free space as your Kubernetes node offers to jobs. In this example, we are using \fB/data/scratch\fP\&. To actually make use of caching, make sure not to use \fB\-\-disableCaching\fP\&.
.IP 3. 3
\fBTOIL_KUBERNETES_OWNER\fP \fBshould\fP be set to the username of the user running the Toil workflow. The jobs that Toil creates will include this username, so they can be more easily recognized, and cleaned up by the user if anything happens to the Toil leader. In this example we are using \fBdemo\-user\fP\&.
.UNINDENT
.sp
Note that Docker containers cannot be run inside of unprivileged Kubernetes pods (which are themselves containers). The Docker daemon does not (yet) support this. Other tools, such as Singularity in its user\-namespace mode, are able to run containers from within containers. If using Singularity to run containerized tools, and you want downloaded container images to persist between Toil jobs, you will also want to set \fBTOIL_KUBERNETES_HOST_PATH\fP and make sure that Singularity is downloading its containers under the Toil work directory (\fB/var/lib/toil\fP buy default) by setting \fBSINGULARITY_CACHEDIR\fP\&. However, you will need to make sure that no two jobs try to download the same container at the same time; Singularity has no synchronization or locking around its cache, but the cache is also not safe for simultaneous access by multiple Singularity invocations. Some Toil workflows use their own custom workaround logic for this problem; this work is likely to be made part of Toil in a future release.
.SS Running workflows
.sp
To run the workflow, you will need to run the Toil leader process somewhere. It can either be run inside Kubernetes as a Kubernetes job, or outside Kubernetes as a normal command.
.SS Option 1: Running the Leader Inside Kubernetes
.sp
Once you have determined a set of environment variable values for your workflow run, write a YAML file that defines a Kubernetes job to run your workflow with that configuration. Some configuration items (such as your username, and the name of your AWS credentials secret) need to be written into the YAML so that they can be used from the leader as well.
.sp
Note that the leader pod will need your workflow script, its other dependencies, and Toil all installed. An easy way to get Toil installed is to start with the Toil appliance image for the version of Toil you want to use. In this example, we use \fBquay.io/ucsc_cgl/toil:5.5.0\fP\&.
.sp
Here\(aqs an example YAML file to run a test workflow:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
apiVersion: batch/v1
kind: Job
metadata:
  # It is good practice to include your username in your job name.
  # Also specify it in TOIL_KUBERNETES_OWNER
  name: demo\-user\-toil\-test
# Do not try and rerun the leader job if it fails

spec:
 backoffLimit: 0
 template:
   spec:
     # Do not restart the pod when the job fails, but keep it around so the
     # log can be retrieved
     restartPolicy: Never
     volumes:
     \- name: aws\-credentials\-vol
       secret:
         # Make sure the AWS credentials are available as a volume.
         # This should match TOIL_AWS_SECRET_NAME
         secretName: aws\-credentials
     # You may need to replace this with a different service account name as
     # appropriate for your cluster.
     serviceAccountName: default
     containers:
     \- name: main
       image: quay.io/ucsc_cgl/toil:5.5.0
       env:
       # Specify your username for inclusion in job names
       \- name: TOIL_KUBERNETES_OWNER
         value: demo\-user
       # Specify where to find the AWS credentials to access the job store with
       \- name: TOIL_AWS_SECRET_NAME
         value: aws\-credentials
       # Specify where per\-host caches should be stored, on the Kubernetes hosts.
       # Needs to be set for Toil\(aqs caching to be efficient.
       \- name: TOIL_KUBERNETES_HOST_PATH
         value: /data/scratch
       volumeMounts:
       # Mount the AWS credentials volume
       \- mountPath: /root/.aws
         name: aws\-credentials\-vol
       resources:
         # Make sure to set these resource limits to values large enough
         # to accommodate the work your workflow does in the leader
         # process, but small enough to fit on your cluster.
         #
         # Since no request values are specified, the limits are also used
         # for the requests.
         limits:
           cpu: 2
           memory: \(dq4Gi\(dq
           ephemeral\-storage: \(dq10Gi\(dq
       command:
       \- /bin/bash
       \- \-c
       \- |
         # This Bash script will set up Toil and the workflow to run, and run them.
         set \-e
         # We make sure to create a work directory; Toil can\(aqt hot\-deploy a
         # script from the root of the filesystem, which is where we start.
         mkdir /tmp/work
         cd /tmp/work
         # We make a virtual environment to allow workflow dependencies to be
         # hot\-deployed.
         #
         # We don\(aqt really make use of it in this example, but for workflows
         # that depend on PyPI packages we will need this.
         #
         # We use \-\-system\-site\-packages so that the Toil installed in the
         # appliance image is still available.
         virtualenv \-\-python python3 \-\-system\-site\-packages venv
         . venv/bin/activate
         # Now we install the workflow. Here we\(aqre using a demo workflow
         # script from Toil itself.
         wget https://raw.githubusercontent.com/DataBiosphere/toil/releases/4.1.0/src/toil/test/docs/scripts/tutorial_helloworld.py
         # Now we run the workflow. We make sure to use the Kubernetes batch
         # system and an AWS job store, and we set some generally useful
         # logging options. We also make sure to enable caching.
         python3 tutorial_helloworld.py \e
             aws:us\-west\-2:demouser\-toil\-test\-jobstore \e
             \-\-batchSystem kubernetes \e
             \-\-realTimeLogging \e
             \-\-logInfo
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
You can save this YAML as \fBleader.yaml\fP, and then run it on your Kubernetes installation with:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ kubectl apply \-f leader.yaml
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To monitor the progress of the leader job, you will want to read its logs. If you are using a Kubernetes dashboard such as \fI\%k9s\fP, you can simply find the pod created for the job in the dashboard, and view its logs there. If not, you will need to locate the pod by hand.
.SS Monitoring and Debugging Kubernetes Jobs and Pods
.sp
The following techniques are most useful for looking at the pod which holds the Toil leader, but they can also be applied to individual Toil jobs on Kubernetes, even when the leader is outside the cluster.
.sp
Kubernetes names pods for jobs by appending a short random string to the name of the job. You can find the name of the pod for your job by doing:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ kubectl get pods | grep demo\-user\-toil\-test
demo\-user\-toil\-test\-g5496                                         1/1     Running     0          2m
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Assuming you have set \fBTOIL_KUBERNETES_OWNER\fP correctly, you should be able to find all of your workflow\(aqs pods by searching for your username:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ kubectl get pods | grep demo\-user
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
If the status of a pod is anything other than \fBPending\fP, you will be able to view its logs with:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ kubectl logs demo\-user\-toil\-test\-g5496
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This will dump the pod\(aqs logs from the beginning to now and terminate. To follow along with the logs from a running pod, add the \fB\-f\fP option:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ kubectl logs \-f demo\-user\-toil\-test\-g5496
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
A status of \fBImagePullBackoff\fP suggests that you have requested to use an image that is not available. Check the \fBimage\fP section of your YAML if you are looking at a leader, or the value of \fBTOIL_APPLIANCE_SELF\fP if you are delaying with a worker job. You also might want to check your Kubernetes node\(aqs Internet connectivity and DNS function; in Kubernetes, DNS depends on system\-level pods which can be terminated or evicted in cases of resource oversubscription, just like user workloads.
.sp
If your pod seems to be stuck \fBPending\fP, \fBContainerCreating\fP, you can get information on what is wrong with it by using \fBkubectl describe pod\fP:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ kubectl describe pod demo\-user\-toil\-test\-g5496
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Pay particular attention to the \fBEvents:\fP section at the end of the output. An indication that a job is too big for the available nodes on your cluster, or that your cluster is too busy for your jobs, is \fBFailedScheduling\fP events:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
Type     Reason            Age                  From               Message
\-\-\-\-     \-\-\-\-\-\-            \-\-\-\-                 \-\-\-\-               \-\-\-\-\-\-\-
Warning  FailedScheduling  13s (x79 over 100m)  default\-scheduler  0/4 nodes are available: 1 Insufficient cpu, 1 Insufficient ephemeral\-storage, 4 Insufficient memory.
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
If a pod is running but seems to be behaving erratically, or seems stuck, you can shell into it and look around:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ kubectl exec \-ti demo\-user\-toil\-test\-g5496 /bin/bash
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
One common cause of stuck pods is attempting to use more memory than allowed by Kubernetes (or by the Toil job\(aqs memory resource requirement), but in a way that does not trigger the Linux OOM killer to terminate the pod\(aqs processes. In these cases, the pod can remain stuck at nearly 100% memory usage more or less indefinitely, and attempting to shell into the pod (which needs to start a process within the pod, using some of its memory) will fail. In these cases, the recommended solution is to kill the offending pod and increase its (or its Toil job\(aqs) memory requirement, or reduce its memory needs by adapting user code.
.SS When Things Go Wrong
.sp
The Toil Kubernetes batch system includes cleanup code to terminate worker jobs when the leader shuts down. However, if the leader pod is removed by Kubernetes, is forcibly killed or otherwise suffers a sudden existence failure, it can go away while its worker jobs live on. It is not recommended to restart a workflow in this state, as jobs from the previous invocation will remain running and will be trying to modify the job store concurrently with jobs from the new invocation.
.sp
To clean up dangling jobs, you can use the following snippet:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ kubectl get jobs | grep demo\-user | cut \-f1 \-d\(aq \(aq | xargs \-n10 kubectl delete job
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This will delete all jobs with \fBdemo\-user\fP\(aqs username in their names, in batches of 10. You can also use the UUID that Toil assigns to a particular workflow invocation in the filter, to clean up only the jobs pertaining to that workflow invocation.
.SS Option 2: Running the Leader Outside Kubernetes
.sp
If you don\(aqt want to run your Toil leader inside Kubernetes, you can run it locally instead. This can be useful when developing a workflow; files can be hot\-deployed from your local machine directly to Kubernetes. However, your local machine will have to have (ideally role\-assumption\- and MFA\-free) access to AWS, and access to Kubernetes. Real time logging will not work unless your local machine is able to listen for incoming UDP packets on arbitrary ports on the address it uses to contact the IPv4 Internet; Toil does no NAT traversal or detection.
.sp
Note that if you set \fBTOIL_WORKDIR\fP when running your workflow like this, it will need to be a directory that exists both on the host and in the Toil appliance.
.sp
Here is an example of running our test workflow leader locally, outside of Kubernetes:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ export TOIL_KUBERNETES_OWNER=demo\-user  # This defaults to your local username if not set
$ export TOIL_AWS_SECRET_NAME=aws\-credentials
$ export TOIL_KUBERNETES_HOST_PATH=/data/scratch
$ virtualenv \-\-python python3 \-\-system\-site\-packages venv
$ . venv/bin/activate
$ wget https://raw.githubusercontent.com/DataBiosphere/toil/releases/4.1.0/src/toil/test/docs/scripts/tutorial_helloworld.py
$ python3 tutorial_helloworld.py \e
      aws:us\-west\-2:demouser\-toil\-test\-jobstore \e
      \-\-batchSystem kubernetes \e
      \-\-realTimeLogging \e
      \-\-logInfo
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Running CWL Workflows
.sp
Running CWL workflows on Kubernetes can be challenging, because executing CWL can require \fBtoil\-cwl\-runner\fP to orchestrate containers of its own, within a Kubernetes job running in the Toil appliance container.
.sp
Normally, running a CWL workflow should Just Work, as long as the workflow\(aqs Docker containers are able to be executed with Singularity, your Kubernetes cluster does not impose extra capability\-based confinement (i.e. SELinux, AppArmor) that interferes with Singularity\(aqs use of user\-mode namespaces, and you make sure to configure Toil so that its workers know where to store their data within the Kubernetes pods (which would be done for you if using a Toil\-managed cluster). For example, you should be able to run a CWL workflow like this:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ export TOIL_KUBERNETES_OWNER=demo\-user  # This defaults to your local username if not set
$ export TOIL_AWS_SECRET_NAME=aws\-credentials
$ export TOIL_KUBERNETES_HOST_PATH=/data/scratch
$ virtualenv \-\-python python3 \-\-system\-site\-packages venv
$ . venv/bin/activate
$ pip install toil[kubernetes,cwl]==5.8.0
$ toil\-cwl\-runner  \e
     \-\-jobStore  aws:us\-west\-2:demouser\-toil\-test\-jobstore \e
     \-\-batchSystem kubernetes \e
     \-\-realTimeLogging \e
     \-\-logInfo \e
     \-\-disableCaching \e
     path/to/cwl/workflow \e
     path/to/cwl/input/object
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Additional \fBcwltool\fP options that your workflow might require, such as \fB\-\-no\-match\-user\fP, can be passed to \fBtoil\-cwl\-runner\fP, which inherits most \fBcwltool\fP options.
.SS AppArmor and Singularity
.sp
Kubernetes clusters based on Ubuntu hosts often will have AppArmor enabled on the host. AppArmor is a capability\-based security enhancement system that integrates with the Linux kernel to enforce lists of things which programs may or may not do, called \fBprofiles\fP\&. For example, an AppArmor profile could be applied to a web server process to stop it from using the \fBmount()\fP system call to manipulate the filesystem, because it has no business doing that under normal circumstances but might attempt to do it if compromised by hackers.
.sp
Kubernetes clusters also often use Docker as the backing container runtime, to run pod containers. When AppArmor is enabled, Docker will load an AppArmor profile and apply it to all of its containers by default, with the ability for the profile to be overridden on a per\-container basis. This profile unfortunately prevents some of the \fImount()\fP system calls that Singularity uses to set up user\-mode containers from working inside the pod, even though these calls would be allowed for an unprivileged user under normal circumstances.
.sp
On the UCSC Kubernetes cluster, \fI\%we configure our Ubuntu hosts with an alternative default AppArmor profile for Docker containers\fP which allows these calls. Other solutions include turning off AppArmor on the host, configuring Kubernetes with a container runtime other than Docker, or \fI\%using Kubernetes\(aqs AppArmor integration\fP to apply a more permissive profile or the \fBunconfined\fP profile to pods that Toil launches.
.sp
Toil does not yet have a way to apply a \fBcontainer.apparmor.security.beta.kubernetes.io/runner\-container: unconfined\fP annotation to its pods, \fI\%as described in the Kubernetes AppArmor documentation\fP\&. This feature is tracked in \fI\%issue #4331\fP\&.
.SS Running in AWS
.sp
Toil jobs can be run on a variety of cloud platforms. Of these, Amazon Web
Services (AWS) is currently the best\-supported solution. Toil provides the
\fI\%Cluster Utilities\fP to conveniently create AWS clusters, connect to the leader
of the cluster, and then launch a workflow. The leader handles distributing
the jobs over the worker nodes and autoscaling to optimize costs.
.sp
The \fI\%Running a Workflow with Autoscaling\fP section details how to create a cluster and run a workflow
that will dynamically scale depending on the workflow\(aqs needs.
.sp
The \fI\%Static Provisioning\fP section explains how a static cluster (one that
won\(aqt automatically change in size) can be created and provisioned (grown, shrunk, destroyed, etc.).
.SS Preparing your AWS environment
.sp
To use Amazon Web Services (AWS) to run Toil or to just use S3 to host the files
during the computation of a workflow, first set up and configure an account with AWS:
.INDENT 0.0
.IP 1. 4
If necessary, create and activate an \fI\%AWS account\fP
.IP 2. 4
Next, generate a key pair for AWS with the command (do NOT generate your key pair with the Amazon browser):
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ ssh\-keygen \-t rsa
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 3. 4
This should prompt you to save your key.  Please save it in
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
~/.ssh/id_rsa
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 4. 4
Now move this to where your OS can see it as an authorized key:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 5. 4
Next, you\(aqll need to add your key to the \fIssh\-agent\fP:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ eval \(gassh\-agent \-s\(ga
$ ssh\-add
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
If your key has a passphrase, you will be prompted to enter it here once.
.IP 6. 4
You\(aqll also need to chmod your private key (good practice but also enforced by AWS):
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ chmod 400 id_rsa
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 7. 4
Now you\(aqll need to add the key to AWS via the browser.  For example, on us\-west1, this address would accessible at:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
https://us\-west\-1.console.aws.amazon.com/ec2/v2/home?region=us\-west\-1#KeyPairs:sort=keyName
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 8. 4
Now click on the \(dqImport Key Pair\(dq button to add your key:
.INDENT 4.0
.INDENT 3.5
\fI\%Adding an Amazon Key Pair\fP.UNINDENT
.UNINDENT
.IP 9. 4
Next, you need to create an AWS access key.  First go to the IAM dashboard, again; for \(dqus\-west1\(dq, the example link would be here:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
https://console.aws.amazon.com/iam/home?region=us\-west\-1#/home
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 10. 4
The directions (transcribed from: \fI\%https://docs.aws.amazon.com/general/latest/gr/managing\-aws\-access\-keys.html\fP ) are now:
.INDENT 4.0
.INDENT 3.5
.INDENT 0.0
.IP 1. 3
On the IAM Dashboard page, choose your account name in the navigation bar, and then choose My Security Credentials.
.IP 2. 3
Expand the Access keys (access key ID and secret access key) section.
.IP 3. 3
Choose Create New Access Key. Then choose Download Key File to save the access key ID and secret access key to a file on your computer. After you close the dialog box, you can\(aqt retrieve this secret access key again.
.UNINDENT
.UNINDENT
.UNINDENT
.IP 11. 4
Now you should have a newly generated \(dqAWS Access Key ID\(dq and \(dqAWS Secret Access Key\(dq.  We can now install the AWS CLI and make sure that it has the proper credentials:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ pip install awscli \-\-upgrade \-\-user
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 12. 4
Now configure your AWS credentials with:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ aws configure
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 13. 4
Add your \(dqAWS Access Key ID\(dq and \(dqAWS Secret Access Key\(dq from earlier and your region and output format:
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
\(dq AWS Access Key ID [****************Q65Q]: \(dq
\(dq AWS Secret Access Key [****************G0ys]: \(dq
\(dq Default region name [us\-west\-1]: \(dq
\(dq Default output format [json]: \(dq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This will create the files \fI~/.aws/config\fP and \fI~/.aws/credentials\fP\&.
.IP 14. 4
If not done already, install toil (example uses version 5.3.0, but we recommend the latest release):
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ virtualenv venv
$ source venv/bin/activate
$ pip install toil[all]==5.3.0
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 15. 4
Now that toil is installed and you are running a virtualenv, an example of launching a toil leader node would be the following
(again, note that we set TOIL_APPLIANCE_SELF to toil version 5.3.0 in this example, but please set the version to
the installed version that you are using if you\(aqre using a different version):
.INDENT 4.0
.INDENT 3.5
.sp
.nf
.ft C
$ TOIL_APPLIANCE_SELF=quay.io/ucsc_cgl/toil:5.3.0 \e
      toil launch\-cluster clustername \e
      \-\-leaderNodeType t2.medium \e
      \-\-zone us\-west\-1a \e
      \-\-keyPairName id_rsa
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.sp
To further break down each of these commands:
.INDENT 0.0
.INDENT 3.5
\fBTOIL_APPLIANCE_SELF=quay.io/ucsc_cgl/toil:latest\fP \-\-\- This is optional.  It specifies a mesos docker image that we maintain with the latest version of toil installed on it.  If you want to use a different version of toil, please specify the image tag you need from \fI\%https://quay.io/repository/ucsc_cgl/toil?tag=latest&tab=tags\fP\&.
.sp
\fBtoil launch\-cluster\fP \-\-\- Base command in toil to launch a cluster.
.sp
\fBclustername\fP \-\-\- Just choose a name for your cluster.
.sp
\fB\-\-leaderNodeType t2.medium\fP \-\-\- Specify the leader node type.  Make a t2.medium (2CPU; 4Gb RAM; $0.0464/Hour).  List of available AWS instances: \fI\%https://aws.amazon.com/ec2/pricing/on\-demand/\fP
.sp
\fB\-\-zone us\-west\-1a\fP \-\-\- Specify the AWS zone you want to launch the instance in.  Must have the same prefix as the zone in your awscli credentials (which, in the example of this tutorial is: \(dqus\-west\-1\(dq).
.sp
\fB\-\-keyPairName id_rsa\fP \-\-\- The name of your key pair, which should be \(dqid_rsa\(dq if you\(aqve followed this tutorial.
.UNINDENT
.UNINDENT
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
You can set the \fBTOIL_AWS_TAGS\fP environment variable to a JSON object to specify arbitrary tags for AWS resources.
For example, if you \fBexport TOIL_AWS_TAGS=\(aq{\(dqproject\-name\(dq: \(dqvariant\-calling\(dq}\(aq\fP in your shell before using Toil,
AWS resources created by Toil will be tagged with a \fBproject\-name\fP tag with the value \fBvariant\-calling\fP\&.
.UNINDENT
.UNINDENT
.SS AWS Job Store
.sp
Using the AWS job store is straightforward after you\(aqve finished \fI\%Preparing your AWS environment\fP;
all you need to do is specify the prefix for the job store name.
.sp
To run the sort example \fI\%sort example\fP with the AWS job store you would type
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 sort.py aws:us\-west\-2:my\-aws\-sort\-jobstore
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Toil Provisioner
.sp
The Toil provisioner is included in Toil alongside the \fB[aws]\fP extra and
allows us to spin up a cluster.
.sp
Getting started with the provisioner is simple:
.INDENT 0.0
.IP 1. 3
Make sure you have Toil installed with the AWS extras. For detailed instructions see \fI\%Installing Toil with Extra Features\fP\&.
.IP 2. 3
You will need an AWS account and you will need to save your AWS credentials on your local
machine. For help setting up an AWS account see
\fI\%here\fP\&. For
setting up your AWS credentials follow instructions
\fI\%here\fP\&.
.UNINDENT
.sp
The Toil provisioner is built around the Toil Appliance, a Docker image that bundles
Toil and all its requirements (e.g. Mesos). This makes deployment simple across
platforms, and you can even simulate a cluster locally (see \fI\%Developing with Docker\fP for details).
.INDENT 0.0
.INDENT 3.5
.IP "Choosing Toil Appliance Image"
.sp
When using the Toil provisioner, the appliance image will be automatically chosen
based on the pip\-installed version of Toil on your system. That choice can be
overridden by setting the environment variables \fBTOIL_DOCKER_REGISTRY\fP and \fBTOIL_DOCKER_NAME\fP or
\fBTOIL_APPLIANCE_SELF\fP\&. See \fI\%Environment Variables\fP for more information on these variables. If
you are developing with autoscaling and want to test and build your own
appliance have a look at \fI\%Developing with Docker\fP\&.
.UNINDENT
.UNINDENT
.sp
For information on using the Toil Provisioner have a look at \fI\%Running a Workflow with Autoscaling\fP\&.
.SS Details about Launching a Cluster in AWS
.sp
Using the provisioner to launch a Toil leader instance is simple using the \fBlaunch\-cluster\fP command. For example,
to launch a cluster named \(dqmy\-cluster\(dq with a t2.medium leader in the us\-west\-2a zone, run
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil launch\-cluster my\-cluster \e
             \-\-leaderNodeType t2.medium \e
             \-\-zone us\-west\-2a \e
             \-\-keyPairName <your\-AWS\-key\-pair\-name>
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The cluster name is used to uniquely identify your cluster and will be used to
populate the instance\(aqs \fBName\fP tag. Also, the Toil provisioner will
automatically tag your cluster with an \fBOwner\fP tag that corresponds to your
keypair name to facilitate cost tracking. In addition, the \fBToilNodeType\fP tag
can be used to filter \(dqleader\(dq vs. \(dqworker\(dq nodes in your cluster.
.sp
The leaderNodeType is an \fI\%EC2 instance type\fP\&. This only affects the leader node.
.sp
The \fB\-\-zone\fP parameter specifies which EC2 availability zone to launch the cluster in.
Alternatively, you can specify this option via the \fBTOIL_AWS_ZONE\fP environment variable.
Note: the zone is different from an EC2 region. A region corresponds to a geographical area
like \fBus\-west\-2 (Oregon)\fP, and availability zones are partitions of this area like
\fBus\-west\-2a\fP\&.
.sp
By default, Toil creates an IAM role for each cluster with sufficient permissions
to perform cluster operations (e.g. full S3, EC2, and SDB access). If the default permissions
are not sufficient for your use case (e.g. if you need access to ECR), you may create a
custom IAM role with all necessary permissions and set the \fB\-\-awsEc2ProfileArn\fP parameter
when launching the cluster. Note that your custom role must at least have
\fI\%these permissions\fP in order for the Toil cluster to function properly.
.sp
In addition, Toil creates a new security group with the same name as the cluster name with
default rules (e.g. opens port 22 for SSH access). If you require additional security groups,
you may use the \fB\-\-awsEc2ExtraSecurityGroupId\fP parameter when launching the cluster.
\fBNote:\fP Do not use the same name as the cluster name for the extra security groups as
any security group matching the cluster name will be deleted once the cluster is destroyed.
.sp
For more information on options try:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil launch\-cluster \-\-help
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Static Provisioning
.sp
Toil can be used to manage a cluster in the cloud by using the \fI\%Cluster Utilities\fP\&.
The cluster utilities also make it easy to run a toil workflow directly on this
cluster. We call this static provisioning because the size of the cluster does not
change. This is in contrast with \fI\%Running a Workflow with Autoscaling\fP\&.
.sp
To launch worker nodes alongside the leader we use the \fB\-w\fP option:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil launch\-cluster my\-cluster \e
             \-\-leaderNodeType t2.small \-z us\-west\-2a \e
             \-\-keyPairName your\-AWS\-key\-pair\-name \e
             \-\-nodeTypes m3.large,t2.micro \-w 1,4
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This will spin up a leader node of type t2.small with five additional workers \-\-\- one m3.large instance and four t2.micro.
.sp
Currently static provisioning is only possible during the cluster\(aqs creation.
The ability to add new nodes and remove existing nodes via the native provisioner is
in development. Of course the cluster can always be deleted with the
\fI\%Destroy\-Cluster Command\fP utility.
.SS Uploading Workflows
.sp
Now that our cluster is launched, we use the \fI\%Rsync\-Cluster Command\fP utility to copy
the workflow to the leader. For a simple workflow in a single file this might
look like
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil rsync\-cluster \-z us\-west\-2a my\-cluster toil\-workflow.py :/
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
If your toil workflow has dependencies have a look at the \fI\%Auto\-Deployment\fP
section for a detailed explanation on how to include them.
.UNINDENT
.UNINDENT
.SS Running a Workflow with Autoscaling
.sp
Autoscaling is a feature of running Toil in a cloud whereby additional cloud instances are launched to run the workflow.
Autoscaling leverages Mesos containers to provide an execution environment for these workflows.
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
Make sure you\(aqve done the AWS setup in \fI\%Preparing your AWS environment\fP\&.
.UNINDENT
.UNINDENT
.INDENT 0.0
.IP 1. 3
Download \fBsort.py\fP
.IP 2. 3
Launch the leader node in AWS using the \fI\%Launch\-Cluster Command\fP command:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil launch\-cluster <cluster\-name> \e
             \-\-keyPairName <AWS\-key\-pair\-name> \e
             \-\-leaderNodeType t2.medium \e
             \-\-zone us\-west\-2a
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 3. 3
Copy the \fBsort.py\fP script up to the leader node:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil rsync\-cluster \-z us\-west\-2a <cluster\-name> sort.py :/root
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 4. 3
Login to the leader node:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil ssh\-cluster \-z us\-west\-2a <cluster\-name>
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 5. 3
Run the script as an autoscaling workflow:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 /root/sort.py aws:us\-west\-2:<my\-jobstore\-name> \e
      \-\-provisioner aws \e
      \-\-nodeTypes c3.large \e
      \-\-maxNodes 2 \e
      \-\-batchSystem mesos
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
In this example, the autoscaling Toil code creates up to two instances of type \fIc3.large\fP and launches Mesos
slave containers inside them. The containers are then available to run jobs defined by the \fIsort.py\fP script.
Toil also creates a bucket in S3 called \fIaws:us\-west\-2:autoscaling\-sort\-jobstore\fP to store intermediate job
results. The Toil autoscaler can also provision multiple different node types, which is useful for workflows
that have jobs with varying resource requirements. For example, one could execute the script with
\fB\-\-nodeTypes c3.large,r3.xlarge \-\-maxNodes 5,1\fP, which would allow the provisioner to create up to five
c3.large nodes and one r3.xlarge node for memory\-intensive jobs. In this situation, the autoscaler would avoid
creating the more expensive r3.xlarge node until needed, running most jobs on the c3.large nodes.
.UNINDENT
.UNINDENT
.INDENT 0.0
.IP 1. 3
View the generated file to sort:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ head fileToSort.txt
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 2. 3
View the sorted file:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ head sortedFile.txt
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.sp
For more information on other autoscaling (and other) options have a look at \fI\%Commandline Options\fP and/or run
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 my\-toil\-script.py \-\-help
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBIMPORTANT:\fP
.INDENT 0.0
.INDENT 3.5
Some important caveats about starting a toil run through an ssh session are
explained in the \fI\%Ssh\-Cluster Command\fP section.
.UNINDENT
.UNINDENT
.SS Preemptibility
.sp
Toil can run on a heterogeneous cluster of both preemptible and non\-preemptible nodes. Being a preemptible node simply
means that the node may be shut down at any time, while jobs are running. These jobs can then be restarted later
somewhere else.
.sp
A node type can be specified as preemptible by adding a \fI\%spot bid\fP to its entry in the list of node types provided with
the \fB\-\-nodeTypes\fP flag. If spot instance prices rise above your bid, the preemptible node whill be shut down.
.sp
Individual jobs can explicitly specify whether they should be run on preemptible nodes via the boolean \fBpreemptible\fP
resource requirement, if this is not specified, the job will not run on preemptible nodes even if preemptible nodes
are available unless specified with the \fB\-\-defaultPreemptible\fP flag. The \fB\-\-defaultPreemptible\fP flag will allow
jobs without a \fBpreemptible\fP requirement to run on preemptible machines. For example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ python /root/sort.py aws:us\-west\-2:<my\-jobstore\-name> \e
      \-\-provisioner aws \e
      \-\-nodeTypes c3.4xlarge:2.00 \e
      \-\-maxNodes 2 \e
      \-\-batchSystem mesos \e
      \-\-defaultPreemptible
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.IP "Specify Preemptibility Carefully"
.sp
Ensure that your choices for \fB\-\-nodeTypes\fP and \fB\-\-maxNodes <>\fP make
sense for your workflow and won\(aqt cause it to hang. You should make sure the
provisioner is able to create nodes large enough to run the largest job
in the workflow, and that non\-preemptible node types are allowed if there are
non\-preemptible jobs in the workflow.
.UNINDENT
.UNINDENT
.sp
Finally, the \fB\-\-preemptibleCompensation\fP flag can be used to handle cases where preemptible nodes may not be
available but are required for your workflow. With this flag enabled, the autoscaler will attempt to compensate
for a shortage of preemptible nodes of a certain type by creating non\-preemptible nodes of that type, if
non\-preemptible nodes of that type were specified in \fB\-\-nodeTypes\fP\&.
.SS Using MinIO and S3\-Compatible object stores
.sp
Toil can be configured to access files stored in an \fI\%S3\-compatible object store\fP such as \fI\%MinIO\fP\&. The following environment variables can be used to configure the S3 connection used:
.INDENT 0.0
.IP \(bu 2
\fBTOIL_S3_HOST\fP: the IP address or hostname to use for connecting to S3
.IP \(bu 2
\fBTOIL_S3_PORT\fP: the port number to use for connecting to S3, if needed
.IP \(bu 2
\fBTOIL_S3_USE_SSL\fP: enable or disable the usage of SSL for connecting to S3 (\fBTrue\fP by default)
.UNINDENT
.sp
Examples:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
TOIL_S3_HOST=127.0.0.1
TOIL_S3_PORT=9010
TOIL_S3_USE_SSL=False
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Dashboard
.sp
Toil provides a dashboard for viewing the RAM and CPU usage of each node, the number of
issued jobs of each type, the number of failed jobs, and the size of the jobs queue. To launch this dashboard
for a toil workflow, include the \fB\-\-metrics\fP flag in the toil script command. The dashboard can then be viewed
in your browser at localhost:3000 while connected to the leader node through \fBtoil ssh\-cluster\fP:
.sp
To change the default port number, you can use the \fB\-\-grafana_port\fP argument:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil ssh\-cluster \-z us\-west\-2a \-\-grafana_port 8000 <cluster\-name>
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
On AWS, the dashboard keeps track of every node in the cluster to monitor CPU and RAM usage, but it
can also be used while running a workflow on a single machine. The dashboard uses Grafana as the
front end for displaying real\-time plots, and Prometheus for tracking metrics exported by toil:
[image]
.sp
In order to use the dashboard for a non\-released toil version, you will have to build the containers locally with
\fBmake docker\fP, since the prometheus, grafana, and mtail containers used in the dashboard are tied to a specific toil
version.
.SS Running in Google Compute Engine (GCE)
.sp
Toil supports a provisioner with Google, and a \fI\%Google Job Store\fP\&. To get started, follow instructions
for \fI\%Preparing your Google environment\fP\&.
.SS Preparing your Google environment
.sp
Toil supports using the \fI\%Google Cloud Platform\fP\&. Setting this up is easy!
.INDENT 0.0
.IP 1. 3
Make sure that the \fBgoogle\fP extra (\fI\%Installing Toil with Extra Features\fP) is installed
.IP 2. 3
Follow \fI\%Google\(aqs Instructions\fP to download credentials and set the
\fBGOOGLE_APPLICATION_CREDENTIALS\fP environment variable
.IP 3. 3
Create a new ssh key with the proper format.  To create a new ssh key run the command
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ ssh\-keygen \-t rsa \-f ~/.ssh/id_rsa \-C [USERNAME]
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
where \fB[USERNAME]\fP is something like \fBjane@example.com\fP\&. Make sure to leave your password blank.
.sp
\fBWARNING:\fP
.INDENT 3.0
.INDENT 3.5
This command could overwrite an old ssh key you may be using.  If you have an existing ssh key
you would like to use, it will need to be called id_rsa and it needs to have no password set.
.UNINDENT
.UNINDENT
.sp
Make sure only you can read the SSH keys:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ chmod 400 ~/.ssh/id_rsa ~/.ssh/id_rsa.pub
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 4. 3
Add your newly formatted public key to Google. To do this, log into your Google Cloud account
and go to \fI\%metadata\fP section under the Compute tab.
[image]
.sp
Near the top of the screen click on \(aqSSH Keys\(aq, then edit, add item, and paste the key. Then save:
[image]
.UNINDENT
.sp
For more details look at Google\(aqs instructions for \fI\%adding SSH keys\fP\&.
.SS Google Job Store
.sp
To use the Google Job Store you will need to set the
\fBGOOGLE_APPLICATION_CREDENTIALS\fP environment variable by following \fI\%Google\(aqs instructions\fP\&.
.sp
Then to run the sort example with the Google job store you would type
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 sort.py google:my\-project\-id:my\-google\-sort\-jobstore
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Running a Workflow with Autoscaling
.sp
\fBWARNING:\fP
.INDENT 0.0
.INDENT 3.5
Google Autoscaling is in beta!
.UNINDENT
.UNINDENT
.sp
The steps to run a GCE workflow are similar to those of AWS (\fI\%Running a Workflow with Autoscaling\fP), except you will
need to explicitly specify the \fB\-\-provisioner gce\fP option which otherwise defaults to \fBaws\fP\&.
.INDENT 0.0
.IP 1. 3
Download \fBsort.py\fP
.IP 2. 3
Launch the leader node in GCE using the \fI\%Launch\-Cluster Command\fP command:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil launch\-cluster <CLUSTER\-NAME> \e
             \-\-provisioner gce \e
             \-\-leaderNodeType n1\-standard\-1 \e
             \-\-keyPairName <SSH\-KEYNAME> \e
             \-\-zone us\-west1\-a
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Where \fB<SSH\-KEYNAME>\fP is the first part of \fB[USERNAME]\fP used when setting up your ssh key.
For example if \fB[USERNAME]\fP was \fI\%jane@example.com\fP, \fB<SSH\-KEYNAME>\fP should be \fBjane\fP\&.
.sp
The \fB\-\-keyPairName\fP option is for an SSH key that was added to the Google account. If your ssh
key \fB[USERNAME]\fP was \fBjane@example.com\fP, then your key pair name will be just \fBjane\fP\&.
.IP 3. 3
Upload the sort example and ssh into the leader:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
(venv) $ toil rsync\-cluster \-\-provisioner gce <CLUSTER\-NAME> sort.py :/root
(venv) $ toil ssh\-cluster \-\-provisioner gce <CLUSTER\-NAME>
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 4. 3
Run the workflow:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 /root/sort.py  google:<PROJECT\-ID>:<JOBSTORE\-NAME> \e
      \-\-provisioner gce \e
      \-\-batchSystem mesos \e
      \-\-nodeTypes n1\-standard\-2 \e
      \-\-maxNodes 2
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 5. 3
Clean up:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ exit  # this exits the ssh from the leader node
(venv) $ toil destroy\-cluster \-\-provisioner gce <CLUSTER\-NAME>
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.SS Cluster Utilities
.sp
There are several utilities used for starting and managing a Toil cluster using the AWS provisioner. They are installed
via the \fB[aws]\fP or \fB[google]\fP extra. For installation details see \fI\%Toil Provisioner\fP\&. The cluster utilities
are used for \fI\%Running in AWS\fP and are comprised of \fBtoil launch\-cluster\fP, \fBtoil rsync\-cluster\fP,
\fBtoil ssh\-cluster\fP, and \fBtoil destroy\-cluster\fP entry points.
.sp
Cluster commands specific to \fBtoil\fP are:
.INDENT 0.0
.INDENT 3.5
\fBstatus\fP \-\-\- Reports runtime and resource usage for all jobs in a specified jobstore (workflow must have originally been run using the \-\e\-stats option).
.sp
\fBstats\fP \-\-\- Inspects a job store to see which jobs have failed, run successfully, etc.
.sp
\fBdestroy\-cluster\fP \-\-\- For autoscaling.  Terminates the specified cluster and associated resources.
.sp
\fBlaunch\-cluster\fP \-\-\- For autoscaling.  This is used to launch a toil leader instance with the specified provisioner.
.sp
\fBrsync\-cluster\fP \-\-\- For autoscaling.  Used to transfer files to a cluster launched with \fBtoil launch\-cluster\fP\&.
.sp
\fBssh\-cluster\fP \-\-\- SSHs into the toil appliance container running on the leader of the cluster.
.sp
\fBclean\fP \-\-\- Delete the job store used by a previous Toil workflow invocation.
.sp
\fBkill\fP \-\-\- Kills any running jobs in a rogue toil.
.UNINDENT
.UNINDENT
.sp
For information on a specific utility run:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
toil launch\-cluster \-\-help
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
for a full list of its options and functionality.
.sp
The cluster utilities can be used for \fI\%Running in Google Compute Engine (GCE)\fP and \fI\%Running in AWS\fP\&.
.sp
\fBTIP:\fP
.INDENT 0.0
.INDENT 3.5
By default, all of the cluster utilities expect to be running on AWS. To run with Google
you will need to specify the \fB\-\-provisioner gce\fP option for each utility.
.UNINDENT
.UNINDENT
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
Boto must be \fI\%configured\fP with AWS credentials before using cluster utilities.
.sp
\fI\%Running in Google Compute Engine (GCE)\fP contains instructions for
.UNINDENT
.UNINDENT
.SS Stats Command
.sp
To use the stats command, a workflow must first be run using the \fB\-\-stats\fP option.  Using this command makes certain
that toil does not delete the job store, no matter what other options are specified (i.e. normally the option
\fB\-\-clean=always\fP would delete the job, but \fB\-\-stats\fP will override this).
.sp
An example of this would be running the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
python3 discoverfiles.py file:my\-jobstore \-\-stats
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Where \fBdiscoverfiles.py\fP is the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import subprocess

from toil.common import Toil
from toil.job import Job


class discoverFiles(Job):
    \(dq\(dq\(dqViews files at a specified path using ls.\(dq\(dq\(dq

    def __init__(self, path, *args, **kwargs):
        self.path = path
        super().__init__(*args, **kwargs)

    def run(self, fileStore):
        if os.path.exists(self.path):
            subprocess.check_call([\(dqls\(dq, self.path])


def main():
    options = Job.Runner.getDefaultArgumentParser().parse_args()
    options.clean = \(dqalways\(dq

    job1 = discoverFiles(path=\(dq/sys/\(dq, displayName=\(aqsysFiles\(aq)
    job2 = discoverFiles(path=os.path.expanduser(\(dq~\(dq), displayName=\(aquserFiles\(aq)
    job3 = discoverFiles(path=\(dq/tmp/\(dq)

    job1.addChild(job2)
    job2.addChild(job3)

    with Toil(options) as toil:
        if not toil.options.restart:
            toil.start(job1)
        else:
            toil.restart()


if __name__ == \(aq__main__\(aq:
    main()

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Notice the \fBdisplayName\fP key, which can rename a job, giving it an alias when it is finally displayed in stats.
Running this workflow file should record three job names: \fBsysFiles\fP (job1), \fBuserFiles\fP (job2), and \fBdiscoverFiles\fP (job3).
To see the runtime and resources used for each job when it was run, type
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
toil stats file:my\-jobstore
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This should output the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
Batch System: singleMachine
Default Cores: 1  Default Memory: 2097152K
Max Cores: 9.22337e+18
Total Clock: 0.56  Total Runtime: 1.01
Worker
    Count |                                    Time* |                                    Clock |                                     Wait |                                   Memory
        n |      min    med*     ave     max   total |      min     med     ave     max   total |      min     med     ave     max   total |      min     med     ave     max   total
        1 |     0.14    0.14    0.14    0.14    0.14 |     0.13    0.13    0.13    0.13    0.13 |     0.01    0.01    0.01    0.01    0.01 |      76K     76K     76K     76K     76K
Job
 Worker Jobs  |     min    med    ave    max
              |       3      3      3      3
    Count |                                    Time* |                                    Clock |                                     Wait |                                   Memory
        n |      min    med*     ave     max   total |      min     med     ave     max   total |      min     med     ave     max   total |      min     med     ave     max   total
        3 |     0.01    0.06    0.05    0.07    0.14 |     0.00    0.06    0.04    0.07    0.12 |     0.00    0.01    0.00    0.01    0.01 |      76K     76K     76K     76K    229K
 sysFiles
    Count |                                    Time* |                                    Clock |                                     Wait |                                   Memory
        n |      min    med*     ave     max   total |      min     med     ave     max   total |      min     med     ave     max   total |      min     med     ave     max   total
        1 |     0.01    0.01    0.01    0.01    0.01 |     0.00    0.00    0.00    0.00    0.00 |     0.01    0.01    0.01    0.01    0.01 |      76K     76K     76K     76K     76K
 userFiles
    Count |                                    Time* |                                    Clock |                                     Wait |                                   Memory
        n |      min    med*     ave     max   total |      min     med     ave     max   total |      min     med     ave     max   total |      min     med     ave     max   total
        1 |     0.06    0.06    0.06    0.06    0.06 |     0.06    0.06    0.06    0.06    0.06 |     0.01    0.01    0.01    0.01    0.01 |      76K     76K     76K     76K     76K
 discoverFiles
    Count |                                    Time* |                                    Clock |                                     Wait |                                   Memory
        n |      min    med*     ave     max   total |      min     med     ave     max   total |      min     med     ave     max   total |      min     med     ave     max   total
        1 |     0.07    0.07    0.07    0.07    0.07 |     0.07    0.07    0.07    0.07    0.07 |     0.00    0.00    0.00    0.00    0.00 |      76K     76K     76K     76K     76K
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Once we\(aqre done, we can clean up the job store by running
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
toil clean file:my\-jobstore
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Status Command
.sp
Continuing the example from the stats section above, if we ran our workflow with the command
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
python3 discoverfiles.py file:my\-jobstore \-\-stats
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
We could interrogate our jobstore with the status command, for example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
toil status file:my\-jobstore
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
If the run was successful, this would not return much valuable information, something like
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
2018\-01\-11 19:31:29,739 \- toil.lib.bioio \- INFO \- Root logger is at level \(aqINFO\(aq, \(aqtoil\(aq logger at level \(aqINFO\(aq.
2018\-01\-11 19:31:29,740 \- toil.utils.toilStatus \- INFO \- Parsed arguments
2018\-01\-11 19:31:29,740 \- toil.utils.toilStatus \- INFO \- Checking if we have files for Toil
The root job of the job store is absent, the workflow completed successfully.
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Otherwise, the \fBstatus\fP command should return the following:
.INDENT 0.0
.INDENT 3.5
There are \fBx\fP unfinished jobs, \fBy\fP parent jobs with children, \fBz\fP jobs with services, \fBa\fP services, and \fBb\fP totally failed jobs currently in  \fBc\fP\&.
.UNINDENT
.UNINDENT
.SS Clean Command
.sp
If a Toil pipeline didn\(aqt finish successfully, or was run using \fB\-\-clean=always\fP or \fB\-\-stats\fP, the job store will exist
until it is deleted. \fBtoil clean <jobStore>\fP ensures that all artifacts associated with a job store are removed.
This is particularly useful for deleting AWS job stores, which reserves an SDB domain as well as an S3 bucket.
.sp
The deletion of the job store can be modified by the \fB\-\-clean\fP argument, and may be set to \fBalways\fP, \fBonError\fP,
\fBnever\fP, or \fBonSuccess\fP (default).
.sp
Temporary directories where jobs are running can also be saved from deletion using the \fB\-\-cleanWorkDir\fP, which has
the same options as \fB\-\-clean\fP\&.  This option should only be run when debugging, as intermediate jobs will fill up
disk space.
.SS Launch\-Cluster Command
.sp
Running \fBtoil launch\-cluster\fP starts up a leader for a cluster. Workers can be
added to the initial cluster by specifying the \fB\-w\fP option.  An example would be
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil launch\-cluster my\-cluster \e
      \-\-leaderNodeType t2.small \-z us\-west\-2a \e
      \-\-keyPairName your\-AWS\-key\-pair\-name \e
      \-\-nodeTypes m3.large,t2.micro \-w 1,4
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Options are listed below.  These can also be displayed by running
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil launch\-cluster \-\-help
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
launch\-cluster\(aqs main positional argument is the clusterName.  This is simply the name of your cluster.  If it does not
exist yet, Toil will create it for you.
.sp
\fBLaunch\-Cluster Options\fP
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B  \-\-help
\-h also accepted.  Displays this help menu.
.TP
.BI \-\-tempDirRoot \ TEMPDIRROOT
Path to the temporary directory where all temp
files are created, by default uses the current working
directory as the base.
.TP
.B  \-\-version
Display version.
.TP
.BI \-\-provisioner \ CLOUDPROVIDER
\-p CLOUDPROVIDER also accepted.  The provisioner for
cluster auto\-scaling.  Both AWS and GCE are
currently supported.
.TP
.BI \-\-zone \ ZONE
\-z ZONE also accepted.  The availability zone of the leader. This
parameter can also be set via the TOIL_AWS_ZONE or TOIL_GCE_ZONE
environment variables, or by the ec2_region_name
parameter in your .boto file if using AWS, or derived from the
instance metadata if using this utility on an existing
EC2 instance.
.TP
.BI \-\-leaderNodeType \ LEADERNODETYPE
Non\-preemptable node type to use for the cluster
leader.
.TP
.BI \-\-keyPairName \ KEYPAIRNAME
The name of the AWS or ssh key pair to include on the
instance.
.TP
.BI \-\-owner \ OWNER
The owner tag for all instances. If not given, the value in
TOIL_OWNER_TAG will be used, or else the value of
\-\-keyPairName.
.TP
.BI \-\-boto \ BOTOPATH
The path to the boto credentials directory. This is
transferred to all nodes in order to access the AWS
jobStore from non\-AWS instances.
.TP
.BI \-\-tag \ KEYVALUE
KEYVALUE is specified as KEY=VALUE. \-t KEY=VALUE also
accepted.  Tags are added to the AWS cluster for this
node and all of its children.
Tags are of the form: \-t key1=value1 \-\-tag key2=value2.
Multiple tags are allowed and each tag needs its own
flag. By default the cluster is tagged with:
{ \(dqName\(dq: clusterName, \(dqOwner\(dq: IAM username }.
.TP
.BI \-\-vpcSubnet \ VPCSUBNET
VPC subnet ID to launch cluster leader in. Uses default
subnet if not specified. This subnet needs to have auto
assign IPs turned on.
.TP
.BI \-\-nodeTypes \ NODETYPES
Comma\-separated list of node types to create while
launching the leader. The syntax for each node type
depends on the provisioner used. For the AWS
provisioner this is the name of an EC2 instance type
followed by a colon and the price in dollars to bid for
a spot instance, for example \(aqc3.8xlarge:0.42\(aq. Must
also provide the \-\-workers argument to specify how
many workers of each node type to create.
.TP
.BI \-\-workers \ WORKERS
\-w WORKERS also accepted.  Comma\-separated list of the
number of workers of each node type to launch alongside
the leader when the cluster is created. This can be
useful if running toil without auto\-scaling but with
need of more hardware support.
.TP
.BI \-\-leaderStorage \ LEADERSTORAGE
Specify the size (in gigabytes) of the root volume for
the leader instance. This is an EBS volume.
.TP
.BI \-\-nodeStorage \ NODESTORAGE
Specify the size (in gigabytes) of the root volume for
any worker instances created when using the \-w flag.
This is an EBS volume.
.TP
.BI \-\-nodeStorageOverrides \ NODESTORAGEOVERRIDES
Comma\-separated list of nodeType:nodeStorage that are used
to override the default value from \-\-nodeStorage for the
specified nodeType(s). This is useful for heterogeneous jobs
where some tasks require much more disk than others.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
\fBLogging Options\fP
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.TP
.B  \-\-logOff
Same as \-\e\-logCritical.
.TP
.B  \-\-logCritical
Turn on logging at level CRITICAL and above. (default
is INFO)
.TP
.B  \-\-logError
Turn on logging at level ERROR and above. (default is
INFO)
.TP
.B  \-\-logWarning
Turn on logging at level WARNING and above. (default
is INFO)
.TP
.B  \-\-logInfo
Turn on logging at level INFO and above. (default is
INFO)
.TP
.B  \-\-logDebug
Turn on logging at level DEBUG and above. (default is
INFO)
.TP
.BI \-\-logLevel \ LOGLEVEL
Log at given level (may be either OFF (or CRITICAL),
ERROR, WARN (or WARNING), INFO or DEBUG). (default is
INFO)
.TP
.BI \-\-logFile \ LOGFILE
File to log in.
.TP
.B  \-\-rotatingLogging
Turn on rotating logging, which prevents log files
getting too big.
.UNINDENT
.UNINDENT
.UNINDENT
.SS Ssh\-Cluster Command
.sp
Toil provides the ability to ssh into the leader of the cluster. This
can be done as follows:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil ssh\-cluster CLUSTER\-NAME\-HERE
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This will open a shell on the Toil leader and is used to start an
\fI\%Running a Workflow with Autoscaling\fP run. Issues with docker prevent using \fBscreen\fP and \fBtmux\fP
when sshing the cluster (The shell doesn\(aqt know that it is a TTY which prevents
it from allocating a new screen session). This can be worked around via
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ script
$ screen
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Simply running \fBscreen\fP within \fBscript\fP will get things working properly again.
.sp
Finally, you can execute remote commands with the following syntax:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil ssh\-cluster CLUSTER\-NAME\-HERE remoteCommand
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
It is not advised that you run your Toil workflow using remote execution like this
unless a tool like \fI\%nohup\fP is used to ensure the
process does not die if the SSH connection is interrupted.
.sp
For an example usage, see \fI\%Running a Workflow with Autoscaling\fP\&.
.SS Rsync\-Cluster Command
.sp
The most frequent use case for the \fBrsync\-cluster\fP utility is deploying your
Toil script to the Toil leader. Note that the syntax is the same as traditional
\fI\%rsync\fP with the exception of the hostname before
the colon. This is not needed in \fBtoil rsync\-cluster\fP since the hostname is automatically
determined by Toil.
.sp
Here is an example of its usage:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil rsync\-cluster CLUSTER\-NAME\-HERE \e
   ~/localFile :/remoteDestination
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Destroy\-Cluster Command
.sp
The \fBdestroy\-cluster\fP command is the advised way to get rid of any Toil cluster
launched using the \fI\%Launch\-Cluster Command\fP command. It ensures that all attached nodes, volumes,
security groups, etc. are deleted. If a node or cluster is shut down using Amazon\(aqs online portal
residual resources may still be in use in the background. To delete a cluster run
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil destroy\-cluster CLUSTER\-NAME\-HERE
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Kill Command
.sp
To kill all currently running jobs for a given jobstore, use the command
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
toil kill file:my\-jobstore
.ft P
.fi
.UNINDENT
.UNINDENT
.SH HPC ENVIRONMENTS
.sp
Toil is a flexible framework that can be leveraged in a variety of environments, including high\-performance computing (HPC) environments.
Toil provides support for a number of batch systems, including \fI\%Grid Engine\fP, \fI\%Slurm\fP, \fI\%Torque\fP and \fI\%LSF\fP, which are popular schedulers used in these environments.
Toil also supports \fI\%HTCondor\fP, which is a popular scheduler for high\-throughput computing (HTC).
To use one of these batch systems specify the \(dq\-\e\-batchSystem\(dq argument to the toil script.
.sp
Due to the cost and complexity of maintaining support for these schedulers we currently consider them to be \(dqcommunity supported\(dq, that is the core development team does not regularly test or develop support for these systems. However, there are members of the Toil community currently deploying Toil in HPC environments and we welcome external contributions.
.sp
Developing the support of a new or existing batch system involves extending the abstract batch system class \fI\%toil.batchSystems.abstractBatchSystem.AbstractBatchSystem\fP\&.
.SS Standard Output/Error from Batch System Jobs
.sp
Standard output and error from batch system jobs (except for the Parasol and Mesos batch systems) are redirected to files in the \fBtoil\-<workflowID>\fP directory created within the temporary directory specified by the \fB\-\-workDir\fP option; see \fI\%Commandline Options\fP\&.
Each file is named as follows: \fBtoil_job_<Toil job ID>_batch_<name of batch system>_<job ID from batch system>_<file description>.log\fP, where \fB<file description>\fP is \fBstd_output\fP for standard output, and \fBstd_error\fP for standard error.
HTCondor will also write job event log files with \fB<file description> = job_events\fP\&.
.sp
If capturing standard output and error is desired, \fB\-\-workDir\fP will generally need to be on a shared file system; otherwise if these are written to local temporary directories on each node (e.g. \fB/tmp\fP) Toil will not be able to retrieve them.
Alternatively, the \fB\-\-noStdOutErr\fP option forces Toil to discard all standard output and error from batch system jobs.
.SH CWL IN TOIL
.sp
The Common Workflow Language (CWL) is an emerging standard for writing workflows
that are portable across multiple workflow engines and platforms.
Toil has full support for the CWL v1.0, v1.1, and v1.2 standards.
.SS Running CWL Locally
.sp
The \fItoil\-cwl\-runner\fP command provides cwl\-parsing functionality using cwltool, and leverages the job\-scheduling and
batch system support of Toil.
.sp
To run in local batch mode, provide the CWL file and the input object file:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil\-cwl\-runner example.cwl example\-job.yml
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
For a simple example of CWL with Toil see \fI\%Running a basic CWL workflow\fP\&.
.SS Note for macOS + Docker + Toil
.sp
When invoking CWL documents that make use of Docker containers if you see errors that
look like
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
docker: Error response from daemon: Mounts denied:
The paths /var/...tmp are not shared from OS X and are not known to Docker.
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
you may need to add
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
export TMPDIR=/tmp/docker_tmp
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
either in your startup file (\fB\&.bashrc\fP) or add it manually in your shell before invoking
toil.
.SS Detailed Usage Instructions
.sp
Help information can be found by using this toil command:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil\-cwl\-runner \-h
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
A more detailed example shows how we can specify both Toil and cwltool arguments for our workflow:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil\-cwl\-runner \e
    \-\-singularity \e
    \-\-jobStore my_jobStore \e
    \-\-batchSystem lsf \e
    \-\-workDir \(gapwd\(ga \e
    \-\-outdir \(gapwd\(ga \e
    \-\-logFile cwltoil.log \e
    \-\-writeLogs \(gapwd\(ga \e
    \-\-logLevel DEBUG \e
    \-\-retryCount 2 \e
    \-\-maxLogFileSize 20000000000 \e
    \-\-stats \e
    standard_bam_processing.cwl \e
    inputs.yaml
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
In this example, we set the following options, which are all passed to Toil:
.sp
\fB\-\-singularity\fP: Specifies that all jobs with Docker format containers
specified should be run using the Singularity container engine instead of the
Docker container engine.
.sp
\fB\-\-jobStore\fP: Path to a folder which doesn\(aqt exist yet, which will contain the
Toil jobstore and all related job\-tracking information.
.sp
\fB\-\-batchSystem\fP: Use the specified HPC or Cloud\-based cluster platform.
.sp
\fB\-\-workDir\fP: The directory where all temporary files will be created for the
workflow. A subdirectory of this will be set as the \fB$TMPDIR\fP environment
variable and this subdirectory can be referenced using the CWL parameter
reference \fB$(runtime.tmpdir)\fP in CWL tools and workflows.
.sp
\fB\-\-outdir\fP: Directory where final \fBFile\fP and \fBDirectory\fP outputs will be
written. References to these and other output types will be in the JSON object
printed to the stdout stream after workflow execution.
.sp
\fB\-\-logFile\fP: Path to the main logfile with logs from all jobs.
.sp
\fB\-\-writeLogs\fP: Directory where all job logs will be stored.
.sp
\fB\-\-retryCount\fP: How many times to retry each Toil job.
.sp
\fB\-\-maxLogFileSize\fP: Logs that get larger than this value will be truncated.
.sp
\fB\-\-stats\fP: Save resources usages in json files that can be collected with the
\fBtoil stats\fP command after the workflow is done.
.sp
\fB\-\-disable\-streaming\fP: Does not allow streaming of input files. This is enabled
by default for files marked with \fBstreamable\fP flag True and only for remote files
when the jobStore is not on local machine.
.SS Running CWL in the Cloud
.sp
To run in cloud and HPC configurations, you may need to provide additional
command line parameters to select and configure the batch system to use.
.sp
To run a CWL workflow in AWS with toil see \fI\%Running a CWL Workflow on AWS\fP\&.
.SS Running CWL within Toil Scripts
.sp
A CWL workflow can be run indirectly in a native Toil script. However, this is not the \fI\%standard\fP way to run
CWL workflows with Toil and doing so comes at the cost of job efficiency. For some use cases, such as running one process on
multiple files, it may be useful. For example, if you want to run a CWL workflow with 3 YML files specifying different
samples inputs, it could look something like:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import subprocess
import tempfile

from toil.common import Toil
from toil.job import Job


def initialize_jobs(job):
    job.fileStore.logToMaster(\(aqinitialize_jobs\(aq)


def runQC(job, cwl_file, cwl_filename, yml_file, yml_filename, outputs_dir, output_num):
    job.fileStore.logToMaster(\(dqrunQC\(dq)
    tempDir = job.fileStore.getLocalTempDir()

    cwl = job.fileStore.readGlobalFile(cwl_file, userPath=os.path.join(tempDir, cwl_filename))
    yml = job.fileStore.readGlobalFile(yml_file, userPath=os.path.join(tempDir, yml_filename))

    subprocess.check_call([\(dqtoil\-cwl\-runner\(dq, cwl, yml])

    output_filename = \(dqoutput.txt\(dq
    output_file = job.fileStore.writeGlobalFile(output_filename)
    job.fileStore.readGlobalFile(output_file, userPath=os.path.join(outputs_dir, \(dqsample_\(dq + output_num + \(dq_\(dq + output_filename))
    return output_file


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_cwlexample\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq
    with Toil(options) as toil:

        # specify the folder where the cwl and yml files live
        inputs_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), \(dqcwlExampleFiles\(dq)
        # specify where you wish the outputs to be written
        outputs_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), \(dqcwlExampleFiles\(dq)

        job0 = Job.wrapJobFn(initialize_jobs)

        cwl_filename = \(dqhello.cwl\(dq
        cwl_file = toil.importFile(\(dqfile://\(dq + os.path.abspath(os.path.join(inputs_dir, cwl_filename)))

        # add list of yml config inputs here or import and construct from file
        yml_files = [\(dqhello1.yml\(dq, \(dqhello2.yml\(dq, \(dqhello3.yml\(dq]
        i = 0
        for yml in yml_files:
            i = i + 1
            yml_file = toil.importFile(\(dqfile://\(dq + os.path.abspath(os.path.join(inputs_dir, yml)))
            yml_filename = yml
            job = Job.wrapJobFn(runQC, cwl_file, cwl_filename, yml_file, yml_filename, outputs_dir, output_num=str(i))
            job0.addChild(job)

        toil.start(job0)

.ft P
.fi
.UNINDENT
.UNINDENT
.SS Running CWL workflows with InplaceUpdateRequirement
.sp
Some CWL workflows use the \fBInplaceUpdateRequirement\fP feature, which requires
that operations on files have visible side effects that Toil\(aqs file store
cannot support. If you need to run a workflow like this, you can make sure that
all of your worker nodes have a shared filesystem, and use the
\fB\-\-bypass\-file\-store\fP option to \fBtoil\-cwl\-runner\fP\&. This will make it leave
all CWL intermediate files on disk and share them between jobs using file
paths, instead of storing them in the file store and downloading them when jobs
need them.
.SS Toil & CWL Tips
.sp
\fBSee logs for just one job by using the full log file\fP
.sp
This requires knowing the job\(aqs toil\-generated ID, which can be found in the log files.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
cat cwltoil.log | grep jobVM1fIs
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBGrep for full tool commands from toil logs\fP
.sp
This gives you a more concise view of the commands being run (note that this information is only available from
Toil when running with \fI\-\-logDebug\fP).
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
pcregrep \-M \(dq\e[job .*\e.cwl.*$\en(.*        .*$\en)*\(dq cwltoil.log
#         ^allows for multiline matching
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBFind Bams that have been generated for specific step while pipeline is running:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
find . | grep \-P \(aq^./out_tmpdir.*_MD\e.bam$\(aq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBSee what jobs have been run\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
cat log/cwltoil.log | grep \-oP \(dq\e[job .*.cwl\e]\(dq | sort | uniq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
or:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
cat log/cwltoil.log | grep \-i \(dqissued job\(dq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBGet status of a workflow\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil status /home/johnsoni/TEST_RUNS_3/TEST_run/tmp/jobstore\-09ae0acc\-c800\-11e8\-9d09\-70106fb1697e
<hostname> 2018\-10\-04 15:01:44,184 MainThread INFO toil.lib.bioio: Root logger is at level \(aqINFO\(aq, \(aqtoil\(aq logger at level \(aqINFO\(aq.
<hostname> 2018\-10\-04 15:01:44,185 MainThread INFO toil.utils.toilStatus: Parsed arguments
<hostname> 2018\-10\-04 15:01:47,081 MainThread INFO toil.utils.toilStatus: Traversing the job graph gathering jobs. This may take a couple of minutes.

Of the 286 jobs considered, there are 179 jobs with children, 107 jobs ready to run, 0 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in file:/home/user/jobstore\-09ae0acc\-c800\-11e8\-9d09\-70106fb1697e.
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBToil Stats\fP
.sp
You can get run statistics broken down by CWL file. This only works once the workflow is finished:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil stats /path/to/jobstore
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The output will contain CPU, memory, and walltime information for all CWL job types:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
<hostname> 2018\-10\-15 12:06:19,003 MainThread INFO toil.lib.bioio: Root logger is at level \(aqINFO\(aq, \(aqtoil\(aq logger at level \(aqINFO\(aq.
<hostname> 2018\-10\-15 12:06:19,004 MainThread INFO toil.utils.toilStats: Parsed arguments
<hostname> 2018\-10\-15 12:06:19,004 MainThread INFO toil.utils.toilStats: Checking if we have files for toil
<hostname> 2018\-10\-15 12:06:19,004 MainThread INFO toil.utils.toilStats: Checked arguments
Batch System: lsf
Default Cores: 1  Default Memory: 10485760K
Max Cores: 9.22337e+18
Total Clock: 106608.01  Total Runtime: 86634.11
Worker
    Count |                                       Time* |                                        Clock |                                              Wait |                                    Memory
        n |      min    med*     ave      max     total |      min     med      ave      max     total |        min      med       ave      max      total |      min     med     ave     max    total
     1659 |     0.00    0.80  264.87 12595.59 439424.40 |     0.00    0.46   449.05 42240.74 744968.80 |  \-35336.69     0.16   \-184.17  4230.65 \-305544.39 |      48K    223K   1020K  40235K 1692300K
Job
 Worker Jobs  |     min    med    ave    max
              |    1077   1077   1077   1077
    Count |                                       Time* |                                        Clock |                                              Wait |                                    Memory
        n |      min    med*     ave      max     total |      min     med      ave      max     total |        min      med       ave      max      total |      min     med     ave     max    total
     1077 |     0.04    1.18  407.06 12593.43 438404.73 |     0.01    0.28   691.17 42240.35 744394.14 |  \-35336.83     0.27   \-284.11  4230.49 \-305989.41 |     135K    268K   1633K  40235K 1759734K
 ResolveIndirect
    Count |                                       Time* |                                        Clock |                                              Wait |                                    Memory
        n |      min    med*     ave      max     total |      min     med      ave      max     total |        min      med       ave      max      total |      min     med     ave     max    total
      205 |     0.04    0.07    0.16     2.29     31.95 |     0.01    0.02     0.02     0.14      3.60 |       0.02     0.05      0.14     2.28      28.35 |     190K    266K    256K    314K   52487K
 CWLGather
    Count |                                       Time* |                                        Clock |                                              Wait |                                    Memory
        n |      min    med*     ave      max     total |      min     med      ave      max     total |        min      med       ave      max      total |      min     med     ave     max    total
       40 |     0.05    0.17    0.29     1.90     11.62 |     0.01    0.02     0.02     0.05      0.80 |       0.03     0.14      0.27     1.88      10.82 |     188K    265K    250K    316K   10039K
 CWLWorkflow
    Count |                                       Time* |                                        Clock |                                              Wait |                                    Memory
        n |      min    med*     ave      max     total |      min     med      ave      max     total |        min      med       ave      max      total |      min     med     ave     max    total
      205 |     0.09    0.40    0.98    13.70    200.82 |     0.04    0.15     0.16     1.08     31.78 |       0.04     0.26      0.82    12.62     169.04 |     190K    270K    257K    316K   52826K
 file:///home/johnsoni/pipeline_0.0.39/ACCESS\-Pipeline/cwl_tools/expression_tools/group_waltz_files.cwl
    Count |                                       Time* |                                        Clock |                                              Wait |                                    Memory
        n |      min    med*     ave      max     total |      min     med      ave      max     total |        min      med       ave      max      total |      min     med     ave     max    total
       99 |     0.29    0.49    0.59     2.50     58.11 |     0.14    0.26     0.29     1.04     28.95 |       0.14     0.22      0.29     1.48      29.16 |     135K    135K    135K    136K   13459K
 file:///home/johnsoni/pipeline_0.0.39/ACCESS\-Pipeline/cwl_tools/expression_tools/make_sample_output_dirs.cwl
    Count |                                       Time* |                                        Clock |                                              Wait |                                    Memory
        n |      min    med*     ave      max     total |      min     med      ave      max     total |        min      med       ave      max      total |      min     med     ave     max    total
       11 |     0.34    0.52    0.74     2.63      8.18 |     0.20    0.30     0.41     1.17      4.54 |       0.14     0.20      0.33     1.45       3.65 |     136K    136K    136K    136K    1496K
 file:///home/johnsoni/pipeline_0.0.39/ACCESS\-Pipeline/cwl_tools/expression_tools/consolidate_files.cwl
    Count |                                       Time* |                                        Clock |                                              Wait |                                    Memory
        n |      min    med*     ave      max     total |      min     med      ave      max     total |        min      med       ave      max      total |      min     med     ave     max    total
        8 |     0.31    0.59    0.71     1.80      5.69 |     0.18    0.35     0.37     0.63      2.94 |       0.13     0.27      0.34     1.17       2.75 |     136K    136K    136K    136K    1091K
 file:///home/johnsoni/pipeline_0.0.39/ACCESS\-Pipeline/cwl_tools/bwa\-mem/bwa\-mem.cwl
    Count |                                       Time* |                                        Clock |                                              Wait |                                    Memory
        n |      min    med*     ave      max     total |      min     med      ave      max     total |        min      med       ave      max      total |      min     med     ave     max    total
       22 |   895.76 3098.13 3587.34 12593.43  78921.51 |  2127.02 7910.31  8123.06 16959.13 178707.34 |  \-11049.84 \-3827.96  \-4535.72    19.49  \-99785.83 |    5659K   5950K   5854K   6128K  128807K
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBUnderstanding toil log files\fP
.sp
There is a \fIworker_log.txt\fP file for each job, this file is written to while the job is running, and deleted after the job finishes. The contents are printed to the main log file and transferred to a log file in the \fI\-\-logDir\fP folder once the job is completed successfully.
.sp
The new log file will be named something like:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
file:<path to cwl tool>.cwl_<job ID>.log

file:\-\-\-home\-johnsoni\-pipeline_1.1.14\-ACCESS\-\-Pipeline\-cwl_tools\-marianas\-ProcessLoopUMIFastq.cwl_I\-O\-jobfGsQQw000.log
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This is the toil job command with spaces replaced by dashes.
.SH WDL IN TOIL
.sp
Support is still in the alpha phase and should be able to handle basic wdl files.  See the specification below for more
details.
.SS How to Run a WDL file in Toil
.sp
Recommended best practice when running wdl files is to first use the Broad\(aqs wdltool for syntax validation and generating
the needed json input file.  Full documentation can be found on the \fI\%repository\fP, and a precompiled jar binary can be
downloaded here: \fI\%wdltool\fP (this requires \fI\%java7\fP).
.sp
That means two steps.  First, make sure your wdl file is valid and devoid of syntax errors by running
.sp
\fBjava \-jar wdltool.jar validate example_wdlfile.wdl\fP
.sp
Second, generate a complementary json file if your wdl file needs one.  This json will contain keys for every necessary
input that your wdl file needs to run:
.sp
\fBjava \-jar wdltool.jar inputs example_wdlfile.wdl\fP
.sp
When this json template is generated, open the file, and fill in values as necessary by hand.  WDL files all require
json files to accompany them.  If no variable inputs are needed, a json file containing only \(aq{}\(aq may be required.
.sp
Once a wdl file is validated and has an appropriate json file, workflows can be run in toil using:
.sp
\fBtoil\-wdl\-runner example_wdlfile.wdl example_jsonfile.json\fP
.sp
See options below for more parameters.
.SS ENCODE Example from ENCODE\-DCC
.sp
To follow this example, you will need docker installed.  The original workflow can be found here:
\fI\%https://github.com/ENCODE\-DCC/pipeline\-container\fP
.sp
We\(aqve included the wdl file and data files in the toil repository needed to run this example.  First, download
the example \fI\%code\fP and unzip.  The file needed is \(dqtestENCODE/encode_mapping_workflow.wdl\(dq.
.sp
Next, use \fI\%wdltool\fP (this requires \fI\%java7\fP) to validate this file:
.sp
\fBjava \-jar wdltool.jar validate encode_mapping_workflow.wdl\fP
.sp
Next, use wdltool to generate a json file for this wdl file:
.sp
\fBjava \-jar wdltool.jar inputs encode_mapping_workflow.wdl\fP
.sp
This json file once opened should look like this:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{
\(dqencode_mapping_workflow.fastqs\(dq: \(dqArray[File]\(dq,
\(dqencode_mapping_workflow.trimming_parameter\(dq: \(dqString\(dq,
\(dqencode_mapping_workflow.reference\(dq: \(dqFile\(dq
}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The trimming_parameter should be set to \(aqnative\(aq.
Download the example \fI\%code\fP and unzip.  Inside are two data files required for the run
.sp
\fBENCODE_data/reference/GRCh38_chr21_bwa.tar.gz\fP
\fBENCODE_data/ENCFF000VOL_chr21.fq.gz\fP
.sp
Editing the json to include these as inputs, the json should now look something like this:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{
\(dqencode_mapping_workflow.fastqs\(dq: [\(dq/path/to/unzipped/ENCODE_data/ENCFF000VOL_chr21.fq.gz\(dq],
\(dqencode_mapping_workflow.trimming_parameter\(dq: \(dqnative\(dq,
\(dqencode_mapping_workflow.reference\(dq: \(dq/path/to/unzipped/ENCODE_data/reference/GRCh38_chr21_bwa.tar.gz\(dq
}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The wdl and json files can now be run using the command
.sp
\fBtoil\-wdl\-runner encode_mapping_workflow.wdl encode_mapping_workflow.json\fP
.sp
This should deposit the output files in the user\(aqs current working directory (to change this, specify a new directory
with the \(aq\-o\(aq option).
.SS GATK Examples from the Broad
.sp
Simple examples of WDL can be found on the Broad\(aqs website as tutorials:
\fI\%https://software.broadinstitute.org/wdl/documentation/topic?name=wdl\-tutorials\fP\&.
.sp
One can follow along with these tutorials, write their own wdl files following the directions and run them using either
cromwell or toil.  For example, in tutorial 1, if you\(aqve followed along and named your wdl file \(aqhelloHaplotypeCall.wdl\(aq,
then once you\(aqve validated your wdl file using \fI\%wdltool\fP (this requires \fI\%java7\fP) using
.sp
\fBjava \-jar wdltool.jar validate helloHaplotypeCaller.wdl\fP
.sp
and generated a json file (and subsequently typed in appropriate filepaths* and variables) using
.sp
\fBjava \-jar wdltool.jar inputs helloHaplotypeCaller.wdl\fP
.INDENT 0.0
.IP \(bu 2
Absolute filepath inputs are recommended for local testing.
.UNINDENT
.sp
then the wdl script can be run using
.sp
\fBtoil\-wdl\-runner helloHaplotypeCaller.wdl helloHaplotypeCaller_inputs.json\fP
.SS toilwdl.py Options
.sp
\fB\(aq\-o\(aq\fP or \fB\(aq\-\e\-outdir\(aq\fP: Specifies the output folder, and defaults to the current working directory if
not specified by the user.
.sp
\fB\(aq\-\e\-dev_mode\(aq\fP: Creates \(dqAST.out\(dq, which holds a printed AST of the wdl file and \(dqmappings.out\(dq, which holds the
printed task, workflow, csv, and tsv dictionaries generated by the parser. Also saves the compiled toil python workflow
file for debugging.
.sp
Any number of arbitrary options may also be specified.  These options will not be parsed immediately, but passed down
as toil options once the wdl/json files are processed.  For valid toil options, see the documentation:
\fI\%http://toil.readthedocs.io/en/latest/running/cliOptions.html\fP
.SS Running WDL within Toil Scripts
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
A cromwell.jar file is needed in order to run a WDL workflow.
.UNINDENT
.UNINDENT
.sp
A WDL workflow can be run indirectly in a native Toil script. However, this is not the \fI\%standard\fP way to run
WDL workflows with Toil and doing so comes at the cost of job efficiency. For some use cases, such as running one process on
multiple files, it may be useful. For example, if you want to run a WDL workflow with 3 JSON files specifying different
samples inputs, it could look something like:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import subprocess
import tempfile

from toil.common import Toil
from toil.job import Job


def initialize_jobs(job):
    job.fileStore.logToMaster(\(dqinitialize_jobs\(dq)


def runQC(job, wdl_file, wdl_filename, json_file, json_filename, outputs_dir, jar_loc,output_num):
    job.fileStore.logToMaster(\(dqrunQC\(dq)
    tempDir = job.fileStore.getLocalTempDir()

    wdl = job.fileStore.readGlobalFile(wdl_file, userPath=os.path.join(tempDir, wdl_filename))
    json = job.fileStore.readGlobalFile(json_file, userPath=os.path.join(tempDir, json_filename))

    subprocess.check_call([\(dqjava\(dq,\(dq\-jar\(dq,jar_loc,\(dqrun\(dq,wdl,\(dq\-\-inputs\(dq,json])

    output_filename = \(dqoutput.txt\(dq
    output_file = job.fileStore.writeGlobalFile(outputs_dir + output_filename)
    job.fileStore.readGlobalFile(output_file, userPath=os.path.join(outputs_dir, \(dqsample_\(dq + output_num + \(dq_\(dq + output_filename))
    return output_file


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_wdlexample\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    with Toil(options) as toil:

        # specify the folder where the wdl and json files live
        inputs_dir = \(dqwdlExampleFiles/\(dq
        # specify where you wish the outputs to be written
        outputs_dir = \(dqwdlExampleFiles/\(dq
        # specify the location of your cromwell jar
        jar_loc = os.path.abspath(\(dqwdlExampleFiles/cromwell\-35.jar\(dq)

        job0 = Job.wrapJobFn(initialize_jobs)

        wdl_filename = \(dqhello.wdl\(dq
        wdl_file = toil.importFile(\(dqfile://\(dq + os.path.abspath(os.path.join(inputs_dir, wdl_filename)))


        # add list of yml config inputs here or import and construct from file
        json_files = [\(dqhello1.json\(dq, \(dqhello2.json\(dq, \(dqhello3.json\(dq]
        i = 0
        for json in json_files:
            i = i + 1
            json_file = toil.importFile(\(dqfile://\(dq + os.path.join(inputs_dir, json))
            json_filename = json
            job = Job.wrapJobFn(runQC, wdl_file, wdl_filename, json_file, json_filename, outputs_dir, jar_loc, output_num=str(i))
            job0.addChild(job)

        toil.start(job0)

.ft P
.fi
.UNINDENT
.UNINDENT
.SS WDL Specifications
.sp
WDL language specifications can be found here: \fI\%https://github.com/broadinstitute/wdl/blob/develop/SPEC.md\fP
.sp
Implementing support for more features is currently underway, but a basic roadmap so far is:
.INDENT 0.0
.TP
.B CURRENTLY IMPLEMENTED:
.INDENT 7.0
.IP \(bu 2
Scatter
.IP \(bu 2
Many Built\-In Functions
.IP \(bu 2
Docker Calls
.IP \(bu 2
Handles Priority, and Output File Wrangling
.IP \(bu 2
Currently Handles Primitives and Arrays
.UNINDENT
.TP
.B TO BE IMPLEMENTED:
.INDENT 7.0
.IP \(bu 2
Integrate Cloud Autoscaling Capacity More Robustly
.IP \(bu 2
WDL Files That \(dqImport\(dq Other WDL Files (Including URI Handling for \(aq\fI\%http://\fP\(aq and \(aq\fI\%https://\fP\(aq)
.UNINDENT
.UNINDENT
.SH WORKFLOW EXECUTION SERVICE (WES)
.sp
The GA4GH Workflow Execution Service (WES) is a standardized API for submitting and monitoring workflows.
Toil has experimental support for setting up a WES server and executing CWL, WDL, and Toil workflows using the WES API.
More information about the WES API specification can be found \fI\%here\fP\&.
.sp
To get started with the Toil WES server, make sure that the \fBserver\fP extra (\fI\%Installing Toil with Extra Features\fP) is installed.
.SS Preparing your WES environment
.sp
The WES server requires \fI\%Celery\fP to distribute and execute workflows. To set up Celery:
.INDENT 0.0
.IP 1. 3
Start RabbitMQ, which is the broker between the WES server and Celery workers:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
docker run \-d \-\-name wes\-rabbitmq \-p 5672:5672 rabbitmq:3.9.5
.ft P
.fi
.UNINDENT
.UNINDENT
.IP 2. 3
Start Celery workers:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
celery \-A toil.server.celery_app worker \-\-loglevel=INFO
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.SS Starting a WES server
.sp
To start a WES server on the default port 8080, run the Toil command:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil server
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The WES API will be hosted on the following URL:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
http://localhost:8080/ga4gh/wes/v1
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To use another port, e.g.: 3000, you can specify the \fB\-\-port\fP argument:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil server \-\-port 3000
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
There are many other command line options. Help information can be found by using this command:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ toil server \-\-help
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Below is a detailed summary of all server\-specific options:
.INDENT 0.0
.TP
.B  \-\-debug
Enable debug mode.
.TP
.B  \-\-bypass_celery
Skip sending workflows to Celery and just run them under the server.
For testing.
.TP
.BI \-\-host \ HOST
The host interface that the Toil server binds on. (default:
\(dq127.0.0.1\(dq).
.TP
.BI \-\-port \ PORT
The port that the Toil server listens on. (default: 8080).
.TP
.B  \-\-swagger_ui
If True, the swagger UI will be enabled and hosted on the
\fI{api_base_path}/ui\fP endpoint. (default: False)
.TP
.B  \-\-cors
Enable Cross Origin Resource Sharing (CORS). This should only be
turned on if the server is intended to be used by a website or
domain. (default: False).
.TP
.BI \-\-cors_origins \ CORS_ORIGIN
Ignored if \-//\-cors is False. This sets the allowed origins for
CORS. For details about CORS and its security risks, see the
\fI\%GA4GH docs on CORS\fP\&. (default: \(dq*\(dq).
.TP
.BI \-\-workers \ WORKERS\fR,\fB \ \-w \ WORKERS
Ignored if \-\e\-debug is True. The number of worker processes
launched by the WSGI server. (default: 2).
.TP
.BI \-\-work_dir \ WORK_DIR
The directory where workflows should be stored. This directory
should be empty or only contain previous workflows. (default:
\(aq./workflows\(aq).
.TP
.BI \-\-state_store \ STATE_STORE
The local path or S3 URL where workflow state metadata should be
stored. (default: in \-\e\-work_dir)
.TP
.BI \-\-opt \ OPT\fR,\fB \ \-o \ OPT
Specify the default parameters to be sent to the workflow engine for
each run.  Options taking arguments must use = syntax. Accepts
multiple values. Example: \-\e\-opt=\-\e\-logLevel=CRITICAL \-\e\-opt=\-\e\-workDir=/tmp.
.TP
.BI \-\-dest_bucket_base \ DEST_BUCKET_BASE
Direct CWL workflows to save output files to dynamically generated
unique paths under the given URL. Supports AWS S3.
.TP
.BI \-\-wes_dialect \ DIALECT
Restrict WES responses to a dialect compatible with clients that do
not fully implement the WES standard. (default: \(aqstandard\(aq)
.UNINDENT
.SS Running the Server with \fIdocker\-compose\fP
.sp
Instead of manually setting up the server components (\fBtoil server\fP, RabbitMQ, and Celery), you can use the following
\fBdocker\-compose.yml\fP file to orchestrate and link them together.
.sp
Make sure to change the credentials for basic authentication by updating the
\fBtraefik.http.middlewares.auth.basicauth.users\fP label. The passwords can be generated with tools like \fBhtpasswd\fP
\fI\%like this\fP\&. (Note that single \fB$\fP signs need to be replaced with \fB$$\fP in the yaml file).
.sp
When running on a different host other than \fBlocalhost\fP, make sure to change the \fBHost\fP to your
tartget host in the \fBtraefik.http.routers.wes.rule\fP and \fBtraefik.http.routers.wespublic.rule\fP labels.
.sp
You can also change \fB/tmp/toil\-workflows\fP if you want Toil workflows to live somewhere else, and create the directory
before starting the server.
.sp
In order to run workflows that require Docker, the \fBdocker.sock\fP socket must be mounted as volume for Celery.
Additionally, the \fBTOIL_WORKDIR\fP directory (defaults to: \fB/var/lib/toil\fP) and \fB/var/lib/cwl\fP (if running CWL
workflows with \fBDockerRequirement\fP) should exist on the host and also be mounted as volumes.
.sp
Also make sure to run it behind a firewall; it opens up the Toil server on port 8080 to anyone who connects.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
# docker\-compose.yml
version: \(dq3.8\(dq

services:
  rabbitmq:
    image: rabbitmq:3.9.5
    hostname: rabbitmq
  celery:
    image: ${TOIL_APPLIANCE_SELF}
    volumes:
      \- /var/run/docker.sock:/var/run/docker.sock
      \- /var/lib/docker:/var/lib/docker
      \- /var/lib/toil:/var/lib/toil
      \- /var/lib/cwl:/var/lib/cwl
      \- /tmp/toil\-workflows:/tmp/toil\-workflows
    command: celery \-\-broker=amqp://guest:guest@rabbitmq:5672// \-A toil.server.celery_app worker \-\-loglevel=INFO
    depends_on:
      \- rabbitmq
  wes\-server:
    image: ${TOIL_APPLIANCE_SELF}
    volumes:
      \- /tmp/toil\-workflows:/tmp/toil\-workflows
    environment:
      \- TOIL_WES_BROKER_URL=amqp://guest:guest@rabbitmq:5672//
    command: toil server \-\-host 0.0.0.0 \-\-port 8000 \-\-work_dir /tmp/toil\-workflows
    expose:
      \- 8000
    labels:
      \- \(dqtraefik.enable=true\(dq
      \- \(dqtraefik.http.routers.wes.rule=Host(\(galocalhost\(ga)\(dq
      \- \(dqtraefik.http.routers.wes.entrypoints=web\(dq
      \- \(dqtraefik.http.routers.wes.middlewares=auth\(dq
      \- \(dqtraefik.http.middlewares.auth.basicauth.users=test:$$2y$$12$$ci.4U63YX83CwkyUrjqxAucnmi2xXOIlEF6T/KdP9824f1Rf1iyNG\(dq
      \- \(dqtraefik.http.routers.wespublic.rule=Host(\(galocalhost\(ga) && Path(\(ga/ga4gh/wes/v1/service\-info\(ga)\(dq
    depends_on:
      \- rabbitmq
      \- celery
  traefik:
    image: traefik:v2.2
    command:
      \- \(dq\-\-providers.docker\(dq
      \- \(dq\-\-providers.docker.exposedbydefault=false\(dq
      \- \(dq\-\-entrypoints.web.address=:8080\(dq
    ports:
      \- \(dq8080:8080\(dq
    volumes:
      \- /var/run/docker.sock:/var/run/docker.sock

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Further customization can also be made as needed. For example, if you have a domain, you can
\fI\%set up HTTPS with Let\(aqs Encrypt\fP\&.
.sp
Once everything is configured, simply run \fBdocker\-compose up\fP to start the containers. Run \fBdocker\-compose down\fP to
stop and remove all containers.
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
\fBdocker\-compose\fP is not installed on the Toil appliance by default. See the following section to set up the WES
server on a Toil cluster.
.UNINDENT
.UNINDENT
.SS Running on a Toil cluster
.sp
To run the server on a Toil leader instance on EC2:
.INDENT 0.0
.IP 1. 3
Launch a Toil cluster with the \fBtoil launch\-cluster\fP command with the AWS provisioner
.IP 2. 3
SSH into your cluster with the \fB\-\-sshOption=\-L8080:localhost:8080\fP option to forward port \fB8080\fP
.IP 3. 3
Install Docker Compose by running the following commands from the \fI\%Docker docs\fP:
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
curl \-L \(dqhttps://github.com/docker/compose/releases/download/1.29.2/docker\-compose\-$(uname \-s)\-$(uname \-m)\(dq \-o /usr/local/bin/docker\-compose
chmod +x /usr/local/bin/docker\-compose

# check installation
docker\-compose \-\-version
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
or, install a different version of Docker Compose by changing \fB\(dq1.29.2\(dq\fP to another version.
.IP 4. 3
Copy the \fBdocker\-compose.yml\fP file from (\fI\%Running the Server with docker\-compose\fP) to an empty directory, and modify the
configuration as needed.
.IP 5. 3
Now, run \fBdocker\-compose up \-d\fP to start the WES server in detach mode on the Toil appliance.
.IP 6. 3
To stop the server, run \fBdocker\-compose down\fP\&.
.UNINDENT
.SS WES API Endpoints
.sp
As defined by the GA4GH WES API specification, the following endpoints with base path \fBga4gh/wes/v1/\fP are supported
by Toil:
.TS
center;
|l|l|.
_
T{
GET /service\-info
T}	T{
Get information about the Workflow Execution Service.
T}
_
T{
GET /runs
T}	T{
List the workflow runs.
T}
_
T{
POST /runs
T}	T{
Run a workflow. This endpoint creates a new workflow
run and returns a \fBrun_id\fP to monitor its progress.
T}
_
T{
GET /runs/{run_id}
T}	T{
Get detailed info about a workflow run.
T}
_
T{
POST /runs/{run_id}/cancel
T}	T{
Cancel a running workflow.
T}
_
T{
GET /runs/{run_id}/status
T}	T{
Get the status (overall state) of a workflow run.
T}
_
.TE
.sp
When running the WES server with the \fBdocker\-compose\fP setup above, most endpoints (except \fBGET /service\-info\fP) will
be protected with basic authentication. Make sure to set the \fBAuthorization\fP header with the correct credentials when
submitting or retrieving a workflow.
.SS Submitting a Workflow
.sp
Now that the WES API is up and running, we can submit and monitor workflows remotely using the WES API endpoints. A
workflow can be submitted for execution using the \fBPOST /runs\fP endpoint.
.sp
As a quick example, we can submit the example CWL workflow from \fI\%Running a basic CWL workflow\fP to our WES API:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
# example.cwl
cwlVersion: v1.0
class: CommandLineTool
baseCommand: echo
stdout: output.txt
inputs:
 message:
   type: string
   inputBinding:
     position: 1
outputs:
 output:
   type: stdout
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
using cURL:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ curl \-\-location \-\-request POST \(aqhttp://localhost:8080/ga4gh/wes/v1/runs\(aq \e
    \-\-user test:test \e
    \-\-form \(aqworkflow_url=\(dqexample.cwl\(dq\(aq \e
    \-\-form \(aqworkflow_type=\(dqcwl\(dq\(aq \e
    \-\-form \(aqworkflow_type_version=\(dqv1.0\(dq\(aq \e
    \-\-form \(aqworkflow_params=\(dq{\e\(dqmessage\e\(dq: \e\(dqHello world!\e\(dq}\(dq\(aq \e
    \-\-form \(aqworkflow_attachment=@\(dq./toil_test_files/example.cwl\(dq\(aq
{
  \(dqrun_id\(dq: \(dq4deb8beb24894e9eb7c74b0f010305d1\(dq
}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note that the \fB\-\-user\fP argument is used to attach the basic authentication credentials along with the request. Make
sure to change \fBtest:test\fP to the username and password you configured for your WES server. Alternatively, you can
also set the \fBAuthorization\fP header manually as \fB\(dqAuthorization: Basic base64_encoded_auth\(dq\fP\&.
.sp
If the workflow is submitted successfully, a JSON object containing a \fBrun_id\fP will be returned. The \fBrun_id\fP is a
unique identifier of your requested workflow, which can be used to monitor or cancel the run.
.sp
There are a few required parameters that have to be set for all workflow submissions, which are the following:
.TS
center;
|l|l|.
_
T{
workflow_url
T}	T{
The URL of the workflow to run. This can refer to a file
from \fBworkflow_attachment\fP\&.
T}
_
T{
workflow_type
T}	T{
The type of workflow language. Toil currently supports one
of the following: \fB\(dqCWL\(dq\fP, \fB\(dqWDL\(dq\fP, or \fB\(dqpy\(dq\fP\&. To run
a Toil native python script, set this to \fB\(dqpy\(dq\fP\&.
T}
_
T{
workflow_type_version
T}	T{
The version of the workflow language. Supported versions
can be found by accessing the \fBGET /service\-info\fP
endpoint of your WES server.
T}
_
T{
workflow_params
T}	T{
A JSON object that specifies the inputs of the workflow.
T}
_
.TE
.sp
Additionally, the following optional parameters are also available:
.TS
center;
|l|l|.
_
T{
workflow_attachment
T}	T{
A list of files associated with the workflow run.
T}
_
T{
workflow_engine_parameters
T}	T{
A JSON key\-value map of workflow engine parameters
to send to the runner.
.sp
Example:
\fB{\(dq\-\-logLevel\(dq: \(dqINFO\(dq, \(dq\-\-workDir\(dq: \(dq/tmp/\(dq}\fP
T}
_
T{
tags
T}	T{
A JSON key\-value map of metadata associated with the
workflow.
T}
_
.TE
.sp
For more details about these parameters, refer to the \fI\%Run Workflow section\fP in the WES API spec.
.SS Upload multiple files
.sp
Looking at the body of the request of the previous example, note that the \fBworkflow_url\fP is a relative URL that refers
to the \fBexample.cwl\fP file uploaded from the local path \fB\&./toil_test_files/example.cwl\fP\&.
.sp
To specify the file name (or subdirectory) of the remote destination file, set the \fBfilename\fP field in the
\fBContent\-Disposition\fP header. You could also upload more than one file by providing the \fBworkflow_attachment\fP
parameter multiple times with different files.
.sp
This can be shown by the following example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ curl \-\-location \-\-request POST \(aqhttp://localhost:8080/ga4gh/wes/v1/runs\(aq \e
    \-\-user test:test \e
    \-\-form \(aqworkflow_url=\(dqexample.cwl\(dq\(aq \e
    \-\-form \(aqworkflow_type=\(dqcwl\(dq\(aq \e
    \-\-form \(aqworkflow_type_version=\(dqv1.0\(dq\(aq \e
    \-\-form \(aqworkflow_params=\(dq{\e\(dqmessage\e\(dq: \e\(dqHello world!\e\(dq}\(dq\(aq \e
    \-\-form \(aqworkflow_attachment=@\(dq./toil_test_files/example.cwl\(dq\(aq \e
    \-\-form \(aqworkflow_attachment=@\(dq./toil_test_files/2.fasta\(dq;filename=inputs/test.fasta\(aq \e
    \-\-form \(aqworkflow_attachment=@\(dq./toil_test_files/2.fastq\(dq;filename=inputs/test.fastq\(aq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
On the server, the execution directory would have the following structure from the above request:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
execution/
├── example.cwl
├── inputs
│     ├── test.fasta
|     └── test.fastq
└── wes_inputs.json
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Specify Toil options
.sp
To pass Toil\-specific parameters to the workflow, you can include the \fBworkflow_engine_parameters\fP parameter along
with your request.
.sp
For example, to set the logging level to \fBINFO\fP, and change the working directory of the workflow, simply include the
following as \fBworkflow_engine_parameters\fP:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{\(dq\-\-logLevel\(dq: \(dqINFO\(dq, \(dq\-\-workDir\(dq: \(dq/tmp/\(dq}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
These options would be appended at the end of existing parameters during command construction, which would override the
default parameters if provided. (Default parameters that can be passed multiple times would not be overridden).
.SS Monitoring a Workflow
.sp
With the \fBrun_id\fP returned when submitting the workflow, we can check the status or get the full logs of the workflow
run.
.SS Checking the state
.sp
The \fBGET /runs/{run_id}/status\fP endpoint can be used to get a simple result with the overall state of your run:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ curl \-\-user test:test http://localhost:8080/ga4gh/wes/v1/runs/4deb8beb24894e9eb7c74b0f010305d1/status
{
  \(dqrun_id\(dq: \(dq4deb8beb24894e9eb7c74b0f010305d1\(dq,
  \(dqstate\(dq: \(dqRUNNING\(dq
}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The possible states here are: \fBQUEUED\fP, \fBINITIALIZING\fP, \fBRUNNING\fP, \fBCOMPLETE\fP, \fBEXECUTOR_ERROR\fP,
\fBSYSTEM_ERROR\fP, \fBCANCELING\fP, and \fBCANCELED\fP\&.
.SS Getting the full logs
.sp
To get the detailed information about a workflow run, use the \fBGET /runs/{run_id}\fP endpoint:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ curl \-\-user test:test http://localhost:8080/ga4gh/wes/v1/runs/4deb8beb24894e9eb7c74b0f010305d1
{
  \(dqrun_id\(dq: \(dq4deb8beb24894e9eb7c74b0f010305d1\(dq,
  \(dqrequest\(dq: {
    \(dqworkflow_attachment\(dq: [
      \(dqexample.cwl\(dq
    ],
    \(dqworkflow_url\(dq: \(dqexample.cwl\(dq,
    \(dqworkflow_type\(dq: \(dqcwl\(dq,
    \(dqworkflow_type_version\(dq: \(dqv1.0\(dq,
    \(dqworkflow_params\(dq: {
      \(dqmessage\(dq: \(dqHello world!\(dq
    }
  },
  \(dqstate\(dq: \(dqRUNNING\(dq,
  \(dqrun_log\(dq: {
    \(dqcmd\(dq: [
      \(dqtoil\-cwl\-runner \-\-outdir=/home/toil/workflows/4deb8beb24894e9eb7c74b0f010305d1/outputs \-\-jobStore=file:/home/toil/workflows/4deb8beb24894e9eb7c74b0f010305d1/toil_job_store /home/toil/workflows/4deb8beb24894e9eb7c74b0f010305d1/execution/example.cwl /home/workflows/4deb8beb24894e9eb7c74b0f010305d1/execution/wes_inputs.json\(dq
    ],
    \(dqstart_time\(dq: \(dq2021\-08\-30T17:35:50Z\(dq,
    \(dqend_time\(dq: null,
    \(dqstdout\(dq: null,
    \(dqstderr\(dq: null,
    \(dqexit_code\(dq: null
  },
  \(dqtask_logs\(dq: [],
  \(dqoutputs\(dq: {}
}
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Canceling a run
.sp
To cancel a workflow run, use the \fBPOST /runs/{run_id}/cancel\fP endpoint:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ curl \-\-location \-\-request POST \(aqhttp://localhost:8080/ga4gh/wes/v1/runs/4deb8beb24894e9eb7c74b0f010305d1/cancel\(aq \e
      \-\-user test:test
{
  \(dqrun_id\(dq: \(dq4deb8beb24894e9eb7c74b0f010305d1\(dq
}
.ft P
.fi
.UNINDENT
.UNINDENT
.SH DEVELOPING A WORKFLOW
.sp
This tutorial walks through the features of Toil necessary for developing a
workflow using the Toil Python API.
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
\(dqscript\(dq and \(dqworkflow\(dq will be used interchangeably
.UNINDENT
.UNINDENT
.SS Scripting Quick Start
.sp
To begin, consider this short toil script which illustrates defining a
workflow:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job


def helloWorld(message, memory=\(dq2G\(dq, cores=2, disk=\(dq3G\(dq):
    return f\(dqHello, world!, here\(aqs a message: {message}\(dq


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_quickstart\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqOFF\(dq
    options.clean = \(dqalways\(dq

    hello_job = Job.wrapFn(helloWorld, \(dqWoot\(dq)

    with Toil(options) as toil:
        print(toil.start(hello_job))  # prints \(dqHello, world!, ...\(dq

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The workflow consists of a single job. The resource requirements for that job
are (optionally) specified by keyword arguments (memory, cores, disk). The
script is run using \fI\%toil.job.Job.Runner.getDefaultOptions()\fP\&. Below we
explain the components of this code in detail.
.SS Job Basics
.sp
The atomic unit of work in a Toil workflow is a \fI\%Job\fP\&.
User scripts inherit from this base class to define units of work. For example,
here is a more long\-winded class\-based version of the job in the quick start
example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from toil.job import Job

class HelloWorld(Job):
    def __init__(self, message):
        Job.__init__(self,  memory=\(dq2G\(dq, cores=2, disk=\(dq3G\(dq)
        self.message = message

    def run(self, fileStore):
        return f\(dqHello, world! Here\(aqs a message: {self.message}\(dq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
In the example a class, HelloWorld, is defined. The constructor requests 2
gigabytes of memory, 2 cores and 3 gigabytes of local disk to complete the work.
.sp
The \fI\%toil.job.Job.run()\fP method is the function the user overrides to get
work done. Here it just returns a message.
.sp
It is also possible to log a message using \fI\%toil.job.Job.log()\fP, which will
be registered in the log output of the leader process of the workflow:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
\&...
    def run(self, fileStore):
        self.log(f\(dqHello, world! Here\(aqs a message: {self.message}\(dq)
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Invoking a Workflow
.sp
We can add to the previous example to turn it into a complete workflow by
adding the necessary function calls to create an instance of HelloWorld and to
run this as a workflow containing a single job. For example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job


class HelloWorld(Job):
    def __init__(self, message):
        Job.__init__(self, memory=\(dq2G\(dq, cores=2, disk=\(dq3G\(dq)
        self.message = message

    def run(self, fileStore):
        return f\(dqHello, world!, here\(aqs a message: {self.message}\(dq


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_invokeworkflow\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqOFF\(dq
    options.clean = \(dqalways\(dq

    hello_job = HelloWorld(\(dqWoot\(dq)

    with Toil(options) as toil:
        print(toil.start(hello_job))

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBNOTE:\fP
.INDENT 0.0
.INDENT 3.5
Do not include a \fI\&.\fP in the name of your python script (besides \fI\&.py\fP at the end).
This is to allow toil to import the types and  functions defined in your file while starting a new process.
.UNINDENT
.UNINDENT
.sp
This uses the \fI\%toil.common.Toil\fP class, which is used to run and resume
Toil workflows. It is used as a context manager and allows for preliminary
setup, such as staging of files into the job store on the leader node. An
instance of the class is initialized by specifying an options object.
The actual workflow is then invoked by calling the
\fI\%toil.common.Toil.start()\fP method, passing the root job of the workflow,
or, if a workflow is being restarted, \fI\%toil.common.Toil.restart()\fP should
be used. Note that the context manager should have explicit if else branches
addressing restart and non restart cases. The boolean value for these if else
blocks is toil.options.restart.
.sp
For example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job

class HelloWorld(Job):
    def __init__(self, message):
        Job.__init__(self,  memory=\(dq2G\(dq, cores=2, disk=\(dq3G\(dq)
        self.message = message

    def run(self, fileStore):
        return f\(dqHello, world!, I have a message: {self.message}\(dq


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_invokeworkflow2\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    with Toil(options) as toil:
        if not toil.options.restart:
            job = HelloWorld(\(dqWoot!\(dq)
            output = toil.start(job)
        else:
            output = toil.restart()
    print(output)

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The call to \fI\%toil.job.Job.Runner.getDefaultOptions()\fP creates a set of
default options for the workflow. The only argument is a description of how to
store the workflow\(aqs state in what we call a \fIjob\-store\fP\&. Here the job\-store is
contained in a directory within the current working directory called
\(dqtoilWorkflowRun\(dq. Alternatively this string can encode other ways to store the
necessary state, e.g. an S3 bucket object store location. By default
the job\-store is deleted if the workflow completes successfully.
.sp
The workflow is executed in the final line, which creates an instance of
HelloWorld and runs it as a workflow. Note all Toil workflows start from a
single starting job, referred to as the \fIroot\fP job. The return value of the
root job is returned as the result of the completed workflow (see promises
below to see how this is a useful feature!).
.SS Specifying Commandline Arguments
.sp
To allow command line control of the options we can use the
\fI\%toil.job.Job.Runner.getDefaultArgumentParser()\fP
method to create a \fI\%argparse.ArgumentParser\fP object which can be used to
parse command line options for a Toil script. For example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from toil.common import Toil
from toil.job import Job


class HelloWorld(Job):
    def __init__(self, message):
        Job.__init__(self,  memory=\(dq2G\(dq, cores=2, disk=\(dq3G\(dq)
        self.message = message

    def run(self, fileStore):
        return \(dqHello, world!, here\(aqs a message: %s\(dq % self.message


if __name__ == \(dq__main__\(dq:
    parser = Job.Runner.getDefaultArgumentParser()
    options = parser.parse_args()
    options.logLevel = \(dqOFF\(dq
    options.clean = \(dqalways\(dq

    hello_job = HelloWorld(\(dqWoot\(dq)

    with Toil(options) as toil:
        print(toil.start(hello_job))

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Creates a fully fledged script with all the options Toil exposed as command
line arguments. Running this script with \(dq\-\-help\(dq will print the full list of
options.
.sp
Alternatively an existing \fI\%argparse.ArgumentParser\fP or
\fI\%optparse.OptionParser\fP object can have Toil script command line options
added to it with the \fI\%toil.job.Job.Runner.addToilOptions()\fP method.
.SS Resuming a Workflow
.sp
In the event that a workflow fails, either because of programmatic error within
the jobs being run, or because of node failure, the workflow can be resumed.
Workflows can only not be reliably resumed if the job\-store itself becomes
corrupt.
.sp
Critical to resumption is that jobs can be rerun, even if they have apparently
completed successfully. Put succinctly, a user defined job should not corrupt
its input arguments. That way, regardless of node, network or leader failure
the job can be restarted and the workflow resumed.
.sp
To resume a workflow specify the \(dqrestart\(dq option in the options object passed
to \fI\%toil.common.Toil.start()\fP\&. If node failures are expected it can
also be useful to use the integer \(dqretryCount\(dq option, which will attempt to
rerun a job retryCount number of times before marking it fully failed.
.sp
In the common scenario that a small subset of jobs fail (including retry
attempts) within a workflow Toil will continue to run other jobs until it can
do no more, at which point \fI\%toil.common.Toil.start()\fP will raise a
\fBtoil.leader.FailedJobsException\fP exception. Typically at this point
the user can decide to fix the script and resume the workflow or delete the
job\-store manually and rerun the complete workflow.
.SS Functions and Job Functions
.sp
Defining jobs by creating class definitions generally involves the boilerplate
of creating a constructor. To avoid this the classes
\fI\%toil.job.FunctionWrappingJob\fP and
\fBtoil.job.JobFunctionWrappingTarget\fP allow functions to be directly
converted to jobs. For example, the quick start example (repeated here):
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job


def helloWorld(message, memory=\(dq2G\(dq, cores=2, disk=\(dq3G\(dq):
    return f\(dqHello, world!, here\(aqs a message: {message}\(dq


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_quickstart\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqOFF\(dq
    options.clean = \(dqalways\(dq

    hello_job = Job.wrapFn(helloWorld, \(dqWoot\(dq)

    with Toil(options) as toil:
        print(toil.start(hello_job))  # prints \(dqHello, world!, ...\(dq

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Is equivalent to the previous example, but using a function to define the job.
.sp
The function call:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
Job.wrapFn(helloWorld, \(dqWoot\(dq)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Creates the instance of the \fBtoil.job.FunctionWrappingTarget\fP that wraps
the function.
.sp
The keyword arguments \fImemory\fP, \fIcores\fP and \fIdisk\fP allow resource requirements
to be specified as before. Even if they are not included as keyword arguments
within a function header they can be passed as arguments when wrapping a
function as a job and will be used to specify resource requirements.
.sp
We can also use the function wrapping syntax to a \fIjob function\fP, a function
whose first argument is a reference to the wrapping job. Just like a \fIself\fP
argument in a class, this allows access to the methods of the wrapping job, see
\fBtoil.job.JobFunctionWrappingTarget\fP\&. For example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job


def helloWorld(job, message):
    job.log(f\(dqHello world, I have a message: {message}\(dq)


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_jobfunctions\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    hello_job = Job.wrapJobFn(helloWorld, \(dqWoot!\(dq)

    with Toil(options) as toil:
        toil.start(hello_job)

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Here \fBhelloWorld()\fP is a job function. It uses the \fI\%toil.job.Job.log()\fP
to log a message that will
be printed to the output console. Here the only subtle difference to note is
the line:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
hello_job = Job.wrapJobFn(helloWorld, \(dqWoot\(dq)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Which uses the function \fI\%toil.job.Job.wrapJobFn()\fP to wrap the job function
instead of \fI\%toil.job.Job.wrapFn()\fP which wraps a vanilla function.
.SS Workflows with Multiple Jobs
.sp
A \fIparent\fP job can have \fIchild\fP jobs and \fIfollow\-on\fP jobs. These relationships
are specified by methods of the job class, e.g. \fI\%toil.job.Job.addChild()\fP
and \fI\%toil.job.Job.addFollowOn()\fP\&.
.sp
Considering a set of jobs the nodes in a job graph and the child and follow\-on
relationships the directed edges of the graph, we say that a job B that is on a
directed path of child/follow\-on edges from a job \fBA\fP in the job graph is a
\fIsuccessor\fP of \fBA\fP, similarly \fBA\fP is a \fIpredecessor\fP of \fBB\fP\&.
.sp
A parent job\(aqs child jobs are run directly after the parent job has completed,
and in parallel. The follow\-on jobs of a job are run after its child jobs and
their successors have completed. They are also run in parallel. Follow\-ons
allow the easy specification of cleanup tasks that happen after a set of
parallel child tasks. The following shows a simple example that uses the
earlier \fBhelloWorld()\fP job function:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from toil.common import Toil
from toil.job import Job


def helloWorld(job, message, memory=\(dq2G\(dq, cores=2, disk=\(dq3G\(dq):
    job.log(f\(dqHello world, I have a message: {message}\(dq)


if __name__ == \(dq__main__\(dq:
    parser = Job.Runner.getDefaultArgumentParser()
    options = parser.parse_args()
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    j1 = Job.wrapJobFn(helloWorld, \(dqfirst\(dq)
    j2 = Job.wrapJobFn(helloWorld, \(dqsecond or third\(dq)
    j3 = Job.wrapJobFn(helloWorld, \(dqsecond or third\(dq)
    j4 = Job.wrapJobFn(helloWorld, \(dqlast\(dq)

    j1.addChild(j2)
    j1.addChild(j3)
    j1.addFollowOn(j4)

    with Toil(options) as toil:
        toil.start(j1)

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
In the example four jobs are created, first \fBj1\fP is run, then \fBj2\fP and
\fBj3\fP are run in parallel as children of \fBj1\fP, finally \fBj4\fP is run as a
follow\-on of \fBj1\fP\&.
.sp
There are multiple short hand functions to achieve the same workflow, for
example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from toil.common import Toil
from toil.job import Job


def helloWorld(job, message, memory=\(dq2G\(dq, cores=2, disk=\(dq3G\(dq):
    job.log(f\(dqHello world, I have a message: {message}\(dq)


if __name__ == \(dq__main__\(dq:
    parser = Job.Runner.getDefaultArgumentParser()
    options = parser.parse_args()
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    j1 = Job.wrapJobFn(helloWorld, \(dqfirst\(dq)
    j2 = j1.addChildJobFn(helloWorld, \(dqsecond or third\(dq)
    j3 = j1.addChildJobFn(helloWorld, \(dqsecond or third\(dq)
    j4 = j1.addFollowOnJobFn(helloWorld, \(dqlast\(dq)

    with Toil(options) as toil:
        toil.start(j1)

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Equivalently defines the workflow, where the functions
\fI\%toil.job.Job.addChildJobFn()\fP and \fI\%toil.job.Job.addFollowOnJobFn()\fP
are used to create job functions as children or follow\-ons of an earlier job.
.sp
Jobs graphs are not limited to trees, and can express arbitrary directed acyclic
graphs. For a precise definition of legal graphs see
\fI\%toil.job.Job.checkJobGraphForDeadlocks()\fP\&. The previous example could be
specified as a DAG as follows:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from toil.common import Toil
from toil.job import Job


def helloWorld(job, message, memory=\(dq2G\(dq, cores=2, disk=\(dq3G\(dq):
    job.log(f\(dqHello world, I have a message: {message}\(dq)


if __name__ == \(dq__main__\(dq:
    parser = Job.Runner.getDefaultArgumentParser()
    options = parser.parse_args()
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    j1 = Job.wrapJobFn(helloWorld, \(dqfirst\(dq)
    j2 = j1.addChildJobFn(helloWorld, \(dqsecond or third\(dq)
    j3 = j1.addChildJobFn(helloWorld, \(dqsecond or third\(dq)
    j4 = j2.addChildJobFn(helloWorld, \(dqlast\(dq)
    j3.addChild(j4)

    with Toil(options) as toil:
        toil.start(j1)

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note the use of an extra child edge to make \fBj4\fP a child of both \fBj2\fP and
\fBj3\fP\&.
.SS Dynamic Job Creation
.sp
The previous examples show a workflow being defined outside of a job. However,
Toil also allows jobs to be created dynamically within jobs. For example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job


def binaryStringFn(job, depth, message=\(dq\(dq):
    if depth > 0:
        job.addChildJobFn(binaryStringFn, depth\-1, message + \(dq0\(dq)
        job.addChildJobFn(binaryStringFn, depth\-1, message + \(dq1\(dq)
    else:
        job.log(f\(dqBinary string: {message}\(dq)


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_dynamic\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    with Toil(options) as toil:
        toil.start(Job.wrapJobFn(binaryStringFn, depth=5))

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The job function \fBbinaryStringFn\fP logs all possible binary strings of length
\fBn\fP (here \fBn=5\fP), creating a total of \fB2^(n+2) \- 1\fP jobs dynamically and
recursively. Static and dynamic creation of jobs can be mixed in a Toil
workflow, with jobs defined within a job or job function being created at
run time.
.SS Promises
.sp
The previous example of dynamic job creation shows variables from a parent job
being passed to a child job. Such forward variable passing is naturally
specified by recursive invocation of successor jobs within parent jobs. This
can also be achieved statically by passing around references to the return
variables of jobs. In Toil this is achieved with promises, as illustrated in
the following example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job


def fn(job, i):
    job.log(\(dqi is: %s\(dq % i, level=100)
    return i + 1


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_promises\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    j1 = Job.wrapJobFn(fn, 1)
    j2 = j1.addChildJobFn(fn, j1.rv())
    j3 = j1.addFollowOnJobFn(fn, j2.rv())

    with Toil(options) as toil:
        toil.start(j1)

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Running this workflow results in three log messages from the jobs: \fBi is 1\fP
from \fBj1\fP, \fBi is 2\fP from \fBj2\fP and \fBi is 3\fP from \fBj3\fP\&.
.sp
The return value from the first job is \fIpromised\fP to the second job by the call
to \fI\%toil.job.Job.rv()\fP in the following line:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
j2 = j1.addChildFn(fn, j1.rv())
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The value of \fBj1.rv()\fP is a \fIpromise\fP, rather than the actual return value of
the function, because \fBj1\fP for the given input has at that point not been
evaluated. A promise (\fI\%toil.job.Promise\fP) is essentially a pointer to
for the return value that is replaced by the actual return value once it has
been evaluated. Therefore, when \fBj2\fP is run the promise becomes 2.
.sp
Promises also support indexing of return values:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
def parent(job):
    indexable = Job.wrapJobFn(fn)
    job.addChild(indexable)
    job.addFollowOnFn(raiseWrap, indexable.rv(2))

def raiseWrap(arg):
    raise RuntimeError(arg) # raises \(dq2\(dq

def fn(job):
    return (0, 1, 2, 3)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Promises can be quite useful. For example, we can combine dynamic job creation
with promises to achieve a job creation process that mimics the functional
patterns possible in many programming languages:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job


def binaryStrings(job, depth, message=\(dq\(dq):
    if depth > 0:
        s = [job.addChildJobFn(binaryStrings, depth \- 1, message + \(dq0\(dq).rv(),
             job.addChildJobFn(binaryStrings, depth \- 1, message + \(dq1\(dq).rv()]
        return job.addFollowOnFn(merge, s).rv()
    return [message]


def merge(strings):
    return strings[0] + strings[1]


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_promises2\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.loglevel = \(dqOFF\(dq
    options.clean = \(dqalways\(dq

    with Toil(options) as toil:
        print(toil.start(Job.wrapJobFn(binaryStrings, depth=5)))

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The return value \fBl\fP of the workflow is a list of all binary strings of
length 10, computed recursively. Although a toy example, it demonstrates how
closely Toil workflows can mimic typical programming patterns.
.SS Promised Requirements
.sp
Promised requirements are a special case of \fI\%Promises\fP that allow a job\(aqs
return value to be used as another job\(aqs resource requirements.
.sp
This is useful when, for example, a job\(aqs storage requirement is determined by a
file staged to the job store by an earlier job:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job, PromisedRequirement


def parentJob(job):
    downloadJob = Job.wrapJobFn(stageFn, \(dqfile://\(dq + os.path.realpath(__file__), cores=0.1, memory=\(aq32M\(aq, disk=\(aq1M\(aq)
    job.addChild(downloadJob)

    analysis = Job.wrapJobFn(analysisJob,
                             fileStoreID=downloadJob.rv(0),
                             disk=PromisedRequirement(downloadJob.rv(1)))
    job.addFollowOn(analysis)


def stageFn(job, url, cores=1):
    importedFile = job.fileStore.import_file(url)
    return importedFile, importedFile.size


def analysisJob(job, fileStoreID, cores=2):
    # now do some analysis on the file
    pass


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_requirements\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    with Toil(options) as toil:
        toil.start(Job.wrapJobFn(parentJob))

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note that this also makes use of the \fBsize\fP attribute of the \fI\%FileID\fP object.
This promised requirements mechanism can also be used in combination with an aggregator for
multiple jobs\(aq output values:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
def parentJob(job):
    aggregator = []
    for fileNum in range(0, 10):
        downloadJob = Job.wrapJobFn(stageFn, \(dqfile://\(dq + os.path.realpath(__file__), cores=0.1, memory=\(aq32M\(aq, disk=\(aq1M\(aq)
        job.addChild(downloadJob)
        aggregator.append(downloadJob)

    analysis = Job.wrapJobFn(analysisJob,
                             fileStoreID=downloadJob.rv(0),
                             disk=PromisedRequirement(lambda xs: sum(xs), [j.rv(1) for j in aggregator]))
    job.addFollowOn(analysis)
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.IP "Limitations"
.sp
Just like regular promises, the return value must be determined prior to
scheduling any job that depends on the return value. In our example above, notice
how the dependent jobs were follow ons to the parent while promising jobs are
children of the parent. This ordering ensures that all promises are
properly fulfilled.
.UNINDENT
.UNINDENT
.SS FileID
.sp
The \fBtoil.fileStore.FileID\fP class is a small wrapper around Python\(aqs builtin string class. It is used to
represent a file\(aqs ID in the file store, and has a \fBsize\fP attribute that is the
file\(aqs size in bytes. This object is returned by \fBimportFile\fP and \fBwriteGlobalFile\fP\&.
.SS Managing files within a workflow
.sp
It is frequently the case that a workflow will want to create files, both
persistent and temporary, during its run. The
\fI\%toil.fileStores.abstractFileStore.AbstractFileStore\fP class is used by
jobs to manage these files in a manner that guarantees cleanup and resumption
on failure.
.sp
The \fI\%toil.job.Job.run()\fP method has a file store instance as an argument.
The following example shows how this can be used to create temporary files that
persist for the length of the job, be placed in a specified local disk of the
node and that will be cleaned up, regardless of failure, when the job finishes:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job


class LocalFileStoreJob(Job):
    def run(self, fileStore):
        # self.tempDir will always contain the name of a directory within the allocated disk space reserved for the job
        scratchDir = self.tempDir

        # Similarly create a temporary file.
        scratchFile = fileStore.getLocalTempFile()


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_managing\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    # Create an instance of FooJob which will have at least 2 gigabytes of storage space.
    j = LocalFileStoreJob(disk=\(dq2G\(dq)

    # Run the workflow
    with Toil(options) as toil:
        toil.start(j)

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Job functions can also access the file store for the job. The equivalent of the
\fBLocalFileStoreJob\fP class is
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
def localFileStoreJobFn(job):
    scratchDir = job.tempDir
    scratchFile = job.fileStore.getLocalTempFile()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note that the \fBfileStore\fP attribute is accessed as an attribute of the
\fBjob\fP argument.
.sp
In addition to temporary files that exist for the duration of a job, the file
store allows the creation of files in a \fIglobal\fP store, which persists during
the workflow and are globally accessible (hence the name) between jobs. For
example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job


def globalFileStoreJobFn(job):
    job.log(\(dqThe following example exercises all the methods provided \(dq
            \(dqby the toil.fileStores.abstractFileStore.AbstractFileStore class\(dq)

    # Create a local temporary file.
    scratchFile = job.fileStore.getLocalTempFile()

    # Write something in the scratch file.
    with open(scratchFile, \(aqw\(aq) as fH:
        fH.write(\(dqWhat a tangled web we weave\(dq)

    # Write a copy of the file into the file\-store; fileID is the key that can be used to retrieve the file.
    # This write is asynchronous by default
    fileID = job.fileStore.writeGlobalFile(scratchFile)

    # Write another file using a stream; fileID2 is the
    # key for this second file.
    with job.fileStore.writeGlobalFileStream(cleanup=True) as (fH, fileID2):
        fH.write(b\(dqOut brief candle\(dq)

    # Now read the first file; scratchFile2 is a local copy of the file that is read\-only by default.
    scratchFile2 = job.fileStore.readGlobalFile(fileID)

    # Read the second file to a desired location: scratchFile3.
    scratchFile3 = os.path.join(job.tempDir, \(dqfoo.txt\(dq)
    job.fileStore.readGlobalFile(fileID2, userPath=scratchFile3)

    # Read the second file again using a stream.
    with job.fileStore.readGlobalFileStream(fileID2) as fH:
        print(fH.read())  # This prints \(dqOut brief candle\(dq

    # Delete the first file from the global file\-store.
    job.fileStore.deleteGlobalFile(fileID)

    # It is unnecessary to delete the file keyed by fileID2 because we used the cleanup flag,
    # which removes the file after this job and all its successors have run (if the file still exists)


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_managing2\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    with Toil(options) as toil:
        toil.start(Job.wrapJobFn(globalFileStoreJobFn))

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The example demonstrates the global read, write and delete functionality of the
file\-store, using both local copies of the files and streams to read and write
the files. It covers all the methods provided by the file store interface.
.sp
What is obvious is that the file\-store provides no functionality to update an
existing \(dqglobal\(dq file, meaning that files are, barring deletion, immutable.
Also worth noting is that there is no file system hierarchy for files in the
global file store. These limitations allow us to fairly easily support
different object stores and to use caching to limit the amount of network file
transfer between jobs.
.SS Staging of Files into the Job Store
.sp
External files can be imported into or exported out of the job store prior to
running a workflow when the \fI\%toil.common.Toil\fP context manager is used
on the leader. The context manager provides methods
\fBtoil.common.Toil.importFile()\fP, and \fBtoil.common.Toil.exportFile()\fP
for this purpose. The destination and source locations of such files are
described with URLs passed to the two methods.  Local files can be imported and
exported as relative paths, and should be relative to the directory where the
toil workflow is initially run from.
.sp
Using absolute paths and appropriate schema where possible (prefixing with
\(dq\fI\%file://\fP\(dq or \(dqs3:/\(dq for example), make imports and exports less ambiguous
and is recommended.
.sp
A list of the currently supported URLs can be found at
\fBtoil.jobStores.abstractJobStore.AbstractJobStore.importFile()\fP\&. To import
an external file into the job store as a shared file, pass the optional
\fBsharedFileName\fP parameter to that method.
.sp
If a workflow fails for any reason an imported file acts as any other file in
the job store. If the workflow was configured such that it not be cleaned up on
a failed run, the file will persist in the job store and needs not be staged
again when the workflow is resumed.
.sp
Example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job


class HelloWorld(Job):
    def __init__(self, id):
        Job.__init__(self, memory=\(dq2G\(dq, cores=2, disk=\(dq3G\(dq)
        self.inputFileID = id

    def run(self, fileStore):
        with fileStore.readGlobalFileStream(self.inputFileID, encoding=\(aqutf\-8\(aq) as fi:
            with fileStore.writeGlobalFileStream(encoding=\(aqutf\-8\(aq) as (fo, outputFileID):
                fo.write(fi.read() + \(aqWorld!\(aq)
        return outputFileID


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_staging\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    with Toil(options) as toil:
        if not toil.options.restart:
            ioFileDirectory = os.path.join(os.path.dirname(os.path.abspath(__file__)), \(dqstagingExampleFiles\(dq)
            inputFileID = toil.importFile(\(dqfile://\(dq + os.path.abspath(os.path.join(ioFileDirectory, \(dqin.txt\(dq)))
            outputFileID = toil.start(HelloWorld(inputFileID))
        else:
            outputFileID = toil.restart()

        toil.exportFile(outputFileID, \(dqfile://\(dq + os.path.abspath(os.path.join(ioFileDirectory, \(dqout.txt\(dq)))

.ft P
.fi
.UNINDENT
.UNINDENT
.SS Using Docker Containers in Toil
.sp
Docker containers are commonly used with Toil. The combination of Toil and Docker
allows for pipelines to be fully portable between any platform that has both Toil
and Docker installed. Docker eliminates the need for the user to do any other tool
installation or environment setup.
.sp
In order to use Docker containers with Toil, Docker must be installed on all
workers of the cluster. Instructions for installing Docker can be found on the
\fI\%Docker\fP website.
.sp
When using Toil\-based autoscaling, Docker will be automatically set up
on the cluster\(aqs worker nodes, so no additional installation steps are necessary.
Further information on using Toil\-based autoscaling can be found in the \fI\%Running a Workflow with Autoscaling\fP
documentation.
.sp
In order to use docker containers in a Toil workflow, the container can be built
locally or downloaded in real time from an online docker repository like \fI\%Quay\fP\&. If
the container is not in a repository, the container\(aqs layers must be accessible on
each node of the cluster.
.sp
When invoking docker containers from within a Toil workflow, it is strongly
recommended that you use \fBdockerCall()\fP, a toil job function provided in
\fBtoil.lib.docker\fP\&. \fBdockerCall\fP leverages docker\(aqs own python API,
and provides container cleanup on job failure. When docker containers are
run without this feature, failed jobs can result in resource leaks.  Docker\(aqs
API can be found at \fI\%docker\-py\fP\&.
.sp
In order to use \fBdockerCall\fP, your installation of Docker must be set up to run
without \fBsudo\fP\&. Instructions for setting this up can be found \fI\%here\fP\&.
.sp
An example of a basic \fBdockerCall\fP is below:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
dockerCall(job=job,
            tool=\(aqquay.io/ucsc_cgl/bwa\(aq,
            workDir=job.tempDir,
            parameters=[\(aqindex\(aq, \(aq/data/reference.fa\(aq])
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note the assumption that \fIreference.fa\fP file is located in \fI/data\fP\&. This is Toil\(aqs
standard convention as a mount location to reduce boilerplate when calling \fIdockerCall\fP\&.
Users can choose their own mount locations by supplying a \fIvolumes\fP kwarg to \fIdockerCall\fP,
such as: \fIvolumes={working_dir: {\(aqbind\(aq: \(aq/data\(aq, \(aqmode\(aq: \(aqrw\(aq}}\fP, where \fIworking_dir\fP
is an absolute path on the user\(aqs filesystem.
.sp
\fBdockerCall\fP can also be added to workflows like any other job function:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job
from toil.lib.docker import apiDockerCall

align = Job.wrapJobFn(apiDockerCall,
                      image=\(aqubuntu\(aq,
                      working_dir=os.getcwd(),
                      parameters=[\(aqls\(aq, \(aq\-lha\(aq])

if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_docker\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    with Toil(options) as toil:
        toil.start(align)

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fI\%cgl\-docker\-lib\fP contains \fBdockerCall\fP\-compatible Dockerized tools that are
commonly used in bioinformatics analysis.
.sp
The documentation provides guidelines for developing your own Docker containers
that can be used with Toil and \fBdockerCall\fP\&. In order for a container to be
compatible with \fBdockerCall\fP, it must have an \fBENTRYPOINT\fP set to a wrapper
script, as described in cgl\-docker\-lib containerization standards.  This can be
set by passing in the optional keyword argument, \(aqentrypoint\(aq.  Example:
.INDENT 0.0
.INDENT 3.5
entrypoint=[\(dq/bin/bash\(dq,\(dq\-c\(dq]
.UNINDENT
.UNINDENT
.sp
dockerCall supports currently the 75 keyword arguments found in the python
\fI\%Docker API\fP, under the \(aqrun\(aq command.
.SS Services
.sp
It is sometimes desirable to run \fIservices\fP, such as a database or server,
concurrently with a workflow. The \fI\%toil.job.Job.Service\fP class provides
a simple mechanism for spawning such a service within a Toil workflow, allowing
precise specification of the start and end time of the service, and providing
start and end methods to use for initialization and cleanup. The following
simple, conceptual example illustrates how services work:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job


class DemoService(Job.Service):
    def start(self, fileStore):
        # Start up a database/service here
        # Return a value that enables another process to connect to the database
        return \(dqloginCredentials\(dq

    def check(self):
        # A function that if it returns False causes the service to quit
        # If it raises an exception the service is killed and an error is reported
        return True

    def stop(self, fileStore):
        # Cleanup the database here
        pass


j = Job()
s = DemoService()
loginCredentialsPromise = j.addService(s)


def dbFn(loginCredentials):
    # Use the login credentials returned from the service\(aqs start method to connect to the service
    pass


j.addChildFn(dbFn, loginCredentialsPromise)


if __name__ == \(dq__main__\(dq:
    jobstore: str = tempfile.mkdtemp(\(dqtutorial_services\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    with Toil(options) as toil:
        toil.start(j)

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
In this example the DemoService starts a database in the start method,
returning an object from the start method indicating how a client job would
access the database. The service\(aqs stop method cleans up the database, while
the service\(aqs check method is polled periodically to check the service is alive.
.sp
A DemoService instance is added as a service of the root job \fBj\fP, with
resource requirements specified. The return value from
\fI\%toil.job.Job.addService()\fP is a promise to the return value of the
service\(aqs start method. When the promised is fulfilled it will represent how to
connect to the database. The promise is passed to a child job of \fBj\fP, which
uses it to make a database connection. The services of a job are started before
any of its successors have been run and stopped after all the successors of the
job have completed successfully.
.sp
Multiple services can be created per job, all run in parallel. Additionally,
services can define sub\-services using \fBtoil.job.Job.Service.addChild()\fP\&.
This allows complex networks of services to be created, e.g. Apache Spark
clusters, within a workflow.
.SS Checkpoints
.sp
Services complicate resuming a workflow after failure, because they can create
complex dependencies between jobs. For example, consider a service that
provides a database that multiple jobs update. If the database service fails
and loses state, it is not clear that just restarting the service will allow
the workflow to be resumed, because jobs that created that state may have
already finished. To get around this problem Toil supports \fIcheckpoint\fP jobs,
specified as the boolean keyword argument \fBcheckpoint\fP to a job or wrapped
function, e.g.:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
j = Job(checkpoint=True)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
A checkpoint job is rerun if one or more of its successors fails its retry
attempts, until it itself has exhausted its retry attempts. Upon restarting a
checkpoint job all its existing successors are first deleted, and then the job
is rerun to define new successors. By checkpointing a job that defines a
service, upon failure of the service the database and the jobs that access the
service can be redefined and rerun.
.sp
To make the implementation of checkpoint jobs simple, a job can only be a
checkpoint if when first defined it has no successors, i.e. it can only define
successors within its run method.
.SS Encapsulation
.sp
Let \fBA\fP be a root job potentially with children and follow\-ons. Without an
encapsulated job the simplest way to specify a job \fBB\fP which runs after \fBA\fP
and all its successors is to create a parent of \fBA\fP, call it \fBAp\fP, and then
make \fBB\fP a follow\-on of \fBAp\fP\&. e.g.:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job

if __name__ == \(dq__main__\(dq:
    # A is a job with children and follow\-ons, for example:
    A = Job()
    A.addChild(Job())
    A.addFollowOn(Job())

    # B is a job which needs to run after A and its successors
    B = Job()

    # The way to do this without encapsulation is to make a parent of A, Ap, and make B a follow\-on of Ap.
    Ap = Job()
    Ap.addChild(A)
    Ap.addFollowOn(B)

    jobstore: str = tempfile.mkdtemp(\(dqtutorial_encapsulations\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    with Toil(options) as toil:
        print(toil.start(Ap))

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
An \fIencapsulated job\fP \fBE(A)\fP of \fBA\fP saves making \fBAp\fP, instead we can
write:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import os
import tempfile

from toil.common import Toil
from toil.job import Job

if __name__ == \(dq__main__\(dq:
    # A
    A = Job()
    A.addChild(Job())
    A.addFollowOn(Job())

    # Encapsulate A
    A = A.encapsulate()

    # B is a job which needs to run after A and its successors
    B = Job()

    # With encapsulation A and its successor subgraph appear to be a single job, hence:
    A.addChild(B)

    jobstore: str = tempfile.mkdtemp(\(dqtutorial_encapsulations2\(dq)
    os.rmdir(jobstore)
    options = Job.Runner.getDefaultOptions(jobstore)
    options.logLevel = \(dqINFO\(dq
    options.clean = \(dqalways\(dq

    with Toil(options) as toil:
        print(toil.start(A))

.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note the call to \fI\%toil.job.Job.encapsulate()\fP creates the
\fBtoil.job.Job.EncapsulatedJob\fP\&.
.SS Depending on Toil
.sp
If you are packing your workflow(s) as a pip\-installable distribution on PyPI,
you might be tempted to declare Toil as a dependency in your \fBsetup.py\fP, via
the \fBinstall_requires\fP keyword argument to \fBsetup()\fP\&. Unfortunately, this
does not work, for two reasons: For one, Toil uses Setuptools\(aq \fIextra\fP
mechanism to manage its own optional dependencies. If you explicitly declared a
dependency on Toil, you would have to hard\-code a particular combination of
extras (or no extras at all), robbing the user of the choice what Toil extras
to install. Secondly, and more importantly, declaring a dependency on Toil
would only lead to Toil being installed on the leader node of a cluster, but
not the worker nodes. Auto\-deployment does not work here because Toil cannot
auto\-deploy itself, the classic \(dqWhich came first, chicken or egg?\(dq problem.
.sp
In other words, you shouldn\(aqt explicitly depend on Toil. Document the
dependency instead (as in \(dqThis workflow needs Toil version X.Y.Z to be
installed\(dq) and optionally add a version check to your \fBsetup.py\fP\&. Refer to
the \fBcheck_version()\fP function in the \fBtoil\-lib\fP project\(aqs \fI\%setup.py\fP for
an example. Alternatively, you can also just depend on \fBtoil\-lib\fP and you\(aqll
get that check for free.
.sp
If your workflow depends on a dependency of Toil,
consider not making that dependency explicit either. If you do, you risk a
version conflict between your project and Toil. The \fBpip\fP utility may
silently ignore that conflict, breaking either Toil or your workflow. It is
safest to simply assume that Toil installs that dependency for you. The only
downside is that you are locked into the exact version of that dependency that
Toil declares. But such is life with Python, which, unlike Java, has no means
of dependencies belonging to different software components within the same
process, and whose favored software distribution utility is \fI\%incapable\fP of
properly resolving overlapping dependencies and detecting conflicts.
.SS Best Practices for Dockerizing Toil Workflows
.sp
\fI\%Computational Genomics Lab\fP\(aqs \fI\%Dockstore\fP based production system provides workflow authors a
way to run Dockerized versions of their pipeline in an automated, scalable fashion. To be compatible
with this system of a workflow should meet the following requirements. In addition
to the Docker container, a common workflow language \fI\%descriptor file\fP is needed. For inputs:
.INDENT 0.0
.IP \(bu 2
Only command line arguments should be used for configuring the workflow. If
the workflow relies on a configuration file, like \fI\%Toil\-RNAseq\fP or \fI\%ProTECT\fP, a
wrapper script inside the Docker container can be used to parse the CLI and
generate the necessary configuration file.
.IP \(bu 2
All inputs to the pipeline should be explicitly enumerated rather than implicit.
For example, don\(aqt rely on one FASTQ read\(aqs path to discover the location of its
pair. This is necessary since all inputs are mapped to their own isolated directories
when the Docker is called via Dockstore.
.IP \(bu 2
All inputs must be documented in the CWL descriptor file. Examples of this file can be seen in
both \fI\%Toil\-RNAseq\fP and \fI\%ProTECT\fP\&.
.UNINDENT
.sp
For outputs:
.INDENT 0.0
.IP \(bu 2
All outputs should be written to a local path rather than S3.
.IP \(bu 2
Take care to package outputs in a local and user\-friendly way. For example,
don\(aqt tar up all output if there are specific files that will care to see individually.
.IP \(bu 2
All output file names should be deterministic and predictable. For example,
don\(aqt prepend the name of an output file with PASS/FAIL depending on the outcome
of the pipeline.
.IP \(bu 2
All outputs must be documented in the CWL descriptor file. Examples of this file can be seen in
both \fI\%Toil\-RNAseq\fP and \fI\%ProTECT\fP\&.
.UNINDENT
.SH TOIL CLASS API
.sp
The Toil class configures and starts a Toil run.
.INDENT 0.0
.TP
.B class  toil.common.Toil(options:  \fI\%Namespace\fP)
A context manager that represents a Toil workflow.
.sp
Specifically the batch system, job store, and its configuration.
.INDENT 7.0
.TP
.B __init__(options:  \fI\%Namespace\fP) -> \fI\%None\fP
Initialize a Toil object from the given options.
.sp
Note that this is very light\-weight and that the bulk of the work is
done when the context is entered.
.INDENT 7.0
.TP
.B Parameters
\fBoptions\fP \-\- command line options specified by the user
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B start(rootJob:  \fI\%Job\fP) -> \fI\%Any\fP
Invoke a Toil workflow with the given job as the root for an initial run.
.sp
This method must be called in the body of a \fBwith Toil(...) as toil:\fP
statement. This method should not be called more than once for a workflow
that has not finished.
.INDENT 7.0
.TP
.B Parameters
\fBrootJob\fP \-\- The root job of the workflow
.TP
.B Returns
The root job\(aqs return value
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B restart() -> \fI\%Any\fP
Restarts a workflow that has been interrupted.
.INDENT 7.0
.TP
.B Returns
The root job\(aqs return value
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B classmethod  getJobStore(locator:  \fI\%str\fP) -> \fI\%AbstractJobStore\fP
Create an instance of the concrete job store implementation that matches the given locator.
.INDENT 7.0
.TP
.B Parameters
\fBlocator\fP (\fI\%str\fP) \-\- The location of the job store to be represent by the instance
.TP
.B Returns
an instance of a concrete subclass of AbstractJobStore
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B static  createBatchSystem(config:  Config) -> \fI\%AbstractBatchSystem\fP
Create an instance of the batch system specified in the given config.
.INDENT 7.0
.TP
.B Parameters
\fBconfig\fP \-\- the current configuration
.TP
.B Returns
an instance of a concrete subclass of AbstractBatchSystem
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B import_file(src_uri:  \fI\%str\fP, shared_file_name:  \fI\%str\fP, symlink:  \fI\%bool\fP  =  False) -> \fI\%None\fP
.TP
.B import_file(src_uri:  \fI\%str\fP, shared_file_name:  \fI\%None\fP  =  None, symlink:  \fI\%bool\fP  =  False) -> \fI\%FileID\fP
Import the file at the given URL into the job store.
.sp
See \fBtoil.jobStores.abstractJobStore.AbstractJobStore.importFile()\fP for a
full description
.UNINDENT
.INDENT 7.0
.TP
.B export_file(file_id:  \fI\%FileID\fP, dst_uri:  \fI\%str\fP) -> \fI\%None\fP
Export file to destination pointed at by the destination URL.
.sp
See \fBtoil.jobStores.abstractJobStore.AbstractJobStore.exportFile()\fP for a
full description
.UNINDENT
.INDENT 7.0
.TP
.B static  normalize_uri(uri:  \fI\%str\fP, check_existence:  \fI\%bool\fP  =  False) -> \fI\%str\fP
Given a URI, if it has no scheme, prepend \(dqfile:\(dq.
.INDENT 7.0
.TP
.B Parameters
\fBcheck_existence\fP \-\- If set, raise an error if a URI points to
a local file that does not exist.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B static  getToilWorkDir(configWorkDir:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%str\fP
Return a path to a writable directory under which per\-workflow directories exist.
.sp
This directory is always required to exist on a machine, even if the Toil
worker has not run yet.  If your workers and leader have different temp
directories, you may need to set TOIL_WORKDIR.
.INDENT 7.0
.TP
.B Parameters
\fBconfigWorkDir\fP \-\- Value passed to the program using the \-\-workDir flag
.TP
.B Returns
Path to the Toil work directory, constant across all machines
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B classmethod  get_toil_coordination_dir(config_work_dir:  \fI\%Optional\fP[\fI\%str\fP], config_coordination_dir:  \fI\%Optional\fP[\fI\%str\fP]) -> \fI\%str\fP
Return a path to a writable directory, which will be in memory if
convenient. Ought to be used for file locking and coordination.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBconfig_work_dir\fP \-\- Value passed to the program using the
\-\-workDir flag
.IP \(bu 2
\fBconfig_coordination_dir\fP \-\- Value passed to the program using the
\-\-coordinationDir flag
.UNINDENT
.TP
.B Returns
Path to the Toil coordination directory. Ought to be on a
POSIX filesystem that allows directories containing open files to be
deleted.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B classmethod  getLocalWorkflowDir(workflowID:  \fI\%str\fP, configWorkDir:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%str\fP
Return the directory where worker directories and the cache will be located for this workflow on this machine.
.INDENT 7.0
.TP
.B Parameters
\fBconfigWorkDir\fP \-\- Value passed to the program using the \-\-workDir flag
.TP
.B Returns
Path to the local workflow directory on this machine
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B classmethod  get_local_workflow_coordination_dir(workflow_id:  \fI\%str\fP, config_work_dir:  \fI\%Optional\fP[\fI\%str\fP], config_coordination_dir:  \fI\%Optional\fP[\fI\%str\fP]) -> \fI\%str\fP
Return the directory where coordination files should be located for
this workflow on this machine. These include internal Toil databases
and lock files for the machine.
.sp
If an in\-memory filesystem is available, it is used. Otherwise, the
local workflow directory, which may be on a shared network filesystem,
is used.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBworkflow_id\fP \-\- Unique ID of the current workflow.
.IP \(bu 2
\fBconfig_work_dir\fP \-\- Value used for the work directory in the
current Toil Config.
.IP \(bu 2
\fBconfig_coordination_dir\fP \-\- Value used for the coordination
directory in the current Toil Config.
.UNINDENT
.TP
.B Returns
Path to the local workflow coordination directory on this
machine.
.UNINDENT
.UNINDENT
.UNINDENT
.SH JOB STORE API
.sp
The job store interface is an abstraction layer that that hides the specific details of file storage,
for example standard file systems, S3, etc. The \fI\%AbstractJobStore\fP
API is implemented to support a give file store, e.g. S3. Implement this API to support a new file store.
.INDENT 0.0
.TP
.B class  toil.jobStores.abstractJobStore.AbstractJobStore(locator:  \fI\%str\fP)
Represents the physical storage for the jobs and files in a Toil workflow.
.sp
JobStores are responsible for storing \fI\%toil.job.JobDescription\fP
(which relate jobs to each other) and files.
.sp
Actual \fI\%toil.job.Job\fP objects are stored in files, referenced by
JobDescriptions. All the non\-file CRUD methods the JobStore provides deal
in JobDescriptions and not full, executable Jobs.
.sp
To actually get ahold of a \fI\%toil.job.Job\fP, use
\fI\%toil.job.Job.loadJob()\fP with a JobStore and the relevant JobDescription.
.INDENT 7.0
.TP
.B __init__(locator:  \fI\%str\fP) -> \fI\%None\fP
Create an instance of the job store.
.sp
The instance will not be fully functional until either \fI\%initialize()\fP
or \fI\%resume()\fP is invoked. Note that the \fI\%destroy()\fP method may
be invoked on the object with or without prior invocation of either of
these two methods.
.sp
Takes and stores the locator string for the job store, which will be
accessible via self.locator.
.UNINDENT
.INDENT 7.0
.TP
.B initialize(config:  Config) -> \fI\%None\fP
Initialize this job store.
.sp
Create the physical storage for this job store, allocate a workflow ID
and persist the given Toil configuration to the store.
.INDENT 7.0
.TP
.B Parameters
\fBconfig\fP \-\- the Toil configuration to initialize this job store with.
The given configuration will be updated with the newly
allocated workflow ID.
.TP
.B Raises
\fI\%JobStoreExistsException\fP \-\- if the physical storage for this job store
already exists
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B write_config() -> \fI\%None\fP
Persists the value of the \fI\%AbstractJobStore.config\fP attribute to the
job store, so that it can be retrieved later by other instances of this class.
.UNINDENT
.INDENT 7.0
.TP
.B resume() -> \fI\%None\fP
Connect this instance to the physical storage it represents and load the Toil configuration
into the \fI\%AbstractJobStore.config\fP attribute.
.INDENT 7.0
.TP
.B Raises
\fI\%NoSuchJobStoreException\fP \-\- if the physical storage for this job store doesn\(aqt exist
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property  config:  Config
Return the Toil configuration associated with this job store.
.UNINDENT
.INDENT 7.0
.TP
.B property  locator:  \fI\%str\fP
Get the locator that defines the job store, which can be used to
connect to it.
.UNINDENT
.INDENT 7.0
.TP
.B setRootJob(rootJobStoreID:  \fI\%FileID\fP) -> \fI\%None\fP
Set the root job of the workflow backed by this job store.
.UNINDENT
.INDENT 7.0
.TP
.B set_root_job(job_id:  \fI\%FileID\fP) -> \fI\%None\fP
Set the root job of the workflow backed by this job store.
.INDENT 7.0
.TP
.B Parameters
\fBjob_id\fP \-\- The ID of the job to set as root
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B load_root_job() -> \fI\%JobDescription\fP
Loads the JobDescription for the root job in the current job store.
.INDENT 7.0
.TP
.B Raises
\fI\%toil.job.JobException\fP \-\- If no root job is set or if the root job doesn\(aqt exist in
this job store
.TP
.B Returns
The root job.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B create_root_job(job_description:  \fI\%JobDescription\fP) -> \fI\%JobDescription\fP
Create the given JobDescription and set it as the root job in this job store.
.INDENT 7.0
.TP
.B Parameters
\fBjob_description\fP \-\- JobDescription to save and make the root job.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B get_root_job_return_value() -> \fI\%Any\fP
Parse the return value from the root job.
.sp
Raises an exception if the root job hasn\(aqt fulfilled its promise yet.
.UNINDENT
.INDENT 7.0
.TP
.B import_file(src_uri:  \fI\%str\fP, shared_file_name:  \fI\%str\fP, hardlink:  \fI\%bool\fP  =  False, symlink:  \fI\%bool\fP  =  False) -> \fI\%None\fP
.TP
.B import_file(src_uri:  \fI\%str\fP, shared_file_name:  \fI\%None\fP  =  None, hardlink:  \fI\%bool\fP  =  False, symlink:  \fI\%bool\fP  =  False) -> \fI\%FileID\fP
Imports the file at the given URL into job store. The ID of the newly imported file is
returned. If the name of a shared file name is provided, the file will be imported as
such and None is returned. If an executable file on the local filesystem is uploaded, its
executability will be preserved when it is downloaded.
.sp
Currently supported schemes are:
.INDENT 7.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
.INDENT 2.0
.TP
.B \(aqs3\(aq for objects in Amazon S3
e.g. s3://bucket/key
.UNINDENT
.IP \(bu 2
.INDENT 2.0
.TP
.B \(aqfile\(aq for local files
e.g. \fI\%file:///local/file/path\fP
.UNINDENT
.IP \(bu 2
.INDENT 2.0
.TP
.B \(aqhttp\(aq
e.g. \fI\%http://someurl.com/path\fP
.UNINDENT
.IP \(bu 2
.INDENT 2.0
.TP
.B \(aqgs\(aq
e.g. gs://bucket/file
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBsrc_uri\fP (\fI\%str\fP) \-\- URL that points to a file or object in the storage mechanism of a
supported URL scheme e.g. a blob in an AWS s3 bucket.
.IP \(bu 2
\fBshared_file_name\fP (\fI\%str\fP) \-\- Optional name to assign to the imported file within the job store
.UNINDENT
.TP
.B Returns
The jobStoreFileID of the imported file or None if shared_file_name was given
.TP
.B Return type
\fI\%toil.fileStores.FileID\fP or None
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B export_file(file_id:  \fI\%FileID\fP, dst_uri:  \fI\%str\fP) -> \fI\%None\fP
Exports file to destination pointed at by the destination URL. The exported file will be
executable if and only if it was originally uploaded from an executable file on the
local filesystem.
.sp
Refer to \fI\%AbstractJobStore.import_file()\fP documentation for currently supported URL schemes.
.sp
Note that the helper method _exportFile is used to read from the source and write to
destination. To implement any optimizations that circumvent this, the _exportFile method
should be overridden by subclasses of AbstractJobStore.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBfile_id\fP (\fI\%str\fP) \-\- The id of the file in the job store that should be exported.
.IP \(bu 2
\fBdst_uri\fP (\fI\%str\fP) \-\- URL that points to a file or object in the storage mechanism of a
supported URL scheme e.g. a blob in an AWS s3 bucket.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B classmethod  list_url(src_uri:  \fI\%str\fP) -> \fI\%List\fP[\fI\%str\fP]
List the directory at the given URL. Returned path components can be
joined with \(aq/\(aq onto the passed URL to form new URLs. Those that end in
\(aq/\(aq correspond to directories. The provided URL may or may not end with
\(aq/\(aq.
.sp
Currently supported schemes are:
.INDENT 7.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
.INDENT 2.0
.TP
.B \(aqs3\(aq for objects in Amazon S3
e.g. s3://bucket/prefix/
.UNINDENT
.IP \(bu 2
.INDENT 2.0
.TP
.B \(aqfile\(aq for local files
e.g. \fI\%file:///local/dir/path/\fP
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B Parameters
\fBsrc_uri\fP (\fI\%str\fP) \-\- URL that points to a directory or prefix in the storage mechanism of a
supported URL scheme e.g. a prefix in an AWS s3 bucket.
.TP
.B Returns
A list of URL components in the given directory, already URL\-encoded.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B classmethod  get_is_directory(src_uri:  \fI\%str\fP) -> \fI\%bool\fP
Return True if the thing at the given URL is a directory, and False if
it is a file. The URL may or may not end in \(aq/\(aq.
.UNINDENT
.INDENT 7.0
.TP
.B classmethod  read_from_url(src_uri:  \fI\%str\fP, writable:  \fI\%IO\fP[\fI\%bytes\fP]) -> \fI\%Tuple\fP[\fI\%int\fP,  \fI\%bool\fP]
Read the given URL and write its content into the given writable stream.
.INDENT 7.0
.TP
.B Returns
The size of the file in bytes and whether the executable permission bit is set
.TP
.B Return type
Tuple[\fI\%int\fP, \fI\%bool\fP]
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  classmethod  get_size(src_uri:  \fI\%ParseResult\fP) -> \fI\%None\fP
Get the size in bytes of the file at the given URL, or None if it cannot be obtained.
.INDENT 7.0
.TP
.B Parameters
\fBsrc_uri\fP \-\- URL that points to a file or object in the storage
mechanism of a supported URL scheme e.g. a blob in an AWS s3 bucket.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  destroy() -> \fI\%None\fP
The inverse of \fI\%initialize()\fP, this method deletes the physical storage represented
by this instance. While not being atomic, this method \fIis\fP at least idempotent,
as a means to counteract potential issues with eventual consistency exhibited by the
underlying storage mechanisms. This means that if the method fails (raises an exception),
it may (and should be) invoked again. If the underlying storage mechanism is eventually
consistent, even a successful invocation is not an ironclad guarantee that the physical
storage vanished completely and immediately. A successful invocation only guarantees that
the deletion will eventually happen. It is therefore recommended to not immediately reuse
the same job store location for a new Toil workflow.
.UNINDENT
.INDENT 7.0
.TP
.B get_env() -> \fI\%Dict\fP[\fI\%str\fP,  \fI\%str\fP]
Returns a dictionary of environment variables that this job store requires to be set in
order to function properly on a worker.
.INDENT 7.0
.TP
.B Return type
\fI\%dict\fP[\fI\%str\fP,\fI\%str\fP]
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B clean(jobCache:  \fI\%Optional\fP[\fI\%Dict\fP[\fI\%Union\fP[\fI\%str\fP,  TemporaryID],  \fI\%JobDescription\fP]]  =  None) -> \fI\%JobDescription\fP
Function to cleanup the state of a job store after a restart.
.sp
Fixes jobs that might have been partially updated. Resets the try counts
and removes jobs that are not successors of the current root job.
.INDENT 7.0
.TP
.B Parameters
\fBjobCache\fP \-\- if a value it must be a dict
from job ID keys to JobDescription object values. Jobs will be loaded
from the cache (which can be downloaded from the job store in a batch)
instead of piecemeal when recursed into.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  assign_job_id(job_description:  \fI\%JobDescription\fP) -> \fI\%None\fP
Get a new jobStoreID to be used by the described job, and assigns it to the JobDescription.
.sp
Files associated with the assigned ID will be accepted even if the JobDescription has never been created or updated.
.INDENT 7.0
.TP
.B Parameters
\fBjob_description\fP (\fI\%toil.job.JobDescription\fP) \-\- The JobDescription to give an ID to
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B batch() -> \fI\%Iterator\fP[\fI\%None\fP]
If supported by the batch system, calls to create() with this context
manager active will be performed in a batch after the context manager
is released.
.UNINDENT
.INDENT 7.0
.TP
.B abstract  create_job(job_description:  \fI\%JobDescription\fP) -> \fI\%JobDescription\fP
Writes the given JobDescription to the job store. The job must have an ID assigned already.
.sp
Must call jobDescription.pre_update_hook()
.INDENT 7.0
.TP
.B Returns
The JobDescription passed.
.TP
.B Return type
\fI\%toil.job.JobDescription\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  job_exists(job_id:  \fI\%str\fP) -> \fI\%bool\fP
Indicates whether a description of the job with the specified jobStoreID exists in the job store
.INDENT 7.0
.TP
.B Return type
\fI\%bool\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  get_public_url(file_name:  \fI\%str\fP) -> \fI\%str\fP
Returns a publicly accessible URL to the given file in the job store. The returned URL may
expire as early as 1h after its been returned. Throw an exception if the file does not
exist.
.INDENT 7.0
.TP
.B Parameters
\fBfile_name\fP (\fI\%str\fP) \-\- the jobStoreFileID of the file to generate a URL for
.TP
.B Raises
\fI\%NoSuchFileException\fP \-\- if the specified file does not exist in this job store
.TP
.B Return type
\fI\%str\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  get_shared_public_url(shared_file_name:  \fI\%str\fP) -> \fI\%str\fP
Differs from \fBgetPublicUrl()\fP in that this method is for generating URLs for shared
files written by \fBwriteSharedFileStream()\fP\&.
.sp
Returns a publicly accessible URL to the given file in the job store. The returned URL
starts with \(aqhttp:\(aq,  \(aqhttps:\(aq or \(aqfile:\(aq. The returned URL may expire as early as 1h
after its been returned. Throw an exception if the file does not exist.
.INDENT 7.0
.TP
.B Parameters
\fBshared_file_name\fP (\fI\%str\fP) \-\- The name of the shared file to generate a publically accessible url for.
.TP
.B Raises
\fI\%NoSuchFileException\fP \-\- raised if the specified file does not exist in the store
.TP
.B Return type
\fI\%str\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  load_job(job_id:  \fI\%str\fP) -> \fI\%JobDescription\fP
Loads the description of the job referenced by the given ID, assigns it
the job store\(aqs config, and returns it.
.sp
May declare the job to have failed (see
\fI\%toil.job.JobDescription.setupJobAfterFailure()\fP) if there is
evidence of a failed update attempt.
.INDENT 7.0
.TP
.B Parameters
\fBjob_id\fP \-\- the ID of the job to load
.TP
.B Raises
\fI\%NoSuchJobException\fP \-\- if there is no job with the given ID
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  update_job(job_description:  \fI\%JobDescription\fP) -> \fI\%None\fP
Persists changes to the state of the given JobDescription in this store atomically.
.sp
Must call jobDescription.pre_update_hook()
.INDENT 7.0
.TP
.B Parameters
\fBjob\fP (\fI\%toil.job.JobDescription\fP) \-\- the job to write to this job store
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  delete_job(job_id:  \fI\%str\fP) -> \fI\%None\fP
Removes the JobDescription from the store atomically. You may not then
subsequently call load(), write(), update(), etc. with the same
jobStoreID or any JobDescription bearing it.
.sp
This operation is idempotent, i.e. deleting a job twice or deleting a non\-existent job
will succeed silently.
.INDENT 7.0
.TP
.B Parameters
\fBjob_id\fP (\fI\%str\fP) \-\- the ID of the job to delete from this job store
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B jobs() -> \fI\%Iterator\fP[\fI\%JobDescription\fP]
Best effort attempt to return iterator on JobDescriptions for all jobs
in the store. The iterator may not return all jobs and may also contain
orphaned jobs that have already finished successfully and should not be
rerun. To guarantee you get any and all jobs that can be run instead
construct a more expensive ToilState object
.INDENT 7.0
.TP
.B Returns
Returns iterator on jobs in the store. The iterator may or may not contain all jobs and may contain
invalid jobs
.TP
.B Return type
Iterator[toil.job.jobDescription]
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  write_file(local_path:  \fI\%str\fP, job_id:  \fI\%Optional\fP[\fI\%str\fP]  =  None, cleanup:  \fI\%bool\fP  =  False) -> \fI\%str\fP
Takes a file (as a path) and places it in this job store. Returns an ID that can be used
to retrieve the file at a later time.  The file is written in a atomic manner.  It will
not appear in the jobStore until the write has successfully completed.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBlocal_path\fP (\fI\%str\fP) \-\- the path to the local file that will be uploaded to the job store.
The last path component (basename of the file) will remain
associated with the file in the file store, if supported, so
that the file can be searched for by name or name glob.
.IP \(bu 2
\fBjob_id\fP (\fI\%str\fP) \-\- the id of a job, or None. If specified, the may be associated
with that job in a job\-store\-specific way. This may influence the returned ID.
.IP \(bu 2
\fBcleanup\fP (\fI\%bool\fP) \-\- Whether to attempt to delete the file when the job
whose jobStoreID was given as jobStoreID is deleted with
jobStore.delete(job). If jobStoreID was not given, does nothing.
.UNINDENT
.TP
.B Raises
.INDENT 7.0
.IP \(bu 2
\fI\%ConcurrentFileModificationException\fP \-\- if the file was modified concurrently during
an invocation of this method
.IP \(bu 2
\fI\%NoSuchJobException\fP \-\- if the job specified via jobStoreID does not exist
.UNINDENT
.UNINDENT
.sp
FIXME: some implementations may not raise this
.INDENT 7.0
.TP
.B Returns
an ID referencing the newly created file and can be used to read the
file in the future.
.TP
.B Return type
\fI\%str\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  write_file_stream(job_id:  \fI\%Optional\fP[\fI\%str\fP]  =  None, cleanup:  \fI\%bool\fP  =  False, basename:  \fI\%Optional\fP[\fI\%str\fP]  =  None, encoding:  \fI\%Optional\fP[\fI\%str\fP]  =  None, errors:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%Iterator\fP[\fI\%Tuple\fP[\fI\%IO\fP[\fI\%bytes\fP],  \fI\%str\fP]]
Similar to writeFile, but returns a context manager yielding a tuple of
1) a file handle which can be written to and 2) the ID of the resulting
file in the job store. The yielded file handle does not need to and
should not be closed explicitly.  The file is written in a atomic manner.
It will not appear in the jobStore until the write has successfully
completed.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBjob_id\fP (\fI\%str\fP) \-\- the id of a job, or None. If specified, the may be associated
with that job in a job\-store\-specific way. This may influence the returned ID.
.IP \(bu 2
\fBcleanup\fP (\fI\%bool\fP) \-\- Whether to attempt to delete the file when the job
whose jobStoreID was given as jobStoreID is deleted with
jobStore.delete(job). If jobStoreID was not given, does nothing.
.IP \(bu 2
\fBbasename\fP (\fI\%str\fP) \-\- If supported by the implementation, use the given
file basename so that when searching the job store with a query
matching that basename, the file will be detected.
.IP \(bu 2
\fBencoding\fP (\fI\%str\fP) \-\- the name of the encoding used to encode the file. Encodings are the same
as for encode(). Defaults to None which represents binary mode.
.IP \(bu 2
\fBerrors\fP (\fI\%str\fP) \-\- an optional string that specifies how encoding errors are to be handled. Errors
are the same as for open(). Defaults to \(aqstrict\(aq when an encoding is specified.
.UNINDENT
.TP
.B Raises
.INDENT 7.0
.IP \(bu 2
\fI\%ConcurrentFileModificationException\fP \-\- if the file was modified concurrently during
an invocation of this method
.IP \(bu 2
\fI\%NoSuchJobException\fP \-\- if the job specified via jobStoreID does not exist
.UNINDENT
.UNINDENT
.sp
FIXME: some implementations may not raise this
.INDENT 7.0
.TP
.B Returns
a context manager yielding a file handle which can be written to and an ID that references
the newly created file and can be used to read the file in the future.
.TP
.B Return type
Iterator[Tuple[IO[\fI\%bytes\fP], \fI\%str\fP]]
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  get_empty_file_store_id(job_id:  \fI\%Optional\fP[\fI\%str\fP]  =  None, cleanup:  \fI\%bool\fP  =  False, basename:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%str\fP
Creates an empty file in the job store and returns its ID.
Call to fileExists(getEmptyFileStoreID(jobStoreID)) will return True.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBjob_id\fP (\fI\%str\fP) \-\- the id of a job, or None. If specified, the may be associated
with that job in a job\-store\-specific way. This may influence the returned ID.
.IP \(bu 2
\fBcleanup\fP (\fI\%bool\fP) \-\- Whether to attempt to delete the file when the job
whose jobStoreID was given as jobStoreID is deleted with
jobStore.delete(job). If jobStoreID was not given, does nothing.
.IP \(bu 2
\fBbasename\fP (\fI\%str\fP) \-\- If supported by the implementation, use the given
file basename so that when searching the job store with a query
matching that basename, the file will be detected.
.UNINDENT
.TP
.B Returns
a jobStoreFileID that references the newly created file and can be used to reference the
file in the future.
.TP
.B Return type
\fI\%str\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  read_file(file_id:  \fI\%str\fP, local_path:  \fI\%str\fP, symlink:  \fI\%bool\fP  =  False) -> \fI\%None\fP
Copies or hard links the file referenced by jobStoreFileID to the given
local file path. The version will be consistent with the last copy of
the file written/updated. If the file in the job store is later
modified via updateFile or updateFileStream, it is
implementation\-defined whether those writes will be visible at
localFilePath.  The file is copied in an atomic manner.  It will not
appear in the local file system until the copy has completed.
.sp
The file at the given local path may not be modified after this method returns!
.sp
Note!  Implementations of readFile need to respect/provide the executable attribute on FileIDs.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBfile_id\fP (\fI\%str\fP) \-\- ID of the file to be copied
.IP \(bu 2
\fBlocal_path\fP (\fI\%str\fP) \-\- the local path indicating where to place the contents of the
given file in the job store
.IP \(bu 2
\fBsymlink\fP (\fI\%bool\fP) \-\- whether the reader can tolerate a symlink. If set to true, the job
store may create a symlink instead of a full copy of the file or a hard link.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  read_file_stream(file_id:  \fI\%Union\fP[\fI\%FileID\fP,  \fI\%str\fP], encoding:  \fI\%Literal\fP[None]  =  None, errors:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%ContextManager\fP[\fI\%IO\fP[\fI\%bytes\fP]]
.TP
.B abstract  read_file_stream(file_id:  \fI\%Union\fP[\fI\%FileID\fP,  \fI\%str\fP], encoding:  \fI\%str\fP, errors:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%ContextManager\fP[\fI\%IO\fP[\fI\%str\fP]]
Similar to readFile, but returns a context manager yielding a file handle which can be
read from. The yielded file handle does not need to and should not be closed explicitly.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBfile_id\fP (\fI\%str\fP) \-\- ID of the file to get a readable file handle for
.IP \(bu 2
\fBencoding\fP (\fI\%str\fP) \-\- the name of the encoding used to decode the file. Encodings are the same as
for decode(). Defaults to None which represents binary mode.
.IP \(bu 2
\fBerrors\fP (\fI\%str\fP) \-\- an optional string that specifies how encoding errors are to be handled. Errors
are the same as for open(). Defaults to \(aqstrict\(aq when an encoding is specified.
.UNINDENT
.TP
.B Returns
a context manager yielding a file handle which can be read from
.TP
.B Return type
Iterator[Union[IO[\fI\%bytes\fP], IO[\fI\%str\fP]]]
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  delete_file(file_id:  \fI\%str\fP) -> \fI\%None\fP
Deletes the file with the given ID from this job store. This operation is idempotent, i.e.
deleting a file twice or deleting a non\-existent file will succeed silently.
.INDENT 7.0
.TP
.B Parameters
\fBfile_id\fP (\fI\%str\fP) \-\- ID of the file to delete
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B fileExists(jobStoreFileID:  \fI\%str\fP) -> \fI\%bool\fP
Determine whether a file exists in this job store.
.UNINDENT
.INDENT 7.0
.TP
.B abstract  file_exists(file_id:  \fI\%str\fP) -> \fI\%bool\fP
Determine whether a file exists in this job store.
.INDENT 7.0
.TP
.B Parameters
\fBfile_id\fP \-\- an ID referencing the file to be checked
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B getFileSize(jobStoreFileID:  \fI\%str\fP) -> \fI\%int\fP
Get the size of the given file in bytes.
.UNINDENT
.INDENT 7.0
.TP
.B abstract  get_file_size(file_id:  \fI\%str\fP) -> \fI\%int\fP
Get the size of the given file in bytes, or 0 if it does not exist when queried.
.sp
Note that job stores which encrypt files might return overestimates of
file sizes, since the encrypted file may have been padded to the
nearest block, augmented with an initialization vector, etc.
.INDENT 7.0
.TP
.B Parameters
\fBfile_id\fP (\fI\%str\fP) \-\- an ID referencing the file to be checked
.TP
.B Return type
\fI\%int\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B updateFile(jobStoreFileID:  \fI\%str\fP, localFilePath:  \fI\%str\fP) -> \fI\%None\fP
Replaces the existing version of a file in the job store.
.UNINDENT
.INDENT 7.0
.TP
.B abstract  update_file(file_id:  \fI\%str\fP, local_path:  \fI\%str\fP) -> \fI\%None\fP
Replaces the existing version of a file in the job store.
.sp
Throws an exception if the file does not exist.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBfile_id\fP \-\- the ID of the file in the job store to be updated
.IP \(bu 2
\fBlocal_path\fP \-\- the local path to a file that will overwrite the current
version in the job store
.UNINDENT
.TP
.B Raises
.INDENT 7.0
.IP \(bu 2
\fI\%ConcurrentFileModificationException\fP \-\- if the file was modified
concurrently during an invocation of this method
.IP \(bu 2
\fI\%NoSuchFileException\fP \-\- if the specified file does not exist
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  update_file_stream(file_id:  \fI\%str\fP, encoding:  \fI\%Optional\fP[\fI\%str\fP]  =  None, errors:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%Iterator\fP[\fI\%IO\fP[\fI\%Any\fP]]
Replaces the existing version of a file in the job store. Similar to writeFile, but
returns a context manager yielding a file handle which can be written to. The
yielded file handle does not need to and should not be closed explicitly.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBfile_id\fP (\fI\%str\fP) \-\- the ID of the file in the job store to be updated
.IP \(bu 2
\fBencoding\fP (\fI\%str\fP) \-\- the name of the encoding used to encode the file. Encodings are the same
as for encode(). Defaults to None which represents binary mode.
.IP \(bu 2
\fBerrors\fP (\fI\%str\fP) \-\- an optional string that specifies how encoding errors are to be handled. Errors
are the same as for open(). Defaults to \(aqstrict\(aq when an encoding is specified.
.UNINDENT
.TP
.B Raises
.INDENT 7.0
.IP \(bu 2
\fI\%ConcurrentFileModificationException\fP \-\- if the file was modified concurrently during
an invocation of this method
.IP \(bu 2
\fI\%NoSuchFileException\fP \-\- if the specified file does not exist
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  write_shared_file_stream(shared_file_name:  \fI\%str\fP, encrypted:  \fI\%Optional\fP[\fI\%bool\fP]  =  None, encoding:  \fI\%Optional\fP[\fI\%str\fP]  =  None, errors:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%Iterator\fP[\fI\%IO\fP[\fI\%bytes\fP]]
Returns a context manager yielding a writable file handle to the global file referenced
by the given name.  File will be created in an atomic manner.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBshared_file_name\fP (\fI\%str\fP) \-\- A file name matching AbstractJobStore.fileNameRegex, unique within
this job store
.IP \(bu 2
\fBencrypted\fP (\fI\%bool\fP) \-\- True if the file must be encrypted, None if it may be encrypted or
False if it must be stored in the clear.
.IP \(bu 2
\fBencoding\fP (\fI\%str\fP) \-\- the name of the encoding used to encode the file. Encodings are the same
as for encode(). Defaults to None which represents binary mode.
.IP \(bu 2
\fBerrors\fP (\fI\%str\fP) \-\- an optional string that specifies how encoding errors are to be handled. Errors
are the same as for open(). Defaults to \(aqstrict\(aq when an encoding is specified.
.UNINDENT
.TP
.B Raises
\fI\%ConcurrentFileModificationException\fP \-\- if the file was modified concurrently during
an invocation of this method
.TP
.B Returns
a context manager yielding a writable file handle
.TP
.B Return type
Iterator[IO[\fI\%bytes\fP]]
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  read_shared_file_stream(shared_file_name:  \fI\%str\fP, encoding:  \fI\%Optional\fP[\fI\%str\fP]  =  None, errors:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%Iterator\fP[\fI\%IO\fP[\fI\%bytes\fP]]
Returns a context manager yielding a readable file handle to the global file referenced
by the given name.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBshared_file_name\fP (\fI\%str\fP) \-\- A file name matching AbstractJobStore.fileNameRegex, unique within
this job store
.IP \(bu 2
\fBencoding\fP (\fI\%str\fP) \-\- the name of the encoding used to decode the file. Encodings are the same
as for decode(). Defaults to None which represents binary mode.
.IP \(bu 2
\fBerrors\fP (\fI\%str\fP) \-\- an optional string that specifies how encoding errors are to be handled. Errors
are the same as for open(). Defaults to \(aqstrict\(aq when an encoding is specified.
.UNINDENT
.TP
.B Returns
a context manager yielding a readable file handle
.TP
.B Return type
Iterator[IO[\fI\%bytes\fP]]
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  write_logs(msg:  \fI\%str\fP) -> \fI\%None\fP
Stores a message as a log in the jobstore.
.INDENT 7.0
.TP
.B Parameters
\fBmsg\fP (\fI\%str\fP) \-\- the string to be written
.TP
.B Raises
\fI\%ConcurrentFileModificationException\fP \-\- if the file was modified concurrently during
an invocation of this method
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  read_logs(callback:  \fI\%Callable\fP[[\&...],  \fI\%Any\fP], read_all:  \fI\%bool\fP  =  False) -> \fI\%int\fP
Reads logs accumulated by the write_logs() method. For each log this method calls the
given callback function with the message as an argument (rather than returning logs directly,
this method must be supplied with a callback which will process log messages).
.sp
Only unread logs will be read unless the read_all parameter is set.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBcallback\fP (\fICallable\fP) \-\- a function to be applied to each of the stats file handles found
.IP \(bu 2
\fBread_all\fP (\fI\%bool\fP) \-\- a boolean indicating whether to read the already processed stats files
in addition to the unread stats files
.UNINDENT
.TP
.B Raises
\fI\%ConcurrentFileModificationException\fP \-\- if the file was modified concurrently during
an invocation of this method
.TP
.B Returns
the number of stats files processed
.TP
.B Return type
\fI\%int\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B write_leader_pid() -> \fI\%None\fP
Write the pid of this process to a file in the job store.
.sp
Overwriting the current contents of pid.log is a feature, not a bug of
this method. Other methods will rely on always having the most current
pid available. So far there is no reason to store any old pids.
.UNINDENT
.INDENT 7.0
.TP
.B read_leader_pid() -> \fI\%int\fP
Read the pid of the leader process to a file in the job store.
.INDENT 7.0
.TP
.B Raises
\fI\%NoSuchFileException\fP \-\- If the PID file doesn\(aqt exist.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B write_leader_node_id() -> \fI\%None\fP
Write the leader node id to the job store. This should only be called
by the leader.
.UNINDENT
.INDENT 7.0
.TP
.B read_leader_node_id() -> \fI\%str\fP
Read the leader node id stored in the job store.
.INDENT 7.0
.TP
.B Raises
\fI\%NoSuchFileException\fP \-\- If the node ID file doesn\(aqt exist.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B write_kill_flag(kill:  \fI\%bool\fP  =  False) -> \fI\%None\fP
Write a file inside the job store that serves as a kill flag.
.sp
The initialized file contains the characters \(dqNO\(dq. This should only be
changed when the user runs the \(dqtoil kill\(dq command.
.sp
Changing this file to a \(dqYES\(dq triggers a kill of the leader process. The
workers are expected to be cleaned up by the leader.
.UNINDENT
.INDENT 7.0
.TP
.B read_kill_flag() -> \fI\%bool\fP
Read the kill flag from the job store, and return True if the leader
has been killed. False otherwise.
.UNINDENT
.INDENT 7.0
.TP
.B default_caching() -> \fI\%bool\fP
Jobstore\(aqs preference as to whether it likes caching or doesn\(aqt care about it.
Some jobstores benefit from caching, however on some local configurations it can be flaky.
.sp
see \fI\%https://github.com/DataBiosphere/toil/issues/4218\fP
.UNINDENT
.UNINDENT
.SH TOIL JOB API
.sp
Functions to wrap jobs and return values (promises).
.SS FunctionWrappingJob
.sp
The subclass of Job for wrapping user functions.
.INDENT 0.0
.TP
.B class  toil.job.FunctionWrappingJob(userFunction, *args, **kwargs)
Job used to wrap a function. In its \fIrun\fP method the wrapped function is called.
.INDENT 7.0
.TP
.B __init__(userFunction, *args, **kwargs)
.INDENT 7.0
.TP
.B Parameters
\fBuserFunction\fP (\fIcallable\fP) \-\- The function to wrap. It will be called with \fB*args\fP and
\fB**kwargs\fP as arguments.
.UNINDENT
.sp
The keywords \fBmemory\fP, \fBcores\fP, \fBdisk\fP, \fBaccelerators\(ga,
\(ga\(gapreemptible\fP and \fBcheckpoint\fP are reserved keyword arguments that
if specified will be used to determine the resources required for the
job, as \fI\%toil.job.Job.__init__()\fP\&. If they are keyword arguments to
the function they will be extracted from the function definition, but
may be overridden by the user (as you would expect).
.UNINDENT
.INDENT 7.0
.TP
.B run(fileStore)
Override this function to perform work and dynamically create successor jobs.
.INDENT 7.0
.TP
.B Parameters
\fBfileStore\fP \-\- Used to create local and globally sharable temporary
files and to send log messages to the leader process.
.TP
.B Returns
The return value of the function can be passed to other jobs by means of
\fI\%toil.job.Job.rv()\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.SS JobFunctionWrappingJob
.sp
The subclass of FunctionWrappingJob for wrapping user job functions.
.INDENT 0.0
.TP
.B class  toil.job.JobFunctionWrappingJob(userFunction, *args, **kwargs)
A job function is a function whose first argument is a \fI\%Job\fP
instance that is the wrapping job for the function. This can be used to
add successor jobs for the function and perform all the functions the
\fI\%Job\fP class provides.
.sp
To enable the job function to get access to the
\fI\%toil.fileStores.abstractFileStore.AbstractFileStore\fP instance (see
\fI\%toil.job.Job.run()\fP), it is made a variable of the wrapping job called
fileStore.
.sp
To specify a job\(aqs resource requirements the following default keyword arguments
can be specified:
.INDENT 7.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
memory
.IP \(bu 2
disk
.IP \(bu 2
cores
.IP \(bu 2
accelerators
.IP \(bu 2
preemptible
.UNINDENT
.UNINDENT
.UNINDENT
.sp
For example to wrap a function into a job we would call:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
Job.wrapJobFn(myJob, memory=\(aq100k\(aq, disk=\(aq1M\(aq, cores=0.1)
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B run(fileStore)
Override this function to perform work and dynamically create successor jobs.
.INDENT 7.0
.TP
.B Parameters
\fBfileStore\fP \-\- Used to create local and globally sharable temporary
files and to send log messages to the leader process.
.TP
.B Returns
The return value of the function can be passed to other jobs by means of
\fI\%toil.job.Job.rv()\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.SS EncapsulatedJob
.sp
The subclass of Job for \fIencapsulating\fP a job, allowing a subgraph of jobs to be treated as a single job.
.INDENT 0.0
.TP
.B class  toil.job.EncapsulatedJob(job, unitName=None)
A convenience Job class used to make a job subgraph appear to be a single job.
.sp
Let A be the root job of a job subgraph and B be another job we\(aqd like to run after A
and all its successors have completed, for this use encapsulate:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
#  Job A and subgraph, Job B
A, B = A(), B()
Aprime = A.encapsulate()
Aprime.addChild(B)
#  B will run after A and all its successors have completed, A and its subgraph of
# successors in effect appear to be just one job.
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
If the job being encapsulated has predecessors (e.g. is not the root job), then the encapsulated
job will inherit these predecessors. If predecessors are added to the job being encapsulated
after the encapsulated job is created then the encapsulating job will NOT inherit these
predecessors automatically. Care should be exercised to ensure the encapsulated job has the
proper set of predecessors.
.sp
The return value of an encapsulated job (as accessed by the \fI\%toil.job.Job.rv()\fP function)
is the return value of the root job, e.g. A().encapsulate().rv() and A().rv() will resolve to
the same value after A or A.encapsulate() has been run.
.INDENT 7.0
.TP
.B __init__(job, unitName=None)
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBjob\fP (\fI\%toil.job.Job\fP) \-\- the job to encapsulate.
.IP \(bu 2
\fBunitName\fP (\fI\%str\fP) \-\- human\-readable name to identify this job instance.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B addChild(childJob)
Add a childJob to be run as child of this job.
.sp
Child jobs will be run directly after this job\(aqs
\fI\%toil.job.Job.run()\fP method has completed.
.INDENT 7.0
.TP
.B Returns
childJob: for call chaining
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B addService(service, parentService=None)
Add a service.
.sp
The \fI\%toil.job.Job.Service.start()\fP method of the service will be called
after the run method has completed but before any successors are run.
The service\(aqs \fI\%toil.job.Job.Service.stop()\fP method will be called once
the successors of the job have been run.
.sp
Services allow things like databases and servers to be started and accessed
by jobs in a workflow.
.INDENT 7.0
.TP
.B Raises
\fI\%toil.job.JobException\fP \-\- If service has already been made the child
of a job or another service.
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBservice\fP \-\- Service to add.
.IP \(bu 2
\fBparentService\fP \-\- Service that will be started before \(aqservice\(aq is
started. Allows trees of services to be established. parentService must be a service
of this job.
.UNINDENT
.TP
.B Returns
a promise that will be replaced with the return value from
\fI\%toil.job.Job.Service.start()\fP of service in any successor of the job.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B addFollowOn(followOnJob)
Add a follow\-on job.
.sp
Follow\-on jobs will be run after the child jobs and their successors have been run.
.INDENT 7.0
.TP
.B Returns
followOnJob for call chaining
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B rv(*path) -> \fI\%Promise\fP
Create a \fIpromise\fP (\fI\%toil.job.Promise\fP).
.sp
The \(dqpromise\(dq representing a return value of the job\(aqs run method, or,
in case of a function\-wrapping job, the wrapped function\(aqs return value.
.INDENT 7.0
.TP
.B Parameters
\fBpath\fP (\fI(\fP\fIAny\fP\fI)\fP) \-\- Optional path for selecting a component of the promised return value.
If absent or empty, the entire return value will be used. Otherwise, the first
element of the path is used to select an individual item of the return value. For
that to work, the return value must be a list, dictionary or of any other type
implementing the \fI__getitem__()\fP magic method. If the selected item is yet another
composite value, the second element of the path can be used to select an item from
it, and so on. For example, if the return value is \fI[6,{\(aqa\(aq:42}]\fP, \fI\&.rv(0)\fP would
select \fI6\fP , \fIrv(1)\fP would select \fI{\(aqa\(aq:3}\fP while \fIrv(1,\(aqa\(aq)\fP would select \fI3\fP\&. To
select a slice from a return value that is slicable, e.g. tuple or list, the path
element should be a \fIslice\fP object. For example, assuming that the return value is
\fI[6, 7, 8, 9]\fP then \fI\&.rv(slice(1, 3))\fP would select \fI[7, 8]\fP\&. Note that slicing
really only makes sense at the end of path.
.TP
.B Returns
A promise representing the return value of this jobs \fI\%toil.job.Job.run()\fP
method.
.TP
.B Return type
\fI\%toil.job.Promise\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B prepareForPromiseRegistration(jobStore)
Set up to allow this job\(aqs promises to register themselves.
.sp
Prepare this job (the promisor) so that its promises can register
themselves with it, when the jobs they are promised to (promisees) are
serialized.
.sp
The promissee holds the reference to the promise (usually as part of the
job arguments) and when it is being pickled, so will the promises it refers
to. Pickling a promise triggers it to be registered with the promissor.
.UNINDENT
.UNINDENT
.SS Promise
.sp
The class used to reference return values of jobs/services not yet run/started.
.INDENT 0.0
.TP
.B class  toil.job.Promise(*args)
References a return value from a method as a \fIpromise\fP before the method itself is run.
.sp
References a return value from a \fI\%toil.job.Job.run()\fP or
\fI\%toil.job.Job.Service.start()\fP method as a \fIpromise\fP before the method itself is run.
.sp
Let T be a job. Instances of \fI\%Promise\fP (termed a \fIpromise\fP) are returned by T.rv(),
which is used to reference the return value of T\(aqs run function. When the promise is passed
to the constructor (or as an argument to a wrapped function) of a different, successor job
the promise will be replaced by the actual referenced return value. This mechanism allows a
return values from one job\(aqs run method to be input argument to job before the former job\(aqs
run function has been executed.
.INDENT 7.0
.TP
.B filesToDelete  =  {}
A set of IDs of files containing promised values when we know we won\(aqt need them anymore
.UNINDENT
.INDENT 7.0
.TP
.B __init__(job:  \fI\%Job\fP, path:  \fI\%Any\fP)
Initialize this promise.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBjob\fP (\fI\%Job\fP) \-\- the job whose return value this promise references
.IP \(bu 2
\fBpath\fP \-\- see \fI\%Job.rv()\fP
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B class  toil.job.PromisedRequirement(valueOrCallable, *args)
Class for dynamically allocating job function resource requirements.
.sp
(involving \fI\%toil.job.Promise\fP instances.)
.sp
Use when resource requirements depend on the return value of a parent function.
PromisedRequirements can be modified by passing a function that takes the
\fI\%Promise\fP as input.
.sp
For example, let f, g, and h be functions. Then a Toil workflow can be
defined as follows::
A = Job.wrapFn(f)
B = A.addChildFn(g, cores=PromisedRequirement(A.rv())
C = B.addChildFn(h, cores=PromisedRequirement(lambda x: 2*x, B.rv()))
.INDENT 7.0
.TP
.B __init__(valueOrCallable, *args)
Initialize this Promised Requirement.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBvalueOrCallable\fP \-\- A single Promise instance or a function that
takes args as input parameters.
.IP \(bu 2
\fBargs\fP (\fI\%int\fP\fI or \fP\fI\&.Promise\fP) \-\- variable length argument list
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B getValue()
Return PromisedRequirement value.
.UNINDENT
.INDENT 7.0
.TP
.B static  convertPromises(kwargs:  \fI\%Dict\fP[\fI\%str\fP,  \fI\%Any\fP]) -> \fI\%bool\fP
Return True if reserved resource keyword is a Promise or PromisedRequirement instance.
.sp
Converts Promise instance to PromisedRequirement.
.INDENT 7.0
.TP
.B Parameters
\fBkwargs\fP \-\- function keyword arguments
.UNINDENT
.UNINDENT
.UNINDENT
.SH JOB METHODS API
.sp
Jobs are the units of work in Toil which are composed into workflows.
.INDENT 0.0
.TP
.B class  toil.job.Job(memory:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP]]  =  None, cores:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP,  \fI\%float\fP]]  =  None, disk:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP]]  =  None, accelerators:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP,  \fI\%Mapping\fP[\fI\%str\fP,  \fI\%Any\fP],  AcceleratorRequirement,  \fI\%Sequence\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP,  \fI\%Mapping\fP[\fI\%str\fP,  \fI\%Any\fP],  AcceleratorRequirement]]]]  =  None, preemptible:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP,  \fI\%bool\fP]]  =  None, preemptable:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP,  \fI\%bool\fP]]  =  None, unitName:  \fI\%Optional\fP[\fI\%str\fP]  =  \(aq\(aq, checkpoint:  \fI\%Optional\fP[\fI\%bool\fP]  =  False, displayName:  \fI\%Optional\fP[\fI\%str\fP]  =  \(aq\(aq, descriptionClass:  \fI\%Optional\fP[\fI\%str\fP]  =  None)
Class represents a unit of work in toil.
.INDENT 7.0
.TP
.B __init__(memory:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP]]  =  None, cores:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP,  \fI\%float\fP]]  =  None, disk:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP]]  =  None, accelerators:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP,  \fI\%Mapping\fP[\fI\%str\fP,  \fI\%Any\fP],  AcceleratorRequirement,  \fI\%Sequence\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP,  \fI\%Mapping\fP[\fI\%str\fP,  \fI\%Any\fP],  AcceleratorRequirement]]]]  =  None, preemptible:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP,  \fI\%bool\fP]]  =  None, preemptable:  \fI\%Optional\fP[\fI\%Union\fP[\fI\%str\fP,  \fI\%int\fP,  \fI\%bool\fP]]  =  None, unitName:  \fI\%Optional\fP[\fI\%str\fP]  =  \(aq\(aq, checkpoint:  \fI\%Optional\fP[\fI\%bool\fP]  =  False, displayName:  \fI\%Optional\fP[\fI\%str\fP]  =  \(aq\(aq, descriptionClass:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%None\fP
Job initializer.
.sp
This method must be called by any overriding constructor.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBmemory\fP (\fI\%int\fP\fI or \fP\fIstring convertible by toil.lib.conversions.human2bytes to an int\fP) \-\- the maximum number of bytes of memory the job will require to run.
.IP \(bu 2
\fBcores\fP (\fI\%float\fP\fI, \fP\fI\%int\fP\fI, or \fP\fIstring convertible by toil.lib.conversions.human2bytes to an int\fP) \-\- the number of CPU cores required.
.IP \(bu 2
\fBdisk\fP (\fI\%int\fP\fI or \fP\fIstring convertible by toil.lib.conversions.human2bytes to an int\fP) \-\- the amount of local disk space required by the job, expressed in bytes.
.IP \(bu 2
\fBaccelerators\fP (\fI\%int\fP\fI, \fP\fIstring\fP\fI, \fP\fI\%dict\fP\fI, or \fP\fI\%list\fP\fI of \fP\fIthose. Strings and dicts must be parseable by AcceleratorRequirement.parse.\fP) \-\- the computational accelerators required by the job. If a string, can be a string of a number, or a string specifying a model, brand, or API (with optional colon\-delimited count).
.IP \(bu 2
\fBpreemptible\fP (\fI\%bool\fP\fI, \fP\fIint in {0\fP\fI, \fP\fI1}\fP\fI, or \fP\fIstring in {\(aqfalse\(aq\fP\fI, \fP\fI\(aqtrue\(aq} in any case\fP) \-\- if the job can be run on a preemptible node.
.IP \(bu 2
\fBpreemptable\fP \-\- legacy preemptible parameter, for backwards compatibility with workflows not using the preemptible keyword
.IP \(bu 2
\fBunitName\fP (\fI\%str\fP) \-\- Human\-readable name for this instance of the job.
.IP \(bu 2
\fBcheckpoint\fP (\fI\%bool\fP) \-\- if any of this job\(aqs successor jobs completely fails,
exhausting all their retries, remove any successor jobs and rerun this job to restart the
subtree. Job must be a leaf vertex in the job graph when initially defined, see
\fBtoil.job.Job.checkNewCheckpointsAreCutVertices()\fP\&.
.IP \(bu 2
\fBdisplayName\fP (\fI\%str\fP) \-\- Human\-readable job type display name.
.IP \(bu 2
\fBdescriptionClass\fP (\fIclass\fP) \-\- Override for the JobDescription class used to describe the job.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property  jobStoreID
Get the ID of this Job.
.INDENT 7.0
.TP
.B Return type
\fI\%str\fP|toil.job.TemporaryID
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property  description
Expose the JobDescription that describes this job.
.INDENT 7.0
.TP
.B Return type
\fI\%toil.job.JobDescription\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property  disk:  \fI\%int\fP
The maximum number of bytes of disk the job will require to run.
.INDENT 7.0
.TP
.B Return type
\fI\%int\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property  memory
The maximum number of bytes of memory the job will require to run.
.INDENT 7.0
.TP
.B Return type
\fI\%int\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property  cores
.INDENT 7.0
.INDENT 3.5
The number of CPU cores required.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B Return type
\fI\%int\fP|\fI\%float\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property  accelerators
.INDENT 7.0
.INDENT 3.5
Any accelerators, such as GPUs, that are needed.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B Return type
\fI\%list\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property  preemptible
Whether the job can be run on a preemptible node.
.INDENT 7.0
.TP
.B Return type
\fI\%bool\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property  checkpoint
Determine if the job is a checkpoint job or not.
.INDENT 7.0
.TP
.B Return type
\fI\%bool\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B assignConfig(config:  Config)
Assign the given config object.
.sp
It will be used by various actions implemented inside the Job class.
.INDENT 7.0
.TP
.B Parameters
\fBconfig\fP \-\- Config object to query
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B run(fileStore:  \fI\%AbstractFileStore\fP) -> \fI\%Any\fP
Override this function to perform work and dynamically create successor jobs.
.INDENT 7.0
.TP
.B Parameters
\fBfileStore\fP \-\- Used to create local and globally sharable temporary
files and to send log messages to the leader process.
.TP
.B Returns
The return value of the function can be passed to other jobs by means of
\fI\%toil.job.Job.rv()\fP\&.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B addChild(childJob:  \fI\%Job\fP) -> \fI\%Job\fP
Add a childJob to be run as child of this job.
.sp
Child jobs will be run directly after this job\(aqs
\fI\%toil.job.Job.run()\fP method has completed.
.INDENT 7.0
.TP
.B Returns
childJob: for call chaining
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B hasChild(childJob:  \fI\%Job\fP) -> \fI\%bool\fP
Check if childJob is already a child of this job.
.INDENT 7.0
.TP
.B Returns
True if childJob is a child of the job, else False.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B addFollowOn(followOnJob:  \fI\%Job\fP) -> \fI\%Job\fP
Add a follow\-on job.
.sp
Follow\-on jobs will be run after the child jobs and their successors have been run.
.INDENT 7.0
.TP
.B Returns
followOnJob for call chaining
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B hasPredecessor(job:  \fI\%Job\fP) -> \fI\%bool\fP
Check if a given job is already a predecessor of this job.
.UNINDENT
.INDENT 7.0
.TP
.B hasFollowOn(followOnJob:  \fI\%Job\fP) -> \fI\%bool\fP
Check if given job is already a follow\-on of this job.
.INDENT 7.0
.TP
.B Returns
True if the followOnJob is a follow\-on of this job, else False.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B addService(service:  \fI\%Service\fP, parentService:  \fI\%Optional\fP[\fI\%Service\fP]  =  None) -> \fI\%Promise\fP
Add a service.
.sp
The \fI\%toil.job.Job.Service.start()\fP method of the service will be called
after the run method has completed but before any successors are run.
The service\(aqs \fI\%toil.job.Job.Service.stop()\fP method will be called once
the successors of the job have been run.
.sp
Services allow things like databases and servers to be started and accessed
by jobs in a workflow.
.INDENT 7.0
.TP
.B Raises
\fI\%toil.job.JobException\fP \-\- If service has already been made the child
of a job or another service.
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBservice\fP \-\- Service to add.
.IP \(bu 2
\fBparentService\fP \-\- Service that will be started before \(aqservice\(aq is
started. Allows trees of services to be established. parentService must be a service
of this job.
.UNINDENT
.TP
.B Returns
a promise that will be replaced with the return value from
\fI\%toil.job.Job.Service.start()\fP of service in any successor of the job.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B hasService(service:  \fI\%Service\fP) -> \fI\%bool\fP
Return True if the given Service is a service of this job, and False otherwise.
.UNINDENT
.INDENT 7.0
.TP
.B addChildFn(fn:  \fI\%Callable\fP, *args, **kwargs) -> \fI\%FunctionWrappingJob\fP
Add a function as a child job.
.INDENT 7.0
.TP
.B Parameters
\fBfn\fP \-\- Function to be run as a child job with \fB*args\fP and \fB**kwargs\fP as         arguments to this function. See toil.job.FunctionWrappingJob for reserved         keyword arguments used to specify resource requirements.
.TP
.B Returns
The new child job that wraps fn.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B addFollowOnFn(fn:  \fI\%Callable\fP, *args, **kwargs) -> \fI\%FunctionWrappingJob\fP
Add a function as a follow\-on job.
.INDENT 7.0
.TP
.B Parameters
\fBfn\fP \-\- Function to be run as a follow\-on job with \fB*args\fP and \fB**kwargs\fP as         arguments to this function. See toil.job.FunctionWrappingJob for reserved         keyword arguments used to specify resource requirements.
.TP
.B Returns
The new follow\-on job that wraps fn.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B addChildJobFn(fn:  \fI\%Callable\fP, *args, **kwargs) -> \fI\%FunctionWrappingJob\fP
Add a job function as a child job.
.sp
See \fI\%toil.job.JobFunctionWrappingJob\fP for a definition of a job function.
.INDENT 7.0
.TP
.B Parameters
\fBfn\fP \-\- Job function to be run as a child job with \fB*args\fP and \fB**kwargs\fP as         arguments to this function. See toil.job.JobFunctionWrappingJob for reserved         keyword arguments used to specify resource requirements.
.TP
.B Returns
The new child job that wraps fn.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B addFollowOnJobFn(fn:  \fI\%Callable\fP, *args, **kwargs) -> \fI\%FunctionWrappingJob\fP
Add a follow\-on job function.
.sp
See \fI\%toil.job.JobFunctionWrappingJob\fP for a definition of a job function.
.INDENT 7.0
.TP
.B Parameters
\fBfn\fP \-\- Job function to be run as a follow\-on job with \fB*args\fP and \fB**kwargs\fP as         arguments to this function. See toil.job.JobFunctionWrappingJob for reserved         keyword arguments used to specify resource requirements.
.TP
.B Returns
The new follow\-on job that wraps fn.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property  tempDir:  \fI\%str\fP
Shortcut to calling \fBjob.fileStore.getLocalTempDir()\fP\&.
.sp
Temp dir is created on first call and will be returned for first and future calls
:return: Path to tempDir. See \fIjob.fileStore.getLocalTempDir\fP
.UNINDENT
.INDENT 7.0
.TP
.B log(text:  \fI\%str\fP, level=20) -> \fI\%None\fP
Convenience wrapper for \fBfileStore.logToMaster()\fP\&.
.UNINDENT
.INDENT 7.0
.TP
.B static  wrapFn(fn, *args, **kwargs)
Makes a Job out of a function.         Convenience function for constructor of \fI\%toil.job.FunctionWrappingJob\fP\&.
.INDENT 7.0
.TP
.B Parameters
\fBfn\fP \-\- Function to be run with \fB*args\fP and \fB**kwargs\fP as arguments.         See toil.job.JobFunctionWrappingJob for reserved keyword arguments used         to specify resource requirements.
.TP
.B Returns
The new function that wraps fn.
.TP
.B Return type
\fI\%toil.job.FunctionWrappingJob\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B static  wrapJobFn(fn, *args, **kwargs)
Makes a Job out of a job function.         Convenience function for constructor of \fI\%toil.job.JobFunctionWrappingJob\fP\&.
.INDENT 7.0
.TP
.B Parameters
\fBfn\fP \-\- Job function to be run with \fB*args\fP and \fB**kwargs\fP as arguments.         See toil.job.JobFunctionWrappingJob for reserved keyword arguments used         to specify resource requirements.
.TP
.B Returns
The new job function that wraps fn.
.TP
.B Return type
\fI\%toil.job.JobFunctionWrappingJob\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B encapsulate(name=None)
Encapsulates the job, see \fI\%toil.job.EncapsulatedJob\fP\&.
Convenience function for constructor of \fI\%toil.job.EncapsulatedJob\fP\&.
.INDENT 7.0
.TP
.B Parameters
\fBname\fP (\fI\%str\fP) \-\- Human\-readable name for the encapsulated job.
.TP
.B Returns
an encapsulated version of this job.
.TP
.B Return type
\fI\%toil.job.EncapsulatedJob\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B rv(*path) -> \fI\%Any\fP
Create a \fIpromise\fP (\fI\%toil.job.Promise\fP).
.sp
The \(dqpromise\(dq representing a return value of the job\(aqs run method, or,
in case of a function\-wrapping job, the wrapped function\(aqs return value.
.INDENT 7.0
.TP
.B Parameters
\fBpath\fP (\fI(\fP\fIAny\fP\fI)\fP) \-\- Optional path for selecting a component of the promised return value.
If absent or empty, the entire return value will be used. Otherwise, the first
element of the path is used to select an individual item of the return value. For
that to work, the return value must be a list, dictionary or of any other type
implementing the \fI__getitem__()\fP magic method. If the selected item is yet another
composite value, the second element of the path can be used to select an item from
it, and so on. For example, if the return value is \fI[6,{\(aqa\(aq:42}]\fP, \fI\&.rv(0)\fP would
select \fI6\fP , \fIrv(1)\fP would select \fI{\(aqa\(aq:3}\fP while \fIrv(1,\(aqa\(aq)\fP would select \fI3\fP\&. To
select a slice from a return value that is slicable, e.g. tuple or list, the path
element should be a \fIslice\fP object. For example, assuming that the return value is
\fI[6, 7, 8, 9]\fP then \fI\&.rv(slice(1, 3))\fP would select \fI[7, 8]\fP\&. Note that slicing
really only makes sense at the end of path.
.TP
.B Returns
A promise representing the return value of this jobs \fI\%toil.job.Job.run()\fP
method.
.TP
.B Return type
\fI\%toil.job.Promise\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B prepareForPromiseRegistration(jobStore:  \fI\%AbstractJobStore\fP) -> \fI\%None\fP
Set up to allow this job\(aqs promises to register themselves.
.sp
Prepare this job (the promisor) so that its promises can register
themselves with it, when the jobs they are promised to (promisees) are
serialized.
.sp
The promissee holds the reference to the promise (usually as part of the
job arguments) and when it is being pickled, so will the promises it refers
to. Pickling a promise triggers it to be registered with the promissor.
.UNINDENT
.INDENT 7.0
.TP
.B checkJobGraphForDeadlocks()
Ensures that a graph of Jobs (that hasn\(aqt yet been saved to the
JobStore) doesn\(aqt contain any pathological relationships between jobs
that would result in deadlocks if we tried to run the jobs.
.sp
See \fI\%toil.job.Job.checkJobGraphConnected()\fP,
\fBtoil.job.Job.checkJobGraphAcyclic()\fP and
\fI\%toil.job.Job.checkNewCheckpointsAreLeafVertices()\fP for more info.
.INDENT 7.0
.TP
.B Raises
\fI\%toil.job.JobGraphDeadlockException\fP \-\- if the job graph
is cyclic, contains multiple roots or contains checkpoint jobs that are
not leaf vertices when defined (see \fBtoil.job.Job.checkNewCheckpointsAreLeaves()\fP).
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B getRootJobs() -> \fI\%Set\fP[\fI\%Job\fP]
Returns the set of root job objects that contain this job.
A root job is a job with no predecessors (i.e. which are not children, follow\-ons, or services).
.sp
Only deals with jobs created here, rather than loaded from the job store.
.UNINDENT
.INDENT 7.0
.TP
.B checkJobGraphConnected()
.INDENT 7.0
.TP
.B Raises
\fI\%toil.job.JobGraphDeadlockException\fP \-\- if \fI\%toil.job.Job.getRootJobs()\fP does         not contain exactly one root job.
.UNINDENT
.sp
As execution always starts from one root job, having multiple root jobs will         cause a deadlock to occur.
.sp
Only deals with jobs created here, rather than loaded from the job store.
.UNINDENT
.INDENT 7.0
.TP
.B checkJobGraphAcylic()
.INDENT 7.0
.TP
.B Raises
\fI\%toil.job.JobGraphDeadlockException\fP \-\- if the connected component         of jobs containing this job contains any cycles of child/followOn dependencies         in the \fIaugmented job graph\fP (see below). Such cycles are not allowed         in valid job graphs.
.UNINDENT
.sp
A follow\-on edge (A, B) between two jobs A and B is equivalent         to adding a child edge to B from (1) A, (2) from each child of A,         and (3) from the successors of each child of A. We call each such edge         an edge an \(dqimplied\(dq edge. The augmented job graph is a job graph including         all the implied edges.
.sp
For a job graph G = (V, E) the algorithm is \fBO(|V|^2)\fP\&. It is \fBO(|V| + |E|)\fP for         a graph with no follow\-ons. The former follow\-on case could be improved!
.sp
Only deals with jobs created here, rather than loaded from the job store.
.UNINDENT
.INDENT 7.0
.TP
.B checkNewCheckpointsAreLeafVertices()
A checkpoint job is a job that is restarted if either it fails, or if any of         its successors completely fails, exhausting their retries.
.sp
A job is a leaf it is has no successors.
.sp
A checkpoint job must be a leaf when initially added to the job graph. When its         run method is invoked it can then create direct successors. This restriction is made
to simplify implementation.
.sp
Only works on connected components of jobs not yet added to the JobStore.
.INDENT 7.0
.TP
.B Raises
\fI\%toil.job.JobGraphDeadlockException\fP \-\- if there exists a job being added to the graph for which         checkpoint=True and which is not a leaf.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B defer(function, *args, **kwargs)
Register a deferred function, i.e. a callable that will be invoked after the current
attempt at running this job concludes. A job attempt is said to conclude when the job
function (or the \fI\%toil.job.Job.run()\fP method for class\-based jobs) returns, raises an
exception or after the process running it terminates abnormally. A deferred function will
be called on the node that attempted to run the job, even if a subsequent attempt is made
on another node. A deferred function should be idempotent because it may be called
multiple times on the same node or even in the same process. More than one deferred
function may be registered per job attempt by calling this method repeatedly with
different arguments. If the same function is registered twice with the same or different
arguments, it will be called twice per job attempt.
.sp
Examples for deferred functions are ones that handle cleanup of resources external to
Toil, like Docker containers, files outside the work directory, etc.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBfunction\fP (\fIcallable\fP) \-\- The function to be called after this job concludes.
.IP \(bu 2
\fBargs\fP (\fI\%list\fP) \-\- The arguments to the function
.IP \(bu 2
\fBkwargs\fP (\fI\%dict\fP) \-\- The keyword arguments to the function
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B getTopologicalOrderingOfJobs()
.INDENT 7.0
.TP
.B Returns
a list of jobs such that for all pairs of indices i, j for which i < j,         the job at index i can be run before the job at index j.
.UNINDENT
.sp
Only considers jobs in this job\(aqs subgraph that are newly added, not loaded from the job store.
.sp
Ignores service jobs.
.INDENT 7.0
.TP
.B Return type
\fI\%list\fP[\fI\%Job\fP]
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B saveBody(jobStore)
Save the execution data for just this job to the JobStore, and fill in
the JobDescription with the information needed to retrieve it.
.sp
The Job\(aqs JobDescription must have already had a real jobStoreID assigned to it.
.sp
Does not save the JobDescription.
.INDENT 7.0
.TP
.B Parameters
\fBjobStore\fP (\fI\%toil.jobStores.abstractJobStore.AbstractJobStore\fP) \-\- The job store
to save the job body into.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B saveAsRootJob(jobStore:  \fI\%AbstractJobStore\fP) -> \fI\%JobDescription\fP
Save this job to the given jobStore as the root job of the workflow.
.INDENT 7.0
.TP
.B Returns
the JobDescription describing this job.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B classmethod  loadJob(jobStore:  \fI\%AbstractJobStore\fP, jobDescription:  \fI\%JobDescription\fP) -> \fI\%Job\fP
Retrieves a \fI\%toil.job.Job\fP instance from a JobStore
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBjobStore\fP \-\- The job store.
.IP \(bu 2
\fBjobDescription\fP \-\- the JobDescription of the job to retrieve.
.UNINDENT
.TP
.B Returns
The job referenced by the JobDescription.
.UNINDENT
.UNINDENT
.UNINDENT
.SS JobDescription
.sp
The class used to store all the information that the Toil Leader ever needs to
know about a Job.
.INDENT 0.0
.TP
.B class  toil.job.JobDescription(requirements:  \fI\%Mapping\fP[\fI\%str\fP,  \fI\%Union\fP[\fI\%int\fP,  \fI\%str\fP,  \fI\%bool\fP]], jobName:  \fI\%str\fP, unitName:  \fI\%str\fP  =  \(aq\(aq, displayName:  \fI\%str\fP  =  \(aq\(aq, command:  \fI\%Optional\fP[\fI\%str\fP]  =  None)
Stores all the information that the Toil Leader ever needs to know about a Job.
.sp
(requirements information, dependency information, commands to issue,
etc.)
.sp
Can be obtained from an actual (i.e. executable) Job object, and can be
used to obtain the Job object from the JobStore.
.sp
Never contains other Jobs or JobDescriptions: all reference is by ID.
.sp
Subclassed into variants for checkpoint jobs and service jobs that have
their specific parameters.
.INDENT 7.0
.TP
.B __init__(requirements:  \fI\%Mapping\fP[\fI\%str\fP,  \fI\%Union\fP[\fI\%int\fP,  \fI\%str\fP,  \fI\%bool\fP]], jobName:  \fI\%str\fP, unitName:  \fI\%str\fP  =  \(aq\(aq, displayName:  \fI\%str\fP  =  \(aq\(aq, command:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%None\fP
Create a new JobDescription.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBrequirements\fP \-\- Dict from string to number, string, or bool
describing the resource requirements of the job. \(aqcores\(aq, \(aqmemory\(aq,
\(aqdisk\(aq, and \(aqpreemptible\(aq fields, if set, are parsed and broken out
into properties. If unset, the relevant property will be
unspecified, and will be pulled from the assigned Config object if
queried (see \fBtoil.job.Requirer.assignConfig()\fP).
.IP \(bu 2
\fBjobName\fP \-\- Name of the kind of job this is. May be used in job
store IDs and logging. Also used to let the cluster scaler learn a
model for how long the job will take. Ought to be the job class\(aqs
name if no real user\-defined name is available.
.IP \(bu 2
\fBunitName\fP \-\- Name of this instance of this kind of job. May
appear with jobName in logging.
.IP \(bu 2
\fBdisplayName\fP \-\- A human\-readable name to identify this
particular job instance. Ought to be the job class\(aqs name
if no real user\-defined name is available.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B serviceHostIDsInBatches() -> \fI\%Iterator\fP[\fI\%List\fP[\fI\%str\fP]]
Find all batches of service host job IDs that can be started at the same time.
.sp
(in the order they need to start in)
.UNINDENT
.INDENT 7.0
.TP
.B successorsAndServiceHosts() -> \fI\%Iterator\fP[\fI\%str\fP]
Get an iterator over all child, follow\-on, and service job IDs.
.UNINDENT
.INDENT 7.0
.TP
.B allSuccessors()
Get an iterator over all child and follow\-on job IDs.
.UNINDENT
.INDENT 7.0
.TP
.B property  services
Get a collection of the IDs of service host jobs for this job, in arbitrary order.
.sp
Will be empty if the job has no unfinished services.
.UNINDENT
.INDENT 7.0
.TP
.B nextSuccessors() -> \fI\%List\fP[\fI\%str\fP]
Return the collection of job IDs for the successors of this job that are ready to run.
.sp
If those jobs have multiple predecessor relationships, they may still
be blocked on other jobs.
.sp
Returns None when at the final phase (all successors done), and an
empty collection if there are more phases but they can\(aqt be entered yet
(e.g. because we are waiting for the job itself to run).
.UNINDENT
.INDENT 7.0
.TP
.B property  stack:  \fI\%Tuple\fP[\fI\%Tuple\fP[\fI\%str\fP,  \&...],  \&...]
Get IDs of successors that need to run still.
.sp
Batches of successors are in reverse order of the order they need to run in.
.sp
Some successors in each batch may have already been finished. Batches may be empty.
.sp
Exists so that code that used the old stack list immutably can work
still. New development should use nextSuccessors(), and all mutations
should use filterSuccessors() (which automatically removes completed
phases).
.INDENT 7.0
.TP
.B Returns
Batches of successors that still need to run, in reverse
order. An empty batch may exist under a non\-empty batch, or at the top
when the job itself is not done.
.TP
.B Return type
\fI\%tuple\fP(\fI\%tuple\fP(\fI\%str\fP))
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B filterSuccessors(predicate:  \fI\%Callable\fP[[\fI\%str\fP],  \fI\%bool\fP]) -> \fI\%None\fP
Keep only successor jobs for which the given predicate function approves.
.sp
The predicate function is called with the job\(aqs ID.
.sp
Treats all other successors as complete and forgets them.
.UNINDENT
.INDENT 7.0
.TP
.B filterServiceHosts(predicate:  \fI\%Callable\fP[[\fI\%str\fP],  \fI\%bool\fP]) -> \fI\%None\fP
Keep only services for which the given predicate approves.
.sp
The predicate function is called with the service host job\(aqs ID.
.sp
Treats all other services as complete and forgets them.
.UNINDENT
.INDENT 7.0
.TP
.B clear_nonexistent_dependents(job_store:  \fI\%AbstractJobStore\fP) -> \fI\%None\fP
Remove all references to child, follow\-on, and associated service jobs
that do not exist (i.e. have been completed and removed) in the given
job store.
.UNINDENT
.INDENT 7.0
.TP
.B clear_dependents() -> \fI\%None\fP
Remove all references to child, follow\-on, and associated service jobs.
.UNINDENT
.INDENT 7.0
.TP
.B is_subtree_done() -> \fI\%bool\fP
Return True if the job appears to be done, and all related child,
follow\-on, and service jobs appear to be finished and removed.
.UNINDENT
.INDENT 7.0
.TP
.B replace(other:  \fI\%JobDescription\fP) -> \fI\%None\fP
Take on the ID of another JobDescription, retaining our own state and type.
.sp
When updated in the JobStore, we will save over the other JobDescription.
.sp
Useful for chaining jobs: the chained\-to job can replace the parent job.
.sp
Merges cleanup state from the job being replaced into this one.
.INDENT 7.0
.TP
.B Parameters
\fBother\fP \-\- Job description to replace.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B addChild(childID:  \fI\%str\fP) -> \fI\%None\fP
Make the job with the given ID a child of the described job.
.UNINDENT
.INDENT 7.0
.TP
.B addFollowOn(followOnID:  \fI\%str\fP) -> \fI\%None\fP
Make the job with the given ID a follow\-on of the described job.
.UNINDENT
.INDENT 7.0
.TP
.B addServiceHostJob(serviceID, parentServiceID=None)
Make the ServiceHostJob with the given ID a service of the described job.
.sp
If a parent ServiceHostJob ID is given, that parent service will be started
first, and must have already been added.
.UNINDENT
.INDENT 7.0
.TP
.B hasChild(childID:  \fI\%str\fP) -> \fI\%bool\fP
Return True if the job with the given ID is a child of the described job.
.UNINDENT
.INDENT 7.0
.TP
.B hasFollowOn(followOnID:  \fI\%str\fP) -> \fI\%bool\fP
Test if the job with the given ID is a follow\-on of the described job.
.UNINDENT
.INDENT 7.0
.TP
.B hasServiceHostJob(serviceID) -> \fI\%bool\fP
Test if the ServiceHostJob is a service of the described job.
.UNINDENT
.INDENT 7.0
.TP
.B renameReferences(renames:  \fI\%Dict\fP[TemporaryID,  \fI\%str\fP]) -> \fI\%None\fP
Apply the given dict of ID renames to all references to jobs.
.sp
Does not modify our own ID or those of finished predecessors.
IDs not present in the renames dict are left as\-is.
.INDENT 7.0
.TP
.B Parameters
\fBrenames\fP \-\- Rename operations to apply.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B addPredecessor() -> \fI\%None\fP
Notify the JobDescription that a predecessor has been added to its Job.
.UNINDENT
.INDENT 7.0
.TP
.B onRegistration(jobStore:  \fI\%AbstractJobStore\fP) -> \fI\%None\fP
Called by the Job saving logic when this JobDescription meets the JobStore and has its ID assigned.
.sp
Overridden to perform setup work (like hooking up flag files for service
jobs) that requires the JobStore.
.INDENT 7.0
.TP
.B Parameters
\fBjobStore\fP \-\- The job store we are being placed into
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B setupJobAfterFailure(exit_status:  \fI\%Optional\fP[\fI\%int\fP]  =  None, exit_reason:  \fI\%Optional\fP[BatchJobExitReason]  =  None)
Reduce the remainingTryCount if greater than zero and set the memory
to be at least as big as the default memory (in case of exhaustion of memory,
which is common).
.sp
Requires a configuration to have been assigned (see \fBtoil.job.Requirer.assignConfig()\fP).
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBexit_status\fP \-\- The exit code from the job.
.IP \(bu 2
\fBexit_reason\fP \-\- The reason the job stopped, if available from the batch system.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B getLogFileHandle(jobStore)
Returns a context manager that yields a file handle to the log file.
.sp
Assumes logJobStoreFileID is set.
.UNINDENT
.INDENT 7.0
.TP
.B property  remainingTryCount
The try count set on the JobDescription, or the default based on the
retry count from the config if none is set.
.UNINDENT
.INDENT 7.0
.TP
.B clearRemainingTryCount() -> \fI\%bool\fP
Clear remainingTryCount and set it back to its default value.
.INDENT 7.0
.TP
.B Returns
True if a modification to the JobDescription was made, and
False otherwise.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B pre_update_hook() -> \fI\%None\fP
Called by the job store before pickling and saving a created or updated
version of a job.
.UNINDENT
.INDENT 7.0
.TP
.B get_job_kind() -> \fI\%str\fP
Returns an identifier of the job for use with the message bus.
Either the unit name, job name, or display name, which identifies
the kind of job it is to toil.
.sp
Otherwise returns Unknown Job in case no identifier is available
.UNINDENT
.UNINDENT
.SH JOB.RUNNER API
.sp
The Runner contains the methods needed to configure and start a Toil run.
.INDENT 0.0
.TP
.B class  Job.Runner
Used to setup and run Toil workflow.
.INDENT 7.0
.TP
.B static  getDefaultArgumentParser() -> \fI\%ArgumentParser\fP
Get argument parser with added toil workflow options.
.INDENT 7.0
.TP
.B Returns
The argument parser used by a toil workflow with added Toil options.
.TP
.B Return type
\fI\%argparse.ArgumentParser\fP
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B static  getDefaultOptions(jobStore:  \fI\%str\fP) -> \fI\%Namespace\fP
Get default options for a toil workflow.
.INDENT 7.0
.TP
.B Parameters
\fBjobStore\fP (\fIstring\fP) \-\- A string describing the jobStore             for the workflow.
.TP
.B Returns
The options used by a toil workflow.
.TP
.B Return type
argparse.ArgumentParser values object
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B static  addToilOptions(parser)
Adds the default toil options to an \fI\%optparse\fP or \fI\%argparse\fP
parser object.
.INDENT 7.0
.TP
.B Parameters
\fBparser\fP (\fI\%optparse.OptionParser\fP\fI or \fP\fI\%argparse.ArgumentParser\fP) \-\- Options object to add toil options to.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B static  startToil(job, options)
Run the toil workflow using the given options.
.sp
Deprecated by toil.common.Toil.start.
.sp
(see Job.Runner.getDefaultOptions and Job.Runner.addToilOptions) starting with this
job.
:param toil.job.Job job: root job of the workflow
:raises: toil.leader.FailedJobsException if at the end of function             their remain failed jobs.
:return: The return value of the root job\(aqs run function.
:rtype: Any
.UNINDENT
.UNINDENT
.SH JOB.FILESTORE API
.sp
The AbstractFileStore is an abstraction of a Toil run\(aqs shared storage.
.INDENT 0.0
.TP
.B class  toil.fileStores.abstractFileStore.AbstractFileStore(jobStore:  \fI\%AbstractJobStore\fP, jobDesc:  \fI\%JobDescription\fP, file_store_dir:  \fI\%str\fP, waitForPreviousCommit:  \fI\%Callable\fP[[],  \fI\%Any\fP])
Interface used to allow user code run by Toil to read and write files.
.sp
Also provides the interface to other Toil facilities used by user code,
including:
.INDENT 7.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
normal (non\-real\-time) logging
.IP \(bu 2
finding the correct temporary directory for scratch work
.IP \(bu 2
importing and exporting files into and out of the workflow
.UNINDENT
.UNINDENT
.UNINDENT
.sp
Stores user files in the jobStore, but keeps them separate from actual
jobs.
.sp
May implement caching.
.sp
Passed as argument to the \fI\%toil.job.Job.run()\fP method.
.sp
Access to files is only permitted inside the context manager provided by
\fI\%toil.fileStores.abstractFileStore.AbstractFileStore.open()\fP\&.
.sp
Also responsible for committing completed jobs back to the job store with
an update operation, and allowing that commit operation to be waited for.
.INDENT 7.0
.TP
.B __init__(jobStore:  \fI\%AbstractJobStore\fP, jobDesc:  \fI\%JobDescription\fP, file_store_dir:  \fI\%str\fP, waitForPreviousCommit:  \fI\%Callable\fP[[],  \fI\%Any\fP]) -> \fI\%None\fP
Create a new file store object.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBjobStore\fP \-\- the job store in use for the current Toil run.
.IP \(bu 2
\fBjobDesc\fP \-\- the JobDescription object for the currently
running job.
.IP \(bu 2
\fBfile_store_dir\fP \-\- the per\-worker local temporary directory where
the file store should store local files. Per\-job directories will be
created under here by the file store.
.IP \(bu 2
\fBwaitForPreviousCommit\fP \-\- the waitForCommit method of the previous job\(aqs
file store, when jobs are running in sequence on the same worker. Used to
prevent this file store\(aqs startCommit and the previous job\(aqs
startCommit methods from running at the same time and racing. If
they did race, it might be possible for the later job to be fully
marked as completed in the job store before the eralier job was.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B static  createFileStore(jobStore:  \fI\%AbstractJobStore\fP, jobDesc:  \fI\%JobDescription\fP, file_store_dir:  \fI\%str\fP, waitForPreviousCommit:  \fI\%Callable\fP[[],  \fI\%Any\fP], caching:  \fI\%Optional\fP[\fI\%bool\fP]) -> \fI\%Union\fP[NonCachingFileStore,  CachingFileStore]
Create a concreate FileStore.
.UNINDENT
.INDENT 7.0
.TP
.B static  shutdownFileStore(workflowID:  \fI\%str\fP, config_work_dir:  \fI\%Optional\fP[\fI\%str\fP], config_coordination_dir:  \fI\%Optional\fP[\fI\%str\fP]) -> \fI\%None\fP
Carry out any necessary filestore\-specific cleanup.
.sp
This is a destructive operation and it is important to ensure that there
are no other running processes on the system that are modifying or using
the file store for this workflow.
.sp
This is the intended to be the last call to the file store in a Toil run,
called by the batch system cleanup function upon batch system shutdown.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBworkflowID\fP \-\- The workflow ID for this invocation of the workflow
.IP \(bu 2
\fBconfig_work_dir\fP \-\- The path to the work directory in the Toil Config.
.IP \(bu 2
\fBconfig_coordination_dir\fP \-\- The path to the coordination directory in the Toil Config.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B open(job:  \fI\%Job\fP) -> \fI\%Generator\fP[\fI\%None\fP,  \fI\%None\fP,  \fI\%None\fP]
Create the context manager around tasks prior and after a job has been run.
.sp
File operations are only permitted inside the context manager.
.sp
Implementations must only yield from within \fIwith super().open(job):\fP\&.
.INDENT 7.0
.TP
.B Parameters
\fBjob\fP \-\- The job instance of the toil job to run.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B getLocalTempDir() -> \fI\%str\fP
Get a new local temporary directory in which to write files.
.sp
The directory will only persist for the duration of the job.
.INDENT 7.0
.TP
.B Returns
The absolute path to a new local temporary directory. This directory
will exist for the duration of the job only, and is guaranteed
to be deleted once the job terminates, removing all files it
contains recursively.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B getLocalTempFile(suffix:  \fI\%Optional\fP[\fI\%str\fP]  =  None, prefix:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%str\fP
Get a new local temporary file that will persist for the duration of the job.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBsuffix\fP \-\- If not None, the file name will end with this string.
Otherwise, default value \(dq.tmp\(dq will be used
.IP \(bu 2
\fBprefix\fP \-\- If not None, the file name will start with this string.
Otherwise, default value \(dqtmp\(dq will be used
.UNINDENT
.TP
.B Returns
The absolute path to a local temporary file. This file will exist
for the duration of the job only, and is guaranteed to be deleted
once the job terminates.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B getLocalTempFileName(suffix:  \fI\%Optional\fP[\fI\%str\fP]  =  None, prefix:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%str\fP
Get a valid name for a new local file. Don\(aqt actually create a file at the path.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBsuffix\fP \-\- If not None, the file name will end with this string.
Otherwise, default value \(dq.tmp\(dq will be used
.IP \(bu 2
\fBprefix\fP \-\- If not None, the file name will start with this string.
Otherwise, default value \(dqtmp\(dq will be used
.UNINDENT
.TP
.B Returns
Path to valid file
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  writeGlobalFile(localFileName:  \fI\%str\fP, cleanup:  \fI\%bool\fP  =  False) -> \fI\%FileID\fP
Upload a file (as a path) to the job store.
.sp
If the file is in a FileStore\-managed temporary directory (i.e. from
\fI\%toil.fileStores.abstractFileStore.AbstractFileStore.getLocalTempDir()\fP),
it will become a local copy of the file, eligible for deletion by
\fI\%toil.fileStores.abstractFileStore.AbstractFileStore.deleteLocalFile()\fP\&.
.sp
If an executable file on the local filesystem is uploaded, its executability
will be preserved when it is downloaded again.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBlocalFileName\fP \-\- The path to the local file to upload. The
last path component (basename of the file) will remain
associated with the file in the file store, if supported by the
backing JobStore, so that the file can be searched for by name
or name glob.
.IP \(bu 2
\fBcleanup\fP \-\- if True then the copy of the global file will be deleted once
the job and all its successors have completed running.  If not the global
file must be deleted manually.
.UNINDENT
.TP
.B Returns
an ID that can be used to retrieve the file.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B writeGlobalFileStream(cleanup:  \fI\%bool\fP  =  False, basename:  \fI\%Optional\fP[\fI\%str\fP]  =  None, encoding:  \fI\%Optional\fP[\fI\%str\fP]  =  None, errors:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%Iterator\fP[\fI\%Tuple\fP[WriteWatchingStream,  \fI\%FileID\fP]]
Similar to writeGlobalFile, but allows the writing of a stream to the job store.
The yielded file handle does not need to and should not be closed explicitly.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBencoding\fP \-\- The name of the encoding used to decode the file. Encodings
are the same as for decode(). Defaults to None which represents binary mode.
.IP \(bu 2
\fBerrors\fP \-\- Specifies how encoding errors are to be handled. Errors are the
same as for open(). Defaults to \(aqstrict\(aq when an encoding is specified.
.IP \(bu 2
\fBcleanup\fP \-\- is as in
\fI\%toil.fileStores.abstractFileStore.AbstractFileStore.writeGlobalFile()\fP\&.
.IP \(bu 2
\fBbasename\fP \-\- If supported by the backing JobStore, use the given
file basename so that when searching the job store with a query
matching that basename, the file will be detected.
.UNINDENT
.TP
.B Returns
A context manager yielding a tuple of
1) a file handle which can be written to and
2) the toil.fileStores.FileID of the resulting file in the job store.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B logAccess(fileStoreID:  \fI\%Union\fP[\fI\%FileID\fP,  \fI\%str\fP], destination:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%None\fP
Record that the given file was read by the job.
.sp
(to be announced if the job fails)
.sp
If destination is not None, it gives the path that the file
was downloaded to. Otherwise, assumes that the file was streamed.
.sp
Must be called by \fI\%readGlobalFile()\fP and \fI\%readGlobalFileStream()\fP
implementations.
.UNINDENT
.INDENT 7.0
.TP
.B abstract  readGlobalFile(fileStoreID:  \fI\%str\fP, userPath:  \fI\%Optional\fP[\fI\%str\fP]  =  None, cache:  \fI\%bool\fP  =  True, mutable:  \fI\%bool\fP  =  False, symlink:  \fI\%bool\fP  =  False) -> \fI\%str\fP
Make the file associated with fileStoreID available locally.
.sp
If mutable is True, then a copy of the file will be created locally so
that the original is not modified and does not change the file for other
jobs. If mutable is False, then a link can be created to the file, saving
disk resources. The file that is downloaded will be executable if and only
if it was originally uploaded from an executable file on the local filesystem.
.sp
If a user path is specified, it is used as the destination. If a user path isn\(aqt
specified, the file is stored in the local temp directory with an encoded name.
.sp
The destination file must not be deleted by the user; it can only be
deleted through deleteLocalFile.
.sp
Implementations must call \fI\%logAccess()\fP to report the download.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBfileStoreID\fP \-\- job store id for the file
.IP \(bu 2
\fBuserPath\fP \-\- a path to the name of file to which the global file will
be copied or hard\-linked (see below).
.IP \(bu 2
\fBcache\fP \-\- Described in
\fBtoil.fileStores.CachingFileStore.readGlobalFile()\fP
.IP \(bu 2
\fBmutable\fP \-\- Described in
\fBtoil.fileStores.CachingFileStore.readGlobalFile()\fP
.UNINDENT
.TP
.B Returns
An absolute path to a local, temporary copy of the file keyed
by fileStoreID.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  readGlobalFileStream(fileStoreID:  \fI\%str\fP, encoding:  \fI\%Optional\fP[\fI\%str\fP]  =  None, errors:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%ContextManager\fP[\fI\%Union\fP[\fI\%IO\fP[\fI\%bytes\fP],  \fI\%IO\fP[\fI\%str\fP]]]
Read a stream from the job store; similar to readGlobalFile.
.sp
The yielded file handle does not need to and should not be closed explicitly.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBencoding\fP \-\- the name of the encoding used to decode the file. Encodings
are the same as for decode(). Defaults to None which represents binary mode.
.IP \(bu 2
\fBerrors\fP \-\- an optional string that specifies how encoding errors are
to be handled. Errors are the same as for open(). Defaults to \(aqstrict\(aq
when an encoding is specified.
.UNINDENT
.UNINDENT
.sp
Implementations must call \fI\%logAccess()\fP to report the download.
.INDENT 7.0
.TP
.B Returns
a context manager yielding a file handle which can be read from.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B getGlobalFileSize(fileStoreID:  \fI\%Union\fP[\fI\%FileID\fP,  \fI\%str\fP]) -> \fI\%int\fP
Get the size of the file pointed to by the given ID, in bytes.
.sp
If a FileID or something else with a non\-None \(aqsize\(aq field, gets that.
.sp
Otherwise, asks the job store to poll the file\(aqs size.
.sp
Note that the job store may overestimate the file\(aqs size, for example
if it is encrypted and had to be augmented with an IV or other
encryption framing.
.INDENT 7.0
.TP
.B Parameters
\fBfileStoreID\fP \-\- File ID for the file
.TP
.B Returns
File\(aqs size in bytes, as stored in the job store
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  deleteLocalFile(fileStoreID:  \fI\%Union\fP[\fI\%FileID\fP,  \fI\%str\fP]) -> \fI\%None\fP
Delete local copies of files associated with the provided job store ID.
.sp
Raises an OSError with an errno of errno.ENOENT if no such local copies
exist. Thus, cannot be called multiple times in succession.
.sp
The files deleted are all those previously read from this file ID via
readGlobalFile by the current job into the job\(aqs file\-store\-provided
temp directory, plus the file that was written to create the given file
ID, if it was written by the current job from the job\(aqs
file\-store\-provided temp directory.
.INDENT 7.0
.TP
.B Parameters
\fBfileStoreID\fP \-\- File Store ID of the file to be deleted.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  deleteGlobalFile(fileStoreID:  \fI\%Union\fP[\fI\%FileID\fP,  \fI\%str\fP]) -> \fI\%None\fP
Delete local files and then permanently deletes them from the job store.
.sp
To ensure that the job can be restarted if necessary, the delete will not
happen until after the job\(aqs run method has completed.
.INDENT 7.0
.TP
.B Parameters
\fBfileStoreID\fP \-\- the File Store ID of the file to be deleted.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B logToMaster(text:  \fI\%str\fP, level:  \fI\%int\fP  =  20) -> \fI\%None\fP
Send a logging message to the leader. The message will also be         logged by the worker at the same level.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBtext\fP \-\- The string to log.
.IP \(bu 2
\fBlevel\fP \-\- The logging level.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  startCommit(jobState:  \fI\%bool\fP  =  False) -> \fI\%None\fP
Update the status of the job on the disk.
.sp
May start an asynchronous process. Call waitForCommit() to wait on that process.
.INDENT 7.0
.TP
.B Parameters
\fBjobState\fP \-\- If True, commit the state of the FileStore\(aqs job,
and file deletes. Otherwise, commit only file creates/updates.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  waitForCommit() -> \fI\%bool\fP
Blocks while startCommit is running.
.sp
This function is called by this job\(aqs successor to ensure that it does
not begin modifying the job store until after this job has finished doing so.
.sp
Might be called when startCommit is never called on a particular
instance, in which case it does not block.
.INDENT 7.0
.TP
.B Returns
Always returns True
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  classmethod  shutdown(shutdown_info:  \fI\%Any\fP) -> \fI\%None\fP
Shutdown the filestore on this node.
.sp
This is intended to be called on batch system shutdown.
.INDENT 7.0
.TP
.B Parameters
\fBshutdown_info\fP \-\- The implementation\-specific shutdown information,
for shutting down the file store and removing all its state and all job
local temp directories from the node.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B class  toil.fileStores.FileID(fileStoreID:  \fI\%str\fP, *args:  \fI\%Any\fP)
A small wrapper around Python\(aqs builtin string class.
.sp
It is used to represent a file\(aqs ID in the file store, and has a size attribute
that is the file\(aqs size in bytes. This object is returned by importFile and
writeGlobalFile.
.sp
Calls into the file store can use bare strings; size will be queried from
the job store if unavailable in the ID.
.INDENT 7.0
.TP
.B __init__(fileStoreID:  \fI\%str\fP, size:  \fI\%int\fP, executable:  \fI\%bool\fP  =  False) -> \fI\%None\fP
.UNINDENT
.INDENT 7.0
.TP
.B pack() -> \fI\%str\fP
Pack the FileID into a string so it can be passed through external code.
.UNINDENT
.INDENT 7.0
.TP
.B classmethod  unpack(packedFileStoreID:  \fI\%str\fP) -> \fI\%FileID\fP
Unpack the result of pack() into a FileID object.
.UNINDENT
.UNINDENT
.SH BATCH SYSTEM API
.sp
The batch system interface is used by Toil to abstract over different ways of running
batches of jobs, for example Slurm, GridEngine, Mesos, Parasol and a single node. The
\fI\%toil.batchSystems.abstractBatchSystem.AbstractBatchSystem\fP API is implemented to
run jobs using a given job management system, e.g. Mesos.
.SS Batch System Enivronmental Variables
.sp
Environmental variables allow passing of scheduler specific parameters.
.sp
For SLURM there are two environment variables \- the first applies to all jobs,
while the second defined the partition to use for parallel jobs:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
export TOIL_SLURM_ARGS=\(dq\-t 1:00:00 \-q fatq\(dq
export TOIL_SLURM_PE=\(aqmulticore\(aq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
For TORQUE there are two environment variables \- one for everything but the resource
requirements, and another \- for resources requirements (without the \fI\-l\fP prefix):
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
export TOIL_TORQUE_ARGS=\(dq\-q fatq\(dq
export TOIL_TORQUE_REQS=\(dqwalltime=1:00:00\(dq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
For GridEngine (SGE, UGE), there is an additional environmental variable to define the
\fI\%parallel environment\fP
for running multicore jobs:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
export TOIL_GRIDENGINE_PE=\(aqsmp\(aq
export TOIL_GRIDENGINE_ARGS=\(aq\-q batch.q\(aq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
For HTCondor, additional parameters can be included in the submit file passed to condor_submit:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
export TOIL_HTCONDOR_PARAMS=\(aqrequirements = TARGET.has_sse4_2 == true; accounting_group = test\(aq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The environment variable is parsed as a semicolon\-separated string of \fBparameter = value\fP pairs.
.SS Batch System API
.INDENT 0.0
.TP
.B class  toil.batchSystems.abstractBatchSystem.AbstractBatchSystem
An abstract base class to represent the interface the batch system must provide to Toil.
.INDENT 7.0
.TP
.B abstract  classmethod  supportsAutoDeployment() -> \fI\%bool\fP
Whether this batch system supports auto\-deployment of the user script itself.
.sp
If it does, the \fI\%setUserScript()\fP can be invoked to set the resource
object representing the user script.
.sp
Note to implementors: If your implementation returns True here, it should also override
.UNINDENT
.INDENT 7.0
.TP
.B abstract  classmethod  supportsWorkerCleanup() -> \fI\%bool\fP
Indicates whether this batch system invokes
\fBBatchSystemSupport.workerCleanup()\fP after the last job for a
particular workflow invocation finishes. Note that the term \fIworker\fP
refers to an entire node, not just a worker process. A worker process
may run more than one job sequentially, and more than one concurrent
worker process may exist on a worker node, for the same workflow. The
batch system is said to \fIshut down\fP after the last worker process
terminates.
.UNINDENT
.INDENT 7.0
.TP
.B setUserScript(userScript:  Resource) -> \fI\%None\fP
Set the user script for this workflow. This method must be called before the first job is
issued to this batch system, and only if \fI\%supportsAutoDeployment()\fP returns True,
otherwise it will raise an exception.
.INDENT 7.0
.TP
.B Parameters
\fBuserScript\fP \-\- the resource object representing the user script
or module and the modules it depends on.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B set_message_bus(message_bus:  MessageBus) -> \fI\%None\fP
Give the batch system an opportunity to connect directly to the message
bus, so that it can send informational messages about the jobs it is
running to other Toil components.
.UNINDENT
.INDENT 7.0
.TP
.B abstract  issueBatchJob(jobDesc:  \fI\%JobDescription\fP, job_environment:  \fI\%Optional\fP[\fI\%Dict\fP[\fI\%str\fP,  \fI\%str\fP]]  =  None) -> \fI\%int\fP
Issues a job with the specified command to the batch system and returns
a unique jobID.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBjobDesc\fP \-\- a toil.job.JobDescription
.IP \(bu 2
\fBjob_environment\fP \-\- a collection of job\-specific environment
variables to be set on the worker.
.UNINDENT
.TP
.B Returns
a unique jobID that can be used to reference the newly issued
job
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  killBatchJobs(jobIDs:  \fI\%List\fP[\fI\%int\fP]) -> \fI\%None\fP
Kills the given job IDs. After returning, the killed jobs will not
appear in the results of getRunningBatchJobIDs. The killed job will not
be returned from getUpdatedBatchJob.
.INDENT 7.0
.TP
.B Parameters
\fBjobIDs\fP \-\- list of IDs of jobs to kill
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  getIssuedBatchJobIDs() -> \fI\%List\fP[\fI\%int\fP]
Gets all currently issued jobs
.INDENT 7.0
.TP
.B Returns
A list of jobs (as jobIDs) currently issued (may be running, or may be
waiting to be run). Despite the result being a list, the ordering should not
be depended upon.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  getRunningBatchJobIDs() -> \fI\%Dict\fP[\fI\%int\fP,  \fI\%float\fP]
Gets a map of jobs as jobIDs that are currently running (not just waiting)
and how long they have been running, in seconds.
.INDENT 7.0
.TP
.B Returns
dictionary with currently running jobID keys and how many seconds they have
been running as the value
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  getUpdatedBatchJob(maxWait:  \fI\%int\fP) -> \fI\%Optional\fP[UpdatedBatchJobInfo]
Returns information about job that has updated its status (i.e. ceased
running, either successfully or with an error). Each such job will be
returned exactly once.
.sp
Does not return info for jobs killed by killBatchJobs, although they
may cause None to be returned earlier than maxWait.
.INDENT 7.0
.TP
.B Parameters
\fBmaxWait\fP \-\- the number of seconds to block, waiting for a result
.TP
.B Returns
If a result is available, returns UpdatedBatchJobInfo.
Otherwise it returns None. wallTime is the number of seconds (a strictly
positive float) in wall\-clock time the job ran for, or None if this
batch system does not support tracking wall time.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B getSchedulingStatusMessage() -> \fI\%Optional\fP[\fI\%str\fP]
Get a log message fragment for the user about anything that might be
going wrong in the batch system, if available.
.sp
If no useful message is available, return None.
.sp
This can be used to report what resource is the limiting factor when
scheduling jobs, for example. If the leader thinks the workflow is
stuck, the message can be displayed to the user to help them diagnose
why it might be stuck.
.INDENT 7.0
.TP
.B Returns
User\-directed message about scheduling state.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  shutdown() -> \fI\%None\fP
Called at the completion of a toil invocation.
Should cleanly terminate all worker threads.
.UNINDENT
.INDENT 7.0
.TP
.B setEnv(name:  \fI\%str\fP, value:  \fI\%Optional\fP[\fI\%str\fP]  =  None) -> \fI\%None\fP
Set an environment variable for the worker process before it is launched. The worker
process will typically inherit the environment of the machine it is running on but this
method makes it possible to override specific variables in that inherited environment
before the worker is launched. Note that this mechanism is different to the one used by
the worker internally to set up the environment of a job. A call to this method affects
all jobs issued after this method returns. Note to implementors: This means that you
would typically need to copy the variables before enqueuing a job.
.sp
If no value is provided it will be looked up from the current environment.
.UNINDENT
.INDENT 7.0
.TP
.B classmethod  add_options(parser:  \fI\%Union\fP[\fI\%ArgumentParser\fP,  _ArgumentGroup]) -> \fI\%None\fP
If this batch system provides any command line options, add them to the given parser.
.UNINDENT
.INDENT 7.0
.TP
.B classmethod  setOptions(setOption:  OptionSetter) -> \fI\%None\fP
Process command line or configuration options relevant to this batch system.
.INDENT 7.0
.TP
.B Parameters
\fBsetOption\fP \-\- A function with signature
setOption(option_name, parsing_function=None, check_function=None, default=None, env=None)
returning nothing, used to update run configuration as a side effect.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B getWorkerContexts() -> \fI\%List\fP[\fI\%ContextManager\fP[\fI\%Any\fP]]
Get a list of picklable context manager objects to wrap worker work in,
in order.
.sp
Can be used to ask the Toil worker to do things in\-process (such as
configuring environment variables, hot\-deploying user scripts, or
cleaning up a node) that would otherwise require a wrapping \(dqexecutor\(dq
process.
.UNINDENT
.UNINDENT
.SH JOB.SERVICE API
.sp
The Service class allows databases and servers to be spawned within a Toil workflow.
.INDENT 0.0
.TP
.B class  Job.Service(memory=None, cores=None, disk=None, accelerators=None, preemptible=None, unitName=None)
Abstract class used to define the interface to a service.
.sp
Should be subclassed by the user to define services.
.sp
Is not executed as a job; runs within a ServiceHostJob.
.INDENT 7.0
.TP
.B __init__(memory=None, cores=None, disk=None, accelerators=None, preemptible=None, unitName=None)
Memory, core and disk requirements are specified identically to as in             \fI\%toil.job.Job.__init__()\fP\&.
.UNINDENT
.INDENT 7.0
.TP
.B abstract  start(job)
Start the service.
.INDENT 7.0
.TP
.B Parameters
\fBjob\fP (\fI\%toil.job.Job\fP) \-\- The underlying host job that the service is being run in.
Can be used to register deferred functions, or to access
the fileStore for creating temporary files.
.TP
.B Returns
An object describing how to access the service. The object must be pickleable
and will be used by jobs to access the service (see \fI\%toil.job.Job.addService()\fP).
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B abstract  stop(job)
Stops the service. Function can block until complete.
.INDENT 7.0
.TP
.B Parameters
\fBjob\fP (\fI\%toil.job.Job\fP) \-\- The underlying host job that the service is being run in.
Can be used to register deferred functions, or to access
the fileStore for creating temporary files.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B check()
Checks the service is still running.
.INDENT 7.0
.TP
.B Raises
\fBexceptions.RuntimeError\fP \-\- If the service failed, this will cause the service job to be labeled failed.
.TP
.B Returns
True if the service is still running, else False. If False then the service job will be terminated,
and considered a success. Important point: if the service job exits due to a failure, it should raise a
RuntimeError, not return False!
.UNINDENT
.UNINDENT
.UNINDENT
.SH EXCEPTIONS API
.sp
Toil specific exceptions.
.INDENT 0.0
.TP
.B exception  toil.job.JobException(message:  \fI\%str\fP)
General job exception.
.INDENT 7.0
.TP
.B __init__(message:  \fI\%str\fP) -> \fI\%None\fP
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B exception  toil.job.JobGraphDeadlockException(string)
An exception raised in the event that a workflow contains an unresolvable     dependency, such as a cycle. See \fI\%toil.job.Job.checkJobGraphForDeadlocks()\fP\&.
.INDENT 7.0
.TP
.B __init__(string)
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B exception  toil.jobStores.abstractJobStore.ConcurrentFileModificationException(jobStoreFileID:  \fI\%FileID\fP)
Indicates that the file was attempted to be modified by multiple processes at once.
.INDENT 7.0
.TP
.B __init__(jobStoreFileID:  \fI\%FileID\fP)
.INDENT 7.0
.TP
.B Parameters
\fBjobStoreFileID\fP \-\- the ID of the file that was modified by multiple workers
or processes concurrently
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B exception  toil.jobStores.abstractJobStore.JobStoreExistsException(locator:  \fI\%str\fP)
Indicates that the specified job store already exists.
.INDENT 7.0
.TP
.B __init__(locator:  \fI\%str\fP)
.INDENT 7.0
.TP
.B Parameters
\fBlocator\fP (\fI\%str\fP) \-\- The location of the job store
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B exception  toil.jobStores.abstractJobStore.NoSuchFileException(jobStoreFileID:  \fI\%FileID\fP, customName:  \fI\%Optional\fP[\fI\%str\fP]  =  None, *extra:  \fI\%Any\fP)
Indicates that the specified file does not exist.
.INDENT 7.0
.TP
.B __init__(jobStoreFileID:  \fI\%FileID\fP, customName:  \fI\%Optional\fP[\fI\%str\fP]  =  None, *extra:  \fI\%Any\fP)
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBjobStoreFileID\fP \-\- the ID of the file that was mistakenly assumed to exist
.IP \(bu 2
\fBcustomName\fP \-\- optionally, an alternate name for the nonexistent file
.IP \(bu 2
\fBextra\fP (\fI\%list\fP) \-\- optional extra information to add to the error message
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B exception  toil.jobStores.abstractJobStore.NoSuchJobException(jobStoreID:  \fI\%FileID\fP)
Indicates that the specified job does not exist.
.INDENT 7.0
.TP
.B __init__(jobStoreID:  \fI\%FileID\fP)
.INDENT 7.0
.TP
.B Parameters
\fBjobStoreID\fP (\fI\%str\fP) \-\- the jobStoreID that was mistakenly assumed to exist
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B exception  toil.jobStores.abstractJobStore.NoSuchJobStoreException(locator:  \fI\%str\fP)
Indicates that the specified job store does not exist.
.INDENT 7.0
.TP
.B __init__(locator:  \fI\%str\fP)
.INDENT 7.0
.TP
.B Parameters
\fBlocator\fP (\fI\%str\fP) \-\- The location of the job store
.UNINDENT
.UNINDENT
.UNINDENT
.SH RUNNING TESTS
.sp
Test make targets, invoked as \fB$ make <target>\fP, subject to which
environment variables are set (see \fI\%Running Integration Tests\fP).
.TS
center;
|l|l|.
_
T{
TARGET
T}	T{
DESCRIPTION
T}
_
T{
test
T}	T{
Invokes all tests.
T}
_
T{
integration_test
T}	T{
Invokes only the integration tests.
T}
_
T{
test_offline
T}	T{
Skips building the Docker appliance and only
invokes tests that have no docker dependencies.
T}
_
T{
integration_test_local
T}	T{
Makes integration tests easier to debug locally
by running the integration tests serially and
doesn\(aqt redirect output. This makes it appears on
the terminal as expected.
T}
_
.TE
.sp
Before running tests for the first time, initialize your virtual environment
following the steps in \fI\%Building from Source\fP\&.
.sp
Run all tests (including slow tests):
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ make test
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Run only quick tests (as of Jul 25, 2018, this was ~ 20 minutes):
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ export TOIL_TEST_QUICK=True; make test
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Run an individual test with:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ make test tests=src/toil/test/sort/sortTest.py::SortTest::testSort
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The default value for \fBtests\fP is \fB\(dqsrc\(dq\fP which includes all tests in the
\fBsrc/\fP subdirectory of the project root. Tests that require a particular
feature will be skipped implicitly. If you want to explicitly skip tests that
depend on a currently installed \fIfeature\fP, use
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ make test tests=\(dq\-m \(aqnot aws\(aq src\(dq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This will run only the tests that don\(aqt depend on the \fBaws\fP extra, even if
that extra is currently installed. Note the distinction between the terms
\fIfeature\fP and \fIextra\fP\&. Every extra is a feature but there are features that are
not extras, such as the \fBgridengine\fP and \fBparasol\fP features.  To skip tests
involving both the \fBparasol\fP feature and the \fBaws\fP extra, use the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ make test tests=\(dq\-m \(aqnot aws and not parasol\(aq src\(dq
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Running Tests with pytest
.sp
Often it is simpler to use pytest directly, instead of calling the \fBmake\fP wrapper.
This usually works as expected, but some tests need some manual preparation. To run a specific test with pytest,
use the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
python3 \-m pytest src/toil/test/sort/sortTest.py::SortTest::testSort
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
For more information, see the \fI\%pytest documentation\fP\&.
.SS Running Integration Tests
.sp
These tests are generally only run using in our CI workflow due to their resource requirements and cost. However, they
can be made available for local testing:
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
Running tests that make use of Docker (e.g. autoscaling tests and Docker tests) require an appliance image to be
hosted. First, make sure you have gone through the set up found in \fI\%Using Docker with Quay\fP\&.
Then to build and host the appliance image run the \fBmake\fP target \fBpush_docker\fP\&.
.INDENT 2.0
.INDENT 3.5
.sp
.nf
.ft C
$ make push_docker
.ft P
.fi
.UNINDENT
.UNINDENT
.IP \(bu 2
Running integration tests require activation via an environment variable as well as exporting information relevant to
the desired tests. Enable the integration tests:
.INDENT 2.0
.INDENT 3.5
.sp
.nf
.ft C
$ export TOIL_TEST_INTEGRATIVE=True
.ft P
.fi
.UNINDENT
.UNINDENT
.IP \(bu 2
Finally, set the environment variables for keyname and desired zone:
.INDENT 2.0
.INDENT 3.5
.sp
.nf
.ft C
$ export TOIL_X_KEYNAME=[Your Keyname]
$ export TOIL_X_ZONE=[Desired Zone]
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Where \fBX\fP is one of our currently supported cloud providers (\fBGCE\fP, \fBAWS\fP).
.IP \(bu 2
See the above sections for guidance on running tests.
.UNINDENT
.UNINDENT
.UNINDENT
.SS Test Environment Variables
.TS
center;
|l|l|.
_
T{
TOIL_TEST_TEMP
T}	T{
An absolute path to a directory where Toil tests
will write their temporary files. Defaults to the
system\(aqs \fI\%standard temporary directory\fP\&.
T}
_
T{
TOIL_TEST_INTEGRATIVE
T}	T{
If \fBTrue\fP, this allows the integration tests to
run. Only valid when running the tests from the
source directory via \fBmake test\fP or
\fBmake test_parallel\fP\&.
T}
_
T{
TOIL_AWS_KEYNAME
T}	T{
An AWS keyname (see \fI\%Preparing your AWS environment\fP), which
is required to run the AWS tests.
T}
_
T{
TOIL_GOOGLE_PROJECTID
T}	T{
A Google Cloud account projectID
(see \fI\%Running in Google Compute Engine (GCE)\fP), which is required to
to run the Google Cloud tests.
T}
_
T{
TOIL_TEST_QUICK
T}	T{
If \fBTrue\fP, long running tests are skipped.
T}
_
.TE
.INDENT 0.0
.INDENT 3.5
.IP "Partial install and failing tests"
.sp
Some tests may fail with an ImportError if the required extras are not installed.
Install Toil with all of the extras
do prevent such errors.
.UNINDENT
.UNINDENT
.SS Using Docker with Quay
.sp
\fI\%Docker\fP is needed for some of the tests. Follow the appropriate
installation instructions for your system on their website to get started.
.sp
When running \fBmake test\fP you might still get the following error:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ make test
Please set TOIL_DOCKER_REGISTRY, e.g. to quay.io/USER.
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To solve, make an account with \fI\%Quay\fP and specify it like so:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ TOIL_DOCKER_REGISTRY=quay.io/USER make test
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
where \fBUSER\fP is your Quay username.
.sp
For convenience you may want to add this variable to your bashrc by running
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ echo \(aqexport TOIL_DOCKER_REGISTRY=quay.io/USER\(aq >> $HOME/.bashrc
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Running Mesos Tests
.sp
If you\(aqre running Toil\(aqs Mesos tests, be sure to create the virtualenv with
\fB\-\-system\-site\-packages\fP to include the Mesos Python bindings. Verify this by
activating the virtualenv and running \fBpip list | grep mesos\fP\&. On macOS,
this may come up empty. To fix it, run the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
for i in /usr/local/lib/python2.7/site\-packages/*mesos*; do ln \-snf $i venv/lib/python2.7/site\-packages/; done
.ft P
.fi
.UNINDENT
.UNINDENT
.SH DEVELOPING WITH DOCKER
.sp
To develop on features reliant on the Toil Appliance (the docker image toil uses for AWS autoscaling), you
should consider setting up a personal registry on \fI\%Quay\fP or \fI\%Docker Hub\fP\&. Because
the Toil Appliance images are tagged with the Git commit they are based on and
because only commits on our master branch trigger an appliance build on Quay,
as soon as a developer makes a commit or dirties the working copy they will no
longer be able to rely on Toil to automatically detect the proper Toil Appliance
image. Instead, developers wishing to test any appliance changes in autoscaling
should build and push their own appliance image to a personal Docker registry.
This is described in the next section.
.SS Making Your Own Toil Docker Image
.sp
\fBNote!\fP  Toil checks if the docker image specified by TOIL_APPLIANCE_SELF
exists prior to launching by using the docker v2 schema.  This should be
valid for any major docker repository, but there is an option to override
this if desired using the option: \fI\-\e\-forceDockerAppliance\fP\&.
.sp
Here is a general workflow (similar instructions apply when using Docker Hub):
.INDENT 0.0
.IP 1. 3
Make some changes to the provisioner of your local version of Toil
.IP 2. 3
Go to the location where you installed the Toil source code and run
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ make docker
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
to automatically build a docker image that can now be uploaded to
your personal \fI\%Quay\fP account. If you have not installed Toil source
code yet see \fI\%Building from Source\fP\&.
.IP 3. 3
If it\(aqs not already you will need Docker installed and need
to \fI\%log into Quay\fP\&. Also you will want to make sure that your Quay
account is public.
.IP 4. 3
Set the environment variable \fBTOIL_DOCKER_REGISTRY\fP to your Quay
account. If you find yourself doing this often you may want to add
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
export TOIL_DOCKER_REGISTRY=quay.io/<MY_QUAY_USERNAME>
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
to your \fB\&.bashrc\fP or equivalent.
.IP 5. 3
Now you can run
.INDENT 3.0
.INDENT 3.5
.sp
.nf
.ft C
$ make push_docker
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
which will upload the docker image to your Quay account. Take note of
the image\(aqs tag for the next step.
.IP 6. 3
Finally you will need to tell Toil from where to pull the Appliance
image you\(aqve created (it uses the Toil release you have installed by
default). To do this set the environment variable
\fBTOIL_APPLIANCE_SELF\fP to the url of your image. For more info see
\fI\%Environment Variables\fP\&.
.IP 7. 3
Now you can launch your cluster! For more information see
\fI\%Running a Workflow with Autoscaling\fP\&.
.UNINDENT
.SS Running a Cluster Locally
.sp
The Toil Appliance container can also be useful as a test environment since it
can simulate a Toil cluster locally. An important caveat for this is autoscaling,
since autoscaling will only work on an EC2 instance and cannot (at this time) be
run on a local machine.
.sp
To spin up a local cluster, start by using the following Docker run command to launch
a Toil leader container:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
docker run \e
    \-\-entrypoint=mesos\-master \e
    \-\-net=host \e
    \-d \e
    \-\-name=leader \e
    \-\-volume=/home/jobStoreParentDir:/jobStoreParentDir \e
    quay.io/ucsc_cgl/toil:3.6.0 \e
    \-\-registry=in_memory \e
    \-\-ip=127.0.0.1 \e
    \-\-port=5050 \e
    \-\-allocation_interval=500ms
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
A couple notes on this command: the \fB\-d\fP flag tells Docker to run in daemon mode so
the container will run in the background. To verify that the container is running you
can run \fBdocker ps\fP to see all containers. If you want to run your own container
rather than the official UCSC container you can simply replace the
\fBquay.io/ucsc_cgl/toil:3.6.0\fP parameter with your own container name.
.sp
Also note that we are not mounting the job store directory itself, but rather the location
where the job store will be written. Due to complications with running Docker on MacOS, I
recommend only mounting directories within your home directory. The next command will
launch the Toil worker container with similar parameters:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
docker run \e
    \-\-entrypoint=mesos\-slave \e
    \-\-net=host \e
    \-d \e
    \-\-name=worker \e
    \-\-volume=/home/jobStoreParentDir:/jobStoreParentDir \e
    quay.io/ucsc_cgl/toil:3.6.0 \e
    \-\-work_dir=/var/lib/mesos \e
    \-\-master=127.0.0.1:5050 \e
    \-\-ip=127.0.0.1 \e
    —\-attributes=preemptable:False \e
    \-\-resources=cpus:2
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note here that we are specifying 2 CPUs and a non\-preemptable worker. We can
easily change either or both of these in a logical way. To change the number
of cores we can change the 2 to whatever number you like, and to
change the worker to be preemptable we change \fBpreemptable:False\fP to
\fBpreemptable:True\fP\&. Also note that the same volume is mounted into the
worker. This is needed since both the leader and worker write and read
from the job store. Now that your cluster is running, you can run
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
docker exec \-it leader bash
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
to get a shell in your leader \(aqnode\(aq. You can also replace the \fBleader\fP parameter
with \fBworker\fP to get shell access in your worker.
.INDENT 0.0
.INDENT 3.5
.IP "Docker\-in\-Docker issues"
.sp
If you want to run Docker inside this Docker cluster (Dockerized tools, perhaps),
you should also mount in the Docker socket via \fB\-v /var/run/docker.sock:/var/run/docker.sock\fP\&.
This will give the Docker client inside the Toil Appliance access to the Docker engine
on the host. Client/engine version mismatches have been known to cause issues, so we
recommend using Docker version 1.12.3 on the host to be compatible with the Docker
client installed in the Appliance. Finally, be careful where you write files inside
the Toil Appliance \- \(aqchild\(aq Docker containers launched in the Appliance will actually
be siblings to the Appliance since the Docker engine is located on the host. This
means that the \(aqchild\(aq container can only mount in files from the Appliance if
the files are located in a directory that was originally mounted into the Appliance
from the host \- that way the files are accessible to the sibling container. Note:
if Docker can\(aqt find the file/directory on the host it will silently fail and mount
in an empty directory.
.UNINDENT
.UNINDENT
.SH MAINTAINER'S GUIDELINES
.sp
In general, as developers and maintainers of the code, we adhere to the following guidelines:
.INDENT 0.0
.IP \(bu 2
We strive to never break the build on master. All development should be done
on branches, in either the main Toil repository or in developers\(aq forks.
.IP \(bu 2
Pull requests should be used for any and all changes (except truly trivial
ones).
.IP \(bu 2
Pull requests should be in response to issues. If you find yourself making a
pull request without an issue, you should create the issue first.
.UNINDENT
.SS Naming Conventions
.INDENT 0.0
.IP \(bu 2
\fBCommit messages\fP \fIshould\fP be \fI\%great\fP\&. Most importantly, they \fImust\fP:
.INDENT 2.0
.IP \(bu 2
Have a short subject line. If in need of more space, drop down \fBtwo\fP lines
and write a body to explain what is changing and why it has to change.
.IP \(bu 2
Write the subject line as a command: \fIDestroy all humans\fP,
not \fIAll humans destroyed\fP\&.
.IP \(bu 2
Reference the issue being fixed in a Github\-parseable format, such as
\fI(resolves #1234)\fP at the end of the subject line, or \fIThis will fix #1234.\fP
somewhere in the body. If no single commit on its own fixes the issue, the
cross\-reference must appear in the pull request title or body instead.
.UNINDENT
.IP \(bu 2
\fBBranches\fP in the main Toil repository \fImust\fP start with \fBissues/\fP,
followed by the issue number (or numbers, separated by a dash), followed by a
short, lowercase, hyphenated description of the change. (There can be many open
pull requests with their associated branches at any given point in time and
this convention ensures that we can easily identify branches.)
.sp
Say there is an issue numbered #123 titled \fIFoo does not work\fP\&. The branch name
would be \fBissues/123\-fix\-foo\fP and the title of the commit would be
\fIFix foo in case of bar (resolves #123).\fP
.UNINDENT
.SS Pull Requests
.INDENT 0.0
.IP \(bu 2
All pull requests must be reviewed by a person other than the request\(aqs
author. Review the PR by following the \fI\%Reviewing Pull Requests\fP checklist.
.IP \(bu 2
Modified pull requests must be re\-reviewed before merging. \fBNote that Github
does not enforce this!\fP
.IP \(bu 2
Merge pull requests by following the \fI\%Merging Pull Requests\fP checklist.
.IP \(bu 2
When merging a pull request, make sure to update the \fI\%Draft Changelog\fP on
the Github wiki, which we will use to produce the changelog for the next
release. The PR template tells you to do this, so don\(aqt forget. New entries
should go at the bottom.
.IP \(bu 2
Pull requests will not be merged unless CI tests pass.
Gitlab tests are only run on code in the main Toil repository on some branch,
so it is the responsibility of the approving reviewer to make sure that pull
requests from outside repositories are copied to branches in the main
repository. This can be accomplished with (from a Toil clone):
.INDENT 2.0
.INDENT 3.5
.sp
.nf
.ft C
\&./contrib/admin/test\-pr theirusername their\-branch issues/123\-fix\-description\-here
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This must be repeated every time the PR submitter updates their PR, after
checking to see that the update is not malicious.
.sp
If there is no issue corresponding to the PR, after which the branch can be
named, the reviewer of the PR should first create the issue.
.sp
Developers who have push access to the main Toil repository are encouraged to
make their pull requests from within the repository, to avoid this step.
.IP \(bu 2
Prefer using \(dqSquash and marge\(dq when merging pull requests to master especially
when the PR contains a \(dqsingle unit\(dq of work (i.e. if one were to rewrite the
PR from scratch with all the fixes included, they would have one commit for
the entire PR). This makes the commit history on master more readable
and easier to debug in case of a breakage.
.sp
When squashing a PR from multiple authors, please add
\fI\%Co\-authored\-by\fP to give credit to all contributing authors.
.sp
See \fI\%Issue #2816\fP for more details.
.UNINDENT
.SS Publishing a Release
.sp
These are the steps to take to publish a Toil release:
.INDENT 0.0
.IP \(bu 2
Determine the release version \fBX.Y.Z\fP\&. This should follow
\fI\%semantic versioning\fP; if user\-workflow\-breaking changes are made, \fBX\fP
should be incremented, and \fBY\fP and \fBZ\fP should be zero. If non\-breaking
changes are made but new functionality is added, \fBX\fP should remain the same
as the last release, \fBY\fP should be incremented, and \fBZ\fP should be zero.
If only patches are released, \fBX\fP and \fBY\fP should be the same as the last
release and \fBZ\fP should be incremented.
.IP \(bu 2
If it does not exist already, create a release branch in the Toil repo
named \fBX.Y.x\fP, where \fBx\fP is a literal lower\-case \(dqx\(dq. For patch releases,
find the existing branch and make sure it is up to date with the patch
commits that are to be released. They may be \fI\%cherry\-picked over\fP from
master.
.IP \(bu 2
On the release branch, edit \fBversion_template.py\fP in the root of the
repository. Find the line that looks like this (slightly different for patch
releases):
.INDENT 2.0
.INDENT 3.5
.sp
.nf
.ft C
baseVersion = \(aqX.Y.0a1\(aq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Make it look like this instead:
.INDENT 2.0
.INDENT 3.5
.sp
.nf
.ft C
baseVersion = \(aqX.Y.Z\(aq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Commit your change to the branch.
.IP \(bu 2
Tag the current state of the release branch as \fBreleases/X.Y.Z\fP\&.
.IP \(bu 2
Make the Github release \fI\%here\fP, referencing that tag. For a non\-patch
release, fill in the description with the changelog from \fI\%the wiki page\fP,
which you should clear. For a patch release, just describe the patch.
.IP \(bu 2
For a non\-patch release, set up the main branch so that development
builds will declare themselves to be alpha versions of what the next release
will probably be. Edit  \fBversion_template.py\fP in the root of the repository
on the main branch to set \fBbaseVersion\fP like this:
.INDENT 2.0
.INDENT 3.5
.sp
.nf
.ft C
baseVersion = \(aqX.Y+1.0a1\(aq
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Make sure to replace \fBX\fP and \fBY+1\fP with actual numbers.
.UNINDENT
.SS Using Git Hooks
.sp
In the \fBcontrib/hooks\fP directory, there are two scripts, \fBmypy\-after\-commit.py\fP and
\fBmypy\-before\-push.py\fP, that can be set up as Git hooks to make sure you don\(aqt accidentally
push commits that would immediately fail type\-checking. These are supposed to eliminate the
need to run \fBmake mypy\fP constantly. You can install them into your Git working copy like
this
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
ln \-rs ./contrib/hooks/mypy\-after\-commit.py .git/hooks/post\-commit
ln \-rs ./contrib/hooks/mypy\-before\-push.py .git/hooks/pre\-push
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
After you make a commit, the post\-commit script will start type\-checking it, and if it takes
too long re\-launch the process in the background. When you push, the pre\-push script will see
if the commit you are pushing type\-checked successfully, and if it hasn\(aqt been type\-checked
but is currently checked out, it will be type\-checked. If type\-checking fails, the push will
be aborted.
.sp
Type\-checking will only be performed if you are in a Toil development virtual environment. If
you aren\(aqt, the scripts won\(aqt do anything.
.sp
To bypass or override pre\-push hook, if it is wrong or if you need to push something that
doesn\(aqt typecheck, you can \fBgit push \-\-no\-verify\fP\&. If the scripts get confused about whether
a commit actually typechecks, you can clear out the type\-checking result cache, which is in
\fB/var/run/user/<your UID>/.mypy_toil_result_cache\fP on Linux and in \fB\&.mypy_toil_result_cache\fP
in the Toil repo on Mac.
.sp
To uninstall the scripts, delete \fB\&.git/hooks/post\-commit\fP and \fB\&.git/hooks/pre\-push\fP\&.
.SS Adding Retries to a Function
.sp
See \fI\%toil.lib.retry\fP .
.sp
retry() can be used to decorate any function based on the list of errors one wishes to retry on.
.sp
This list of errors can contain normal Exception objects, and/or RetryCondition objects wrapping Exceptions to
include additional conditions.
.sp
For example, retrying on a one Exception (HTTPError):
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from requests import get
from requests.exceptions import HTTPError

@retry(errors=[HTTPError])
def update_my_wallpaper():
    return get(\(aqhttps://www.deviantart.com/\(aq)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Or:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from requests import get
from requests.exceptions import HTTPError

@retry(errors=[HTTPError, ValueError])
def update_my_wallpaper():
    return get(\(aqhttps://www.deviantart.com/\(aq)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The examples above will retry for the default interval on any errors specified the \(dqerrors=\(dq arg list.
.sp
To retry on specifically 500/502/503/504 errors, you could specify an ErrorCondition object instead, for example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from requests import get
from requests.exceptions import HTTPError

@retry(errors=[
    ErrorCondition(
               error=HTTPError,
               error_codes=[500, 502, 503, 504]
           )])
def update_my_wallpaper():
    return requests.get(\(aqhttps://www.deviantart.com/\(aq)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To retry on specifically errors containing the phrase \(dqNotFound\(dq:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from requests import get
from requests.exceptions import HTTPError

@retry(errors=[
    ErrorCondition(
        error=HTTPError,
        error_message_must_include=\(dqNotFound\(dq
    )])
def update_my_wallpaper():
    return requests.get(\(aqhttps://www.deviantart.com/\(aq)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To retry on all HTTPError errors EXCEPT an HTTPError containing the phrase \(dqNotFound\(dq:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from requests import get
from requests.exceptions import HTTPError

@retry(errors=[
    HTTPError,
    ErrorCondition(
               error=HTTPError,
               error_message_must_include=\(dqNotFound\(dq,
               retry_on_this_condition=False
           )])
def update_my_wallpaper():
    return requests.get(\(aqhttps://www.deviantart.com/\(aq)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
To retry on boto3\(aqs specific status errors, an example of the implementation is:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import boto3
from botocore.exceptions import ClientError

@retry(errors=[
    ErrorCondition(
               error=ClientError,
               boto_error_codes=[\(dqBucketNotFound\(dq]
           )])
def boto_bucket(bucket_name):
    boto_session = boto3.session.Session()
    s3_resource = boto_session.resource(\(aqs3\(aq)
    return s3_resource.Bucket(bucket_name)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Any combination of these will also work, provided the codes are matched to the correct exceptions.  A ValueError will
not return a 404, for example.
.sp
The retry function as a decorator should make retrying functions easier and clearer.  It also encourages
smaller independent functions, as opposed to lumping many different things that may need to be retried on
different conditions in the same function.
.sp
The ErrorCondition object tries to take some of the heavy lifting of writing specific retry conditions
and boil it down to an API that covers all common use\-cases without the user having to write
any new bespoke functions.
.sp
Use\-cases covered currently:
.INDENT 0.0
.IP 1. 3
Retrying on a normal error, like a KeyError.
.IP 2. 3
Retrying on HTTP error codes (use ErrorCondition).
.IP 3. 3
Retrying on boto\(aqs specific status errors, like \(dqBucketNotFound\(dq (use ErrorCondition).
.IP 4. 3
Retrying when an error message contains a certain phrase (use ErrorCondition).
.IP 5. 3
Explicitly NOT retrying on a condition (use ErrorCondition).
.UNINDENT
.sp
If new functionality is needed, it\(aqs currently best practice in Toil to add
functionality to the ErrorCondition itself rather than making a new custom retry method.
.SH PULL REQUEST CHECKLISTS
.sp
This document contains checklists for dealing with PRs. More general PR information is available at \fI\%Pull Requests\fP\&.
.SS Reviewing Pull Requests
.sp
This checklist is to be kept in sync with the checklist in the pull request template.
.sp
When reviewing a PR, do the following:
.INDENT 0.0
.IP \(bu 2
.INDENT 2.0
.TP
.B  Make sure it is coming from \fBissues/XXXX\-fix\-the\-thing\fP in the Toil repo, or from an external repo.
.INDENT 7.0
.IP \(bu 2
 If it is coming from an external repo, make sure to pull it in for CI with:
.INDENT 2.0
.INDENT 3.5
.sp
.nf
.ft C
contrib/admin/test\-pr otheruser theirbranchname issues/XXXX\-fix\-the\-thing
.ft P
.fi
.UNINDENT
.UNINDENT
.IP \(bu 2
 If there is no associated issue, \fI\%create one\fP\&.
.UNINDENT
.UNINDENT
.IP \(bu 2
.INDENT 2.0
.TP
.B  Read through the code changes. Make sure that it doesn\(aqt have:
.INDENT 7.0
.IP \(bu 2
 Addition of trailing whitespace.
.IP \(bu 2
 New variable or member names in \fBcamelCase\fP that want to be in \fBsnake_case\fP\&.
.IP \(bu 2
 New functions without \fI\%type hints\fP\&.
.IP \(bu 2
 New functions or classes without informative docstrings.
.IP \(bu 2
 Changes to semantics not reflected in the relevant docstrings.
.IP \(bu 2
 New or changed command line options for Toil workflows that are not reflected in \fBdocs/running/cliOptions.rst\fP
.IP \(bu 2
 New features without tests.
.UNINDENT
.UNINDENT
.IP \(bu 2
 Comment on the lines of code where problems exist with a review comment. You can shift\-click the line numbers in the diff to select multiple lines.
.IP \(bu 2
 Finish the review with an overall description of your opinion.
.UNINDENT
.SS Merging Pull Requests
.sp
This checklist is to be kept in sync with the checklist in the pull request template.
.sp
When merging a PR, do the following:
.INDENT 0.0
.IP \(bu 2
 Make sure the PR passes tests.
.IP \(bu 2
 Make sure the PR has been reviewed \fBsince its last modification\fP\&. If not, review it.
.IP \(bu 2
.INDENT 2.0
.TP
.B  Merge with the Github \(dqSquash and merge\(dq feature.
.INDENT 7.0
.IP \(bu 2
.INDENT 2.0
.TP
.B  If there are multiple authors\(aq commits, add \fI\%Co\-authored\-by\fP to give credit to all contributing authors.
.UNINDENT
.UNINDENT
.UNINDENT
.IP \(bu 2
 Copy its recommended changelog entry to the \fI\%Draft Changelog\fP\&.
.IP \(bu 2
 Append the issue number in parentheses to the changelog entry.
.UNINDENT
.SH TOIL ARCHITECTURE
.sp
The following diagram layouts out the software architecture of Toil.
.INDENT 0.0
.INDENT 2.5
[image: Toil's architecture is composed of the leader, the job store, the worker
processes, the batch system, the node provisioner, and the stats and
logging monitor.]
[image]
Figure 1: The basic components of Toil\(aqs architecture..UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B These components are described below:
.INDENT 7.0
.IP \(bu 2
.INDENT 2.0
.TP
.B the leader:
The leader is responsible for deciding which jobs should be run. To do this
it traverses the job graph. Currently this is a single threaded process,
but we make aggressive steps to prevent it becoming a bottleneck
(see \fI\%Read\-only Leader\fP described below).
.UNINDENT
.IP \(bu 2
.INDENT 2.0
.TP
.B the job\-store:
Handles all files shared between the components. Files in the job\-store
are the means by which the state of the workflow is maintained. Each job
is backed by a file in the job store, and atomic updates to this state
are used to ensure the workflow can always be resumed upon failure. The
job\-store can also store all user files, allowing them to be shared
between jobs. The job\-store is defined by the
\fI\%AbstractJobStore\fP class.
Multiple implementations of this class allow Toil to support different
back\-end file stores, e.g.: S3, network file systems, Google file store, etc.
.UNINDENT
.IP \(bu 2
.INDENT 2.0
.TP
.B workers:
The workers are temporary processes responsible for running jobs,
one at a time per worker. Each worker process is invoked with a job argument
that it is responsible for running. The worker monitors this job and reports
back success or failure to the leader by editing the job\(aqs state in the file\-store.
If the job defines successor jobs the worker may choose to immediately run them
(see \fI\%Job Chaining\fP below).
.UNINDENT
.IP \(bu 2
.INDENT 2.0
.TP
.B the batch\-system:
Responsible for scheduling the jobs given to it by the leader, creating
a worker command for each job. The batch\-system is defined by the
\fI\%AbstractBatchSystem\fP class.
Toil uses multiple existing batch systems to schedule jobs, including
Apache Mesos, GridEngine and a multi\-process single node implementation
that allows workflows to be run without any of these frameworks. Toil
can therefore fairly easily be made to run a workflow using an existing
cluster.
.UNINDENT
.IP \(bu 2
.INDENT 2.0
.TP
.B the node provisioner:
Creates worker nodes in which the batch system schedules workers.
It is defined by the \fBAbstractProvisioner\fP
class.
.UNINDENT
.IP \(bu 2
.INDENT 2.0
.TP
.B the statistics and logging monitor:
Monitors logging and statistics produced by the workers and reports them. Uses the
job\-store to gather this information.
.UNINDENT
.UNINDENT
.UNINDENT
.SS Jobs and JobDescriptions
.sp
As noted in \fI\%Job Basics\fP, a job is the atomic unit of work in a Toil workflow.
User scripts inherit from the \fI\%Job\fP class to define units of work.
These jobs are pickled and stored in the job\-store by the leader, and are retrieved
and un\-pickled by the worker when they are scheduled to run.
.sp
During scheduling, Toil does not work with the actual Job objects. Instead,
\fI\%JobDescription\fP objects are used to store all the information
that the Toil Leader ever needs to know about the Job. This includes requirements
information, dependency information, commands to issue, etc.
.sp
Internally, the JobDescription object is referenced by its jobStoreID, which is
often not human readable. However, the Job and JobDescription objects contain
several human\-readable names that are useful for logging and identification:
.TS
center;
|l|l|.
_
T{
jobName
T}	T{
Name of the kind of job this is. This may be used in job store IDs
and logging. Also used to let the cluster scaler learn a model for
how long the job will take. Defaults to the job class\(aqs name if no
real user\-defined name is available.
.sp
For a \fI\%FunctionWrappingJob\fP, the jobName is
replaced by the wrapped function\(aqs name.
.sp
For a CWL workflow, the jobName is the class name of the internal
job that is running the CWL workflow, such as \fB\(dqCWLJob\(dq\fP\&.
T}
_
T{
unitName
T}	T{
Name of this \fIinstance\fP of this kind of job. If set by the user,
it will appear with the jobName in logging.
.sp
For a CWL workflow, the unitName is set to a descriptive name that
includes the CWL file name and the ID in the file if set.
T}
_
T{
displayName
T}	T{
A human\-readable name to identify this particular job instance.
Used as an identifier of the job class in the stats report.
Defaults to the job class\(aqs name if no real user\-defined name is
available.
.sp
For a CWL workflow, the displayName is the absolute workflow URI.
T}
_
.TE
.SS Optimizations
.sp
Toil implements lots of optimizations designed for scalability.
Here we detail some of the key optimizations.
.SS Read\-only leader
.sp
The leader process is currently implemented as a single thread. Most of the leader\(aqs
tasks revolve around processing the state of jobs, each stored as a file within the job\-store.
To minimise the load on this thread, each worker does as much work as possible
to manage the state of the job it is running. As a result, with a couple of minor exceptions,
the leader process never needs to write or update the state of a job within the job\-store.
For example, when a job is complete and has no further successors the responsible
worker deletes the job from the job\-store, marking it complete. The leader then
only has to check for the existence of the file when it receives a signal from the batch\-system
to know that the job is complete. This off\-loading of state management is orthogonal to
future parallelization of the leader.
.SS Job chaining
.sp
The scheduling of successor jobs is partially managed by the worker, reducing the
number of individual jobs the leader needs to process. Currently this is very
simple: if the there is a single next successor job to run and its resources fit within the
resources of the current job and closely match the resources of the current job then
the job is run immediately on the worker without returning to the leader. Further extensions
of this strategy are possible, but for many workflows which define a series of serial successors
(e.g. map sequencing reads, post\-process mapped reads, etc.) this pattern is very effective
at reducing leader workload.
.SS Preemptable node support
.sp
Critical to running at large\-scale is dealing with intermittent node failures. Toil is
therefore designed to always be resumable providing the job\-store does not become corrupt.
This robustness allows Toil to run on preemptible nodes, which are only available when others are not
willing to pay more to use them. Designing workflows that divide into many short individual jobs
that can use preemptable nodes allows for workflows to be efficiently scheduled and executed.
.SS Caching
.sp
Running bioinformatic pipelines often require the passing of large datasets between jobs. Toil
caches the results from jobs such that child jobs running on the same node can directly use the same
file objects, thereby eliminating the need for an intermediary transfer to the job store. Caching
also reduces the burden on the local disks, because multiple jobs can share a single file.
The resulting drop in I/O allows pipelines to run faster, and, by the sharing of files,
allows users to run more jobs in parallel by reducing overall disk requirements.
.sp
To demonstrate the efficiency of caching, we ran an experimental internal pipeline on 3 samples from
the TCGA Lung Squamous Carcinoma (LUSC) dataset. The pipeline takes the tumor and normal exome
fastqs, and the tumor rna fastq and input, and predicts MHC presented neoepitopes in the patient
that are potential targets for T\-cell based immunotherapies. The pipeline was run individually on
the samples on c3.8xlarge machines on AWS (60GB RAM,600GB SSD storage, 32 cores). The pipeline
aligns the data to hg19\-based references, predicts MHC haplotypes using PHLAT, calls mutations using
2 callers (MuTect and RADIA) and annotates them using SnpEff, then predicts MHC:peptide binding
using the IEDB suite of tools before running an in\-house rank boosting algorithm on the final calls.
.sp
To optimize time taken, The pipeline is written such that mutations are called on a per\-chromosome
basis from the whole\-exome bams and are merged into a complete vcf. Running mutect in parallel on
whole exome bams requires each mutect job to download the complete Tumor and Normal Bams to their
working directories \-\- An operation that quickly fills the disk and limits the parallelizability of
jobs. The script was run in Toil, with and without caching, and Figure 2 shows that the workflow
finishes faster in the cached case while using less disk on average than the uncached run. We
believe that benefits of caching arising from file transfers will be much higher on magnetic
disk\-based storage systems as compared to the SSD systems we tested this on.
.INDENT 0.0
.INDENT 2.5
[image: Graph outlining the efficiency gain from caching.]
[image]
Figure 2: Efficiency gain from caching. The lower half of each plot describes the disk used by
the pipeline recorded every 10 minutes over the duration of the pipeline, and the upper half
shows the corresponding stage of the pipeline that is being processed. Since jobs requesting the
same file shared the same inode, the effective load on the disk is considerably lower than in
the uncached case where every job downloads a personal copy of every file it needs. We see that
in all cases, the uncached run uses almost 300\-400GB more that the cached run in the resource
heavy mutation calling step. We also see a benefit in terms of wall time for each stage since we
eliminate the time taken for file transfers..UNINDENT
.UNINDENT
.SS Toil support for Common Workflow Language
.sp
The CWL document and input document are loaded using the \(aqcwltool.load_tool\(aq
module.  This performs normalization and URI expansion (for example, relative
file references are turned into absolute file URIs), validates the document
against the CWL schema, initializes Python objects corresponding to major
document elements (command line tools, workflows, workflow steps), and performs
static type checking that sources and sinks have compatible types.
.sp
Input files referenced by the CWL document and input document are imported into
the Toil file store.  CWL documents may use any URI scheme supported by Toil
file store, including local files and object storage.
.sp
The \(aqlocation\(aq field of File references are updated to reflect the import token
returned by the Toil file store.
.sp
For directory inputs, the directory listing is stored in Directory object.
Each individual files is imported into Toil file store.
.sp
An initial workflow Job is created from the toplevel CWL document. Then,
control passes to the Toil engine which schedules the initial workflow job to
run.
.sp
When the toplevel workflow job runs, it traverses the CWL workflow and creates
a toil job for each step.  The dependency graph is expressed by making
downstream jobs children of upstream jobs, and initializing the child jobs with
an input object containing the promises of output from upstream jobs.
.sp
Because Toil jobs have a single output, but CWL permits steps to have multiple
output parameters that may feed into multiple other steps, the input to a
CWLJob is expressed with an \(dqindirect dictionary\(dq.  This is a dictionary of
input parameters, where each entry value is a tuple of a promise and a promise
key.  When the job runs, the indirect dictionary is turned into a concrete
input object by resolving each promise into its actual value (which is always a
dict), and then looking up the promise key to get the actual value for the the
input parameter.
.sp
If a workflow step specifies a scatter, then a scatter job is created and
connected into the workflow graph as described above.  When the scatter step
runs, it creates child jobs for each parameterizations of the scatter.  A
gather job is added as a follow\-on to gather the outputs into arrays.
.sp
When running a command line tool, it first creates output and temporary
directories under the Toil local temp dir.  It runs the command line tool using
the single_job_executor from CWLTool, providing a Toil\-specific constructor for
filesystem access, and overriding the default PathMapper to use ToilPathMapper.
.sp
The ToilPathMapper keeps track of a file\(aqs symbolic identifier (the Toil
FileID), its local path on the host (the value returned by readGlobalFile) and
the the location of the file inside the Docker container.
.sp
After executing single_job_executor from CWLTool, it gets back the output
object and status.  If the underlying job failed, raise an exception.  Files
from the output object are added to the file store using writeGlobalFile and
the \(aqlocation\(aq field of File references are updated to reflect the token
returned by the Toil file store.
.sp
When the workflow completes, it returns an indirect dictionary linking to the
outputs of the job steps that contribute to the final output.  This is the
value returned by toil.start() or toil.restart().  This is resolved to get the
final output object.  The files in this object are exported from the file store
to \(aqoutdir\(aq on the host file system, and the \(aqlocation\(aq field of File
references are updated to reflect the final exported location of the output
files.
.SH MINIMUM AWS IAM PERMISSIONS
.sp
Toil requires at least the following permissions in an IAM role to operate on a cluster.
These are added by default when launching a cluster. However, ensure that they are present
if creating a custom IAM role when \fI\%launching a cluster\fP
with the \fB\-\-awsEc2ProfileArn\fP parameter.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{
    \(dqVersion\(dq: \(dq2012\-10\-17\(dq,
    \(dqStatement\(dq: [
        {
            \(dqEffect\(dq: \(dqAllow\(dq,
            \(dqAction\(dq: [
                \(dqec2:*\(dq,
                \(dqs3:*\(dq,
                \(dqsdb:*\(dq,
                \(dqiam:PassRole\(dq
            ],
            \(dqResource\(dq: \(dq*\(dq
        }
    ]
}
.ft P
.fi
.UNINDENT
.UNINDENT
.SH AUTO-DEPLOYMENT
.sp
If you want to run your workflow in a distributed environment, on multiple worker machines, either in the cloud or on a
bare\-metal cluster, your script needs to be made available to those other machines. If your script imports other
modules, those modules also need to be made available on the workers. Toil can automatically do that for you, with a
little help on your part. We call this feature \fIauto\-deployment\fP of a workflow.
.sp
Let\(aqs first examine various scenarios of auto\-deploying a workflow, which, as we\(aqll see shortly cannot be
auto\-deployed. Lastly, we\(aqll deal with the issue of declaring \fI\%Toil as a dependency\fP of a
workflow that is packaged as a setuptools distribution.
.sp
Toil can be easily deployed to a remote host. First, assuming you\(aqve followed our \fI\%Preparing your AWS environment\fP section to install Toil
and use it to create a remote leader node on (in this example) AWS, you can now log into this into using
\fI\%Ssh\-Cluster Command\fP and once on the remote host, create and activate a virtualenv (noting to make sure to use the
\fB\-\-system\-site\-packages\fP option!):
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ virtualenv \-\-system\-site\-packages venv
$ . venv/bin/activate
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note the \fB\-\-system\-site\-packages\fP option, which ensures that globally\-installed packages are accessible inside the
virtualenv.  Do not (re)install Toil after this!  The \fB\-\-system\-site\-packages\fP option has already transferred Toil and
the dependencies from your local installation of Toil for you.
.sp
From here, you can install a project and its dependencies:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ tree
\&.
├── util
│\ \  ├── __init__.py
│\ \  └── sort
│\ \      ├── __init__.py
│\ \      └── quick.py
└── workflow
    ├── __init__.py
    └── main.py

3 directories, 5 files
$ pip install matplotlib
$ cp \-R workflow util venv/lib/python2.7/site\-packages
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Ideally, your project would have a \fBsetup.py\fP file (see \fI\%setuptools\fP) which streamlines the installation process:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ tree
\&.
├── util
│\ \  ├── __init__.py
│\ \  └── sort
│\ \      ├── __init__.py
│\ \      └── quick.py
├── workflow
│   ├── __init__.py
│   └── main.py
└── setup.py

3 directories, 6 files
$ pip install .
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Or, if your project has been published to PyPI:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ pip install my\-project
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
In each case, we have created a virtualenv with the \fB\-\-system\-site\-packages\fP flag in the \fBvenv\fP subdirectory then
installed the \fBmatplotlib\fP distribution from PyPI along with the two packages that our project consists of. (Again,
both Python and Toil are assumed to be present on the leader and all worker nodes.)
.sp
We can now run our workflow:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ python3 main.py \-\-batchSystem=mesos …
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBIMPORTANT:\fP
.INDENT 0.0
.INDENT 3.5
If workflow\(aqs external dependencies contain native code (i.e. are not pure
Python) then they must be manually installed on each worker.
.UNINDENT
.UNINDENT
.sp
\fBWARNING:\fP
.INDENT 0.0
.INDENT 3.5
Neither \fBpython3 setup.py develop\fP nor \fBpip install \-e .\fP can be used in
this process as, instead of copying the source files, they create \fB\&.egg\-link\fP
files that Toil can\(aqt auto\-deploy. Similarly, \fBpython3 setup.py install\fP
doesn\(aqt work either as it installs the project as a Python \fB\&.egg\fP which is
also not currently supported by Toil (though it \fI\%could be\fP in the future).
.sp
Also note that using the
\fB\-\-single\-version\-externally\-managed\fP flag with \fBsetup.py\fP will
prevent the installation of your package as an \fB\&.egg\fP\&. It will also disable
the automatic installation of your project\(aqs dependencies.
.UNINDENT
.UNINDENT
.SS Auto Deployment with Sibling Modules
.sp
This scenario applies if the user script imports modules that are its siblings:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ cd my_project
$ ls
userScript.py utilities.py
$ ./userScript.py \-\-batchSystem=mesos …
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Here \fBuserScript.py\fP imports additional functionality from \fButilities.py\fP\&.
Toil detects that \fBuserScript.py\fP has sibling modules and copies them to the
workers, alongside the user script. Note that sibling modules will be
auto\-deployed regardless of whether they are actually imported by the user
script–all .py files residing in the same directory as the user script will
automatically be auto\-deployed.
.sp
Sibling modules are a suitable method of organizing the source code of
reasonably complicated workflows.
.SS Auto\-Deploying a Package Hierarchy
.sp
Recall that in Python, a \fI\%package\fP is a directory containing one or more
\fB\&.py\fP files—one of which must be called \fB__init__.py\fP—and optionally other
packages. For more involved workflows that contain a significant amount of
code, this is the recommended way of organizing the source code. Because we use
a package hierarchy, we can\(aqt really refer to the user script as such, we call
it the user \fImodule\fP instead. It is merely one of the modules in the package
hierarchy. We need to inform Toil that we want to use a package hierarchy by
invoking Python\(aqs \fB\-m\fP option. That enables Toil to identify the entire set
of modules belonging to the workflow and copy all of them to each worker. Note
that while using the \fB\-m\fP option is optional in the scenarios above, it is
mandatory in this one.
.sp
The following shell session illustrates this:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ cd my_project
$ tree
\&.
├── utils
│\ \  ├── __init__.py
│\ \  └── sort
│\ \      ├── __init__.py
│\ \      └── quick.py
└── workflow
    ├── __init__.py
    └── main.py

3 directories, 5 files
$ python3 \-m workflow.main \-\-batchSystem=mesos …
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Here the user module \fBmain.py\fP does not reside in the current directory, but
is part of a package called \fButil\fP, in a subdirectory of the current
directory. Additional functionality is in a separate module called
\fButil.sort.quick\fP which corresponds to \fButil/sort/quick.py\fP\&. Because we
invoke the user module via \fBpython3 \-m workflow.main\fP, Toil can determine the
root directory of the hierarchy–\fBmy_project\fP in this case–and copy all Python
modules underneath it to each worker. The \fB\-m\fP option is documented \fI\%here\fP
.sp
When \fB\-m\fP is passed, Python adds the current working directory to
\fBsys.path\fP, the list of root directories to be considered when resolving a
module name like \fBworkflow.main\fP\&. Without that added convenience we\(aqd have to
run the workflow as \fBPYTHONPATH=\(dq$PWD\(dq python3 \-m workflow.main\fP\&. This also
means that Toil can detect the root directory of the user module\(aqs package
hierarchy even if it isn\(aqt the current working directory. In other words we
could do this:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ cd my_project
$ export PYTHONPATH=\(dq$PWD\(dq
$ cd /some/other/dir
$ python3 \-m workflow.main \-\-batchSystem=mesos …
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Also note that the root directory itself must not be package, i.e. must not
contain an \fB__init__.py\fP\&.
.SS Relying on Shared Filesystems
.sp
Bare\-metal clusters typically mount a shared file system like NFS on each node.
If every node has that file system mounted at the same path, you can place your
project on that shared filesystem and run your user script from there.
Additionally, you can clone the Toil source tree into a directory on that
shared file system and you won\(aqt even need to install Toil on every worker. Be
sure to add both your project directory and the Toil clone to \fBPYTHONPATH\fP\&. Toil
replicates \fBPYTHONPATH\fP from the leader to every worker.
.INDENT 0.0
.INDENT 3.5
.IP "Using a shared filesystem"
.sp
Toil currently only supports a \fBtempdir\fP set to a local, non\-shared directory.
.UNINDENT
.UNINDENT
.SS Toil Appliance
.sp
The term Toil Appliance refers to the Mesos Docker image that Toil uses to simulate the machines in the virtual mesos
cluster.  It\(aqs easily deployed, only needs Docker, and allows for workflows to be run in single\-machine mode and for
clusters of VMs to be provisioned.  To specify a different image, see the Toil \fI\%Environment Variables\fP section.  For more
information on the Toil Appliance, see the \fI\%Running in AWS\fP section.
.SH ENVIRONMENT VARIABLES
.sp
There are several environment variables that affect the way Toil runs.
.TS
center;
|l|l|.
_
T{
TOIL_CHECK_ENV
T}	T{
A flag that determines whether Toil will try to
refer back to a Python virtual environment in
which it is installed when composing commands that
may be run on other hosts. If set to \fBTrue\fP, if
Toil is installed in the current virtual
environment, it will use absolute paths to its own
executables (and the virtual environment must thus
be available on at the same path on all nodes).
Otherwise, Toil internal commands such as
\fB_toil_worker\fP will be resolved according to the
\fBPATH\fP on the node where they are executed. This
setting can be useful in a shared HPC environment,
where users may have their own Toil installations
in virtual environments.
T}
_
T{
TOIL_WORKDIR
T}	T{
An absolute path to a directory where Toil will
write its temporary files. This directory must
exist on each worker node and may be set to a
different value on each worker. The \fB\-\-workDir\fP
command line option overrides this. When using the
Toil docker container, such as on Kubernetes, this
defaults to \fB/var/lib/toil\fP\&. When using Toil
autoscaling with Mesos, this is somewhere inside
the Mesos sandbox. In all other cases, the
system\(aqs \fI\%standard temporary directory\fP is used.
T}
_
T{
TOIL_WORKDIR_OVERRIDE
T}	T{
An absolute path to a directory where Toil will
write its temporary files. This overrides
\fBTOIL_WORKDIR\fP and the  \fB\-\-workDir\fP command
line option.
T}
_
T{
TOIL_COORDINATION_DIR
T}	T{
An absolute path to a directory where Toil will
write its lock files. This directory must exist on
each worker node and may be set to a different
value on each worker. The \fB\-\-coordinationDir\fP
command line option overrides this.
T}
_
T{
TOIL_COORDINATION_DIR_OVERRIDE
T}	T{
An absolute path to a directory where Toil will
write its lock files. This overrides
\fBTOIL_COORDINATION_DIR\fP and the
\fB\-\-coordinationDir\fP command    line option.
T}
_
T{
TOIL_KUBERNETES_HOST_PATH
T}	T{
A path on Kubernetes hosts that will be mounted as
the Toil work directory in the workers, to allow
for shared caching. Will be created if it doesn\(aqt
already exist.
T}
_
T{
TOIL_KUBERNETES_OWNER
T}	T{
A name prefix for easy identification of
Kubernetes jobs. If not set, Toil will use the
current user name.
T}
_
T{
TOIL_KUBERNETES_SERVICE_ACCOUNT
T}	T{
A service account name to apply when creating
Kubernetes pods.
T}
_
T{
TOIL_KUBERNETES_POD_TIMEOUT
T}	T{
Seconds to wait for a scheduled Kubernetes pod to
start running.
T}
_
T{
KUBE_WATCH_ENABLED
T}	T{
A boolean variable that allows for users
to utilize kubernetes watch stream feature
instead of polling for running jobs. Default
value is set to False.
T}
_
T{
TOIL_TES_ENDPOINT
T}	T{
URL to the TES server to run against when using
the \fBtes\fP batch system.
T}
_
T{
TOIL_TES_USER
T}	T{
Username to use with HTTP Basic Authentication to
log into the TES server.
T}
_
T{
TOIL_TES_PASSWORD
T}	T{
Password to use with HTTP Basic Authentication to
log into the TES server.
T}
_
T{
TOIL_TES_BEARER_TOKEN
T}	T{
Token to use to authenticate to the TES server.
T}
_
T{
TOIL_APPLIANCE_SELF
T}	T{
The fully qualified reference for the Toil
Appliance you wish to use, in the form
\fBREPO/IMAGE:TAG\fP\&.
\fBquay.io/ucsc_cgl/toil:3.6.0\fP and
\fBcket/toil:3.5.0\fP are both examples of valid
options. Note that since Docker defaults to
Dockerhub repos, only quay.io repos need to
specify their registry.
T}
_
T{
TOIL_DOCKER_REGISTRY
T}	T{
The URL of the registry of the Toil Appliance
image you wish to use. Docker will use Dockerhub
by default, but the quay.io registry is also
very popular and easily specifiable by setting
this option to \fBquay.io\fP\&.
T}
_
T{
TOIL_DOCKER_NAME
T}	T{
The name of the Toil Appliance image you
wish to use. Generally this is simply \fBtoil\fP but
this option is provided to override this,
since the image can be built with arbitrary names.
T}
_
T{
TOIL_AWS_SECRET_NAME
T}	T{
For the Kubernetes batch system, the name of a
Kubernetes secret which contains a \fBcredentials\fP
file granting access to AWS resources. Will be
mounted as \fB~/.aws\fP inside Kubernetes\-managed
Toil containers. Enables the AWSJobStore to be
used with the Kubernetes batch system, if the
credentials allow access to S3 and SimpleDB.
T}
_
T{
TOIL_AWS_ZONE
T}	T{
Zone to use when using AWS. Also determines region.
Overrides TOIL_AWS_REGION.
T}
_
T{
TOIL_AWS_REGION
T}	T{
Region to use when using AWS.
T}
_
T{
TOIL_AWS_AMI
T}	T{
ID of the AMI to use in node provisioning. If in
doubt, don\(aqt set this variable.
T}
_
T{
TOIL_AWS_NODE_DEBUG
T}	T{
Determines whether to preserve nodes that have
failed health checks. If set to \fBTrue\fP, nodes
that fail EC2 health checks won\(aqt immediately be
terminated so they can be examined and the cause
of failure determined. If any EC2 nodes are left
behind in this manner, the security group will
also be left behind by necessity as it cannot be
deleted until all associated nodes have been
terminated.
T}
_
T{
TOIL_AWS_BATCH_QUEUE
T}	T{
Name or ARN of an AWS Batch Queue to use with the
AWS Batch batch system.
T}
_
T{
TOIL_AWS_BATCH_JOB_ROLE_ARN
T}	T{
ARN of an IAM role to run AWS Batch jobs as with
the AWS Batch batch system. If the jobs are not
run with an IAM role or on machines that have
access to S3 and SimpleDB, the AWS job store will
not be usable.
T}
_
T{
TOIL_GOOGLE_PROJECTID
T}	T{
The Google project ID to use when generating
Google job store names for tests or CWL workflows.
T}
_
T{
TOIL_SLURM_ARGS
T}	T{
Arguments for sbatch for the slurm batch system.
Do not pass CPU or memory specifications here.
Instead, define resource requirements for the job.
There is no default value for this variable.
If neither \fB\-\-export\fP nor \fB\-\-export\-file\fP is
in the argument list, \fB\-\-export=ALL\fP will be
provided.
T}
_
T{
TOIL_SLURM_PE
T}	T{
Name of the slurm partition to use for parallel
jobs.
There is no default value for this variable.
T}
_
T{
TOIL_GRIDENGINE_ARGS
T}	T{
Arguments for qsub for the gridengine batch
system. Do not pass CPU or memory specifications
here. Instead, define resource requirements for
the job. There is no default value for this
variable.
T}
_
T{
TOIL_GRIDENGINE_PE
T}	T{
Parallel environment arguments for qsub and for
the gridengine batch system. There is no default
value for this variable.
T}
_
T{
TOIL_TORQUE_ARGS
T}	T{
Arguments for qsub for the Torque batch system.
Do not pass CPU or memory specifications here.
Instead, define extra parameters for the job such
as queue. Example: \-q medium
Use TOIL_TORQUE_REQS to pass extra values for the
\-l resource requirements parameter.
There is no default value for this variable.
T}
_
T{
TOIL_TORQUE_REQS
T}	T{
Arguments for the resource requirements for Torque
batch system. Do not pass CPU or memory
specifications here. Instead, define extra resource
requirements as a string that goes after the \-l
argument to qsub. Example:
walltime=2:00:00,file=50gb
There is no default value for this variable.
T}
_
T{
TOIL_LSF_ARGS
T}	T{
Additional arguments for the LSF\(aqs bsub command.
Instead, define extra parameters for the job such
as queue. Example: \-q medium.
There is no default value for this variable.
T}
_
T{
TOIL_HTCONDOR_PARAMS
T}	T{
Additional parameters to include in the HTCondor
submit file passed to condor_submit. Do not pass
CPU or memory specifications here. Instead define
extra parameters which may be required by HTCondor.
This variable is parsed as a semicolon\-separated
string of \fBparameter = value\fP pairs. Example:
\fBrequirements = TARGET.has_sse4_2 == true;
accounting_group = test\fP\&.
There is no default value for this variable.
T}
_
T{
TOIL_CUSTOM_DOCKER_INIT_COMMAND
T}	T{
Any custom bash command to run in the Toil docker
container prior to running the Toil services.
Can be used for any custom initialization in the
worker and/or primary nodes such as private docker
docker authentication. Example for AWS ECR:
\fBpip install awscli && eval $(aws ecr get\-login
\-\-no\-include\-email \-\-region us\-east\-1)\fP\&.
T}
_
T{
TOIL_CUSTOM_INIT_COMMAND
T}	T{
Any custom bash command to run prior to starting
the Toil appliance. Can be used for any custom
initialization in the worker and/or primary nodes
such as private docker authentication for the Toil
appliance itself (i.e. from TOIL_APPLIANCE_SELF).
T}
_
T{
TOIL_S3_HOST
T}	T{
the IP address or hostname to use for connecting
to S3. Example: \fBTOIL_S3_HOST=127.0.0.1\fP
T}
_
T{
TOIL_S3_PORT
T}	T{
a port number to use for connecting to S3.
Example: \fBTOIL_S3_PORT=9001\fP
T}
_
T{
TOIL_S3_USE_SSL
T}	T{
enable or disable the usage of SSL for connecting
to S3 (\fBTrue\fP by default).
Example: \fBTOIL_S3_USE_SSL=False\fP
T}
_
T{
TOIL_WES_BROKER_URL
T}	T{
An optional broker URL to use to communicate
between the WES server and Celery task queue. If
unset, \fBamqp://guest:guest@localhost:5672//\fP is
used.
T}
_
T{
TOIL_WES_JOB_STORE_TYPE
T}	T{
Type of job store to use by default for workflows
run via the WES server. Can be \fBfile\fP, \fBaws\fP,
or \fBgoogle\fP\&.
T}
_
T{
TOIL_OWNER_TAG
T}	T{
This will tag cloud resources with a tag reading:
\(dqOwner: $TOIL_OWNER_TAG\(dq. This is used internally
at UCSC to stop a bot we have that terminates
untagged resources.
T}
_
T{
TOIL_AWS_PROFILE
T}	T{
The name of an AWS profile to run TOIL with.
T}
_
T{
TOIL_AWS_TAGS
T}	T{
This will tag cloud resources with any arbitrary
tags given in a JSON format. These are overwritten
in favor of CLI options when using launch cluster.
For information on valid AWS tags, see \fI\%AWS Tags\fP\&.
T}
_
T{
SINGULARITY_DOCKER_HUB_MIRROR
T}	T{
An http or https URL for the Singularity wrapper
in the Toil Docker container to use as a mirror
for Docker Hub.
T}
_
T{
OMP_NUM_THREADS
T}	T{
The number of cores set for OpenMP applications in
the workers. If not set, Toil will use the number
of job threads.
T}
_
T{
GUNICORN_CMD_ARGS
T}	T{
Specify additional Gunicorn configurations for the
Toil WES server. See \fI\%Gunicorn settings\fP\&.
T}
_
.TE
.INDENT 0.0
.IP \(bu 2
\fI\%Index\fP
.IP \(bu 2
\fI\%Search Page\fP
.UNINDENT
.SH AUTHOR
UCSC Computational Genomics Lab
.SH COPYRIGHT
2023 – 2023 UCSC Computational Genomics Lab
.\" Generated by docutils manpage writer.
.