.TH "confuga" 1 "" "CCTools 7.1.2 FINAL" "Cooperative Computing Tools"

.SH NAME
.LP
\fBConfuga\fP - An active storage cluster file system.

.SH SYNOPSIS
.LP
\FC\fBchirp_server --jobs --root=<Confuga URI> [options]\fP\FT

.SH DESCRIPTION
.LP

.PP
Configures and starts a Chirp server to act as the head node for a Confuga
storage cluster.

.PP
For complete details with examples, see the Confuga User's Manual (\fBhttp://ccl.cse.nd.edu/software/manuals/confuga.html\fP).

.SH OPTIONS
.LP

.PP
A Chirp server acting as the Confuga head node uses normal
\fBchirp_server(1)\fP options. In order to run the Chirp server as the
Confuga head node, use the \fB--root\fP switch with the Confuga URI. You must
also enable job execution with the \fB--jobs\fP switch.

.PP
The format for the Confuga URI is:
\fBconfuga:///path/to/workspace?option1=value&option2=value\fP. The workspace
path is the location Confuga maintains metadata and databases for the head
node. Confuga specific options are also passed through the URI, documented
below.  Examples demonstrating how to start Confuga and a small cluster are at
the end of this manual.

.LP
.TP
.BI \ auth \ <method>
.
Enable this method for Head Node to Storage Node authentication. The default is to enable all available authentication mechanisms.
.TP
.BI \ concurrency \ <limit>
.
Limits the number of concurrent jobs executed by the cluster. The default is 0 for limitless.
.TP
.BI \ pull-threshold \ <bytes>
.
Sets the threshold for pull transfers. The default is 128MB.
.TP
.BI \ replication \ <type>
.
Sets the replication mode for satisfying job dependencies. \fBtype\fP may be \fBpush-sync\fP or \fBpush-async-N\fP. The default is \fBpush-async-1\fP.
.TP
.BI \ scheduler \ <type>
.
Sets the scheduler used to assign jobs to storage nodes. The default is \fBfifo-0\fP.
.TP
.BI \ tickets \ <tickets>
.
Sets tickets to use for authenticating with storage nodes. Paths must be absolute.


.SH STORAGE NODES
.LP

.PP
Confuga uses regular Chirp servers as storage nodes. Each storage node is
added to the cluster using the \fBconfuga_adm(1)\fP command.  All storage
node Chirp servers must be run with:


.IP \(bu 4
Ticket authentication enabled (\fB--auth=ticket\fP). Remember by default all authentication mechanisms are enabled.
.IP \(bu 4
Job execution enabled (\fB--jobs\fP).
.IP \(bu 4
Job concurrency of at least two (\fB--job-concurrency=2\fP).


.PP
These options are also suggested but not required:


.IP \(bu 4
More frequent Catalog updates (\fB--catalog-update=30s\fP).
.IP \(bu 4
Project name for the cluster (\fB--project-name=foo\fP).


.PP
You must also ensure that the storage nodes and the Confuga head node are using
the same \fBcatalog_server(1)\fP. By default, this should be the case. The
\fBEXAMPLES\fP section below includes an example cluster using a manually
hosted catalog server.

.SS ADDING STORAGE NODES
.LP

.PP
To add storage nodes to the Confuga cluster, use the \fBconfuga_adm(1)\fP
administrative tool.

.SH EXECUTING WORKFLOWS
.LP

.PP
The easiest way to execute workflows on Confuga is through \fBmakeflow(1)\fP.
Only two options to Makeflow are required, \fB--batch-type\fP and
\fB--working-dir\fP. Confuga uses the Chirp job protocol, so the batch type is
\fBchirp\fP. It is also necessary to define the executing server, the Confuga
Head Node, and the \fInamespace\fP the workflow executes in. For example:

.fam C
.nf
.nh
.IP "" 8
makeflow --batch-type=chirp --working-dir=chirp://confuga.example.com:9094/\fBpath/to/workflow\fP
.fi
.hy
.fam
.P

.PP
The workflow namespace is logically prepended to all file paths defined in the
Makeflow specification. So for example, if you have this Makeflow file:

.fam C
.nf
.nh
.IP "" 8
a: exe
    ./exe > a
.fi
.hy
.fam
.P

.PP
Confuga will execute \fB/path/to/workflow/exe\fP and produce the output file \fB/path/to/workflow/a\fP.

.PP
Unlike other batch systems used with Makeflow, like Condor or Work Queue,
\fIall files used by a workflow must be in the Confuga file system\fP. Condor
and Work Queue both stage workflow files from the submission site to the
execution sites. In Confuga, the entire workflow dataset, including
executables, is already resident.  So when executing a new workflow, you need
to upload the workflow dataset to Confuga. The easiest way to do this is using
the \fBchirp(1)\fP command line tool:

.fam C
.nf
.nh
.IP "" 8
chirp confuga.example.com put workflow/ /path/to/
.fi
.hy
.fam
.P

.PP
Finally, Confuga does not save the \fIstdout\fP or \fIstderr\fP of jobs.
If you want these files for debugging purposes, you must explicitly save them.
To streamline the process, you may use Makeflow's \fB--wrapper\fP options to
save \fIstdout\fP and \fIstderr\fP:

.fam C
.nf
.nh
.IP "" 8
makeflow --batch-type=chirp \\
         --working-dir=chirp://confuga.example.com/ \\
         --wrapper=$'{\\n{}\\n} > stdout.%% 2> stderr.%%' \\
         --wrapper-output='stdout.%%' \\
         --wrapper-output='stderr.%%'
.fi
.hy
.fam
.P

.SH EXAMPLES
.LP

.PP
Launch a head node with Confuga state stored in \fB./confuga.root\fP:

.fam C
.nf
.nh
.IP "" 8
chirp_server --jobs --root="confuga://$(pwd)/confuga.root/"
.fi
.hy
.fam
.P

.PP
Launch a head node with workspace \fB/tmp/confuga.root\fP using storage nodes \fBchirp://localhost:10001\fP and \fBchirp://localhost:10002/u/joe/confuga\fP:

.fam C
.nf
.nh
.IP "" 8
chirp_server --jobs --root='confuga:///tmp/confuga.root/'
confuga_adm confuga:///tmp/confuga.root/ sn-add address localhost:10001
confuga_adm confuga:///tmp/confuga.root/ sn-add -r /u/joe/confuga address localhost:10001
.fi
.hy
.fam
.P

.PP
Run a simple test cluster on your workstation:

.fam C
.nf
.nh
.IP "" 8
# start a catalog server in the background
catalog_server --history=catalog.history \\
               --update-log=catalog.update \\
               --interface=127.0.0.1 \\
               &
# sleep for a time so catalog can start
sleep 1
# start storage node 1 in the background
chirp_server --advertise=localhost \\
             --catalog-name=localhost \\
             --catalog-update=10s \\
             --interface=127.0.0.1 \\
             --jobs \\
             --job-concurrency=10 \\
             --root=./root.1 \\
             --port=9001 \\
             --project-name=test \\
             --transient=./tran.1 \\
             &
# start storage node 2 in the background
chirp_server --advertise=localhost \\
             --catalog-name=localhost \\
             --catalog-update=10s \\
             --interface=127.0.0.1 \\
             --jobs \\
             --job-concurrency=10 \\
             --root=./root.2 \\
             --port=9002 \\
             --project-name=test \\
             --transient=./tran.2 \\
             &
# sleep for a time so catalog can receive storage node status
sleep 5
confuga_adm confuga:///$(pwd)/confuga.root/ sn-add address localhost:9001
confuga_adm confuga:///$(pwd)/confuga.root/ sn-add address localhost:9002
# start the Confuga head node
chirp_server --advertise=localhost \\
             --catalog-name=localhost \\
             --catalog-update=30s \\
             --debug=confuga \\
             --jobs \\
             --root="confuga://$(pwd)/confuga.root/?auth=unix" \\
             --port=9000
.fi
.hy
.fam
.P

.SH COPYRIGHT
.LP
The Cooperative Computing Tools are Copyright (C) 2005-2019 The University of Notre Dame.  This software is distributed under the GNU General Public License.  See the file COPYING for details.

.SH SEE ALSO
.LP
\fBconfuga_adm(1)\fP 
.IP \(bu 4
\fBCooperative Computing Tools Documentation\fP
.IP \(bu 4
\fBChirp User Manual\fP
.IP \(bu 4
\fBchirp(1)\fP  \fBchirp_status(1)\fP  \fBchirp_fuse(1)\fP  \fBchirp_get(1)\fP  \fBchirp_put(1)\fP  \fBchirp_stream_files(1)\fP  \fBchirp_distribute(1)\fP  \fBchirp_benchmark(1)\fP  \fBchirp_server(1)\fP