'\" t
.\" Title: pegasus-s3
.\" Author: [see the "Author" section]
.\" Generator: DocBook XSL Stylesheets v1.79.1
.\" Date: 11/09/2018
.\" Manual: Pegasus Manual
.\" Source: Pegasus 4.4.0
.\" Language: English
.\"
.TH "PEGASUS\-S3" "1" "11/09/2018" "Pegasus 4\&.4\&.0" "Pegasus Manual"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
pegasus-s3 \- Upload, download, delete objects in Amazon S3
.SH "SYNOPSIS"
.sp
.nf
\fBpegasus\-s3\fR \fBhelp\fR
\fBpegasus\-s3\fR \fBls\fR [options] \fIURL\fR
\fBpegasus\-s3\fR \fBmkdir\fR [options] \fIURL\&...\fR
\fBpegasus\-s3\fR \fBrmdir\fR [options] URL\&...
\fBpegasus\-s3\fR \fBrm\fR [options] [\fIURL\&...\fR]
\fBpegasus\-s3\fR \fBput\fR [options] \fIFILE\fR \fIURL\fR
\fBpegasus\-s3\fR \fBget\fR [options] \fIURL\fR [\fIFILE\fR]
\fBpegasus\-s3\fR \fBlsup\fR [options] \fIURL\fR
\fBpegasus\-s3\fR \fBrmup\fR [options] \fIURL\fR [\fIUPLOAD\fR]
\fBpegasus\-s3\fR \fBcp\fR [options] \fISRC\&...\fR \fIDEST\fR
.fi
.SH "DESCRIPTION"
.sp
\fBpegasus\-s3\fR is a client for the Amazon S3 object storage service and any other storage services that conform to the Amazon S3 API, such as Eucalyptus Walrus\&.
.SH "OPTIONS"
.SS "Global Options"
.PP
\fB\-h\fR, \fB\-\-help\fR
.RS 4
Show help message for subcommand and exit
.RE
.PP
\fB\-d\fR, \fB\-\-debug\fR
.RS 4
Turn on debugging
.RE
.PP
\fB\-v\fR, \fB\-\-verbose\fR
.RS 4
Show progress messages
.RE
.PP
\fB\-C\fR \fIFILE\fR, \fB\-\-conf\fR=\fIFILE\fR
.RS 4
Path to configuration file
.RE
.SS "rm Options"
.PP
\fB\-f\fR, \fB\-\-force\fR
.RS 4
If the URL does not exist, then ignore the error\&.
.RE
.PP
\fB\-F\fR \fIFILE\fR, \fB\-\-file\fR=\fIFILE\fR
.RS 4
File containing a list of URLs to delete
.RE
.SS "put Options"
.PP
\fB\-c\fR \fIX\fR, \fB\-\-chunksize\fR=\fIX\fR
.RS 4
Set the chunk size for multipart uploads to X MB\&. A value of 0 disables multipart uploads\&. The default is 10MB, the min is 5MB and the max is 1024MB\&. This parameter only applies for sites that support multipart uploads (see multipart_uploads configuration parameter in the
\fBCONFIGURATION\fR
section)\&. The maximum number of chunks is 10,000, so if you are uploading a large file, then the chunk size is automatically increased to enable the upload\&. Choose smaller values to reduce the impact of transient failures\&.
.RE
.PP
\fB\-p\fR \fIN\fR, \fB\-\-parallel\fR=\fIN\fR
.RS 4
Use N threads to upload
\fIFILE\fR
in parallel\&. The default value is 4, which enables parallel uploads with 4 threads\&. This parameter is only valid if the site supports mulipart uploads and the
\fB\-\-chunksize\fR
parameter is not 0\&. Otherwise parallel uploads are disabled\&.
.RE
.PP
\fB\-b\fR, \fB\-\-create\-bucket\fR
.RS 4
Create the destination bucket if it does not already exist
.RE
.SS "get Options"
.PP
\fB\-c\fR \fIX\fR, \fB\-\-chunksize\fR=\fIX\fR
.RS 4
Set the chunk size for parallel downloads to X megabytes\&. A value of 0 will avoid chunked reads\&. This option only applies for sites that support ranged downloads (see ranged_downloads configuration parameter)\&. The default chunk size is 10MB, the min is 1MB and the max is 1024MB\&. Choose smaller values to reduce the impact of transient failures\&.
.RE
.PP
\fB\-p\fR \fIN\fR, \fB\-\-parallel\fR=\fIN\fR
.RS 4
Use N threads to upload FILE in parallel\&. The default value is 4, which enables parallel downloads with 4 threads\&. This parameter is only valid if the site supports ranged downloads and the
\fB\-\-chunksize\fR
parameter is not 0\&. Otherwise parallel downloads are disabled\&.
.RE
.SS "rmup Options"
.PP
\fB\-a\fR, \fB\-\-all\fR
.RS 4
Cancel all uploads for the specified bucket
.RE
.SS "cp Options"
.PP
\fB\-c\fR, \fB\-\-create\-dest\fR
.RS 4
Create the destination bucket if it does not exist\&.
.RE
.PP
\fB\-r\fR, \fB\-\-recursive\fR
.RS 4
If SRC is a bucket, copy all of the keys in that bucket to DEST\&. In that case DEST must be a bucket\&.
.RE
.PP
\fB\-f\fR, \fB\-\-force\fR
.RS 4
If DEST exists, then overwrite it\&.
.RE
.SH "SUBCOMMANDS"
.sp
\fBpegasus\-s3\fR has several subcommands for different storage service operations\&.
.PP
\fBhelp\fR
.RS 4
The help subcommand lists all available subcommands\&.
.RE
.PP
\fBls\fR
.RS 4
The
\fBls\fR
subcommand lists the contents of a URL\&. If the URL does not contain a bucket, then all the buckets owned by the user are listed\&. If the URL contains a bucket, but no key, then all the keys in the bucket are listed\&. If the URL contains a bucket and a key, then all keys in the bucket that begin with the specified key are listed\&.
.RE
.PP
\fBmkdir\fR
.RS 4
The
\fBmkdir\fR
subcommand creates one or more buckets\&.
.RE
.PP
\fBrmdir\fR
.RS 4
The
\fBrmdir\fR
subcommand deletes one or more buckets from the storage service\&. In order to delete a bucket, the bucket must be empty\&.
.RE
.PP
\fBrm\fR
.RS 4
The
\fBrm\fR
subcommand deletes one or more keys from the storage service\&.
.RE
.PP
\fBput\fR
.RS 4
The
\fBput\fR
subcommand stores the file specified by FILE in the storage service under the bucket and key specified by URL\&. If the URL contains a bucket, but not a key, then the file name is used as the key\&.
.sp
If a transient failure occurs, then the upload will be retried several times before
\fBpegasus\-s3\fR
gives up and fails\&.
.sp
The
\fBput\fR
subcommand can do both chunked and parallel uploads if the service supports multipart uploads (see
\fBmultipart_uploads\fR
in the
\fBCONFIGURATION\fR
section)\&. Currently only Amazon S3 supports multipart uploads\&.
.sp
This subcommand will check the size of the file to make sure it can be stored before attempting to store it\&.
.sp
Chunked uploads are useful to reduce the probability of an upload failing\&. If an upload is chunked, then
\fBpegasus\-s3\fR
issues separate PUT requests for each chunk of the file\&. Specifying smaller chunks (using
\fB\-\-chunksize\fR) will reduce the chances of an upload failing due to a transient error\&. Chunksizes can range from 5 MB to 1GB (chunk sizes smaller than 5 MB produced incomplete uploads on Amazon S3)\&. The maximum number of chunks for any single file is 10,000, so if a large file is being uploaded with a small chunksize, then the chunksize will be increased to fit within the 10,000 chunk limit\&. By default, the file will be split into 10 MB chunks if the storage service supports multipart uploads\&. Chunked uploads can be disabled by specifying a chunksize of 0\&. If the upload is chunked, then each chunk is retried independently under transient failures\&. If any chunk fails permanently, then the upload is aborted\&.
.sp
Parallel uploads can increase performance for services that support multipart uploads\&. In a parallel upload the file is split into N chunks and each chunk is uploaded concurrently by one of M threads in first\-come, first\-served fashion\&. If the chunksize is set to 0, then parallel uploads are disabled\&. If M > N, then the actual number of threads used will be reduced to N\&. The number of threads can be specified using the \-\-parallel argument\&. If \-\-parallel is 1, then only a single thread is used\&. The default value is 4\&. There is no maximum number of threads, but it is likely that the link will be saturated by 4 to 8 threads\&.
.sp
Under certain circumstances, when a multipart upload fails it could leave behind data on the server\&. When a failure occurs the
\fBput\fR
subcommand will attempt to abort the upload\&. If the upload cannot be aborted, then a partial upload may remain on the server\&. To check for partial uploads run the
\fBlsup\fR
subcommand\&. If you see an upload that failed in the output of
\fBlsup\fR, then run the
\fBrmup\fR
subcommand to remove it\&.
.RE
.PP
\fBget\fR
.RS 4
The
\fBget\fR
subcommand retrieves an object from the storage service identified by URL and stores it in the file specified by FILE\&. If FILE is not specified, then the key is used as the file name (Note: if the key has slashes, then the file name will be a relative subdirectory, but
\fBpegasus\-s3\fR
will not create the subdirectory if it does not exist)\&.
.sp
If a transient failure occurs, then the download will be retried several times before
\fBpegasus\-s3\fR
gives up and fails\&.
.sp
The
\fBget\fR
subcommand can do both chunked and parallel downloads if the service supports ranged downloads (see
\fBranged_downloads\fR
in the
\fBCONFIGURATION\fR
section)\&. Currently only Amazon S3 has good support for ranged downloads\&. Eucalyptus Walrus supports ranged downloads, but the current release, 1\&.6, is inconsistent with the Amazon interface and has a bug that causes ranged downloads to hang in some cases\&. It is recommended that ranged downloads not be used with Eucalyptus until these issues are resolved\&.
.sp
Chunked downloads can be used to reduce the probability of a download failing\&. When a download is chunked,
\fBpegasus\-s3\fR
issues separate GET requests for each chunk of the file\&. Specifying smaller chunks (using
\fB\-\-chunksize\fR) will reduce the chances that a download will fail to do a transient error\&. Chunk sizes can range from 1 MB to 1 GB\&. By default, a download will be split into 10 MB chunks if the site supports ranged downloads\&. Chunked downloads can be disabled by specifying a
\fB\-\-chunksize\fR
of 0\&. If a download is chunked, then each chunk is retried independently under transient failures\&. If any chunk fails permanently, then the download is aborted\&.
.sp
Parallel downloads can increase performance for services that support ranged downloads\&. In a parallel download, the file to be retrieved is split into N chunks and each chunk is downloaded concurrently by one of M threads in a first\-come, first\-served fashion\&. If the chunksize is 0, then parallel downloads are disabled\&. If M > N, then the actual number of threads used will be reduced to N\&. The number of threads can be specified using the \-\-parallel argument\&. If \-\-parallel is 1, then only a single thread is used\&. The default value is 4\&. There is no maximum number of threads, but it is likely that the link will be saturated by 4 to 8 threads\&.
.RE
.PP
\fBlsup\fR
.RS 4
The
\fBlsup\fR
subcommand lists active multipart uploads\&. The URL specified should point to a bucket\&. This command is only valid if the site supports multipart uploads\&. The output of this command is a list of keys and upload IDs\&.
.sp
This subcommand is used with
\fBrmup\fR
to help recover from failures of multipart uploads\&.
.RE
.PP
\fBrmup\fR
.RS 4
The
\fBrmup\fR
subcommand cancels and active upload\&. The URL specified should point to a bucket, and UPLOAD is the long, complicated upload ID shown by the
\fBlsup\fR
subcommand\&.
.sp
This subcommand is used with
\fBlsup\fR
to recover from failures of multipart uploads\&.
.RE
.PP
\fBcp\fR
.RS 4
The
\fBcp\fR
subcommand copies keys on the server\&. Keys cannot be copied between accounts\&.
.RE
.SH "URL FORMAT"
.sp
All URLs for objects stored in S3 should be specified in the following format:
.sp
.if n \{\
.RS 4
.\}
.nf
s3[s]://USER@SITE[/BUCKET[/KEY]]
.fi
.if n \{\
.RE
.\}
.sp
The protocol part can be \fIs3://\fR or \fIs3s://\fR\&. If \fIs3s://\fR is used, then \fBpegasus\-s3\fR will force the connection to use SSL and override the setting in the configuration file\&. If s3:// is used, then whether the connection uses SSL or not is determined by the value of the \fIendpoint\fR variable in the configuration for the site\&.
.sp
The \fIUSER@SITE\fR part is required, but the \fIBUCKET\fR and \fIKEY\fR parts may be optional depending on the context\&.
.sp
The \fIUSER@SITE\fR portion is referred to as the \(lqidentity\(rq, and the \fISITE\fR portion is referred to as the \(lqsite\(rq\&. Both the identity and the site are looked up in the configuration file (see \fBCONFIGURATION\fR) to determine the parameters to use when establishing a connection to the service\&. The site portion is used to find the host and port, whether to use SSL, and other things\&. The identity portion is used to determine which authentication tokens to use\&. This format is designed to enable users to easily use multiple services with multiple authentication tokens\&. Note that neither the \fIUSER\fR nor the \fISITE\fR portion of the URL have any meaning outside of \fBpegasus\-s3\fR\&. They do not refer to real usernames or hostnames, but are rather handles used to look up configuration values in the configuration file\&.
.sp
The BUCKET portion of the URL is the part between the 3rd and 4th slashes\&. Buckets are part of a global namespace that is shared with other users of the storage service\&. As such, they should be unique\&.
.sp
The KEY portion of the URL is anything after the 4th slash\&. Keys can include slashes, but S3\-like storage services do not have the concept of a directory like regular file systems\&. Instead, keys are treated like opaque identifiers for individual objects\&. So, for example, the keys \fIa/b\fR and \fIa/c\fR have a common prefix, but cannot be said to be in the same \fIdirectory\fR\&.
.sp
Some example URLs are:
.sp
.if n \{\
.RS 4
.\}
.nf
s3://ewa@amazon
s3://juve@skynet/gideon\&.isi\&.edu
s3://juve@magellan/pegasus\-images/centos\-5\&.5\-x86_64\-20101101\&.part\&.1
s3s://ewa@amazon/pegasus\-images/data\&.tar\&.gz
.fi
.if n \{\
.RE
.\}
.SH "CONFIGURATION"
.sp
Each user should specify a configuration file that \fBpegasus\-s3\fR will use to look up connection parameters and authentication tokens\&.
.SS "Search Path"
.sp
This client will look in the following locations, in order, to locate the user\(cqs configuration file:
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
The \-C/\-\-conf argument
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
The S3CFG environment variable
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 3." 4.2
.\}
$HOME/\&.pegasus/s3cfg
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 4." 4.2
.\}
$HOME/\&.s3cfg
.RE
.sp
If it does not find the configuration file in one of these locations it will fail with an error\&. The $HOME/\&.s3cfg location is only supported for backward\-compatibility\&. $HOME/\&.pegasus/s3cfg should be used instead\&.
.SS "Configuration File Format"
.sp
The configuration file is in INI format and contains two types of entries\&.
.sp
The first type of entry is a site entry, which specifies the configuration for a storage service\&. This entry specifies the service endpoint that \fBpegasus\-s3\fR should connect to for the site, and some optional features that the site may support\&. Here is an example of a site entry for Amazon S3:
.sp
.if n \{\
.RS 4
.\}
.nf
[amazon]
endpoint = http://s3\&.amazonaws\&.com/
.fi
.if n \{\
.RE
.\}
.sp
The other type of entry is an identity entry, which specifies the authentication information for a user at a particular site\&. Here is an example of an identity entry:
.sp
.if n \{\
.RS 4
.\}
.nf
[pegasus@amazon]
access_key = 90c4143642cb097c88fe2ec66ce4ad4e
secret_key = a0e3840e5baee6abb08be68e81674dca
.fi
.if n \{\
.RE
.\}
.sp
It is important to note that user names and site names used are only logical\(emthey do not correspond to actual hostnames or usernames, but are simply used as a convenient way to refer to the services and identities used by the client\&.
.sp
The configuration file should be saved with limited permissions\&. Only the owner of the file should be able to read from it and write to it (i\&.e\&. it should have permissions of 0600 or 0400)\&. If the file has more liberal permissions, then \fBpegasus\-s3\fR will fail with an error message\&. The purpose of this is to prevent the authentication tokens stored in the configuration file from being accessed by other users\&.
.SS "Configuration Variables"
.PP
\fBendpoint\fR (site)
.RS 4
The URL of the web service endpoint\&. If the URL begins with
\fIhttps\fR, then SSL will be used\&.
.RE
.PP
\fBmax_object_size\fR (site)
.RS 4
The maximum size of an object in GB (default: 5GB)
.RE
.PP
\fBmultipart_uploads\fR (site)
.RS 4
Does the service support multipart uploads (True/False, default: False)
.RE
.PP
\fBranged_downloads\fR (site)
.RS 4
Does the service support ranged downloads? (True/False, default: False)
.RE
.PP
\fBaccess_key\fR (identity)
.RS 4
The access key for the identity
.RE
.PP
\fBsecret_key\fR (identity)
.RS 4
The secret key for the identity
.RE
.SS "Example Configuration"
.sp
This is an example configuration that specifies a two sites (amazon and magellan) and three identities (pegasus@amazon,juve@magellan, and voeckler@magellan)\&. For the amazon site the maximum object size is 5TB, and the site supports both multipart uploads and ranged downloads, so both uploads and downloads can be done in parallel\&.
.sp
.if n \{\
.RS 4
.\}
.nf
[amazon]
endpoint = https://s3\&.amazonaws\&.com/
max_object_size = 5120
multipart_uploads = True
ranged_downloads = True
[pegasus@amazon]
access_key = 90c4143642cb097c88fe2ec66ce4ad4e
secret_key = a0e3840e5baee6abb08be68e81674dca
[magellan]
# NERSC Magellan is a Eucalyptus site\&. It doesn\*(Aqt support multipart uploads,
# or ranged downloads (the defaults), and the maximum object size is 5GB
# (also the default)
endpoint = https://128\&.55\&.69\&.235:8773/services/Walrus
[juve@magellan]
access_key = quwefahsdpfwlkewqjsdoijldsdf
secret_key = asdfa9wejalsdjfljasldjfasdfa
[voeckler@magellan]
# Each site can have multiple associated identities
access_key = asdkfaweasdfbaeiwhkjfbaqwhei
secret_key = asdhfuinakwjelfuhalsdflahsdl
.fi
.if n \{\
.RE
.\}
.SH "EXAMPLE"
.sp
List all buckets owned by identity \fIuser@amazon\fR:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus\-s3 ls s3://user@amazon
.fi
.if n \{\
.RE
.\}
.sp
List the contents of bucket \fIbar\fR for identity \fIuser@amazon\fR:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus\-s3 ls s3://user@amazon/bar
.fi
.if n \{\
.RE
.\}
.sp
List all objects in bucket \fIbar\fR that start with \fIhello\fR:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus\-s3 ls s3://user@amazon/bar/hello
.fi
.if n \{\
.RE
.\}
.sp
Create a bucket called \fImybucket\fR for identity \fIuser@amazon\fR:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus\-s3 mkdir s3://user@amazon/mybucket
.fi
.if n \{\
.RE
.\}
.sp
Delete a bucket called \fImybucket\fR:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus\-s3 rmdir s3://user@amazon/mybucket
.fi
.if n \{\
.RE
.\}
.sp
Upload a file \fIfoo\fR to bucket \fIbar\fR:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus\-s3 putfoo s3://user@amazon/bar/foo
.fi
.if n \{\
.RE
.\}
.sp
Download an object \fIfoo\fR in bucket \fIbar\fR:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus\-s3 get s3://user@amazon/bar/foo foo
.fi
.if n \{\
.RE
.\}
.sp
Upload a file in parallel with 4 threads and 100MB chunks:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus\-s3 put \-\-parallel 4 \-\-chunksize 100 foo s3://user@amazon/bar/foo
.fi
.if n \{\
.RE
.\}
.sp
Download an object in parallel with 4 threads and 100MB chunks:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus\-s3 get \-\-parallel 4 \-\-chunksize 100 s3://user@amazon/bar/foo foo
.fi
.if n \{\
.RE
.\}
.sp
List all partial uploads for bucket \fIbar\fR:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus\-s3 lsup s3://user@amazon/bar
.fi
.if n \{\
.RE
.\}
.sp
Remove all partial uploads for bucket \fIbar\fR:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus\-s3 rmup \-\-all s3://user@amazon/bar
.fi
.if n \{\
.RE
.\}
.SH "RETURN VALUE"
.sp
\fBpegasus\-s3\fR returns a zero exist status if the operation is successful\&. A non\-zero exit status is returned in case of failure\&.
.SH "AUTHOR"
.sp
Gideon Juve
.sp
Pegasus Team \m[blue]\fBhttp://pegasus\&.isi\&.edu\fR\m[]