'\" t .\" Title: pegasus-s3 .\" Author: [see the "Author" section] .\" Generator: DocBook XSL Stylesheets v1.79.1 .\" Date: 11/09/2018 .\" Manual: Pegasus Manual .\" Source: Pegasus 4.4.0 .\" Language: English .\" .TH "PEGASUS\-S3" "1" "11/09/2018" "Pegasus 4\&.4\&.0" "Pegasus Manual" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" pegasus-s3 \- Upload, download, delete objects in Amazon S3 .SH "SYNOPSIS" .sp .nf \fBpegasus\-s3\fR \fBhelp\fR \fBpegasus\-s3\fR \fBls\fR [options] \fIURL\fR \fBpegasus\-s3\fR \fBmkdir\fR [options] \fIURL\&...\fR \fBpegasus\-s3\fR \fBrmdir\fR [options] URL\&... \fBpegasus\-s3\fR \fBrm\fR [options] [\fIURL\&...\fR] \fBpegasus\-s3\fR \fBput\fR [options] \fIFILE\fR \fIURL\fR \fBpegasus\-s3\fR \fBget\fR [options] \fIURL\fR [\fIFILE\fR] \fBpegasus\-s3\fR \fBlsup\fR [options] \fIURL\fR \fBpegasus\-s3\fR \fBrmup\fR [options] \fIURL\fR [\fIUPLOAD\fR] \fBpegasus\-s3\fR \fBcp\fR [options] \fISRC\&...\fR \fIDEST\fR .fi .SH "DESCRIPTION" .sp \fBpegasus\-s3\fR is a client for the Amazon S3 object storage service and any other storage services that conform to the Amazon S3 API, such as Eucalyptus Walrus\&. .SH "OPTIONS" .SS "Global Options" .PP \fB\-h\fR, \fB\-\-help\fR .RS 4 Show help message for subcommand and exit .RE .PP \fB\-d\fR, \fB\-\-debug\fR .RS 4 Turn on debugging .RE .PP \fB\-v\fR, \fB\-\-verbose\fR .RS 4 Show progress messages .RE .PP \fB\-C\fR \fIFILE\fR, \fB\-\-conf\fR=\fIFILE\fR .RS 4 Path to configuration file .RE .SS "rm Options" .PP \fB\-f\fR, \fB\-\-force\fR .RS 4 If the URL does not exist, then ignore the error\&. .RE .PP \fB\-F\fR \fIFILE\fR, \fB\-\-file\fR=\fIFILE\fR .RS 4 File containing a list of URLs to delete .RE .SS "put Options" .PP \fB\-c\fR \fIX\fR, \fB\-\-chunksize\fR=\fIX\fR .RS 4 Set the chunk size for multipart uploads to X MB\&. A value of 0 disables multipart uploads\&. The default is 10MB, the min is 5MB and the max is 1024MB\&. This parameter only applies for sites that support multipart uploads (see multipart_uploads configuration parameter in the \fBCONFIGURATION\fR section)\&. The maximum number of chunks is 10,000, so if you are uploading a large file, then the chunk size is automatically increased to enable the upload\&. Choose smaller values to reduce the impact of transient failures\&. .RE .PP \fB\-p\fR \fIN\fR, \fB\-\-parallel\fR=\fIN\fR .RS 4 Use N threads to upload \fIFILE\fR in parallel\&. The default value is 4, which enables parallel uploads with 4 threads\&. This parameter is only valid if the site supports mulipart uploads and the \fB\-\-chunksize\fR parameter is not 0\&. Otherwise parallel uploads are disabled\&. .RE .PP \fB\-b\fR, \fB\-\-create\-bucket\fR .RS 4 Create the destination bucket if it does not already exist .RE .SS "get Options" .PP \fB\-c\fR \fIX\fR, \fB\-\-chunksize\fR=\fIX\fR .RS 4 Set the chunk size for parallel downloads to X megabytes\&. A value of 0 will avoid chunked reads\&. This option only applies for sites that support ranged downloads (see ranged_downloads configuration parameter)\&. The default chunk size is 10MB, the min is 1MB and the max is 1024MB\&. Choose smaller values to reduce the impact of transient failures\&. .RE .PP \fB\-p\fR \fIN\fR, \fB\-\-parallel\fR=\fIN\fR .RS 4 Use N threads to upload FILE in parallel\&. The default value is 4, which enables parallel downloads with 4 threads\&. This parameter is only valid if the site supports ranged downloads and the \fB\-\-chunksize\fR parameter is not 0\&. Otherwise parallel downloads are disabled\&. .RE .SS "rmup Options" .PP \fB\-a\fR, \fB\-\-all\fR .RS 4 Cancel all uploads for the specified bucket .RE .SS "cp Options" .PP \fB\-c\fR, \fB\-\-create\-dest\fR .RS 4 Create the destination bucket if it does not exist\&. .RE .PP \fB\-r\fR, \fB\-\-recursive\fR .RS 4 If SRC is a bucket, copy all of the keys in that bucket to DEST\&. In that case DEST must be a bucket\&. .RE .PP \fB\-f\fR, \fB\-\-force\fR .RS 4 If DEST exists, then overwrite it\&. .RE .SH "SUBCOMMANDS" .sp \fBpegasus\-s3\fR has several subcommands for different storage service operations\&. .PP \fBhelp\fR .RS 4 The help subcommand lists all available subcommands\&. .RE .PP \fBls\fR .RS 4 The \fBls\fR subcommand lists the contents of a URL\&. If the URL does not contain a bucket, then all the buckets owned by the user are listed\&. If the URL contains a bucket, but no key, then all the keys in the bucket are listed\&. If the URL contains a bucket and a key, then all keys in the bucket that begin with the specified key are listed\&. .RE .PP \fBmkdir\fR .RS 4 The \fBmkdir\fR subcommand creates one or more buckets\&. .RE .PP \fBrmdir\fR .RS 4 The \fBrmdir\fR subcommand deletes one or more buckets from the storage service\&. In order to delete a bucket, the bucket must be empty\&. .RE .PP \fBrm\fR .RS 4 The \fBrm\fR subcommand deletes one or more keys from the storage service\&. .RE .PP \fBput\fR .RS 4 The \fBput\fR subcommand stores the file specified by FILE in the storage service under the bucket and key specified by URL\&. If the URL contains a bucket, but not a key, then the file name is used as the key\&. .sp If a transient failure occurs, then the upload will be retried several times before \fBpegasus\-s3\fR gives up and fails\&. .sp The \fBput\fR subcommand can do both chunked and parallel uploads if the service supports multipart uploads (see \fBmultipart_uploads\fR in the \fBCONFIGURATION\fR section)\&. Currently only Amazon S3 supports multipart uploads\&. .sp This subcommand will check the size of the file to make sure it can be stored before attempting to store it\&. .sp Chunked uploads are useful to reduce the probability of an upload failing\&. If an upload is chunked, then \fBpegasus\-s3\fR issues separate PUT requests for each chunk of the file\&. Specifying smaller chunks (using \fB\-\-chunksize\fR) will reduce the chances of an upload failing due to a transient error\&. Chunksizes can range from 5 MB to 1GB (chunk sizes smaller than 5 MB produced incomplete uploads on Amazon S3)\&. The maximum number of chunks for any single file is 10,000, so if a large file is being uploaded with a small chunksize, then the chunksize will be increased to fit within the 10,000 chunk limit\&. By default, the file will be split into 10 MB chunks if the storage service supports multipart uploads\&. Chunked uploads can be disabled by specifying a chunksize of 0\&. If the upload is chunked, then each chunk is retried independently under transient failures\&. If any chunk fails permanently, then the upload is aborted\&. .sp Parallel uploads can increase performance for services that support multipart uploads\&. In a parallel upload the file is split into N chunks and each chunk is uploaded concurrently by one of M threads in first\-come, first\-served fashion\&. If the chunksize is set to 0, then parallel uploads are disabled\&. If M > N, then the actual number of threads used will be reduced to N\&. The number of threads can be specified using the \-\-parallel argument\&. If \-\-parallel is 1, then only a single thread is used\&. The default value is 4\&. There is no maximum number of threads, but it is likely that the link will be saturated by 4 to 8 threads\&. .sp Under certain circumstances, when a multipart upload fails it could leave behind data on the server\&. When a failure occurs the \fBput\fR subcommand will attempt to abort the upload\&. If the upload cannot be aborted, then a partial upload may remain on the server\&. To check for partial uploads run the \fBlsup\fR subcommand\&. If you see an upload that failed in the output of \fBlsup\fR, then run the \fBrmup\fR subcommand to remove it\&. .RE .PP \fBget\fR .RS 4 The \fBget\fR subcommand retrieves an object from the storage service identified by URL and stores it in the file specified by FILE\&. If FILE is not specified, then the key is used as the file name (Note: if the key has slashes, then the file name will be a relative subdirectory, but \fBpegasus\-s3\fR will not create the subdirectory if it does not exist)\&. .sp If a transient failure occurs, then the download will be retried several times before \fBpegasus\-s3\fR gives up and fails\&. .sp The \fBget\fR subcommand can do both chunked and parallel downloads if the service supports ranged downloads (see \fBranged_downloads\fR in the \fBCONFIGURATION\fR section)\&. Currently only Amazon S3 has good support for ranged downloads\&. Eucalyptus Walrus supports ranged downloads, but the current release, 1\&.6, is inconsistent with the Amazon interface and has a bug that causes ranged downloads to hang in some cases\&. It is recommended that ranged downloads not be used with Eucalyptus until these issues are resolved\&. .sp Chunked downloads can be used to reduce the probability of a download failing\&. When a download is chunked, \fBpegasus\-s3\fR issues separate GET requests for each chunk of the file\&. Specifying smaller chunks (using \fB\-\-chunksize\fR) will reduce the chances that a download will fail to do a transient error\&. Chunk sizes can range from 1 MB to 1 GB\&. By default, a download will be split into 10 MB chunks if the site supports ranged downloads\&. Chunked downloads can be disabled by specifying a \fB\-\-chunksize\fR of 0\&. If a download is chunked, then each chunk is retried independently under transient failures\&. If any chunk fails permanently, then the download is aborted\&. .sp Parallel downloads can increase performance for services that support ranged downloads\&. In a parallel download, the file to be retrieved is split into N chunks and each chunk is downloaded concurrently by one of M threads in a first\-come, first\-served fashion\&. If the chunksize is 0, then parallel downloads are disabled\&. If M > N, then the actual number of threads used will be reduced to N\&. The number of threads can be specified using the \-\-parallel argument\&. If \-\-parallel is 1, then only a single thread is used\&. The default value is 4\&. There is no maximum number of threads, but it is likely that the link will be saturated by 4 to 8 threads\&. .RE .PP \fBlsup\fR .RS 4 The \fBlsup\fR subcommand lists active multipart uploads\&. The URL specified should point to a bucket\&. This command is only valid if the site supports multipart uploads\&. The output of this command is a list of keys and upload IDs\&. .sp This subcommand is used with \fBrmup\fR to help recover from failures of multipart uploads\&. .RE .PP \fBrmup\fR .RS 4 The \fBrmup\fR subcommand cancels and active upload\&. The URL specified should point to a bucket, and UPLOAD is the long, complicated upload ID shown by the \fBlsup\fR subcommand\&. .sp This subcommand is used with \fBlsup\fR to recover from failures of multipart uploads\&. .RE .PP \fBcp\fR .RS 4 The \fBcp\fR subcommand copies keys on the server\&. Keys cannot be copied between accounts\&. .RE .SH "URL FORMAT" .sp All URLs for objects stored in S3 should be specified in the following format: .sp .if n \{\ .RS 4 .\} .nf s3[s]://USER@SITE[/BUCKET[/KEY]] .fi .if n \{\ .RE .\} .sp The protocol part can be \fIs3://\fR or \fIs3s://\fR\&. If \fIs3s://\fR is used, then \fBpegasus\-s3\fR will force the connection to use SSL and override the setting in the configuration file\&. If s3:// is used, then whether the connection uses SSL or not is determined by the value of the \fIendpoint\fR variable in the configuration for the site\&. .sp The \fIUSER@SITE\fR part is required, but the \fIBUCKET\fR and \fIKEY\fR parts may be optional depending on the context\&. .sp The \fIUSER@SITE\fR portion is referred to as the \(lqidentity\(rq, and the \fISITE\fR portion is referred to as the \(lqsite\(rq\&. Both the identity and the site are looked up in the configuration file (see \fBCONFIGURATION\fR) to determine the parameters to use when establishing a connection to the service\&. The site portion is used to find the host and port, whether to use SSL, and other things\&. The identity portion is used to determine which authentication tokens to use\&. This format is designed to enable users to easily use multiple services with multiple authentication tokens\&. Note that neither the \fIUSER\fR nor the \fISITE\fR portion of the URL have any meaning outside of \fBpegasus\-s3\fR\&. They do not refer to real usernames or hostnames, but are rather handles used to look up configuration values in the configuration file\&. .sp The BUCKET portion of the URL is the part between the 3rd and 4th slashes\&. Buckets are part of a global namespace that is shared with other users of the storage service\&. As such, they should be unique\&. .sp The KEY portion of the URL is anything after the 4th slash\&. Keys can include slashes, but S3\-like storage services do not have the concept of a directory like regular file systems\&. Instead, keys are treated like opaque identifiers for individual objects\&. So, for example, the keys \fIa/b\fR and \fIa/c\fR have a common prefix, but cannot be said to be in the same \fIdirectory\fR\&. .sp Some example URLs are: .sp .if n \{\ .RS 4 .\} .nf s3://ewa@amazon s3://juve@skynet/gideon\&.isi\&.edu s3://juve@magellan/pegasus\-images/centos\-5\&.5\-x86_64\-20101101\&.part\&.1 s3s://ewa@amazon/pegasus\-images/data\&.tar\&.gz .fi .if n \{\ .RE .\} .SH "CONFIGURATION" .sp Each user should specify a configuration file that \fBpegasus\-s3\fR will use to look up connection parameters and authentication tokens\&. .SS "Search Path" .sp This client will look in the following locations, in order, to locate the user\(cqs configuration file: .sp .RS 4 .ie n \{\ \h'-04' 1.\h'+01'\c .\} .el \{\ .sp -1 .IP " 1." 4.2 .\} The \-C/\-\-conf argument .RE .sp .RS 4 .ie n \{\ \h'-04' 2.\h'+01'\c .\} .el \{\ .sp -1 .IP " 2." 4.2 .\} The S3CFG environment variable .RE .sp .RS 4 .ie n \{\ \h'-04' 3.\h'+01'\c .\} .el \{\ .sp -1 .IP " 3." 4.2 .\} $HOME/\&.pegasus/s3cfg .RE .sp .RS 4 .ie n \{\ \h'-04' 4.\h'+01'\c .\} .el \{\ .sp -1 .IP " 4." 4.2 .\} $HOME/\&.s3cfg .RE .sp If it does not find the configuration file in one of these locations it will fail with an error\&. The $HOME/\&.s3cfg location is only supported for backward\-compatibility\&. $HOME/\&.pegasus/s3cfg should be used instead\&. .SS "Configuration File Format" .sp The configuration file is in INI format and contains two types of entries\&. .sp The first type of entry is a site entry, which specifies the configuration for a storage service\&. This entry specifies the service endpoint that \fBpegasus\-s3\fR should connect to for the site, and some optional features that the site may support\&. Here is an example of a site entry for Amazon S3: .sp .if n \{\ .RS 4 .\} .nf [amazon] endpoint = http://s3\&.amazonaws\&.com/ .fi .if n \{\ .RE .\} .sp The other type of entry is an identity entry, which specifies the authentication information for a user at a particular site\&. Here is an example of an identity entry: .sp .if n \{\ .RS 4 .\} .nf [pegasus@amazon] access_key = 90c4143642cb097c88fe2ec66ce4ad4e secret_key = a0e3840e5baee6abb08be68e81674dca .fi .if n \{\ .RE .\} .sp It is important to note that user names and site names used are only logical\(emthey do not correspond to actual hostnames or usernames, but are simply used as a convenient way to refer to the services and identities used by the client\&. .sp The configuration file should be saved with limited permissions\&. Only the owner of the file should be able to read from it and write to it (i\&.e\&. it should have permissions of 0600 or 0400)\&. If the file has more liberal permissions, then \fBpegasus\-s3\fR will fail with an error message\&. The purpose of this is to prevent the authentication tokens stored in the configuration file from being accessed by other users\&. .SS "Configuration Variables" .PP \fBendpoint\fR (site) .RS 4 The URL of the web service endpoint\&. If the URL begins with \fIhttps\fR, then SSL will be used\&. .RE .PP \fBmax_object_size\fR (site) .RS 4 The maximum size of an object in GB (default: 5GB) .RE .PP \fBmultipart_uploads\fR (site) .RS 4 Does the service support multipart uploads (True/False, default: False) .RE .PP \fBranged_downloads\fR (site) .RS 4 Does the service support ranged downloads? (True/False, default: False) .RE .PP \fBaccess_key\fR (identity) .RS 4 The access key for the identity .RE .PP \fBsecret_key\fR (identity) .RS 4 The secret key for the identity .RE .SS "Example Configuration" .sp This is an example configuration that specifies a two sites (amazon and magellan) and three identities (pegasus@amazon,juve@magellan, and voeckler@magellan)\&. For the amazon site the maximum object size is 5TB, and the site supports both multipart uploads and ranged downloads, so both uploads and downloads can be done in parallel\&. .sp .if n \{\ .RS 4 .\} .nf [amazon] endpoint = https://s3\&.amazonaws\&.com/ max_object_size = 5120 multipart_uploads = True ranged_downloads = True [pegasus@amazon] access_key = 90c4143642cb097c88fe2ec66ce4ad4e secret_key = a0e3840e5baee6abb08be68e81674dca [magellan] # NERSC Magellan is a Eucalyptus site\&. It doesn\*(Aqt support multipart uploads, # or ranged downloads (the defaults), and the maximum object size is 5GB # (also the default) endpoint = https://128\&.55\&.69\&.235:8773/services/Walrus [juve@magellan] access_key = quwefahsdpfwlkewqjsdoijldsdf secret_key = asdfa9wejalsdjfljasldjfasdfa [voeckler@magellan] # Each site can have multiple associated identities access_key = asdkfaweasdfbaeiwhkjfbaqwhei secret_key = asdhfuinakwjelfuhalsdflahsdl .fi .if n \{\ .RE .\} .SH "EXAMPLE" .sp List all buckets owned by identity \fIuser@amazon\fR: .sp .if n \{\ .RS 4 .\} .nf $ pegasus\-s3 ls s3://user@amazon .fi .if n \{\ .RE .\} .sp List the contents of bucket \fIbar\fR for identity \fIuser@amazon\fR: .sp .if n \{\ .RS 4 .\} .nf $ pegasus\-s3 ls s3://user@amazon/bar .fi .if n \{\ .RE .\} .sp List all objects in bucket \fIbar\fR that start with \fIhello\fR: .sp .if n \{\ .RS 4 .\} .nf $ pegasus\-s3 ls s3://user@amazon/bar/hello .fi .if n \{\ .RE .\} .sp Create a bucket called \fImybucket\fR for identity \fIuser@amazon\fR: .sp .if n \{\ .RS 4 .\} .nf $ pegasus\-s3 mkdir s3://user@amazon/mybucket .fi .if n \{\ .RE .\} .sp Delete a bucket called \fImybucket\fR: .sp .if n \{\ .RS 4 .\} .nf $ pegasus\-s3 rmdir s3://user@amazon/mybucket .fi .if n \{\ .RE .\} .sp Upload a file \fIfoo\fR to bucket \fIbar\fR: .sp .if n \{\ .RS 4 .\} .nf $ pegasus\-s3 putfoo s3://user@amazon/bar/foo .fi .if n \{\ .RE .\} .sp Download an object \fIfoo\fR in bucket \fIbar\fR: .sp .if n \{\ .RS 4 .\} .nf $ pegasus\-s3 get s3://user@amazon/bar/foo foo .fi .if n \{\ .RE .\} .sp Upload a file in parallel with 4 threads and 100MB chunks: .sp .if n \{\ .RS 4 .\} .nf $ pegasus\-s3 put \-\-parallel 4 \-\-chunksize 100 foo s3://user@amazon/bar/foo .fi .if n \{\ .RE .\} .sp Download an object in parallel with 4 threads and 100MB chunks: .sp .if n \{\ .RS 4 .\} .nf $ pegasus\-s3 get \-\-parallel 4 \-\-chunksize 100 s3://user@amazon/bar/foo foo .fi .if n \{\ .RE .\} .sp List all partial uploads for bucket \fIbar\fR: .sp .if n \{\ .RS 4 .\} .nf $ pegasus\-s3 lsup s3://user@amazon/bar .fi .if n \{\ .RE .\} .sp Remove all partial uploads for bucket \fIbar\fR: .sp .if n \{\ .RS 4 .\} .nf $ pegasus\-s3 rmup \-\-all s3://user@amazon/bar .fi .if n \{\ .RE .\} .SH "RETURN VALUE" .sp \fBpegasus\-s3\fR returns a zero exist status if the operation is successful\&. A non\-zero exit status is returned in case of failure\&. .SH "AUTHOR" .sp Gideon Juve .sp Pegasus Team \m[blue]\fBhttp://pegasus\&.isi\&.edu\fR\m[]