table of contents
PEGASUS-S3(1) | PEGASUS-S3(1) |
NAME¶
pegasus-s3 - Upload, download, delete objects in Amazon S3SYNOPSIS¶
pegasus-s3 help pegasus-s3 ls [options] URL pegasus-s3 mkdir [options] URL... pegasus-s3 rmdir [options] URL... pegasus-s3 rm [options] [URL...] pegasus-s3 put [options] FILE URL pegasus-s3 get [options] URL [FILE] pegasus-s3 lsup [options] URL pegasus-s3 rmup [options] URL [UPLOAD]
DESCRIPTION¶
pegasus-s3 is a client for the Amazon S3 object storage service and any other storage services that conform to the Amazon S3 API, such as Eucalyptus Walrus.OPTIONS¶
Global Options¶
-h, --helpShow help message for subcommand and
exit
-d, --debug
Turn on debugging
-v, --verbose
Show progress messages
-C FILE, --conf=FILE
Path to configuration file
rm Options¶
-f, --forceIf the URL does not exist, then ignore the
error.
-F FILE, --file=FILE
File containing a list of URLs to delete
put Options¶
-c X, --chunksize=XSet the chunk size for multipart uploads to X
MB. A value of 0 disables multipart uploads. The default is 10MB, the min is
5MB and the max is 1024MB. This parameter only applies for sites that support
multipart uploads (see multipart_uploads configuration parameter in the
CONFIGURATION section). The maximum number of chunks is 10,000, so if
you are uploading a large file, then the chunk size is automatically increased
to enable the upload. Choose smaller values to reduce the impact of transient
failures.
-p N, --parallel=N
Use N threads to upload FILE in
parallel. The default value is 0, which disables parallel uploads. This
parameter is only valid if the site supports mulipart uploads and the
--chunksize parameter is not 0.
-b, --create-bucket
Create the destination bucket if it does not
already exist
get Options¶
-c X, --chunksize=XSet the chunk size for parallel downloads to X
megabytes. A value of 0 will avoid chunked reads. This option only applies for
sites that support ranged downloads (see ranged_downloads configuration
parameter). The default chunk size is 10MB, the min is 1MB and the max is
1024MB. Choose smaller values to reduce the impact of transient
failures.
-p N, --parallel=N
Use N threads to upload FILE in parallel. The
default value is 0, which disables parallel downloads. This parameter is only
valid if the site supports ranged downloads and the --chunksize
parameter is not 0.
rmup Options¶
-a, --allCancel all uploads for the specified
bucket
SUBCOMMANDS¶
pegasus-s3 has several subcommands for different storage service operations. helpThe help subcommand lists all available
subcommands.
ls
The ls subcommand lists the contents of
a URL. If the URL does not contain a bucket, then all the buckets owned by the
user are listed. If the URL contains a bucket, but no key, then all the keys
in the bucket are listed. If the URL contains a bucket and a key, then all
keys in the bucket that begin with the specified key are listed.
mkdir
The mkdir subcommand creates one or
more buckets.
rmdir
The rmdir subcommand deletes one or
more buckets from the storage service. In order to delete a bucket, the bucket
must be empty.
rm
The rm subcommand deletes one or more
keys from the storage service.
put
The put subcommand stores the file
specified by FILE in the storage service under the bucket and key specified by
URL. If the URL contains a bucket, but not a key, then the file name is used
as the key.
If a transient failure occurs, then the upload will be retried several times
before pegasus-s3 gives up and fails.
The put subcommand can do both chunked and parallel uploads if the
service supports multipart uploads (see multipart_uploads in the
CONFIGURATION section). Currently only Amazon S3 supports multipart
uploads.
This subcommand will check the size of the file to make sure it can be stored
before attempting to store it.
Chunked uploads are useful to reduce the probability of an upload failing. If an
upload is chunked, then pegasus-s3 issues separate PUT requests for
each chunk of the file. Specifying smaller chunks (using --chunksize)
will reduce the chances of an upload failing due to a transient error.
Chunksizes can range from 5 MB to 1GB (chunk sizes smaller than 5 MB produced
incomplete uploads on Amazon S3). The maximum number of chunks for any single
file is 10,000, so if a large file is being uploaded with a small chunksize,
then the chunksize will be increased to fit within the 10,000 chunk limit. By
default, the file will be split into 10 MB chunks if the storage service
supports multipart uploads. Chunked uploads can be disabled by specifying a
chunksize of 0. If the upload is chunked, then each chunk is retried
independently under transient failures. If any chunk fails permanently, then
the upload is aborted.
Parallel uploads can increase performance for services that support multipart
uploads. In a parallel upload the file is split into N chunks and each chunk
is uploaded concurrently by one of M threads in first-come, first-served
fashion. If the chunksize is set to 0, then parallel uploads are disabled. If
M > N, then the actual number of threads used will be reduced to N. The
number of threads can be specified using the --parallel argument. If
--parallel is 0 or 1, then only a single thread is used. The default value is
0. There is no maximum number of threads, but it is likely that the link will
be saturated by 4 threads. Very high-bandwidth, long-delay links may get
better results with up to8 threads.
Under certain circumstances, when a multipart upload fails it could leave behind
data on the server. When a failure occurs the put subcommand will
attempt to abort the upload. If the upload cannot be aborted, then a partial
upload may remain on the server. To check for partial uploads run the
lsup subcommand. If you see an upload that failed in the output of
lsup, then run the rmup subcommand to remove it.
get
The get subcommand retrieves an object
from the storage service identified by URL and stores it in the file specified
by FILE. If FILE is not specified, then the key is used as the file name
(Note: if the key has slashes, then the file name will be a relative
subdirectory, but pegasus-s3 will not create the subdirectory if it
does not exist).
If a transient failure occurs, then the download will be retried several times
before pegasus-s3 gives up and fails.
The get subcommand can do both chunked and parallel downloads if the
service supports ranged downloads (see ranged_downloads in the
CONFIGURATION section). Currently only Amazon S3 has good support for
ranged downloads. Eucalyptus Walrus supports ranged downloads, but the current
release, 1.6, is inconsistent with the Amazon interface and has a bug that
causes ranged downloads to hang in some cases. It is recommended that ranged
downloads not be used with Eucalyptus until these issues are resolved.
Chunked downloads can be used to reduce the probability of a download failing.
When a download is chunked, pegasus-s3 issues separate GET requests for
each chunk of the file. Specifying smaller chunks (using --chunksize)
will reduce the chances that a download will fail to do a transient error.
Chunk sizes can range from 1 MB to 1 GB. By default, a download will be split
into 10 MB chunks if the site supports ranged downloads. Chunked downloads can
be disabled by specifying a --chunksize of 0. If a download is chunked,
then each chunk is retried independently under transient failures. If any
chunk fails permanently, then the download is aborted.
Parallel downloads can increase performance for services that support ranged
downloads. In a parallel download, the file to be retrieved is split into N
chunks and each chunk is downloaded concurrently by one of M threads in a
first-come, first-served fashion. If the chunksize is 0, then parallel
downloads are disabled. If M > N, then the actual number of threads used
will be reduced to N. The number of threads can be specified using the
--parallel argument. If --parallel is 0 or 1, then only a single thread is
used. The default value is 0. There is no maximum number of threads, but it is
likely that the link will be saturated by 4 threads. Very high-bandwidth,
long-delay links may get better results with up to8 threads.
lsup
The lsup subcommand lists active
multipart uploads. The URL specified should point to a bucket. This command is
only valid if the site supports multipart uploads. The output of this command
is a list of keys and upload IDs.
This subcommand is used with rmup to help recover from failures of
multipart uploads.
rmup
The rmup subcommand cancels and active
upload. The URL specified should point to a bucket, and UPLOAD is the long,
complicated upload ID shown by the lsup subcommand.
This subcommand is used with lsup to recover from failures of multipart
uploads.
URL FORMAT¶
All URLs for objects stored in S3 should be specified in the following format:s3[s]://USER@SITE[/BUCKET[/KEY]]
s3://ewa@amazon s3://juve@skynet/gideon.isi.edu s3://juve@magellan/pegasus-images/centos-5.5-x86_64-20101101.part.1 s3s://ewa@amazon/pegasus-images/data.tar.gz
CONFIGURATION¶
Each user should specify a configuration file that pegasus-s3 will use to look up connection parameters and authentication tokens.Search Path¶
This client will look in the following locations, in order, to locate the user’s configuration file: 1.The -C/--conf argument
2.The S3CFG environment variable
3.$HOME/.s3cfg
Configuration File Format¶
The configuration file is in INI format and contains two types of entries.[amazon] endpoint = http://s3.amazonaws.com/
[pegasus@amazon] access_key = 90c4143642cb097c88fe2ec66ce4ad4e secret_key = a0e3840e5baee6abb08be68e81674dca
Configuration Variables¶
endpoint (site)The URL of the web service endpoint. If the
URL begins with https, then SSL will be used.
max_object_size (site)
The maximum size of an object in GB (default:
5GB)
multipart_uploads (site)
Does the service support multipart uploads
(True/False, default: False)
ranged_downloads (site)
Does the service support ranged downloads?
(True/False, default: False)
access_key (identity)
The access key for the identity
secret_key (identity)
The secret key for the identity
Example Configuration¶
This is an example configuration that specifies a two sites (amazon and magellan) and three identities (pegasus@amazon,juve@magellan, and voeckler@magellan). For the amazon site the maximum object size is 5TB, and the site supports both multipart uploads and ranged downloads, so both uploads and downloads can be done in parallel.[amazon] endpoint = https://s3.amazonaws.com/ max_object_size = 5120 multipart_uploads = True ranged_downloads = True [pegasus@amazon] access_key = 90c4143642cb097c88fe2ec66ce4ad4e secret_key = a0e3840e5baee6abb08be68e81674dca [magellan] # NERSC Magellan is a Eucalyptus site. It doesn't support multipart uploads, # or ranged downloads (the defaults), and the maximum object size is 5GB # (also the default) endpoint = https://128.55.69.235:8773/services/Walrus [juve@magellan] access_key = quwefahsdpfwlkewqjsdoijldsdf secret_key = asdfa9wejalsdjfljasldjfasdfa [voeckler@magellan] # Each site can have multiple associated identities access_key = asdkfaweasdfbaeiwhkjfbaqwhei secret_key = asdhfuinakwjelfuhalsdflahsdl
EXAMPLE¶
List all buckets owned by identity user@amazon:$ pegasus-s3 ls s3://user@amazon
$ pegasus-s3 ls s3://user@amazon/bar
$ pegasus-s3 ls s3://user@amazon/bar/hello
$ pegasus-s3 mkdir s3://user@amazon/mybucket
$ pegasus-s3 rmdir s3://user@amazon/mybucket
$ pegasus-s3 putfoo s3://user@amazon/bar/foo
$ pegasus-s3 get s3://user@amazon/bar/foo foo
$ pegasus-s3 put --parallel 4 --chunksize 100 foo s3://user@amazon/bar/foo
$ pegasus-s3 get --parallel 4 --chunksize 100 s3://user@amazon/bar/foo foo
$ pegasus-s3 lsup s3://user@amazon/bar
$ pegasus-s3 rmup --all s3://user@amazon/bar
RETURN VALUE¶
pegasus-s3 returns a zero exist status if the operation is successful. A non-zero exit status is returned in case of failure.AUTHOR¶
Gideon Juve <juve@usc.edu>05/24/2012 |