.\" Copyright (c) 2015-2022 Ganael LAPLANCHE .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .Dd January 27, 2015 .Dt FPSYNC 1 .Os .Sh NAME .Nm fpsync .Nd Synchronize directories in parallel using fpart and an external tool .Sh SYNOPSIS .Nm .Op Fl p .Op Fl n Ar jobs .Op Fl w Ar wrks .Op Fl m Ar tool .Op Fl T Ar path .Op Fl f Ar files .Op Fl s Ar size .Op Fl E .Op Fl o Ar toolopts .Op Fl O Ar fpartopts .Op Fl S .Op Fl t Ar tmpdir .Op Fl d Ar shdir .Op Fl M Ar mailaddr .Op Fl v .Pa src_dir/ .Pa dst_url/ .Nm .Fl l .Nm .Fl r Ar runid .Op Fl R .Cm [OPTIONS...] .Nm .Fl a Ar runid .Nm .Fl D Ar runid .Sh DESCRIPTION The .Nm tool synchronizes directories in parallel using .Xr fpart 1 and .Xr rsync 1 , .Xr cpio 1 or .Xr tar 1 . It computes subsets of .Pa src_dir/ and spawns jobs to synchronize them to .Pa dst_url/ . .sp Synchronization jobs can be executed either locally or remotely (using SSH workers, see option .Fl w ) and are executed on-the-fly while filesystem crawling goes on. This makes .Nm a good tool for migrating large filesystems. .Sh COMMON OPTIONS .Bl -tag -width indent .It Ic -t Ar tmpdir Set .Nm temporary directory to .Ar tmpdir . This directory remains local and does not need to be shared amongst SSH workers when using the .Fl w option. Default: .Pa /tmp/fpsync .It Ic -d Ar shdir Set .Nm shared directory to .Ar shdir . This option is mandatory when using SSH workers and set by default to .Ar tmpdir when running locally. The specified directory must be an absolute path ; it will be used to handle communications with SSH hosts (sharing partitions and log files) and, as a consequence, must be made available to all participating hosts (e.g. through a r/w NFS mount), including the master one running .Nm . .It Ic -M Ar mailaddr Send an e-mail to .Ar mailaddr after a run. Multiple -space-separated- addresses can be specified. That option requires the .Xr mail 1 client to be installed and configured on the master host running .Nm . .It Fl v Verbose mode. Can be be specified several times to increase verbosity level. .It Fl h Print help .El .Sh SYNCHRONIZATION OPTIONS .Bl -tag -width indent .It Ic -m Ar tool External copy .Ar tool used to synchronize files. Currently supported tools are: .Sy rsync , .Sy cpio , .Sy tar , and .Sy tarify . Default: .Sy rsync . When using .Sy cpio or .Sy tar and more than one worker, directory timestamps may not be replicated. A second pass will fix them. Special tool .Sy tarify generates tarballs into destination directory. .It Ic -T Ar path Specify absolute .Ar path of copy tool (guessed by default). If you force a specific .Ar path , the copy tool must be present at that .Ar path on each worker. That .Ar path cannot be changed when resuming a run. .It Ic -f Ar files Transfer at most .Ar files files or directories per sync job. .Sy 0 means unlimited but you must at least specify one file or size limit. Default: .Sy 2000 .It Ic -s Ar size Transfer at most .Ar size bytes per sync job. .Sy 0 means unlimited but you must at least specify one file or size limit. You can use a human-friendly unit suffix here (k, m, g, t, p). .br Default: .Sy 4294967296 (4 GB) .It Fl E Work on a per-directory basis (rsync tool only). In that mode, .Nm works with lists of directories instead of files. That mode may generate coarse-grained lists but enables .Xr rsync 1 's .Cm --delete option by default ( .Sy WARNING ! ! ! ), making it a good candidate for a final (cleaning) pass after several incremental passes using standard (file) mode. When option .Fl E is specified twice, it enables 'aggressive' mode which isolates erroneous directories and enables recursive synchronization for them. This advanced mode can be useful to try to overcome transcient errors such as Linux SMB client deferring .Fn opendir calls to support compound SMB requests. .It Ic -o Ar toolopts Override default .Xr rsync 1 , .Xr cpio 1 or .Xr tar 1 options with .Ar toolopts . Use this option with care as certain options are incompatible with a parallel usage (e.g. rsync's .Cm --delete ) . Default for rsync: .Dq -lptgoD -v --numeric-ids . Empty for cpio, tar and tarify. .It Ic -O Ar fpartopts Override default .Xr fpart 1 options with .Ar fpartopts . Options and values must be separated by a pipe character. .br Default: .Dq -x|.zfs|-x|.snapshot*|-x|.ckpt . .It Fl S Sudo mode. Use .Xr sudo 8 for filesystem crawling and synchronizations. .It Pa src_dir/ Source directory. It must be absolute and available on all participating hosts (including the master one, running .Nm ) . .It Pa dst_url/ Destination directory or URL (rsync tool only). If a remote URL is provided, it must be supported by .Xr rsync 1 . All participating workers must be able to reach that target. .El .Sh JOB HANDLING AND DISPATCHING OPTIONS .Bl -tag -width indent .It Ic -n Ar jobs Start .Ar jobs concurrent sync jobs (either locally or remotely, see below) per run. Default: .Sy 2 .It Ic -w Ar wrks Use remote SSH .Ar wrks to synchronize files. Synchronization jobs are executed locally when this option is not set. .Ar wrks is a space-separated list of login@machine connection strings and can be specified several times. You must be allowed to connect to those machines using a SSH key to avoid user interaction. .El .Sh RUN HANDLING OPTIONS .Bl -tag -width indent .It Fl p Prepare mode: prepare, test synchronization environment, start .Xr fpart 1 and create partitions but do .Sy not actually start transfers. That mode can be used to create a run that can then be resumed using option .Fl r . .It Fl l List previous runs and their status. .It Ic -r Ar runid Resume run .Ar runid and restart synchronizing remaining partitions from a previous run. .Ar runid is displayed when using verbose mode (see option .Fl v ) or prepare mode (option .Fl p ) and can be retrieved afterwards using option .Fl l . Note that filesystem crawling is skipped when resuming a previous run. As a consequence, options .Fl m , .Fl f , .Fl s , .Fl E , .Fl o , .Fl O , .Fl S , .Pa src_dir/ , and .Pa dst_url/ are ignored. .It Fl R Replay mode: when using option .Fl r , force re-synchronizing run's all partitions instead of remaining ones only. That mode can be useful to skip filesystem crawling when you have to replay a final pass several times and you know directory structure has not changed in the meantime (you may miss files if you use replay mode with a standard, file-based, run). .It Ic -a Ar runid Archive run .Ar runid (including partition files, logs, queue and work directories) to .Ar tmpdir . That option requires the .Xr tar 1 client to be installed on the master host running .Nm . .It Ic -D Ar runid Delete run .Ar runid (including partition files, logs, queue and work directories). .El .Sh RUNNING FPSYNC Each .Nm run generates a unique .Ar runid , which is displayed in verbose mode (see option .Fl v ) and within log files. You can use that .Ar runid to resume a previous run (see option .Fl r ) . .Nm will then restart synchronizing data from the parts that were being synchronized at the time it stopped. .sp This unique feature gives the administrator the ability to stop .Nm and restart it later, without having to restart the whole filesystem crawling and synchronization process. Note that resuming is only possible when filesystem crawling step has finished. .sp During synchronization, you can press CTRL-C to interrupt the process. The first CTRL-C prevents new synchronizations from being submitted and the process will wait for current synchronizations to be finished before exiting. If you press CTRL-C again, current synchronizations will be killed and .Nm will exit immediately. When using option .Fl E to enable directory mode and rsync's .Cm --delete option, keep in mind that killing rsync processes may lead to a situation where certain files have been updated and others not deleted yet (because the deletion process is postponed using rsync's .Cm --delete-after option). .sp On certain systems, CTRL-T can be pressed to get the status of current and remaining parts to be synchronized. This can also be achieved by sending a SIGINFO to the .Nm process. .sp Whether you use verbose mode or not, everything is logged within .Pa shdir/log/ . .Sh EXAMPLES Here are some examples: .Bl -tag -width indent .It Li "fpsync -n 4 /usr/src/ /var/src/" .sp Synchronizes .Pa /usr/src/ to .Pa /var/src/ using 4 local jobs. .It Li "fpsync -n 2 -w login@machine1 -w login@machine2 -d /mnt/fpsync /mnt/src/ /mnt/dst/" .sp Synchronizes .Pa /mnt/src/ to .Pa /mnt/dst/ using 2 concurrent jobs executed remotely on 2 SSH workers (machine1 and machine2). The shared directory is set to .Pa /mnt/fpsync and mounted on the machine running .Nm , as well as on machine1 and machine2. The source directory .Pa ( /mnt/src/ ) is also available on those 3 machines, while the destination directory .Pa ( /mnt/dst/ ) is mounted on SSH workers only (machine1 and machine2). .El .Sh LIMITATIONS Parallelizing .Xr rsync 1 can make several options not usable, such as .Cm --delete . If your source directory is live while .Nm is running, you will have to delete extra files from destination directory. This is traditionally done by using a final -offline- .Xr rsync 1 pass that will use this option, but you can also use .Nm and option .Cm -E to perform the same task using several workers. .sp .Nm enqueues synchronization jobs on disk, within the .Pa tmpdir/queue directory. Be careful to host this queue on a filesystem that can handle fine-grained mtime timestamps (i.e. with a sub-second precision) if you want the queue to be processed in order when .Xr fpart 1 generates several jobs per second. On .Fx , .Xr VFS 9 timestamps' precision can be tuned using the 'vfs.timestamp_precision' sysctl. See .Xr vfs_timestamp 9 . .sp Contrary to .Xr rsync 1 , .Nm enforces the final '/' on the source directory. It means that directory .Sy contents are synchronized, not the source directory itself (i.e. you will not get a subdirectory of the name of the source directory in the target directory after synchronization). .sp Before starting filesystem crawling, .Nm changes its current working directory to .Pa src_dir/ and generates partitions containing .Sy relative paths (all starting with './'). This is important to keep in mind when modifying .Ar toolopts or .Ar fpartopts dealing with file or directory paths. .Sh SEE ALSO .Xr cpio 1 , .Xr fpart 1 , .Xr mail 1 , .Xr rsync 1 , .Xr tar 1 , .Xr sudo 8 .Sh AUTHOR, AVAILABILITY Fpsync has been written by .An Gana\(:el LAPLANCHE and is available under the BSD license on .Lk http://contribs.martymac.org .Sh BUGS No bug known (yet).