.\" Automatically generated by Pod::Man 4.10 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "Digest 3pm" .TH Digest 3pm "2018-11-01" "perl v5.28.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" File::RsyncP::Digest \- Perl interface to rsync message digest algorithms .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use File::RsyncP::Digest; \& \& $rsDigest = new File::RsyncP::Digest; \& \& # specify rsync protocol version (default is <= 26 \-> buggy digests). \& $rsDigest\->protocol(version); \& \& # file MD4 digests \& $rsDigest\->reset(); \& $rsDigest\->add(LIST); \& $rsDigest\->addfile(HANDLE); \& \& $digest = $rsDigest\->digest(); \& $string = $rsDigest\->hexdigest(); \& \& # Return 32 byte pair of digests (protocol <= 26 and >= 27). \& $digestPair = $rsDigest\->digest2(); \& \& $digest = File::RsyncP::Digest\->hash(SCALAR); \& $string = File::RsyncP::Digest\->hexhash(SCALAR); \& \& # block digests \& $digests = $rsDigest\->blockDigest($data, $blockSize, $md4DigestLen, \& $checksumSeed); \& \& $digests = $rsDigest\->blockDigestUpdate($state, $blockSize, \& $blockLastLen, $md4DigestLen, $checksumSeed); \& \& $digests2 = $rsDigest\->blockDigestExtract($digests16, $md4DigestLen); .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" The \fBFile::RsyncP::Digest\fR module allows you to compute rsync digests, including the \s-1RSA\s0 Data Security Inc. \s-1MD4\s0 Message Digest algorithm, and Adler32 checksums from within Perl programs. .SS "Rsync Digests" .IX Subsection "Rsync Digests" Rsync uses two main digests (or checksums), for checking with very high probability that the underlying data is identical, without the need to exchange the underlying data. .PP The server (remote) side of rsync generates a checksumSeed (usually unix \&\fBtime()\fR) that is exchanged during the protocol startup. This seed is used in both the file and \s-1MD4\s0 checksum calculations. This causes the block and file checksums to change every time Rsync is run. .IP "File Digest" 4 .IX Item "File Digest" This is an \s-1MD4\s0 digest of the checksum seed, followed by the entire file's contents. This digest is 128 bits long. The file digest is sent at the end of a file's deltas to ensure that the reconstructed file is correct. This digest is also optionally computed and sent as part of the file list if the \-\-checksum option is specified to rsync. .IP "Block digest" 4 .IX Item "Block digest" Each file is divided into blocks of default length 700 bytes. The digest of each block is formed by computing the Adler32 checksum of the block, and also the \s-1MD4\s0 digest of the block followed by the checksum seed. During phase 1, just the first two bytes of the \s-1MD4\s0 digest are sent, meaning the total digest is 6 bytes or 48 bits (4 bytes for Adler32 and the first 2 bytes of the \s-1MD4\s0 digest). During phase 2 (which is necessary for received files that have an incorrect file digest), the entire \s-1MD4\s0 checksum is used (128 bits) meaning the block digest is 20 bytes or 160 bits. (Prior to rsync protocol \s-1XXX,\s0 the full 20 byte digest was sent every time and there was only a single phase.) .PP This module contains routines for computing file and block digests in a manner that is identical to rsync. .PP Incidentally, rsync contains two bugs in its implementation of \s-1MD4\s0 (up to and including rsync protocol version 26): .IP "\(bu" 4 \&\fBMD4Final()\fR is not called when the data size (ie: file or block size plus 4 bytes for the checksum seed) is a multiple of 64. .IP "\(bu" 4 \&\s-1MD4\s0 is not correct for total data sizes greater than 512MB (2^32 bits). Rsync's \s-1MD4\s0 only maintains the data size using a 32 bit counter, so it overflows for file sizes bigger than 512MB. .PP The effects of these bugs are benign: the \s-1MD4\s0 digest should not be cryptographically weakened and both sides are consistent. .PP This module implements both versions of the \s-1MD4\s0 digest: the buggy version for protocol versions <= 26 and the correct version for protocol versions >= 27. The default mode is the buggy version (protocol versions <= 26). .PP You can specify the rsync protocol version to determine which \&\s-1MD4\s0 version is used: .PP .Vb 2 \& # specify rsync protocol version (default is <= 26 \-> buggy digests). \& $rsDigest\->protocol(version); .Ve .PP Also, you can get both digests in a single call. The result is returned as a single 32 byte scalar: the first 16 bytes is the buggy digest and the second 16 bytes is the correct digest: .PP .Vb 2 \& # Return 32 byte pair of digests (protocol <= 26 and >= 27). \& $digestPair = $rsDigest\->digest2(); .Ve .SS "Usage" .IX Subsection "Usage" A new rsync digest context object is created with the \fBnew\fR operation. Multiple simultaneous digest contexts can be maintained, if desired. .SS "Computing Block Digests" .IX Subsection "Computing Block Digests" After a context is created, the function to compute block checksums is: .PP .Vb 2 \& $digests = $rsDigest\->blockDigest($data, $blockSize, $md4DigestLen, \& $checksumSeed) .Ve .PP The first argument is the data, which can contain as much raw data as you wish (ie: multiple blocks). Both the Adler32 checksum and the \s-1MD4\s0 checksum are computed for each block in data. The partial end block (if present) is also processed. The 4 bytes of the integer checksumSeed is added at the end of each block digest calculation if it is non-zero. The blockSize is specified in the second argument (default is 700). The third argument, md4DigestLen, specifies how many bytes of the \&\s-1MD4\s0 digest are included in the returned data. Rsync uses a value of 2 for the first pass (meaning 6 bytes of total digests are returned per block), and all 16 bytes for the second pass (meaning 20 bytes of total digests are returned per block). The returned number of bytes is the number of bytes in each digest (Alder32 + partial/compete \s-1MD4\s0) times the number of blocks: .PP .Vb 1 \& (4 + md4DigestLen) * ceil(length(data) / blockSize); .Ve .PP To allow block checksums to be cached (when checksumSeed is unknown), and then quickly updated with the known checksumSeed, the checksum data should be first computed with a digest length of \-1 and a checksumSeed of 0: .PP .Vb 1 \& $state = $rsDigest\->blockDigest($data, $blockSize, \-1, 0); .Ve .PP The returned \f(CW$state\fR should be saved for later retrieval, together with the length of the last partial block (eg: length($data) % \f(CW$blockSize\fR). The length of \f(CW$state\fR depends upon the number of blocks and the block size. In addition to the 16 bytes of \s-1MD4\s0 state, up to 63 bytes of unprocessed data per block also is saved in \f(CW$state\fR. For each block, .PP .Vb 1 \& 16 + ($blockSize % 64) .Ve .PP bytes are saved in \f(CW$state\fR, so \f(CW$state\fR is most compact when \f(CW$blockSize\fR is a multiple of 64. (The last, partial, block might have a smaller block size, requiring up to 63 bytes of state even if \f(CW$blockSize\fR is a multiple of 64.) .PP Once the checksumSeed is known the updated checksums can then be computed using: .PP .Vb 2 \& $digests = $rsDigest\->blockDigestUpdate($state, $blockSize, \& $blockLastLen, $md4DigestLen, $checksumSeed); .Ve .PP The first argument is the cached checksums from blockDigest. The third argument is the length of the (partial) last block. .PP Alternatively, I hope to add a \-\-checksum\-seed=n option to rsync that allows the checksum seed to be set to 0. This causes the checksum seed to be omitted from the \s-1MD4\s0 calculation and it makes caching the checksums much easier. A zero checksum seed does not weaken the block digest. I'm not sure whether or not it weakens the file digest (the checksum seed is applied at the start of the file digest and end of the block digest). In this case, the full 16 byte checksums should be computed using: .PP .Vb 1 \& $digests16 = $rsDigest\->blockDigest($data, $blockSize, 16, 0); .Ve .PP and for phase 1 the 2 byte \s-1MD4\s0 substrings can be extracted with: .PP .Vb 1 \& $digests2 = $rsDigest\->blockDigestExtract($digests16, 2); .Ve .PP The original \f(CW$digests16\fR does not need any additional processing for phase 2. .SS "Computing File Digests" .IX Subsection "Computing File Digests" In addition, functions identical to \fBDigest::MD4\fR are provided that allow rsync's \s-1MD4\s0 file digest to be computed. The checksum seed, if non-zero, is included at the start of the data, before the file's contents are added. .PP The context is updated with the \fBadd\fR operation which adds the strings contained in the \fI\s-1LIST\s0\fR parameter. Note, however, that \&\f(CW\*(C`add(\*(Aqfoo\*(Aq, \*(Aqbar\*(Aq)\*(C'\fR, \f(CW\*(C`add(\*(Aqfoo\*(Aq)\*(C'\fR followed by \f(CW\*(C`add(\*(Aqbar\*(Aq)\*(C'\fR and \&\f(CW\*(C`add(\*(Aqfoobar\*(Aq)\*(C'\fR should all give the same result. .PP The final \s-1MD4\s0 message digest value is returned by the \fBdigest\fR operation as a 16\-byte binary string. This operation delivers the result of \&\fBadd\fR operations since the last \fBnew\fR or \fBreset\fR operation. Note that the \fBdigest\fR operation is effectively a destructive, read-once operation. Once it has been performed, the context must be \fBreset\fR before being used to calculate another digest value. .PP Several convenience functions are also provided. The \fBaddfile\fR operation takes an open file-handle and reads it until end-of file in 1024 byte blocks adding the contents to the context. The file-handle can either be specified by name or passed as a type-glob reference, as shown in the examples below. The \fBhexdigest\fR operation calls \&\fBdigest\fR and returns the result as a printable string of hexdecimal digits. This is exactly the same operation as performed by the \&\fBunpack\fR operation in the examples below. .PP The \fBhash\fR operation can act as either a static member function (ie you invoke it on the \s-1MD4\s0 class as in the synopsis above) or as a normal virtual function. In both cases it performs the complete \s-1MD4\s0 cycle (reset, add, digest) on the supplied scalar value. This is convenient for handling small quantities of data. When invoked on the class a temporary context is created. When invoked through an already created context object, this context is used. The latter form is slightly more efficient. The \fBhexhash\fR operation is analogous to \&\fBhexdigest\fR. .SH "EXAMPLES" .IX Header "EXAMPLES" .Vb 1 \& use File::RsyncP::Digest; \& \& my $rsDigest = new File::RsyncP::Digest; \& $rsDigest\->add(\*(Aqfoo\*(Aq, \*(Aqbar\*(Aq); \& $rsDigest\->add(\*(Aqbaz\*(Aq); \& my $digest = $rsDigest\->digest(); \& \& print("Rsync MD4 Digest is " . unpack("H*", $digest) . "\en"); .Ve .PP The above example would print out the message .PP .Vb 1 \& Rsync MD4 Digest is 6df23dc03f9b54cc38a0fc1483df6e21 .Ve .PP To compute the rsync phase 1 block checksums (4 + 2 = 6 bytes per block) for a 2000 byte file containing 700 a's, 700 b's and 600 c's, with a checksum seed of 0x12345678: .PP .Vb 1 \& use File::RsyncP::Digest; \& \& my $rsDigest = new File::RsyncP::Digest; \& my $data = ("a" x 700) . ("b" x 700) . ("c" x 600); \& my $digest = $rsDigest\->rsyncChecksum($data, 700, 2, 0x12345678); \& \& print("Rsync block checksums are " . unpack("H*", $digest) . "\en"); .Ve .PP This will print: .PP .Vb 1 \& Rsync block checksums are 3c09a624641bf80b0ce3abd208e8645d5b49 .Ve .PP The same result can be achieved in two steps by saving the state, and then finishing the calculation: .PP .Vb 1 \& my $state = $rsDigest\->blockDigest($data, 700, \-1, 0); \& \& my $digest = $rsDigest\->blockDigestUpdate($state, 700, \& length($data) % 700, 2, 0x12345678); .Ve .PP or by computing full-length \s-1MD4\s0 digests, and extracting the 2 byte version: .PP .Vb 2 \& my $digest16 = $rsDigest\->blockDigest($data, 700, 16, 0x12345678); \& my $digest = $rsDigest\->blockDigestExtract($digest16, 2); .Ve .SH "LICENSE" .IX Header "LICENSE" This program is free software: you can redistribute it and/or modify it under the terms of the \s-1GNU\s0 General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. .PP This program is distributed in the hope that it will be useful, but \s-1WITHOUT ANY WARRANTY\s0; without even the implied warranty of \&\s-1MERCHANTABILITY\s0 or \s-1FITNESS FOR A PARTICULAR PURPOSE.\s0 See the \&\s-1GNU\s0 General Public License for more details. .PP You should have received a copy of the \s-1GNU\s0 General Public License along with this program. If not, see . .PP The \s-1MD4\s0 algorithm is defined in \s-1RFC1320.\s0 The basic C code implementing the algorithm is derived from that in the \s-1RFC\s0 and is covered by the following copyright: .Sp .Vb 2 \& MD4 is Copyright (C) 1990\-2, RSA Data Security, Inc. All rights \& reserved. \& \& License to copy and use this software is granted provided that it \& is identified as the "RSA Data Security, Inc. MD4 Message\-Digest \& Algorithm" in all material mentioning or referencing this software \& or this function. \& \& License is also granted to make and use derivative works provided \& that such works are identified as "derived from the RSA Data \& Security, Inc. MD4 Message\-Digest Algorithm" in all material \& mentioning or referencing the derived work. \& \& RSA Data Security, Inc. makes no representations concerning either \& the merchantability of this software or the suitability of this \& software for any particular purpose. It is provided "as is" \& without express or implied warranty of any kind. \& \& These notices must be retained in any copies of any part of this \& documentation and/or software. .Ve .PP This copyright does not prohibit distribution of any version of Perl containing this extension under the terms of the \s-1GNU\s0 or Artistic licences. .SH "AUTHOR" .IX Header "AUTHOR" File::RsyncP::Digest was written by Craig Barratt based on Digest::MD4 and the Adler32 implementation was based on rsync 2.5.5. .PP Digest::MD4 was adapted by Mike McCauley (\f(CW\*(C`mikem@open.com.au\*(C'\fR), based entirely on \s-1MD5\-1.7,\s0 written by Neil Winton (\f(CW\*(C`N.Winton@axion.bt.co.uk\*(C'\fR). .PP Rsync was written by Andrew Tridgell and Paul Mackerras. It is available under a \s-1GPL\s0 license. See . .SH "SEE ALSO" .IX Header "SEE ALSO" See for File::RsyncP's SourceForge home page. .PP See File::RsyncP, File::RsyncP::FileIO and File::RsyncP::FileList.