.\" Automatically generated by Pandoc 2.9.2.1 .\" .TH "bup-margin" "1" "2021-02-06" "Bup 0.32" "" .hy .SH NAME .PP bup-margin - figure out your deduplication safety margin .SH SYNOPSIS .PP bup margin [options\&...] .SH DESCRIPTION .PP \f[C]bup margin\f[R] iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, \f[C]n\f[R], identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. .PP For example, one system that was tested had a collection of 11 million objects (70 GB), and \f[C]bup margin\f[R] returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. .PP The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it\[cq]s theoretically possible to use many more bits with far fewer objects. .PP If you\[cq]re paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running \f[C]bup margin\f[R] occasionally to see if you\[cq]re getting dangerously close to 160 bits. .SH OPTIONS .TP --predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. .TP --ignore-midx don\[cq]t use \f[C].midx\f[R] files, use only \f[C].idx\f[R] files. This is only really useful when used with \f[C]--predict\f[R]. .SH EXAMPLES .IP .nf \f[C] $ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) \f[R] .fi .SH SEE ALSO .PP \f[C]bup-midx\f[R](1), \f[C]bup-save\f[R](1) .SH BUP .PP Part of the \f[C]bup\f[R](1) suite. .SH AUTHORS Avery Pennarun .