.\" Automatically generated by Pandoc 1.17.2 .\" .TH "bup\-margin" "1" "2017\-04\-01" "Bup debian/0.29\-3" "" .hy .SH NAME .PP bup\-margin \- figure out your deduplication safety margin .SH SYNOPSIS .PP bup margin [options...] .SH DESCRIPTION .PP \f[C]bup\ margin\f[] iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, \f[C]n\f[], identifies the longest subset of SHA\-1 you could use and still encounter a collision between your object ids. .PP For example, one system that was tested had a collection of 11 million objects (70 GB), and \f[C]bup\ margin\f[] returned 45. That means a 46\-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. .PP The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA\-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA\-1 hashes are essentially random, it\[aq]s theoretically possible to use many more bits with far fewer objects. .PP If you\[aq]re paranoid about the possibility of SHA\-1 collisions, you can monitor your repository by running \f[C]bup\ margin\f[] occasionally to see if you\[aq]re getting dangerously close to 160 bits. .SH OPTIONS .TP .B \-\-predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. .RS .RE .TP .B \-\-ignore\-midx don\[aq]t use \f[C]\&.midx\f[] files, use only \f[C]\&.idx\f[] files. This is only really useful when used with \f[C]\-\-predict\f[]. .RS .RE .SH EXAMPLES .IP .nf \f[C] $\ bup\ margin Reading\ indexes:\ 100.00%\ (1612581/1612581),\ done. 40 40\ matching\ prefix\ bits 1.94\ bits\ per\ doubling 120\ bits\ (61.86\ doublings)\ remaining 4.19338e+18\ times\ larger\ is\ possible Everyone\ on\ earth\ could\ have\ 625878182\ data\ sets like\ yours,\ all\ in\ one\ repository,\ and\ we\ would expect\ 1\ object\ collision. $\ bup\ margin\ \-\-predict PackIdxList:\ using\ 1\ index. Reading\ indexes:\ 100.00%\ (1612581/1612581),\ done. 915\ of\ 1612581\ (0.057%)\ \f[] .fi .SH SEE ALSO .PP \f[C]bup\-midx\f[](1), \f[C]bup\-save\f[](1) .SH BUP .PP Part of the \f[C]bup\f[](1) suite. .SH AUTHORS Avery Pennarun .