.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "DBM::Deep::Internals 3pm" .TH DBM::Deep::Internals 3pm "2023-11-12" "perl v5.36.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" DBM::Deep::Internals \- Out of date documentation on DBM::Deep internals .SH "OUT OF DATE" .IX Header "OUT OF DATE" This document is out-of-date. It describes an intermediate file format used during the development from 0.983 to 1.0000. It will be rewritten soon. .PP So far, the description of the header format has been updated. .SH "DESCRIPTION" .IX Header "DESCRIPTION" This is a document describing the internal workings of DBM::Deep. It is not necessary to read this document if you only intend to be a user. This document is intended for people who either want a deeper understanding of specifics of how DBM::Deep works or who wish to help program DBM::Deep. .SH "CLASS LAYOUT" .IX Header "CLASS LAYOUT" DBM::Deep is broken up into five classes in three inheritance hierarchies. .IP "\(bu" 4 DBM::Deep is the parent of DBM::Deep::Array and DBM::Deep::Hash. These classes form the immediate interface to the outside world. They are the classes that provide the \s-1TIE\s0 mechanisms as well as the \s-1OO\s0 methods. .IP "\(bu" 4 DBM::Deep::Engine is the layer that deals with the mechanics of reading and writing to the file. This is where the logic of the file layout is handled. .IP "\(bu" 4 DBM::Deep::File is the layer that deals with the physical file. As a singleton that every other object has a reference to, it also provides a place to handle datastructure-wide items, such as transactions. .SH "FILE LAYOUT" .IX Header "FILE LAYOUT" This describes the 1.0003 and 2.0000 formats, which internally are numbered 3 and 4, respectively. The internal numbers are used in this section. These two formats are almost identical. .PP DBM::Deep uses a tagged file layout. Every section has a tag, a size, and then the data. .SS "File header" .IX Subsection "File header" The file header consists of two parts. The first part is a fixed length of 13 bytes: .PP .Vb 5 \& DPDB h VVVV SSSS \& \e / | \e \e \& \e/ \*(Aq\-\-\-. \e \*(Aq\-\-\- size of the second part of the header \& file \e \*(Aq\-\-\- version \& signature tag .Ve .IP "\(bu" 4 File Signature .Sp The first four bytes are '\s-1DPDB\s0' in network byte order, signifying that this is a DBM::Deep file. .IP "\(bu" 4 File tag .Sp A literal \s-1ASCII\s0 'h', indicating that this is the header. The file used by versions prior to 1.00 had a different fifth byte, allowing the difference to be determined. .IP "\(bu" 4 Version .Sp This is four bytes containing the file version. This lets the file format change over time. .Sp It is packed in network order, so version 4 is stored as \*(L"\e0\e0\e0\ecD\*(R". .IP "\(bu" 4 Header size .Sp The size of the second part of the header, in bytes. This number is also packed in network order. .PP The second part of the header is as follows: .PP .Vb 7 \& S B S T T(TTTTTTTTT...) (SS SS SS SS ...) (continued...) \& | | | | \e | \& | | | \*(Aq\-\-\-\-\-\-\-\-\-\-. \e staleness counters \& | | \*(Aq\-\-\-\-\-\-\-\-. \e txn bitfield \& | \*(Aq\-\-\-\-\-\-. \e number of transactions \& byte size \e data sector size \& max buckets \& \& (continuation...) \& BB(BBBBBB) DD(DDDDDD) II(IIIIII) \& | | | \& | free data | \& free blist free index .Ve .IP "\(bu" 4 Constants .Sp These are the file-wide constants that determine how the file is laid out. They can only be set upon file creation. .Sp The byte size is the number of bytes used to point to an offset elsewhere in the file. This corresponds to the \f(CW\*(C`pack_size\*(C'\fR option. This and the next three values are stored as packed 8\-bit integers (chars), so 2 is represented by \*(L"\ecB\*(R". .Sp \&\f(CW\*(C`max_buckets\*(C'\fR and \f(CW\*(C`data_sector_size\*(C'\fR are documented in the main DBM::Deep man page. The number stored is actually one less than what is passed to the constructor, to allow for a range of 1\-256. .Sp The number of transactions corresponds to the \f(CW\*(C`num_txns\*(C'\fR value passed to the constructor. .IP "\(bu" 4 Transaction information .Sp The transaction bitfield consists of one bit for every available transaction \s-1ID.\s0 It is therefore anywhere from 1 byte to 32 bytes long. .Sp The staleness counters each take two bytes (packed 32\-bit integers), one for each transaction, not including the so-called \s-1HEAD\s0 (the main transaction that all processes share \fIbefore\fR calling \f(CW\*(C`begin_work\*(C'\fR). So these take up 0 to 508 bytes. .Sp Staleness is explained in DBM::Deep::Engine. .IP "\(bu" 4 Freespace information .Sp Pointers into the first free sectors of the various sector sizes (Index, Bucketlist, and Data) are stored here. These are called chains internally, as each free sector points to the next one. .Sp The number of bytes is determined by the byte size, ranging from 2 to 8. .SS "Index" .IX Subsection "Index" The Index parts can be tagged either as Hash, Array, or Index. The latter is if there was a reindexing due to a bucketlist growing too large. The others are the root index for their respective datatypes. The index consists of a tag, a size, and then 256 sections containing file locations. Each section corresponds to each value representable in a byte. .PP The index is used as follows \- whenever a hashed key is being looked up, the first byte is used to determine which location to go to from the root index. Then, if that's also an index, the second byte is used, and so forth until a bucketlist is found. .SS "Bucketlist" .IX Subsection "Bucketlist" This is the part that contains the link to the data section. A bucketlist defaults to being 16 buckets long (modifiable by the \fImax_buckets\fR parameter used when creating a new file). Each bucket contains an \s-1MD5\s0 and a location of the appropriate key section. .SS "Key area" .IX Subsection "Key area" This is the part that handles transactional awareness. There are \&\fImax_buckets\fR sections. Each section contains the location to the data section, a transaction \s-1ID,\s0 and whether that transaction considers this key to be deleted or not. .SS "Data area" .IX Subsection "Data area" This is the part that actual stores the key, value, and class (if appropriate). The layout is: .IP "\(bu" 4 tag .IP "\(bu" 4 length of the value .IP "\(bu" 4 the actual value .IP "\(bu" 4 keylength .IP "\(bu" 4 the actual key .IP "\(bu" 4 a byte indicating if this value has a classname .IP "\(bu" 4 the classname (if one is there) .PP The key is stored after the value because the value is requested more often than the key. .SH "PERFORMANCE" .IX Header "PERFORMANCE" DBM::Deep is written completely in Perl. It also is a multi-process \s-1DBM\s0 that uses the datafile as a method of synchronizing between multiple processes. This is unlike most RDBMSes like MySQL and Oracle. Furthermore, unlike all RDBMSes, DBM::Deep stores both the data and the structure of that data as it would appear in a Perl program. .SS "\s-1CPU\s0" .IX Subsection "CPU" DBM::Deep attempts to be CPU-light. As it stores all the data on disk, DBM::Deep is I/O\-bound, not CPU-bound. .SS "\s-1RAM\s0" .IX Subsection "RAM" DBM::Deep uses extremely little \s-1RAM\s0 relative to the amount of data you can access. You can iterate through a million keys (using \f(CW\*(C`each()\*(C'\fR) without increasing your memory usage at all. .SS "\s-1DISK\s0" .IX Subsection "DISK" DBM::Deep is I/O\-bound, pure and simple. The faster your disk, the faster DBM::Deep will be. Currently, when performing \f(CW\*(C`my $x = $db\->{foo}\*(C'\fR, there are a minimum of 4 seeks and 1332 + N bytes read (where N is the length of your data). (All values assume a medium filesize.) The actions taken are: .IP "1 Lock the file" 4 .IX Item "1 Lock the file" .PD 0 .IP "1 Perform a \fBstat()\fR to determine if the inode has changed" 4 .IX Item "1 Perform a stat() to determine if the inode has changed" .ie n .IP "1 Go to the primary index for the $db (1 seek)" 4 .el .IP "1 Go to the primary index for the \f(CW$db\fR (1 seek)" 4 .IX Item "1 Go to the primary index for the $db (1 seek)" .IP "1 Read the tag/size of the primary index (5 bytes)" 4 .IX Item "1 Read the tag/size of the primary index (5 bytes)" .IP "1 Read the body of the primary index (1024 bytes)" 4 .IX Item "1 Read the body of the primary index (1024 bytes)" .IP "1 Go to the bucketlist for this \s-1MD5\s0 (1 seek)" 4 .IX Item "1 Go to the bucketlist for this MD5 (1 seek)" .IP "1 Read the tag/size of the bucketlist (5 bytes)" 4 .IX Item "1 Read the tag/size of the bucketlist (5 bytes)" .IP "1 Read the body of the bucketlist (144 bytes)" 4 .IX Item "1 Read the body of the bucketlist (144 bytes)" .IP "1 Go to the keys location for this \s-1MD5\s0 (1 seek)" 4 .IX Item "1 Go to the keys location for this MD5 (1 seek)" .IP "1 Read the tag/size of the keys section (5 bytes)" 4 .IX Item "1 Read the tag/size of the keys section (5 bytes)" .IP "1 Read the body of the keys location (144 bytes)" 4 .IX Item "1 Read the body of the keys location (144 bytes)" .IP "1 Go to the data section that corresponds to this transaction \s-1ID.\s0 (1 seek)" 4 .IX Item "1 Go to the data section that corresponds to this transaction ID. (1 seek)" .IP "1 Read the tag/size of the data section (5 bytes)" 4 .IX Item "1 Read the tag/size of the data section (5 bytes)" .IP "1 Read the value for this data (N bytes)" 4 .IX Item "1 Read the value for this data (N bytes)" .IP "1 Unlock the file" 4 .IX Item "1 Unlock the file" .PD .PP Every additional level of indexing (if there are enough keys) requires an additional seek and the reading of 1029 additional bytes. If the value is blessed, an additional 1 seek and 9 + M bytes are read (where M is the length of the classname). .PP Arrays are (currently) even worse because they're considered \*(L"funny hashes\*(R" with the length stored as just another key. This means that if you do any sort of lookup with a negative index, this entire process is performed twice \- once for the length and once for the value. .SH "ACTUAL TESTS" .IX Header "ACTUAL TESTS" .SS "\s-1SPEED\s0" .IX Subsection "SPEED" Obviously, DBM::Deep isn't going to be as fast as some C\-based DBMs, such as the almighty \fIBerkeleyDB\fR. But it makes up for it in features like true multi-level hash/array support, and cross-platform FTPable files. Even so, DBM::Deep is still pretty fast, and the speed stays fairly consistent, even with huge databases. Here is some test data: .PP .Vb 1 \& Adding 1,000,000 keys to new DB file... \& \& At 100 keys, avg. speed is 2,703 keys/sec \& At 200 keys, avg. speed is 2,642 keys/sec \& At 300 keys, avg. speed is 2,598 keys/sec \& At 400 keys, avg. speed is 2,578 keys/sec \& At 500 keys, avg. speed is 2,722 keys/sec \& At 600 keys, avg. speed is 2,628 keys/sec \& At 700 keys, avg. speed is 2,700 keys/sec \& At 800 keys, avg. speed is 2,607 keys/sec \& At 900 keys, avg. speed is 2,190 keys/sec \& At 1,000 keys, avg. speed is 2,570 keys/sec \& At 2,000 keys, avg. speed is 2,417 keys/sec \& At 3,000 keys, avg. speed is 1,982 keys/sec \& At 4,000 keys, avg. speed is 1,568 keys/sec \& At 5,000 keys, avg. speed is 1,533 keys/sec \& At 6,000 keys, avg. speed is 1,787 keys/sec \& At 7,000 keys, avg. speed is 1,977 keys/sec \& At 8,000 keys, avg. speed is 2,028 keys/sec \& At 9,000 keys, avg. speed is 2,077 keys/sec \& At 10,000 keys, avg. speed is 2,031 keys/sec \& At 20,000 keys, avg. speed is 1,970 keys/sec \& At 30,000 keys, avg. speed is 2,050 keys/sec \& At 40,000 keys, avg. speed is 2,073 keys/sec \& At 50,000 keys, avg. speed is 1,973 keys/sec \& At 60,000 keys, avg. speed is 1,914 keys/sec \& At 70,000 keys, avg. speed is 2,091 keys/sec \& At 80,000 keys, avg. speed is 2,103 keys/sec \& At 90,000 keys, avg. speed is 1,886 keys/sec \& At 100,000 keys, avg. speed is 1,970 keys/sec \& At 200,000 keys, avg. speed is 2,053 keys/sec \& At 300,000 keys, avg. speed is 1,697 keys/sec \& At 400,000 keys, avg. speed is 1,838 keys/sec \& At 500,000 keys, avg. speed is 1,941 keys/sec \& At 600,000 keys, avg. speed is 1,930 keys/sec \& At 700,000 keys, avg. speed is 1,735 keys/sec \& At 800,000 keys, avg. speed is 1,795 keys/sec \& At 900,000 keys, avg. speed is 1,221 keys/sec \& At 1,000,000 keys, avg. speed is 1,077 keys/sec .Ve .PP This test was performed on a PowerMac G4 1gHz running Mac \s-1OS X 10.3.2 &\s0 Perl 5.8.1, with an 80GB Ultra \s-1ATA/100 HD\s0 spinning at 7200RPM. The hash keys and values were between 6 \- 12 chars in length. The \s-1DB\s0 file ended up at 210MB. Run time was 12 min 3 sec. .SS "\s-1MEMORY USAGE\s0" .IX Subsection "MEMORY USAGE" One of the great things about DBM::Deep is that it uses very little memory. Even with huge databases (1,000,000+ keys) you will not see much increased memory on your process. DBM::Deep relies solely on the filesystem for storing and fetching data. Here is output from \fItop\fR before even opening a database handle: .PP .Vb 2 \& PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND \& 22831 root 11 0 2716 2716 1296 R 0.0 0.2 0:07 perl .Ve .PP Basically the process is taking 2,716K of memory. And here is the same process after storing and fetching 1,000,000 keys: .PP .Vb 2 \& PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND \& 22831 root 14 0 2772 2772 1328 R 0.0 0.2 13:32 perl .Ve .PP Notice the memory usage increased by only 56K. Test was performed on a 700mHz x86 box running Linux RedHat 7.2 & Perl 5.6.1.