.\" Automatically generated by Pod::Man 4.10 (Pod::Simple 3.35) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "Mail::Mbox::MessageParser 3pm" .TH Mail::Mbox::MessageParser 3pm "2018-12-01" "perl v5.28.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Mail::Mbox::MessageParser \- A fast and simple mbox folder reader .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& #!/usr/bin/perl \& \& use Mail::Mbox::MessageParser; \& \& # Compression support \& my $file_name = \*(Aqmail/saved\-mail.xz\*(Aq; \& my $file_handle = new FileHandle($file_name); \& \& # Set up cache. (Not necessary if enable_cache is false.) \& Mail::Mbox::MessageParser::SETUP_CACHE( \& { \*(Aqfile_name\*(Aq => \*(Aq/tmp/cache\*(Aq } ); \& \& my $folder_reader = \& new Mail::Mbox::MessageParser( { \& \*(Aqfile_name\*(Aq => $file_name, \& \*(Aqfile_handle\*(Aq => $file_handle, \& \*(Aqenable_cache\*(Aq => 1, \& \*(Aqenable_grep\*(Aq => 1, \& } ); \& \& die $folder_reader unless ref $folder_reader; \& \& # Any newlines or such before the start of the first email \& my $prologue = $folder_reader\->prologue; \& print $prologue; \& \& # This is the main loop. It\*(Aqs executed once for each email \& while(!$folder_reader\->end_of_file()) \& { \& my $email = $folder_reader\->read_next_email(); \& print $$email; \& } .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" This module implements a fast but simple mbox folder reader. One of three implementations (Cache, Grep, Perl) will be used depending on the wishes of the user and the system configuration. The first implementation is a cached-based one which stores email information about mailboxes on the file system. Subsequent accesses will be faster because no analysis of the mailbox will be needed. The second implementation is one based on \s-1GNU\s0 grep, and is significantly faster than the Perl version for mailboxes which contain very large (10MB) emails. The final implementation is a fast Perl-based one which should always be applicable. .PP The Cache implementation is about 6 times faster than the standard Perl implementation. The Grep implementation is about 4 times faster than the standard Perl implementation. If you have \s-1GNU\s0 grep, it's best to enable both the Cache and Grep implementations. If the cache information is available, you'll get very fast speeds. Otherwise, you'll take about a 1/3 performance hit when the Grep version is used instead. .PP The overriding requirement for this module is speed. If you wish more sophisticated parsing, use Mail::MboxParser (which is based on this module) or Mail::Box. .SS "\s-1METHODS AND FUNCTIONS\s0" .IX Subsection "METHODS AND FUNCTIONS" .IP "\s-1SETUP_CACHE\s0(...)" 4 .IX Item "SETUP_CACHE(...)" .Vb 1 \& SETUP_CACHE( { \*(Aqfile_name\*(Aq => } ); \& \& \- the file name of the cache .Ve .Sp Call this function once to set up the cache before creating any parsers. You must provide the location to the cache file. There is no default value. .IP "new(...)" 4 .IX Item "new(...)" .Vb 7 \& new( { \*(Aqfile_name\*(Aq => , \& \*(Aqfile_handle\*(Aq => , \& \*(Aqenable_cache\*(Aq => <1 or 0>, \& \*(Aqenable_grep\*(Aq => <1 or 0>, \& \*(Aqforce_processing\*(Aq => <1 or 0>, \& \*(Aqdebug\*(Aq => <1 or 0>, \& } ); \& \& \- the file name of the mailbox \& \- the already opened file handle for the mailbox \& \- true to attempt to use the cache implementation \& \- true to attempt to use the grep implementation \& \- true to force processing of files that look invalid \& \- true to print some debugging information to STDERR .Ve .Sp The constructor takes either a file name or a file handle, or both. If the file handle is not defined, Mail::Mbox::MessageParser will attempt to open the file using the file name. You should always pass the file name if you have it, so that the parser can cache the mailbox information. .Sp This module will automatically decompress the mailbox as necessary. If a filename is available but the file handle is undef, the module will call bzip, bzip2, gzip, lzip, xz to decompress the file in memory if the filename ends with the appropriate suffix. If the file handle is defined, it will detect the type of compression and apply the correct decompression program. .Sp The Cache, Grep, or Perl implementation of the parser will be loaded, whichever is most appropriate. For example, the first time you use caching, there will be no cache. In this case, the grep implementation can be used instead. The cache will be updated in memory as the grep implementation parses the mailbox, and the cache will be written after the program exits. The file name is optional, in which case \fIenable_cache\fR and \fIenable_grep\fR must both be false. .Sp \&\fIforce_processing\fR will cause the module to process folders that look to be binary, or whose text data doesn't look like a mailbox. .Sp Returns a reference to a Mail::Mbox::MessageParser object on success, and a scalar desribing an error on failure. (\*(L"Not a mailbox\*(R", \*(L"Can't open : \*(R", \*(L"Can't execute for file \*(R" .IP "\fBreset()\fR" 4 .IX Item "reset()" Reset the filehandle and all internal state. Note that this will not work with filehandles which are streams. If there is enough demand, I may add the ability to store the previously read stream data internally so that \fI\f(BIreset()\fI\fR will work correctly. .IP "\fBendline()\fR" 4 .IX Item "endline()" Returns \*(L"\en\*(R" or \*(L"\er\en\*(R", depending on the file format. .IP "\fBprologue()\fR" 4 .IX Item "prologue()" Returns any newlines or other content at the start of the mailbox prior to the first email. .IP "\fBend_of_file()\fR" 4 .IX Item "end_of_file()" Returns true if the end of the file has been encountered. .IP "\fBline_number()\fR" 4 .IX Item "line_number()" Returns the line number for the start of the last email read. .IP "\fBnumber()\fR" 4 .IX Item "number()" Returns the number of the last email read. (i.e. The first email will have a number of 1.) .IP "\fBlength()\fR" 4 .IX Item "length()" Returns the length of the last email read. .IP "\fBoffset()\fR" 4 .IX Item "offset()" Returns the byte offset of the last email read. .IP "\fBread_next_email()\fR" 4 .IX Item "read_next_email()" Returns a reference to a scalar holding the text of the next email in the mailbox, or undef at the end of the file. .SH "BUGS" .IX Header "BUGS" No known bugs. .PP Contact david@coppit.org for bug reports and suggestions. .SH "AUTHOR" .IX Header "AUTHOR" David Coppit . .SH "LICENSE" .IX Header "LICENSE" This code is distributed under the \s-1GNU\s0 General Public License (\s-1GPL\s0) Version 2. See the file \s-1LICENSE\s0 in the distribution for details. .SH "HISTORY" .IX Header "HISTORY" This code was originally part of the grepmail distribution. See http://grepmail.sf.net/ for previous versions of grepmail which included early versions of this code. .SH "SEE ALSO" .IX Header "SEE ALSO" Mail::MboxParser, Mail::Box