table of contents
GREPMAIL(1p) | User Contributed Perl Documentation | GREPMAIL(1p) |
NAME¶
grepmail - search mailboxes for mail matching a regular expressionSYNOPSIS¶
grepmail [--help|--version] [-abBDFhHilLmrRuvVw] [-C <cache-file>] [-j <status>] [-s <sizespec>] [-d <date-specification>] [-X <signature-pattern>] [-Y <header-pattern>] [[-e] <pattern>|-E <expr>|-f <pattern-file>] <files...>
DESCRIPTION¶
grepmail looks for mail messages
containing a pattern, and prints the resulting messages on standard out.
By default grepmail looks in both header and body for the specified
pattern.
When redirected to a file, the result is another mailbox, which can, in turn, be
handled by standard User Agents, such as elm, or even used as input for
another instance of grepmail.
At least one of -E, -e, -d, -s, or -u must be
specified. The pattern is optional if -d, -s, and/or -u
is used. The -e flag is optional if there is no file whose name is the
pattern. The -E option can be used to specify complex search
expressions involving logical operators. (See below.)
If a mailbox can not be found, grepmail first searches the directory specified
by the MAILDIR environment variable (if one is defined), then searches the
$HOME/mail, $HOME/Mail, and $HOME/Mailbox directories.
OPTIONS AND ARGUMENTS¶
Many of the options and arguments are analogous to those of grep.- pattern
- The pattern to search for in the mail message. May be any
Perl regular expression, but should be quoted on the command line to
protect against globbing (shell expansion). To search for more than one
pattern, use the form "(pattern1|pattern2|...)".
- mailbox
- Mailboxes must be traditional, UNIX "/bin/mail"
mailbox format. The mailboxes may be compressed by gzip, or bzip2, in
which case gunzip, or bzip2 must be installed on the system.
- -a
- Use arrival date instead of sent date.
- -b
- Asserts that the pattern must match in the body of the email.
- -B
- Print the body but with only minimal ('From ', 'From:', 'Subject:', 'Date:') headers. This flag can be used with -H, in which case it will print only short headers and no email bodies.
- -C
- Specifies the location of the cache file. The default is $HOME/.grepmail-cache.
- -D
- Enable debug mode, which prints diagnostic messages.
- -d
- Date specifications must be of the form of:
- a date like "today", "yesterday", "5/18/93", "5 days ago", "5 weeks ago",
- OR "before", "after", or "since", followed by a date as defined above,
- OR "between <date> and <date>", where <date> is defined as above.
- -E
- Specify a complex search expression using logical
operators. The current syntax allows the user to specify search
expressions using Perl syntax. Three values can be used: $email (the
entire email message), $email_header (just the header), or $email_body
(just the body). A search is specified in the form "$email =~
/pattern/", and multiple searches can be combined using
"&&" and "||" for "and" and
"or".
$email_header =~ /^From: .*\@coppit.org/ && $email =~ /grepmail/i
- -e
- Explicitly specify the search pattern. This is useful for specifying patterns that begin with "-", which would otherwise be interpreted as a flag.
- -f
- Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing.
- -F
- Force grepmail to process all files and streams as though they were mailboxes. (i.e. Skip checks for non-mailbox ASCII files or binary files that don't look like they are compressed using known schemes.)
- -h
- Asserts that the pattern must match in the header of the email.
- -H
- Print the header but not body of matching emails.
- -i
- Make the search case-insensitive (by analogy to grep -i).
- -j
- Asserts that the email "Status:" header must contain the given flags. Order and case are not important, so use -j AR or -j ra to search for emails which have been read and answered.
- -l
- Output the names of files having an email matching the expression, (by analogy to grep -l).
- -L
- Follow symbolic links. (Implies -R)
- -M
- Causes grepmail to ignore non-text MIME attachments. This removes false positives resulting from binaries encoded as ASCII attachments.
- -m
- Append "X-Mailfolder: <folder>" to all email headers, indicating which folder contained the matched email.
- -n
- Prefix each line with line number information. If multiple files are specified, the filename will precede the line number. NOTE: When used in conjunction with -m, the X-Mailfolder header has the same line number as the next (blank) line.
- -q
- Quiet mode. Suppress the output of warning messages about non-mailbox files, directories, etc.
- -r
- Generate a report of the names of the files containing emails matching the expression, along with a count of the number of matching emails.
- -R
- Causes grepmail to recurse any directories encountered.
- -s
- Return emails which match the size (in bytes) specified
with this flag. Note that this size includes the length of the header.
- 12345: match size of exactly 12345
- <12345, <=12345, >12345, >=12345: match size less than, less than or equal,
greater than, or greater than or equal to 12345
- 10000-12345: match size between 10000 and 12345 inclusive
- -S
- Ignore signatures. The signature consists of everything after a line consisting of "-- ".
- -u
- Output only unique emails, by analogy to sort -u. Grepmail determines email uniqueness by the Message-ID header.
- -v
- Invert the sense of the search, by analogy to grep -v. This results in the set of emails printed being the complement of those that would be printed without the -v switch.
- -V
- Print the version and exit.
- -w
- Search for only those lines which contain the pattern as
part of a word group. That is, the start of the pattern must match the
start of a word, and the end of the pattern must match the end of a word.
(Note that the start and end need not be for the same word.)
- -X
- Specify a regular expression for the signature separator. By default this pattern is '^-- $'.
- -Y
- Specify a pattern which indicates specific headers to be
searched. The search will automatically treat headers which span multiple
lines as one long line. This flag implies -h.
If the regular expression contains
"^TO:" it will be substituted by
which should match all headers with destination addresses.
If the regular expression contains "^FROM_DAEMON:" it will be
substituted by
which should catch mails coming from most daemons.
If the regular expression contains "^FROM_MAILER:" it will be
substituted by
(a stripped down version of "^FROM_DAEMON:"), which should catch mails
coming from most mailer-daemons.
So, to search for all emails to or from "Andy":
^((Original-)?(Resent-)?(To|Cc|Bcc)|(X-Envelope|Apparently(-Resent)?)-To):
(^(Mailing-List:|Precedence:.*(junk|bulk|list)|To: Multiple recipients of |(((Resent-)?(From|Sender)|X-Envelope-From):|>?From )([^>]*[^(.%@a-z0-9])?(Post(ma?(st(e?r)?|n)|office)|(send)?Mail(er)?|daemon|m(mdf|ajordomo)|n?uucp|LIST(SERV|proc)|NETSERV|o(wner|ps)|r(e(quest|sponse)|oot)|b(ounce|bs\.smtp)|echo|mirror|s(erv(ices?|er)|mtp(error)?|ystem)|A(dmin(istrator)?|MMGR|utoanswer))(([^).!:a-z0-9][-_a-z0-9]*)?[%@>\t ][^<)]*(\(.*\).*)?)?
(^(((Resent-)?(From|Sender)|X-Envelope-From):|>?From)([^>]*[^(.%@a-z0-9])?(Post(ma(st(er)?|n)|office)|(send)?Mail(er)?|daemon|mmdf|n?uucp|ops|r(esponse|oot)|(bbs\.)?smtp(error)?|s(erv(ices?|er)|ystem)|A(dmin(istrator)?|MMGR))(([^).!:a-z0-9][-_a-z0-9]*)?[%@>\t][^<)]*(\(.*\).*)?)?$([^>]|$))
grepmail -Y '(^TO:|^From:)' Andy mailbox
- --help
- Print a help message summarizing the usage.
- --
- All arguments following -- are treated as mail folders.
EXAMPLES¶
Count the number of emails. ("." matches every email.)grepmail -r . sent-mailGet all email between 2000 and 3000 bytes about books
grepmail books -s 2000-3000 sent-mailGet all email that you mailed yesterday
grepmail -d yesterday sent-mailGet all email that you mailed before the first thursday in June 1998 that pertains to research (requires Date::Manip):
grepmail research -d "before 1st thursday in June 1998" sent-mailGet all email that you mailed before the first of June 1998 that pertains to research:
grepmail research -d "before 6/1/98" sent-mailGet all email you received since 8/20/98 that wasn't about research or your job, ignoring case:
grepmail -iv "(research|job)" -d "since 8/20/98" saved-mailGet all email about mime but not about Netscape. Constrain the search to match the body, since most headers contain the text "mime":
grepmail -b mime saved-mail | grepmail Netscape -vPrint a list of all mailboxes containing a message from Rodney. Constrain the search to the headers, since quoted emails may match the pattern:
grepmail -hl "^From.*Rodney" saved-mail*Find all emails with the text "Pilot" in both the header and the body:
grepmail -hb "Pilot" saved-mail*Print a count of the number of messages about grepmail in all saved-mail mailboxes:
grepmail -br grepmail saved-mail*Remove any duplicates from a mailbox:
grepmail -u saved-mailConvert a Gnus mailbox to mbox format:
grepmail . gnus-mailbox-dir/* > mboxSearch for all emails to or from an address (taking into account wrapped headers and different header names):
grepmail -Y '(^TO:|^From:)' my@email.address saved-mailFind all emails from postmasters:
grepmail -Y '^FROM_MAILER:' . saved-mail
FILES¶
grepmail will not create temporary files while decompressing compressed archives. The last version to do this was 3.5. While the new design uses more memory, the code is much simpler, and there is less chance that email can be read by malicious third parties. Memory usage is determined by the size of the largest email message in the mailbox.ENVIRONMENT¶
The MAILDIR environment variable can be used to specify the default mail directory. This directory will be searched if the specified mailbox can not be found directly. The HOME environment variable is also used to find mailboxes if they can not be found directly. It is also used to store grepmail state information such as its cache file.BUGS AND LIMITATIONS¶
- Patterns containing "$" may cause problems
- Currently I look for "$" followed by a non-word character and replace it with the line ending for the current file (either "\n" or "\r\n"). This may cause problems with complex patterns specified with -E, but I'm not aware of any.
- Mails without bodies cause problems
- According to RFC 822, mail messages need not have message bodies. I've found and removed one bug related to this. I'm not sure if there are others.
- Complex single-point dates not parsed correctly
- If you specify a point date like "September 1,
2004", grepmail creates a date range that includes the entire day of
September 1, 2004. If you specify a complex point date such as
"today", "1st Monday in July", or "9/1/2004 at
0:00" grepmail may parse the time incorrectly.
- File names that look like flags cause problems.
- In some special circumstances, grepmail will be confused by files whose names look like flags. In such cases, use the -e flag to specify the search pattern.
AUTHOR¶
David Coppit, <david@coppit.org>, http://coppit.org/SEE ALSO¶
elm(1), mail(1), grep(1), perl(1), printmail(1), Mail::Internet(3), procmailrc(5). Crocker, D. H., Standard for the Format of Arpa Internet Text Messages, RFC 822.2010-04-28 | perl v5.10.1 |