'\" t .\" Title: pdfgrep .\" Author: [see the "AUTHORS" section] .\" Generator: DocBook XSL Stylesheets vsnapshot .\" Date: 11/19/2018 .\" Manual: Pdfgrep Manual .\" Source: Pdfgrep 2.1.1 .\" Language: English .\" .TH "PDFGREP" "1" "11/19/2018" "Pdfgrep 2\&.1\&.1" "Pdfgrep Manual" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" pdfgrep \- search PDF files for a regular expression .SH "SYNOPSIS" .sp .nf \fBpdfgrep\fR [\fIOPTION\fR\&...] \fIPATTERN\fR [\fIFILE\fR\&...] \fBpdfgrep\fR [\fIOPTION\fR\&...] [\fB\-e\fR \fIPATTERN\fR | \fB\-f\fR \fIFILE\fR] [\fIFILE\fR\&...] .fi .SH "DESCRIPTION" .sp Search for \fIPATTERN\fR in each PDF \fIFILE\fR and print matching lines\&. By default, \fIPATTERN\fR is an extended regular expression\&. .sp \fBpdfgrep\fR tries to be mostly compatible with \fBGNU grep\fR with some PDF\-specific distinctions and additional options\&. Most notably, \fB\-n\fR prints page instead of line numbers\&. .SH "OPTIONS" .SS "General Information" .PP \fB\-\-help\fR .RS 4 Print a short summary of the options\&. .RE .PP \fB\-V\fR, \fB\-\-version\fR .RS 4 Show version information\&. .RE .SS "Pattern Interpretation" .PP \fB\-F\fR, \fB\-\-fixed\-strings\fR .RS 4 Interpret \fIPATTERN\fR as a list of fixed strings separated by newlines, any of which is to be matched\&. .RE .PP \fB\-P\fR, \fB\-\-perl\-regexp\fR .RS 4 Interpret \fIPATTERN\fR as a Perl compatible regular expression (PCRE)\&. See \fIpcresyntax\fR(3) for a quick overview\&. .RE .SS "Matching Control" .PP \fB\-e\fR \fIPATTERN\fR, \fB\-\-regexp=\fR\fIPATTERN\fR .RS 4 Use \fIPATTERN\fR as the pattern to search for\&. If this option is specified multiple times or combined with \fB\-\-file\fR, all patterns are tried in turn until one of them matches\&. .RE .PP \fB\-f\fR \fIFILE\fR, \fB\-\-file=\fR\fIFILE\fR .RS 4 Read patterns from \fIFILE\fR, one per line\&. If \fIFILE\fR contains multiple patterns or if this option is applied multiple times or combined with \fB\-e\fR, all patterns are tried in turn until one of them matches\&. An empty pattern list matches nothing\&. .RE .PP \fB\-i\fR, \fB\-\-ignore\-case\fR .RS 4 Ignore case distinctions in both the \fIPATTERN\fR and the input files\&. .RE .SS "General Output Control" .PP \fB\-c\fR, \fB\-\-count\fR .RS 4 Suppress normal output\&. Instead print the number of matches for each input file\&. Note that unlike grep, multiple matches on the same page will be counted individually\&. .RE .PP \fB\-p\fR, \fB\-\-page\-count\fR .RS 4 Like \fB\-c\fR, but prints the number of matches per page\&. Implies \fB\-n\fR\&. .RE .PP \fB\-\-color\fR \fIWHEN\fR .RS 4 Surround file names, page numbers and matched text with escape sequences to display them in color on the terminal\&. \fIWHEN\fR can be: .TS tab(:); lt lt lt lt lt lt. T{ \fBalways\fR T}:T{ Always use colors, even when stdout is not a terminal\&. T} T{ \fBnever\fR T}:T{ Do not use colors\&. T} T{ \fBauto\fR T}:T{ Use colors only when stdout is a terminal (this is the default)\&. T} .TE .sp 1 .RE .PP \fB\-L\fR, \fB\-\-files\-without\-match\fR .RS 4 Suppress normal output\&. Instead print the name of each input file that doesn\(cqt contain a match\&. This works well with \fB\-Z\fR, but many other output options like \fB\-n\fR or \fB\-c\fR are ignored when \fB\-L\fR is specified\&. .RE .PP \fB\-l\fR, \fB\-\-files\-with\-matches\fR .RS 4 Suppress normal output\&. Instead print the name of each input file that contains a match\&. This works well with \fB\-Z\fR, but many other output options like \fB\-n\fR or \fB\-c\fR are ignored when \fB\-l\fR is specified\&. .RE .PP \fB\-m\fR, \fB\-\-max\-count\fR \fINUM\fR .RS 4 Stop reading a file after \fINUM\fR matches\&. When the \-c or \-\-count option is also used, pdfgrep does not output a count greater than \fINUM\fR\&. .RE .PP \fB\-o\fR, \fB\-\-only\-matching\fR .RS 4 Print only the matched part of a line without any surrounding context\&. .RE .PP \fB\-q\fR, \fB\-\-quiet\fR .RS 4 Suppress all normal output to stdout\&. Exit immediately with exit status 0 if a match is found, even in case of errors\&. Use this if you only care about the presence of matches, not their number or content\&. .RE .SS "Line Prefix Control" .PP \fB\-H\fR, \fB\-\-with\-filename\fR .RS 4 Print the file name for each match\&. This is the default setting when there is more than one file to search\&. .RE .PP \fB\-h\fR, \fB\-\-no\-filename\fR .RS 4 Suppress the prefixing of file name on output\&. This is the default setting when there is only one file to search\&. .RE .PP \fB\-n\fR, \fB\-\-page\-number\fR .RS 4 Prefix each match with the number of the page where it was found\&. .RE .PP \fB\-Z\fR, \fB\-\-null\fR .RS 4 Output a null byte (called \fINUL\fR in ASCII and \*(Aq\e0\*(Aq in C) instead of the colon that usually separates a filename from the rest of the line\&. This option makes the output unambiguous in the presence of colons, spaces or newlines in the filename\&. It can be used in conjunction with commands such as \fIxargs\ \&\-0\fR or \fIperl\ \&\-0\fR\&. .RE .PP \fB\-\-match\-prefix\-separator\fR \fISEP\fR .RS 4 Changes the colon used to separate filename, line number and text in the output to \fISEP\fR, which can be an arbitrary string\&. This is useful when filenames contain colons, but only for interactive usage\&. For scripting, \fB\-\-null\fR should be used\&. .RE .SS "Context Control" .PP \fB\-A\fR \fINUM\fR, \fB\-\-after\-context=NUM\fR .RS 4 Print \fINUM\fR lines of context after matching lines\&. Contiguous groups of matches are separated by a line containing \fB\-\-\fR\&. With \fB\-o\fR, this option has no effect\&. .RE .PP \fB\-B\fR \fINUM\fR, \fB\-\-before\-context=NUM\fR .RS 4 Print \fINUM\fR lines of context before matching lines\&. Contiguous groups of matches are separated by a line containing \fB\-\-\fR\&. With \fB\-o\fR, this option has no effect\&. .RE .PP \fB\-C\fR \fINUM\fR, \fB\-\-context=NUM\fR .RS 4 Print \fINUM\fR lines of context before and after matching lines\&. Contiguous groups of matches are separated by a line containing \fB\-\-\fR\&. With \fB\-o\fR, this option has no effect\&. .RE .SS "File Selection" .PP \fB\-r\fR, \fB\-\-recursive\fR .RS 4 Recursively search all files (restricted by \fB\-\-include\fR and \fB\-\-exclude\fR) under each directory, following symlinks only if they are on the command line\&. .RE .PP \fB\-R\fR, \fB\-\-dereference\-recursive\fR .RS 4 Same as \fB\-r\fR, but follows all symlinks\&. .RE .PP \fB\-\-exclude=\fR\fIGLOB\fR .RS 4 Skip files whose base name matches \fIGLOB\fR\&. See \fIglob\fR(7) for wildcards you can use\&. You can use this option multiple times to exclude more patterns\&. It takes precedence over \fB\-\-include\fR\&. Note, that in\- and excludes apply only to files found via \fB\-\-recursive\fR and not to the argument list\&. .RE .PP \fB\-\-include=\fR\fIGLOB\fR .RS 4 Only search files whose base name matches \fIGLOB\fR\&. See \fB\-\-exclude\fR for details\&. The default is \fI*\&.pdf\fR\&. .RE .SS "Other Options" .PP \fB\-\-cache\fR .RS 4 Use a cache for the rendered text to speed up the operation on large files\&. .RE .PP \fB\-\-password=\fR\fIPASSWORD\fR .RS 4 Use PASSWORD to decrypt the PDF\-files\&. Can be specified multiple times; all passwords will be tried on all PDFs\&. \fBNote\fR that this password will show up in your command history and the output of \fIps\fR(1)\&. So please do not use this if the security of \fIPASSWORD\fR is important\&. .RE .PP \fB\-\-page\-range=\fR\fIRANGE\fR .RS 4 Limit search to a specified set of pages\&. \fIRANGE\fR is a comma separated list of either a single page number or a range expression of the form PAGE1\-PAGE2\&. Example: 2\-3,5,7\-10\&. .RE .PP \fB\-\-debug\fR .RS 4 Enable debug output\&. \fBNote\fR: Due to limitations of poppler before version 0\&.30\&.0, some debug output is also printed without \fB\-\-debug\fR when using such a poppler version\&. .RE .PP \fB\-\-warn\-empty\fR .RS 4 Print a warning to \fIstderr\fR if a PDF contains no searchable text\&. This is the case for PDFs that consist only of images, for example scanned documents\&. .RE .PP \fB\-\-unac\fR .RS 4 Remove accents and ligatures from both the search pattern and the PDF documents\&. This is useful if you want to search for a word containing "ae", but the PDF uses the single character "æ" instead\&. See \fBunac(3)\fR and \fBunaccent(1)\fR for details\&. .sp \fBThis option is experimental and only available if pdfgrep is compiled with unac support\&.\fR .RE .SH "EXIT STATUS" .sp Normally, the exit status is 0 if at least one match is found, 1 if no match is found and 2 if an error occurred\&. But if the \fB\-\-quiet\fR or \fB\-q\fR option is used and a match was found, \fBpdfgrep\fR will return 0 regardless of errors\&. .SH "ENVIRONMENT VARIABLES" .sp The behavior of \fBpdfgrep\fR is affected by the following environment variable\&. .PP \fBGREP_COLORS\fR .RS 4 Specifies the colors and other attributes used to highlight various parts of the output\&. The syntax and values are like \fBGREP_COLORS\fR of \fBgrep\fR\&. See \fIgrep\fR(1) for more details\&. Currently only the capabilities \fBmt\fR, \fBms\fR, \fBmc\fR, \fBfn\fR, \fBln\fR and \fBse\fR are used by \fBpdfgrep\fR, where \fBmt\fR, \fBms\fR and \fBmc\fR have the same effect\&. .RE .SH "FILES" .PP \fB${XDG_CACHE_HOME}/pdfgrep/\fR* .RS 4 Cache files written and used when \fB\-\-cache\fR is enabled\&. At most 200 cache entries older than a day are retained\&. .RE .SH "EXAMPLES" .PP \fBPrint the first ten lines matching \fR\fB\fIpattern\fR\fR\fB and print their page number:\fR .RS 4 .sp .if n \{\ .RS 4 .\} .nf pdfgrep \-n \-\-max\-count 10 pattern foo\&.pdf .fi .if n \{\ .RE .\} .RE .PP \fBSearch all \&.pdf files whose names begin with \fR\fB\fIfoo\fR\fR\fB recursively in the current directory:\fR .RS 4 .sp .if n \{\ .RS 4 .\} .nf pdfgrep \-r \-\-include "foo*\&.pdf" pattern .fi .if n \{\ .RE .\} .RE .PP \fBSearch all PDFs in the current directory for \fR\fB\fIfoo\fR\fR\fB that also contain \fR\fB\fIbar\fR\fR\fB:\fR .RS 4 .sp .if n \{\ .RS 4 .\} .nf pdfgrep \-Z \-\-files\-with\-matches "bar" *\&.pdf | xargs \-0 pdfgrep \-H foo .fi .if n \{\ .RE .\} .RE .PP \fBSearch all \&.pdf files that are smaller than 12M recursively in the current directory:\fR .RS 4 .sp .if n \{\ .RS 4 .\} .nf find \&. \-name "*\&.pdf" \-size \-12M \-print0 | xargs \-0 pdfgrep pattern .fi .if n \{\ .RE .\} .sp Note that in contrast to the previous examples, this task could not be solved with pdfgrep alone, but the Unix tools \fBfind(1)\fR and \fBxargs(1)\fR had to be used\&. That\(cqs because pdfgrep itself doesn\(cqt include options to exclude files by their size\&. But as you see, it doesn\(cqt have to! .RE .SH "BUGS" .SS "Reporting Bugs" .sp Bugs can either be reportet to the mailing list (pdfgrep\-users@pdfgrep\&.org) or to the bugtracker on gitlab (https://gitlab\&.com/pdfgrep/pdfgrep/issues)\&. .SH "AUTHORS" .sp \fBpdfgrep\fR is maintained by Hans\-Peter Deifel\&. .sp See the \fIAUTHORS\fR file in the source for a full list of contributors\&. .SH "SEE ALSO" .sp grep(1), pcre(3), regex(7) .sp See pdfgrep\(cqs website https://pdfgrep\&.org for more information, downloads, git repository and more\&.