NAME¶

CAM::PDF::PageText - Extract text from PDF page tree

SYNOPSIS¶

   my $pdf = CAM::PDF->new($filename);
   my $pageone_tree = $pdf->getPageContentTree(1);
   print CAM::PDF::PageText->render($pageone_tree);

DESCRIPTION¶

This module attempts to extract sequential text from a PDF page. This is not a robust process, as PDF text is graphically laid out in arbitrary order. This module uses a few heuristics to try to guess what text goes next to what other text, but may be fooled easily by, say, subscripts, non-horizontal text, changes in font, form fields etc.

All those disclaimers aside, it is useful for a quick dump of text from a simple PDF file.

LICENSE¶

Same as CAM::PDF

FUNCTIONS¶

$pkg->render($pagetree)

$pkg->render($pagetree, $verbose)

Turn a page content tree into a string. This is a class method that should be called like:

   CAM::PDF::PageText->render($pagetree);

AUTHOR¶

See CAM::PDF

2022-06-09

perl v5.34.0

Source file:	CAM::PDF::PageText.3pm.en.gz (from libcam-pdf-perl 1.60-5)
Source last updated:	2022-12-08T00:37:04Z
Converted to HTML:	2023-06-10T23:33:12Z