NAME¶
gscan2pdf - A GUI to produce PDFs or DjVus from scanned documents
USAGE¶
- 1. Scan one or several pages in with File/Scan
- 2. Create PDF of selected pages with File/Save
REQUIRED ARGUMENTS¶
None
OPTIONS¶
gscan2pdf has the following command-line options:
- --device=<device> Specifies the device to use, instead of getting
the list of devices from via the SANE API. This can be useful if the scanner
is on a remote computer which is not broadcasting its existence.
- --help Displays this help page and exits.
- --log=<log file> Specifies a file to store logging messages.
- --(debug|info|warn|error|fatal) Defines the log level. If a log file is
specified, this defaults to 'debug', otherwise 'warn'.
- --version Displays the program version and exits.
Scanning is handled with SANE via scanimage. PDF conversion is done by
PDF::API2. TIFF export is handled by libtiff (faster and smaller memory
footprint for multipage files).
DIAGNOSTICS¶
To diagnose a possible error, start gscan2pdf from the command line with logging
enabled:
"gscan2pdf --log=file.log"
and check file.log.
EXIT STATUS¶
None
CONFIGURATION¶
gscan2pdf creates a text resource file called .gscan2pdf in the user's home
directory. Generally, however, preferences should be changed via the
Edit/Preferences menu, or are captured automatically during normal usage of
the program.
INCOMPATIBILITIES¶
None known.
BUGS AND LIMITATIONS¶
Whilst it is possible to import PDFs, this is intended to be able to round-trip
files created by gscan2pdf. Hence, only the images are imported, and all text
is ignored.
Download¶
gscan2pdf is available on Sourceforge
(<
http://sourceforge.net/projects/gscan2pdf/files/gscan2pdf/>).
Debian-based¶
If you are using Debian, you should find that sid has the latest version already
packaged.
If you are using a Ubuntu-based system, you can automatically keep up to date
with the latest version via the ppa:
"sudo apt-add-repository ppa:jeffreyratcliffe/ppa"
If you are you are using Synaptic, then use menu
Edit/Reload Package
Information, search for gscan2pdf in the package list, and lo and behold,
you can install the nice shiny new version.
From the command line:
"sudo apt-get update"
"sudo apt-get install gscan2pdf"
RPMs¶
Download the rpm from Sourceforge, and then install it with "rpm -i
gscan2pdf-version.rpm"
From source¶
The source is hosted in the files section of the gscan2pdf project on
Sourceforge (<
http://sourceforge.net/projects/gscan2pdf/files/>).
From the repository¶
gscan2pdf uses Git for its Revision Control System. You can browse the tree at
<
http://sourceforge.net/p/gscan2pdf/code/>.
Git users can clone the complete tree with "git clone
git://git.code.sf.net/p/gscan2pdf/code"
Building gscan2pdf from source¶
Having downloaded the source either from a Sourceforge file release, or from the
Git repository, unpack it if necessary with "tar xvfz
gscan2pdf-x.x.x.tar.gz cd gscan2pdf-x.x.x"
"perl Makefile.PL", will create the Makefile. There is a "make
test", but this is not machine-dependent, and therefore really just for
my benefit to make sure I haven't broken the device-dependent options parsing
routine.
You can install directly from the source with "make install", but
building the appropriate package for your distribution should be as
straightforward as "make debdist" or "make rpmdist".
However, you will additionally need the rpm, devscripts, fakeroot, debhelper
and gettext packages.
Dependencies¶
The list below looks daunting, but all packages are available from any
reasonable up-to-date distribution. If you are using Synaptic, having
installed gscan2pdf, locate the gscan2pdf entry in Synaptic, right-click it
and you can install them under
Recommends. Note also that the library
names given below are the Debian/Ubuntu ones. Those distributions using RPM
typically use perl(module) where Debian has libmodule-perl.
- Required
- libgtk2.0-0 (>= 2.4)
- The GTK+ graphical user interface library.
- libglib-perl (>= 1.100-1)
- Perl interface to the GLib and GObject libraries
- libgtk2-perl (>= 1:1.043-1)
- Perl interface to the 2.x series of the Gimp Toolkit library
- libgtk2-imageview-perl
- Perl bindings to the gtkimageview widget. See
<http://trac.bjourne.webfactional.com/>
- libgtk2-ex-simple-list-perl
- A simple interface to Gtk2's complex MVC list widget
- liblocale-gettext-perl (>= 1.05)
- Using libc functions for internationalisation in Perl
- libpdf-api2-perl
- provides the functions for creating PDF documents in Perl
- libsane
- API library for scanners
- libsane-perl
- Perl bindings for libsane.
- libset-intspan-perl
- manages sets of integers
- libtiff-tools
- TIFF manipulation and conversion tools
- Imagemagick
- Image manipulation programs
- perlmagick
- A perl interface to the libMagick graphics routines
- sane-utils
- API library for scanners -- utilities.
- Optional
Support¶
There are two mailing lists for gscan2pdf:
- gscan2pdf-announce
- A low-traffic list for announcements, mostly of new releases. You can
subscribe at
<http://lists.sourceforge.net/lists/listinfo/gscan2pdf-announce>
- gscan2pdf-help
- General support, questions, etc.. You can subscribe at
<http://lists.sourceforge.net/lists/listinfo/gscan2pdf-help>
Reporting bugs¶
Before reporting bugs, please read the "FAQs" section.
Please report any bugs found, preferably against the Debian package[1][2]. You
do not need to be a Debian user, or set up an account to do this.
- 1. http://packages.debian.org/sid/gscan2pdf
- 2. http://www.debian.org/Bugs/
Alternatively, there is a bug tracker for the gscan2pdf project on Sourceforge
(<
http://sourceforge.net/p/gscan2pdf/_list/tickets?source=navbar>).
Please include the log file created by "gscan2pdf --log=log" with any
new bug report.
Translations¶
gscan2pdf has already been partly translated several languages. If you would
like to contribute to an existing or new translation, please check out
Rosetta: <
https://translations.launchpad.net/gscan2pdf>
Note that the translations for the scanner options are taken directly from
sane-backends. If you would like to contribute to these, you can do so either
at contact the sane-devel mailing list (sane-devel@lists.alioth.debian.org)
and have a look at the po/ directory in the source code
<
http://www.sane-project.org/cvs.html>.
Alternatively, Ubuntu has its own translation project. For the 9.04 release, the
translations are available at
<
https://translations.launchpad.net/ubuntu/jaunty/+source/sane-backends/+pots/sane-backends>
DESCRIPTION¶
File¶
New
Clears the page list.
Open
Opens any format that imagemagick supports. PDFs will have their embedded images
extracted and imported one per page.
Scan
Sets options before scanning via SANE.
Device
Chooses between available scanners.
# Pages
Selects the number of pages, or all pages to scan.
Source document
Selects between single sided or double sides pages.
This affects the page numbering. Single sided scans are numbered consecutively.
Double sided scans are incremented (or decremented, see below) by 2, i.e. 1,
3, 5, etc..
Side to scan
If double sided is selected above, assuming a non-duplex scanner, i.e. a scanner
that cannot automatically scan both sides of a page, this determines whether
the page number is incremented or decremented by 2.
To scan both sides of three pages, i.e. 6 sides:
- 1. Select:
- # Pages = 3 (or "all" if your scanner can detect when it is out
of paper)
Double sided
Facing side
- 2. Scans sides 1, 3 & 5.
- 3. Put pile back with scanner ready to scan back of last page.
- 4. Select:
- # Pages = 3 (or "all" if your scanner can detect when it is out
of paper)
Double sided
Reverse side
- 5. Scans sides 6, 4 & 2.
- 6. gscan2pdf automatically sorts the pages so that they appear in the
correct order.
Device-dependent options
These, naturally, depend on your scanner. They can include
- Page size.
- Mode (colour/black & white/greyscale)
- Resolution (in PPI)
- Batch-scan
- Guarantees that a "no documents" condition will be returned
after the last scanned page, to prevent endless flatbed scans after a
batch scan.
- Wait-for-button/Button-wait
- After sending the scan command, wait until the button on the scanner is
pressed before actually starting the scan process.
- Source
- Selects the document source. Possible options can include Flatbed or ADF.
On some scanners, this is the only way of generating an out-of-documents
signal.
Save
Saves the selected or all pages as a PDF, DjVu, TIFF, PNG, JPEG, PNM or GIF.
PDF Metadata
Metadata are information that are not visible when viewing the PDF, but are
embedded in the file and so searchable and can be examined, typically with the
"Properties" option of the PDF viewer.
The metadata are completely optional, but can also be used to generate the
filename see preferences for details.
DjVu
Both black and white, and colour images produce better compression than PDF. See
<
http://www.djvuzone.org/> for more details.
Email as PDF
Attaches the selected or all pages as a PDF to a blank email. This requires
xdg-email, which is in the xdg-utils package. If this is not present, the
option is ghosted out.
Print
Prints the selected or all pages.
Compress temporary files
If your temporary ($TMPDIR) directory is getting full, this function can be
useful - compressing all images at LZW-compressed TIFFs. These require much
less space than the PNM files that are typically produced by SANE or by
importing a PDF.
Edit¶
Delete
Deletes the selected page.
Renumber
Renumbers the pages from 1..n.
Note that the page order can also be changed by drag and drop in the thumbnail
view.
Select
The select menus can be used to select, all, even, odd, blank, dark or modified
pages. Selecting blank or dark pages runs imagemagick to make the decision.
Selecting modified pages selects those which have modified by threshold,
unsharp, etc., since the last OCR run was made.
Preferences
The preferences menu item allows the control of the default behaviour of various
functions. Most of these are self-explanatory.
Frontend
gscan2pdf supports two frontends, scanimage and scanadf. scanadf support was
added when it was realised that scanadf works better than scanimage with some
scanners. On Debian-based systems, scanadf is in the sane package, not, like
scanimage, in sane-utils. If scanadf is not present, the option is obviously
ghosted out.
In 0.9.27, Perl bindings for SANE were introduced and two further frontends,
scanimage-perl and scanadf-perl (scanimage and scanadf transliterated from C
into Perl) were added.
Before 1.2.0, options available through CLI frontends like scanimage were made
visible as users asked for them. In 1.2.0, all options can be shown or hidden
via Edit/Preferences, along with the ability to specify which options trigger
a reload.
Default filename for PDF files
The following variables are available, which are replaced by the corresponding
metadata:
%a author
%t title
%y document's year
%Y today's year
%m document's month
%M today's month
%d document's day
%D today's day
View¶
Zoom 100%
Zooms to 1:1. How this appears depends on the desktop resolution.
Zoom to fit
Scales the view such that all the page is visible.
Zoom in
Zoom out
Rotate 90 clockwise
The rotate options require the package imagemagick and, if this is not present,
are ghosted out.
Rotate 180
Rotate 90 anticlockwise
Threshold
Changes all pixels darker than the given value to black; all others become
white.
Unsharp mask
The unsharp option sharpens an image. The image is convolved with a Gaussian
operator of the given radius and standard deviation (sigma). For reasonable
results, radius should be larger than sigma. Use a radius of 0 to have the
method select a suitable radius.
Crop
unpaper
unpaper (see <
http://unpaper.berlios.de/>) is a utility for cleaning up a
scan.
OCR (Optical Character Recognition)
The gocr, tesseract, ocropus or cuneiform utilities are used to produce text
from an image.
There is an OCR output buffer for each page and is embedded as plain text behind
the scanned image in the PDF produced. This way, Beagle can index (i.e.
search) the plain text.
In DjVu files, the OCR output buffer is embedded in the hidden text layer. Thus
these can also be indexed by Beagle.
There is an interesting review of OCR software at
<
http://web.archive.org/web/20080529012847/http://groundstate.ca/ocr>.
An important conclusion was that 400ppi is necessary for decent results.
Up to v2.04, the only way to tell which languages were available to tesseract
was to look for the language files. Therefore, gscan2pdf checks the path
returned by:
tesseract '' '' -l ''
If there are no language files in the above location, then gscan2pdf assumes
that tesseract v1.0 is installed, which had no language files.
Variables for user-defined tools
The following variables are available:
%i input filename
%o output filename
%r resolution
An image can be modified in-place by just specifying %i.
FAQs¶
Why isn't option xyz available in the scan window?¶
Possibly because SANE or your scanner doesn't support it.
If an option listed in the output of "scanimage --help" that you would
like to use isn't available, send me the output and I will look at
implementing it.
I've only got an old flatbed scanner with no automatic sheetfeeder. How do I scan a multipage document?¶
If you are lucky, you have an option like Wait-for-button or Button-wait, where
the scanner will wait for you to press the scan button on the device before it
starts the scan, allowing you to scan multiple pages without touching the
computer.
Otherwise, you have to set the number of pages to scan to 1 and hit the scan
button on the scan window for each page.
Why is option xyz ghosted out?¶
Probably because the package required for that option is not installed. Email as
PDF requires xdg-email (xdg-utils), unpaper and the rotate options require
imagemagick.
Why can I not scan from the flatbed of my HP scanner?¶
Generally for HP scanners with an ADF, to scan from the flatbed, you should set
"# Pages" to "1", and possibly "Batch scan" to
"No".
When I update gscan2pdf using the Update Manager in Ubuntu, why is the list of changes never displayed?¶
As far as I can tell, this is pulled from changelogs.ubuntu.com, and therefore
only the changelogs from official Ubuntu builds are displayed.
Why can gscan2pdf not find my scanner?¶
If your scanner is not connected directly to the machine on which you are
running gscan2pdf and you have not installed the SANE daemon, saned, gscan2pdf
cannot automatically find it. In this case, you can specify the scanner device
on the command line:
"gscan2pdf --device <device">
How can I search for text in the OCR layer of the finished PDF or DJVU file?¶
pdftotext or djvutxt can extract the text layer from PDF or DJVU files. See the
respective man pages for details.
Having opened a PDF or DJVU file in evince or Acrobat Reader, the search
function will typically find the page with the requested text and highlight
it.
There are various tools for searching or indexing files, including PDF and DJVU:
- •
- (meta) Tracker (<https://projects.gnome.org/tracker/>)
- •
- plone (<http://plone.org/>)
- •
- pdfgrep (<http://pdfgrep.sourceforge.net/>
- •
- swish-e (<http://www.swish-e.org/>)
- •
- recoll (<http://www.lesbonscomptes.com/recoll/>)
- •
- terrier (<http://www.lesbonscomptes.com/recoll/>)
See Also¶
Xsane
http://scantailor.sourceforge.net/
Author¶
Jeffrey Ratcliffe (ra28145 at users dot sf dot net)
Thanks to¶
- •
- all the people who have sent patches, translations, bugs and
feedback.
- •
- the GTK2 project for a most excellent graphics toolkit.
- •
- the Gtk2-Perl project for their superb Perl bindings for GTK2.
- •
- The SANE project for scanner access
- •
- Bjoern Lindqvist for the gtkimageview widget
- •
- Sourceforge for hosting the project.
LICENSE AND COPYRIGHT¶
Copyright (C) 2006--2014 Jeffrey Ratcliffe <Jeffrey.Ratcliffe@gmail.com>
This program is free software: you can redistribute it and/or modify it under
the terms of the version 3 GNU General Public License as published by the Free
Software Foundation.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program. If not, see <
http://www.gnu.org/licenses/>.