NAME¶
Linklint - fast link checker and website maintenance tool
SYNOPSIS¶
linklint [
-cache directory] [
-case]
[
-checksum] [
-concise_url] [
-db1..9] [
-delay
d] [
-doc] [
-docbase
base ] [
-dont_output
xxxx] [
-error] [
-flush] [
-forward] [
-help] [
-help_all] [
-host
hostname:port ] [
-host
hostname] [
-htmlonly] [
-http] [
-http_header name:value] [
-ignore
ignoreset ] [
-index
file] [
-language zz] [
-limit n] [
-list] [
-local
linkset ] [
-map /a=[/b]]
[
-net] [
-netmod] [
-netset] [
-no_anchors] [
-no_query_string] [
-no_warn_index] [
-orphan] [
-out
file] [
-output_frames] [
-output_index
filename] [
-password realm
user:password ] [
-proxy
hostname[:port]] [
-quiet] [
-redirect]
[
-retry] [
-silent] [
-skip
skipset] [
-textonly] [
-timeout
t ] [
-url_doc_prefix
url/] [
-version] [
-warn] [
-xref]
linkset
VERSION¶
2.3.5 August 13, 2001
DESCRIPTION¶
This manual page documents briefly the Linklint program, which is an Open Source
Perl program that checks local and remote HTML links.
This manual page was written for the Debian distribution because the original
program does not have a manual page. Instead, it has documentation in the HTML
format; see below.
OPTIONS¶
Input File Selection
Whether you are doing a local site check or an HTTP site check, you specify
which directories (presumably containing HTML files) to check with one or more
linksets. A linkset uses two wildcard characters @ and #. Each linkset
specifies one or more directories much like the standard * and ? wildcard
characters are used to specify the characters in the * names of files in one
directory.
The @ character matches any string of characters (this kind of acts like
"*"), and the # character (which is kind of like "?")
matches any string of characters except "/" . The best way to
understand how @ and # work is to look at a few examples:
the entire site /@
the homepage only (default) /
files in the root directory only /#
. . . and one directory down /#/#
files in the sub directory only /sub/#
files in the sub directory and below /sub/@
specific files /file1 /file2 ...
specific subdirectories /sub1/@ /sub2/@ ...
If you specify more than one linkset, files matching any of the linksets will be
checked. HTML files that don't match any of the linksets will be skipped.
Linklint will see if they exist but won't check any of their links.
Other File Selection Options
- -skip skipset
- Skips HTML files that match skipset.
"Linklint" will make sure these files exist but won't add any of
their links to the list of files to check. Multiple skipsets are
allowed, but each must be preceded with -skip on the command line.
Skipsets use the same wildcard characters as linksets.
- -ignore ignoreset
- Ignores files matching ignoreset.
"Linklint" doesn't even check to see if these files exist.
Multiple ignoresets are allowed, but each must be preceded with
-ignore on the command line. Ignoresets use the same wildcard
characters as linksets.
- -limit n
- Limits checking to n HTML files (default 500). All
HTML files after the first n are skipped.
Local Site Checking
If you are developing HTML pages on a computer that does not have an http
server, or if you are developing a simple site that does not use Server
Redirection or extensive CGI, you should use local site checking.
linklint /@
Checks all HTML files in the current directory and below. Assumes that the
current directory is the server root directory so links starting with
"/" default to this directory. You must specify
/@ to check
the entire site. See Which Files to Check for details.
linklint -root dir /@
Checks all HTML files in dir and below. This is useful if you want to check
several sites on the same machine or if you don't want to run Linklint in your
public HTML directory.
Other Local Site Options
- -host hostname
- By default "Linklint" assumes all links on your
site that start with "http://" are remote links to other sites.
If you have absolute links to your own site, give "Linklint"
your hostname and links starting with "http://hostname" will be
treated as local files. If you specify -host hostname:port, only
http links to this hostname and port will be treated as local files.
- -case
- Makes sure that the filename (upper/lower) case used links
inside of html tags matches the case used by the file system. This is for
Windows only and is very handy if you are porting a site to a Unix
host.
- -orphan
- Checks all directories that contain files used on the site
for unused (orphan) files.
- -index file
- Uses file as the default index file instead of the
default list used by "Linklint". You can specify more than one
file but each one must be preceded by -index on the command line.
If a default index file is not found, "Linklint" uses a listing
of the entire directory. See the Default File section for details.
- -map /a=[/b]
- Substitutes leading /a with /b. For
server-side image maps or to simulate Server Redirection.
- -no_warn_index
- Turns of the "index file not found" warning.
Applies to local site checking only.
- -no_anchors
- Tells "Linklint" to ignore named anchors. This
could ease memory problems for people with large sites who are primarily
interested in missing pages and not missing named anchors. This option
works for both HTTP and local site checks.
HTTP Site Checking
If you have a complicated site that uses lots of CGI or Server Redirection, you
should use HTTP site checking. Even though an HTTP site check reads pages via
your HTTP server, you will get the best performance if you do your checking on
a machine that has a high speed connection to your server.
linklint -http -host www.site.com /@
The
-http flag tells "Linklint" to check HTML files on the site
www.site.com via a remote http connection. You must specify a -host whenever
you do an HTTP site check (otherwise Linklint won't where to get your pages).
You can specify
/@ to check the entire site. See Which Files to Check
for details.
HTTP Site Check Options
- -http
- This flag tells Linklint to perform an HTTP site check
instead of a local site check. All files (except server side image maps)
will be read via the HTTP protocol from your web server.
- -host hostname:port
- If you include :port at the end of your hostname,
Linklint uses this port for the HTTP site check.
- -password realm user:password
- Uses user and password as authorization to
enter password protected realm. Realms are named areas of a site
that share a common set of usernames and passwords. If passwords are
needed to check your site, Linklint will tell you which realms need
passwords in warning messages. Enclose the realm in double quotes if it
contains spaces. If no password is given for a specific realm, Linklint
will try using the password for the ""DEFAULT"" realm
if it was provided.
- -timeout t
- Times out after t seconds (default 15) when getting
files via http. Once data is received, an additional t seconds is
allowed. The timeout is disabled on Windows machines since the Windows
port of Perl does not support the "alarm()" function.
- -delay d
- Delays d seconds between requests to the same host
(default 0). This is a friendly thing to do especially if you are checking
many links on the same host.
- -local linkset
- Gets files that match linkset locally. The default
-local linkset is @.map (which matches any link
ending in .map). This allows Linklint to follow links through
server-side image maps. The default is ignored if you specify your own
-local expressions. You need to specify the -root directory
for this option to work propery.
- -map /a=[/b]
- Substitutes leading /a with /b. For
server-side image maps or to simulate Server Redirection.
- -no_anchors
- Tells "Linklint" to ignore named anchors.
- -no_query_string
- Up until version 2.3.4, Linklint did not use query strings
while doing HTTP site checks. Query strings were removed before making
HTTP requests. As of 2.3.4 query strings in links are used in the
requests. Use the -no_query_string flag to get back the
"old" behavior.
- -http_header Name:value
- Adds the HTTP header Name: value to all HTTP
requests generated by Linklint. You will need to use quotation marks to
hide spaces in the header line from the command line interpreter. Linklint
will automatically add a space after the first colon if there is not one
there already. Multiple (unique) header lines are allowed.
- -language zz
- This option is only useful if you are checking a site that
uses content negotiation to present the same URL in different languages.
Creates an HTTP Request header of the form Accept-Language: zz that
is included as part of all HTTP requests generated by Linklint. Multiple
-language specifications are allowed. This will result in a single
Accept-Language: header that lists all of the languages you have
specified in alphabetical order. Some web sites can use this information
to return pages to you in a specific language.
If you need to get more complicated than this, use the more general purpose
-http_header to create your own header. There is a partial list of
language abbreviations (taken from Debian) included as part of the
Linklint documentation.
Remote URL Checking
A remote URL check is used to see if a remote URL exists (or has been recently
modified). Links in the remote pages are not checked nor does Linklint look
for named anchors in remote URLs.
Remote URL checking can be used to check all of the "remote" links on
your site (those that link to pages on other sites) or it can check a list of
URLs. There are several ways to specify which remote URLs to check:
linklint http://somehost/file.html
Checks to see if
/file.html exists on somehost. Multiple URLs can be
entered on the command line, in an
@commandfile, or in an
@@httpfile. Every URL to be checked must begin with
"
http://". This will disable site checking.
linklint @@httpfile
Checks all the remote http URLs found in httpfile. Anything in the file starting
with "
http://" is considered to be a URL. If the file looks like a
remoteX.txt file generated by Linklint then all failed URLs will be
cross referenced.
linklint @@ -doc linkdoc
Assuming you have already done a site check and used
-doc linkdoc to put
all of your output files in the linkdoc directory, Linklint will check all the
remote links that were found on your site and cross reference all failed URLs
without doing a site check. You can use the
-netmod or
-netset
flags to enable the status-cache.
linklint -net [site check options]
The
-net flag tells Linklint to check all remote links after doing either
a local or HTTP site check site. If you are having memory problems, don't use
the
-net option, instead use one of the
@@ options above.
Other Remote URL Options
- -timeout t
- Times out after t seconds (default 15) when getting
files via http. Once data is received, an additional t seconds is
allowed. The timeout is disabled on Windows machines since the Windows
port of Perl does not support the "alarm()" function.
- -delay d
- Delays d seconds between requests to the same host
(default 0). This is a friendly thing to do especially if you are checking
many links on the same host.
- -redirect
- Checks for <meta> redirects in the headers of remote
URLs that are html files. If a redirect is found it is followed. This
feature is disabled if the status cache is used.
- -proxy hostname[:port]
- Sends all remote HTTP requests through the proxy server
hostname and the optional port. This allows you to check
remote URLs or (new with version 2.3.1) your entire site from within a
firewall that has an http proxy server. Some error messages (relating to
host errors) may not be available through a proxy server.
- -concise_url
- Turns off printing successful URLs to STDOUT during remote
link checking.
Status Cache Options
The Status Cache is a very powerful feature. It allows you to keep track of
recent changes in all of the remote (off-site) pages you link to. You can then
use the Linklint output files to quickly check changed pages to see if they
still meet your needs.
The flags below make use of the status cache file linklint.url (kept in your
HOME or LINKLINT directory). This file keeps track of the modification dates
of all the remote URLs that you check.
- -netmod
- Operates just like -net but makes use of the status
cache. Newly checked URLs will be entered in the cache. Linklint will tell
you which (previously cached) URLs have been modified since the last
-netset.
- -netset
- Like -netmod but also resets the last modified
status in the cache for all URLs that checked ok. If you always use
-netset, modified URLs will be reported just once.
- -retry
- Only checks URLs that have a host fail status in the cache.
Sometimes a URL fails because its host is temporarily down. This flag
enables you to recheck just those links. An easy way to recheck all the
cached URLs with host failures is "linklint @@ -retry". Use
"linklint @@linkdoc/remoteX.txt -retry" if you want failed URLs
to be cross referenced.
- -flush
- Removes all URLs from the cache that are not currently
being checked. The -retry flag has no effect on which URLs are
flushed.
- -checksum
- Ensures that every URL that has been modified is reported
as such. This flag can make the remote checking take longer. Many of the
pages that require a checksum are dynamically generated and will always be
reported as modified.
- -cache directory
- Reads and writes the linklint.url cache file in this
directory. The default directory is set by your LINKLINT or HOME
environment variables.
Output Options
No output files are generated by default, only progress and a brief summary of
the results are printed to the screen. You can produce complete documentation
(split up into separate files) in a
-doc directory or put selected
output in a single
-out file or by redirecting the standard output to a
file. See the Output File Specification section for a detailed description of
all output files.
Multi File Output
- -doc linkdoc
- Sends all output to the linkdoc directory. The
output is divided into separate .txt and .html files.
Complete documentation is always produced regardless of the single file
flags.
The file index.txt contains an index to all the other files;
index.html is an HTML version of the index. The index files for
remote URL checking are ur_lindex.txt and
url_index.html.
- -textonly
- Prevents any HTML files from being created in the
-doc directory.
- -htmlonly
- Erases redundant text files in the -doc directory
after they have been used to create the HTML output files. The files
remote.txt and remoteX.txt are not erased since they can be
used by Linklint to recheck remote URLs.
- -docbase base
- Overrides the default base expression used for
directing a browser to the resources listed in the output HTML files. The
base is prepended to local links in the output HTML files. This only
affects the links in HTML output files, it has no effect on what is
displayed in these files. Ordinarily this flag would only be used during a
local site check to set the base to "http://host".
- -output_frames
- All HTML output data files are linked to from
index.html. If you use this flag then the the data files will be
opened up in a new frame (window) which can be handy in some cases since
it always leaves the index.html file open in its own window.
- -output_index filename
- The output index files were previously named
linklint.txt and linklint.html. These have now been changed
to index.txt and index.html. You can use the
-output_index option to change this name back to
"linklint" or to something else.
- -url_doc_prefix url/
- By default, the output files associate with remote URL
checking all start with "url". You can change this with the
-url_doc_prefix option. If the url_doc_prefix contains a
"/" character then the appropriate directory will be created (as
a subdirectory of the -doc directory).
- -dont_output xxxx
- Don't create output files that contain "xxxx".
Can be repeated. Example:
-dont_output "X$"
will supress the output of all cross reference files.
Single File Output
- -error
- Lists missing files and other errors.
- -out file
- Sends list output and summary information to
file.
- -list
- Lists all found files, links, directories etc.
- -warn
- Lists all warnings.
- -xref
- Adds cross references to the lists.
- -forward
- Sorts lists by referring file.
Debug and other Flags
- -db1
- Debugs command line input and linkset expressions.
- -db2
- Prints the name of every file that gets checked (not just
HTML files).
- -db3
- Debugs HTML parser, prints out tags and resulting
links.
- -db4
- Debugs socket connection (kind of).
- -db5
- Not used.
- -db6
- Details last-modified status for remote URLs (requires
-netset or -netmod).
- -db7
- Prints brief debug information while checking remote
URLs.
- -db8
- Prints all http headers while checking remote URLs.
- -db9
- Generates random http errors.
- -version
- Gives version information.
- -help
- Lists a few simple examples of how to use Linklint.
- -help_all
- Lists all help (contained in program) including every input
option.
- -quiet
- Disables printing progress to the screen.
- -silent
- Disables printing summarys to the screen.
AUTHOR¶
Linklint is written by James B. Bowlin <jbowlin@linklint.org>. This manual
page was written by Denis Barbier <barbier@debian.org> for the Debian
system (but may be used by others) by cut'n'paste from original documentation
written in HTML.