.\" Automatically generated by Pod::Man 2.25 (Pod::Simple 3.16) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "CHECKLINK 1p" .TH CHECKLINK 1p "2012-10-31" "perl v5.14.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" checklink \- check the validity of links in an HTML or XHTML document .SH "SYNOPSIS" .IX Header "SYNOPSIS" \&\fBchecklink\fR [ \fIoptions\fR ] \fIuri\fR ... .SH "DESCRIPTION" .IX Header "DESCRIPTION" This manual page documents briefly the \fBchecklink\fR command, a.k.a. the W3CX Link Checker. .PP \&\fBchecklink\fR is a program that reads an \s-1HTML\s0 or \s-1XHTML\s0 document, extracts a list of anchors and links and checks that no anchor is defined twice and that all the links are dereferenceable, including the fragments. It warns about \s-1HTTP\s0 redirects, including directory redirects, and can check recursively a part of a web site. .PP The program can be used either as a command line tool or as a \s-1CGI\s0 script. .SH "OPTIONS" .IX Header "OPTIONS" This program follow the usual \s-1GNU\s0 command line syntax, with long options starting with two dashes (`\-'). A summary of options is included below. .IP "\fB\-?, \-h, \-\-help\fR" 5 .IX Item "-?, -h, --help" Show summary of options. .IP "\fB\-V, \-\-version\fR" 5 .IX Item "-V, --version" Output version information. .IP "\fB\-s, \-\-summary\fR" 5 .IX Item "-s, --summary" Show result summary only. .IP "\fB\-b, \-\-broken\fR" 5 .IX Item "-b, --broken" Show only the broken links, not the redirects. .IP "\fB\-e, \-\-directory\fR" 5 .IX Item "-e, --directory" Hide directory redirects \- e.g. \-> . .IP "\fB\-r, \-\-recursive\fR" 5 .IX Item "-r, --recursive" Check the documents linked from the first one. .IP "\fB\-D, \-\-depth\fR \fIn\fR" 5 .IX Item "-D, --depth n" Check the documents linked from the first one to depth \fIn\fR (implies \fB\-\-recursive\fR). .IP "\fB\-l, \-\-location\fR \fIuri\fR" 5 .IX Item "-l, --location uri" Scope of the documents checked (implies \fB\-\-recursive\fR). Can be specified multiple times in order to specify multiple recursion bases. If the \s-1URI\s0 of a candidate document is downwards relative to any of the bases, it is considered to be within the scope. If not specified, the default is the base \s-1URI\s0 of the initial document, for example for it would be . .IP "\fB\-X, \-\-exclude\fR \fIregexp\fR" 5 .IX Item "-X, --exclude regexp" Do not check links whose full, canonical URIs match \fIregexp\fR. Note that this option limits recursion the same way as \fB\-\-exclude\-docs\fR with the same regular expression would. .IP "\fB\-\-exclude\-docs\fR \fIregexp\fR" 5 .IX Item "--exclude-docs regexp" In recursive mode, do not check links in documents whose full, canonical URIs match \fIregexp\fR. This option may be specified multiple times. .IP "\fB\-\-suppress\-redirect\fR \fI\s-1URI\-\s0>\s-1URI\s0\fR" 5 .IX Item "--suppress-redirect URI->URI" Do not report a redirect from the first to the second \s-1URI\s0. The \*(L"\->\*(R" is literal text. This option may be specified multiple times. Whitespace may be used instead of \*(L"\->\*(R" to separate the URIs. .IP "\fB\-\-suppress\-redirect\-prefix\fR \fI\s-1URI\-\s0>\s-1URI\s0\fR" 5 .IX Item "--suppress-redirect-prefix URI->URI" Do not report a redirect from a child of the first \s-1URI\s0 to the same child of the second \s-1URI\s0. The \e\*(L"\->\e\*(R" is literal text. This option may be specified multiple times. Whitespace may be used instead of \*(L"\->\*(R" to separate the URIs. .IP "\fB\-\-suppress\-temp\-redirects\fR" 5 .IX Item "--suppress-temp-redirects" Do not report warnings about temporary redirects. .IP "\fB\-\-suppress\-broken\fR \fI\s-1CODE:URI\s0\fR" 5 .IX Item "--suppress-broken CODE:URI" Do not report a broken link with the given \s-1CODE\s0. \s-1CODE\s0 is the \s-1HTTP\s0 response, or \-1 for robots exclusion. The \*(L":\*(R" is literal text. This option may be specified multiple times. Whitespace may be used instead of \&\*(L":\*(R" to separate the \s-1CODE\s0 and the \s-1URI\s0. .IP "\fB\-\-suppress\-fragment\fR \fI\s-1URI\s0\fR" 5 .IX Item "--suppress-fragment URI" Do not report the given broken fragment \s-1URI\s0. A fragment \s-1URI\s0 contains \*(L"#\*(R". This option may be specified multiple times. .IP "\fB\-L, \-\-languages\fR \fIaccept-language\fR" 5 .IX Item "-L, --languages accept-language" The \f(CW\*(C`Accept\-Language\*(C'\fR \s-1HTTP\s0 header to send. In command line mode, this header is not sent by default. The special value \f(CW\*(C`auto\*(C'\fR causes a value to be detected from the \f(CW\*(C`LANG\*(C'\fR environment variable, and sent if found. In \s-1CGI\s0 mode, the default is to send the value received from the client as is. .IP "\fB\-c, \-\-cookies\fR \fIcookie-file\fR" 5 .IX Item "-c, --cookies cookie-file" Use cookies, load/save them in \fIcookie-file\fR. The special value \&\f(CW\*(C`tmp\*(C'\fR causes non-persistent use of cookies, i.e. they are used but only stored in memory for the duration of this link checker run. .IP "\fB\-R, \-\-no\-referer\fR" 5 .IX Item "-R, --no-referer" Do not send the \f(CW\*(C`Referer\*(C'\fR \s-1HTTP\s0 header. .IP "\fB\-q, \-\-quiet\fR" 5 .IX Item "-q, --quiet" No output if no errors are found. Implies \fB\-\-summary\fR. .IP "\fB\-v, \-\-verbose\fR" 5 .IX Item "-v, --verbose" Verbose mode. .IP "\fB\-i, \-\-indicator\fR" 5 .IX Item "-i, --indicator" Show progress while parsing as percentage of lines processed. No indicator is shown for documents containing no linefeeds. .IP "\fB\-u, \-\-user\fR \fIusername\fR" 5 .IX Item "-u, --user username" Specify a username for authentication. .IP "\fB\-p, \-\-password\fR \fIpassword\fR" 5 .IX Item "-p, --password password" Specify a password for authentication. .IP "\fB\-\-hide\-same\-realm\fR" 5 .IX Item "--hide-same-realm" Hide 401's that are in the same realm as the document checked. .IP "\fB\-S, \-\-sleep\fR \fIsecs\fR" 5 .IX Item "-S, --sleep secs" Sleep the specified number of seconds between requests to each server. Defaults to 1 second, which is also the minimum allowed. .IP "\fB\-t, \-\-timeout\fR \fIsecs\fR" 5 .IX Item "-t, --timeout secs" Timeout for requests, in seconds. The default is 30. .IP "\fB\-C, \-\-connection\-cache\fR \fInumber\fR" 5 .IX Item "-C, --connection-cache number" Maximum number of cached connections. Using this option overrides the \&\f(CW\*(C`Connection_Cache_Size\*(C'\fR configuration file parameter, see its documentation below for the default value and more information. .IP "\fB\-d, \-\-domain\fR \fIdomain\fR" 5 .IX Item "-d, --domain domain" Perl regular expression describing the domain to which the authentication information (if present) will be sent. The default value can be specified in the configuration file. See the \f(CW\*(C`Trusted\*(C'\fR entry in the configuration file description below for more information. .ie n .IP "\fB\-\-masquerade\fR \fI""real-prefix surrogate-prefix""\fR" 5 .el .IP "\fB\-\-masquerade\fR \fI``real-prefix surrogate-prefix''\fR" 5 .IX Item "--masquerade real-prefix surrogate-prefix" Perform a simple string substitution: URIs which begin with the string \f(CW\*(C`real\-prefix\*(C'\fR are rewritten using the \f(CW\*(C`surrogate\-prefix\*(C'\fR before being dereferenced. Useful for making a local directory masquerade as a remote one. For example: .Sp .Vb 1 \& \-\-masquerade "http://example.com/x/y/z/ file:///my/local/dir/" .Ve .Sp If the document being checked contains a link to http://example.com/x/y/z/foo.html, then the local file system will be checked for file:///my/local/dir/foo.html. .Sp \&\fB\-\-masquerade\fR takes a single argument consisting of two URIs, separated by whitespace. The quote marks are not part of the argument, but one usual way of providing a value with embedded whitespace is to enclose it in quotes. .IP "\fB\-H, \-\-html\fR" 5 .IX Item "-H, --html" \&\s-1HTML\s0 output. .SH "FILES" .IX Header "FILES" .IP "\fI/etc/w3c/checklink.conf\fR" 5 .IX Item "/etc/w3c/checklink.conf" The main configuration file. You can use the W3C_CHECKLINK_CFG environment variable to override the default location. .Sp \&\f(CW\*(C`Trusted\*(C'\fR specifies a regular expression for matching trusted domains (ie. domains where \s-1HTTP\s0 basic authentication, if any, will be sent). The regular expression will be matched case insensitively against host names. The default behavior (when unset, that is) is to send the authentication information only to the host which requests it; usually you don't want to change this. For example, the following configures \&\fIonly\fR the w3.org domain as trusted: .Sp .Vb 1 \& Trusted = \e.w3\e.org$ .Ve .Sp \&\f(CW\*(C`Allow_Private_IPs\*(C'\fR is a boolean flag indicating whether checking links on non-public \s-1IP\s0 addresses is allowed. The default is true in command line mode and false when run as a \s-1CGI\s0 script. For example, to disallow checking non-public \s-1IP\s0 addresses, regardless of the mode, use: .Sp .Vb 1 \& Allow_Private_IPs = 0 .Ve .Sp \&\f(CW\*(C`Forbidden_Protocols\*(C'\fR is a comma separated list of additional protocols/URI schemes that the link checker is not allowed to use. The \f(CW\*(C`javascript\*(C'\fR and \&\f(CW\*(C`mailto\*(C'\fR schemes are always forbidden, and so is the \f(CW\*(C`file\*(C'\fR scheme when running as a \s-1CGI\s0 script. .Sp .Vb 1 \& Forbidden_Protocols = javascript,mailto .Ve .Sp \&\f(CW\*(C`Markup_Validator_URI\*(C'\fR and \f(CW\*(C`CSS_Validator_URI\*(C'\fR are formatted URIs to the respective validators. The \f(CW%s\fR in these will be replaced with the full \&\*(L"\s-1URI\s0 encoded\*(R" \s-1URI\s0 to the document being checked, and shown in the link checker results view in the online/CGI version. The defaults are: .Sp .Vb 4 \& Markup_Validator_URI = \& http://validator.w3.org/check?uri=%s \& CSS_Validator_URI = \& http://jigsaw.w3.org/css\-validator/validator?uri=%s .Ve .Sp \&\f(CW\*(C`Doc_URI\*(C'\fR is a \s-1URI\s0 used for linking to the documentation, and \s-1CSS\s0 and JavaScript files in the dynamically generated content of the link checker. The default is: .Sp .Vb 1 \& Doc_URI = http://validator.w3.org/docs/checklink.html .Ve .Sp \&\f(CW\*(C`Connection_Cache_Size\*(C'\fR is an integer denoting the maximum number of connections the link checker will keep open at any given time. The default is: .Sp .Vb 1 \& Connection_Cache_Size = 2 .Ve .SH "ENVIRONMENT" .IX Header "ENVIRONMENT" checklink uses the libwww-perl library which has a number of environment variables affecting its behaviour. See \*(L"\s-1SEE\s0 \s-1ALSO\s0\*(R" for some pointers. .IP "\fBW3C_CHECKLINK_CFG\fR" 5 .IX Item "W3C_CHECKLINK_CFG" If set, overrides the path to the configuration file. .SH "SEE ALSO" .IX Header "SEE ALSO" The documentation for this program is available on the web at . .PP \&\s-1LWP\s0, Net::FTP, Net::NNTP, Net::IP, perlre. .SH "AUTHOR" .IX Header "AUTHOR" This program was originally written by Hugo Haas , based on Renaud Bruyeron's \fIchecklink.pl\fR. It has been enhanced by Ville Skytta\*: and many other volunteers since. Use the mailing list for feedback, and see for more information. .PP This manual page was originally written by Fre\*'de\*'ric Schu\*:tz for the Debian GNU/Linux system (but may be used by others). .SH "COPYRIGHT" .IX Header "COPYRIGHT" This program is licensed under the W3CX Software License, http://www.w3.org/Consortium/Legal/copyright\-software .