table of contents
webcheck(1) | User Commands | webcheck(1) |
NAME¶
webcheck - website link checkerSYNOPSIS¶
webcheck [OPTION]... URLDESCRIPTION¶
webcheck will check the document at the specified URL for links to other documents, follow these links recursively and generate an HTML report.- -i, --internal=PATTERN
- Mark URLs matching the PATTERN (perl-type regular
expression) as an internal link. Can be used multiple times. Note that the
PATTERN is matched against the full URL. URLs matching this PATTERN will
be considered internal, even if they match one of the --external PATTERNs.
- -x, --external=PATTERN
- Mark URLs matching the PATTERN (perl-type regular
expression) as an external link. Can be used multiple times. Note that the
PATTERN is matched against the full URL.
- -y, --yank=PATTERN
- Do not check URLs matching the PATTERN (perl-type
regular expression). Like the -x flag, though this option will cause
webcheck to not check the link matched by regex whereas -x will check the
link but not its children. Can be used multiple times. Note that the
PATTERN is matched against the full URL.
- -b, --base-only
- Consider any URL not starting with the base URL to be
external. For example, if you run
webcheck -b http://www.example.com/foo
- -a, --avoid-external
- Avoid external links. Normally if webcheck is examining an
HTML page and it finds a link that points to an external document, it will
check to see if that external document exists. This flag disables that
action.
- --ignore-robots
- Do not retrieve and parse robots.txt files. By default
robots.txt files are retrieved and honored. If you are sure you want to
ignore and override the webmaster's decision this option can be used.
- -q, --quiet, --silent
- Do not print out progress as webcheck traverses a site.
- -d, --debug
- Print debugging information while crawling the site. This
option is mainly useful for developers.
- -o, --output=DIRECTORY
- Output directory. Use to specify the directory where
webcheck will dump its reports. The default is the current directory or as
specified by config.py. If this directory does not exist it will be
created for you (if possible).
- -c, --continue
- Try to continue from a previous run. When using this option
webcheck will look for a webcheck.dat in the output directory. This file
is read to restore the state from the previous run. This allows webcheck
to continue a previously interrupted run. When this option is used, the
--internal, --external and --yank options will be ignored as well as any
URL arguments. The --base-only and --avoid-external options should be the
same as the previous run.
- -f, --force
- Overwrite files without asking. This option is required for
running webcheck non-interactively.
- -r, --redirects=N
- Redirect depth. the number of redirects webcheck should
follow when following a link. 0 implies to follow all redirects.
- -u, --userpass=URL
- Specify a URL with username and password information to use
for basic authentication when visiting the site.
- -w, --wait=SECONDS
- Wait SECONDS between document retrievals. Usually
webcheck will process a url and immediately move on to the next. However
on some loaded systems it may be desirable to have webcheck pause between
requests. This option can be set to any non-negative number.
- -v, --version
- Show version of program.
- -h, --help
- Show short summary of options.
URL CLASSES¶
URLs are divided into two classes:EXAMPLES¶
Check the site www.example.com but consider any path with "/webcheck" in it to be external.webcheck http://www.example.com/ -x /webcheck
NOTES¶
When checking internal URLs webcheck honors the robots.txt file, identifying itself as user-agent webcheck. Disallowed links will not be checked at all as if the -y option was specified for that URL. To allow webcheck to crawl parts of a site that other robots are disallowed, use something like:User-agent: *
Disallow: /foo
User-agent: webcheck
Allow: /foo
ENVIRONMENT¶
- <scheme>_proxy
- Proxy url for <scheme>.
REPORTING BUGS¶
Bug reports shoult be sent to the mailing list <webcheck-users@lists.arthurdejong.org>. More information on reporting bugs can be found on the webcheck homepage:COPYRIGHT¶
Copyright © 1998, 1999 Albert Hopkins (marduk)Sep 2010 | Version 1.10.4 |