.\" Automatically generated by Pod::Man 2.28 (Pod::Simple 3.32) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{ . if \nF \{ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "PWGET 1" .TH PWGET 1 "2016-10-19" "perl v5.22.2" "Perl pwget URL fetch utility" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" pwget \- Perl Web URL fetch program .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 5 \& pwget http://example.com/ [URL ...] \& pwget \-\-config $HOME/config/pwget.conf \-\-tag linux \-\-tag emacs .. \& pwget \-\-verbose \-\-overwrite http://example.com/ \& pwget \-\-verbose \-\-overwrite \-\-Output ~/dir/ http://example.com/ \& pwget \-\-new \-\-overwrite http://example.com/package\-1.1.tar.gz .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" Automate periodic downloads of files and packages. .PP If you retrieve latest versions of certain program blocks periodically, this is the Perl script for you. Run from cron job or once a week to upload newest versions of files around the net. Note: .SS "Wget and this program" .IX Subsection "Wget and this program" At this point you may wonder, where would you need this perl program when \fIwget\fR\|(1) C\-program has been the standard for ages. Well, 1) Perl is cross platform and more easily extendable 2) You can record file download criteria to a configuration file and use perl regular epxressions to select downloads 3) the program can anlyze web-pages and \*(L"search\*(R" for the download only links as instructed 4) last but not least, it can track newest packages whose name has changed since last downlaod. There are heuristics to determine the newest file or package according to file name skeleton defined in configuration. .PP This program does not replace \fIpwget\fR\|(1) because it does not offer as many options as wget, like recursive downloads and date comparing. Use wget for ad hoc downloads and this utility for files that change (new releases of archives) or which you monitor periodically. .SS "Short introduction" .IX Subsection "Short introduction" This small utility makes it possible to keep a list of URLs in a configuration file and periodically retrieve those pages or files with simple commands. This utility is best suited for small batch jobs to download e.g. most recent versions of software files. If you use an \s-1URL\s0 that is already on disk, be sure to supply option \fB\-\-overwrite\fR to allow overwriting existing files. .PP While you can run this program from command line to retrieve individual files, program has been designed to use separate configuration file via \&\fB\-\-config\fR option. In the configuration file you can control the downloading with separate directives like \f(CW\*(C`save:\*(C'\fR which tells to save the file under different name. The simplest way to retrieve the latest version of apackage from a \s-1FTP\s0 site is: .PP .Vb 2 \& pwget \-\-new \-\-overwite \-\-verbose \e \& http://www.example.com/package\-1.00.tar.gz .Ve .PP Do not worry about the filename \f(CW\*(C`package\-1.00.tar.gz\*(C'\fR. The latest version, say, \f(CW\*(C`package\-3.08.tar.gz\*(C'\fR will be retrieved. The option \&\fB\-\-new\fR instructs to find newer version than the provided \s-1URL.\s0 .PP If the \s-1URL\s0 ends to slash, then directory list at the remote machine is stored to file: .PP .Vb 1 \& !path!000root\-file .Ve .PP The content of this file can be either index.html or the directory listing depending on the used http or ftp protocol. .SH "OPTIONS" .IX Header "OPTIONS" .IP "\fB\-A, \-\-regexp\-content \s-1REGEXP\s0\fR" 4 .IX Item "-A, --regexp-content REGEXP" Analyze the content of the file and match \s-1REGEXP.\s0 Only if the regexp matches the file content, then download file. This option will make downloads slow, because the file is read into memory as a single line and then a match is searched against the content. .Sp For example to download Emacs lisp file (.el) written by Mr. Foo in case insensitive manner: .Sp .Vb 2 \& pwget \-v \-r \*(Aq\e.el$\*(Aq \-A "(?i)Author: Mr. Foo" \e \& http://www.emacswiki.org/elisp/index.html .Ve .IP "\fB\-C, \-\-create\-paths\fR" 4 .IX Item "-C, --create-paths" Create paths that do not exist in \f(CW\*(C`lcd:\*(C'\fR directives. .Sp By default, any \s-1LCD\s0 directive to non-existing directory will interrupt program. With this option, local directories are created as needed making it possible to re-create the exact structure as it is in configuration file. .IP "\fB\-c, \-\-config \s-1FILE\s0\fR" 4 .IX Item "-c, --config FILE" This option can be given multiple times. All configurations are read. .Sp Read URLs from configuration file. If no configuration file is given, file pointed by environment variable is read. See \s-1ENVIRONMENT.\s0 .Sp The configuration file layout is envlained in section \s-1CONFIGURATION FILE\s0 .IP "\fB\-\-chdir \s-1DIRECTORY\s0\fR" 4 .IX Item "--chdir DIRECTORY" Do a \fIchdir()\fR to \s-1DIRECTORY\s0 before any \s-1URL\s0 download starts. This is like doing: .Sp .Vb 2 \& cd DIRECTORY \& pwget http://example.com/index.html .Ve .IP "\fB\-d, \-\-debug [\s-1LEVEL\s0]\fR" 4 .IX Item "-d, --debug [LEVEL]" Turn on debug with positive \s-1LEVEL\s0 number. Zero means no debug. This option turns on \fB\-\-verbose\fR too. .IP "\fB\-e, \-\-extract\fR" 4 .IX Item "-e, --extract" Unpack any files after retrieving them. The command to unpack typical archive files are defined in a program. Make sure these programs are along path. Win32 users are encouraged to install the Cygwin utilities where these programs come standard. Refer to section \s-1SEE ALSO.\s0 .Sp .Vb 6 \& .tar => tar \& .tgz => tar + gzip \& .gz => gzip \& .bz2 => bzip2 \& .xz => xz \& .zip => unzip .Ve .IP "\fB\-F, \-\-firewall \s-1FIREWALL\s0\fR" 4 .IX Item "-F, --firewall FIREWALL" Use \s-1FIREWALL\s0 when accessing files via ftp:// protocol. .IP "\fB\-h, \-\-help\fR" 4 .IX Item "-h, --help" Print help page in text. .IP "\fB\-\-help\-html\fR" 4 .IX Item "--help-html" Print help page in \s-1HTML.\s0 .IP "\fB\-\-help\-man\fR" 4 .IX Item "--help-man" Print help page in Unix manual page format. You want to feed this output to c in order to read it. .Sp Print help page. .IP "\fB\-m, \-\-mirror \s-1SITE\s0\fR" 4 .IX Item "-m, --mirror SITE" If \s-1URL\s0 points to Sourcefoge download area, use mirror \s-1SITE\s0 for downloading. Alternatively the full full \s-1URL\s0 can include the mirror information. And example: .Sp .Vb 1 \& \-\-mirror kent http://downloads.sourceforge.net/foo/foo\-1.0.0.tar.gz .Ve .IP "\fB\-n, \-\-new\fR" 4 .IX Item "-n, --new" Get newest file. This applies to datafiles, which do not have extension \&.asp or .html. When new releases are announced, the version number in filename usually tells which is the current one so getting hardcoded file with: .Sp .Vb 1 \& pwget \-o \-v http://example.com/dir/program\-1.3.tar.gz .Ve .Sp is not usually practical from automation point of view. Adding \&\fB\-\-new\fR option to the command line causes double pass: a) the whole http://example.com/dir/ is examined for all files and b) files matching approximately filename program\-1.3.tar.gz are examined, heuristically sorted and file with latest version number is retrieved. .IP "\fB\-\-no\-lcd\fR" 4 .IX Item "--no-lcd" Ignore \f(CW\*(C`lcd:\*(C'\fR directives in configuration file. .Sp In the configuration file, any \f(CW\*(C`lcd:\*(C'\fR directives are obeyed as they are seen. But if you do want to retrieve \s-1URL\s0 to your current directory, be sure to supply this option. Otherwise the file will end to the directory pointer by \f(CW\*(C`lcd:\*(C'\fR. .IP "\fB\-\-no\-save\fR" 4 .IX Item "--no-save" Ignore \f(CW\*(C`save:\*(C'\fR directives in configuration file. If the URLs have \&\f(CW\*(C`save:\*(C'\fR options, they are ignored during fetch. You usually want to combine \fB\-\-no\-lcd\fR with \fB\-\-no\-save\fR .IP "\fB\-\-no\-extract\fR" 4 .IX Item "--no-extract" Ignore \f(CW\*(C`x:\*(C'\fR directives in configuration file. .IP "\fB\-O, \-\-output \s-1DIR\s0\fR" 4 .IX Item "-O, --output DIR" Before retrieving any files, chdir to \s-1DIR.\s0 .IP "\fB\-o, \-\-overwrite\fR" 4 .IX Item "-o, --overwrite" Allow overwriting existing files when retrieving URLs. Combine this with \fB\-\-skip\-version\fR if you periodically update files. .IP "\fB\-\-proxy \s-1PROXY\s0\fR" 4 .IX Item "--proxy PROXY" Use \s-1PROXY\s0 server for \s-1HTTP. \s0(See \fB\-\-Firewall\fR for \s-1FTP.\s0). The port number is optional in the call: .Sp .Vb 2 \& \-\-proxy http://example.com.proxy.com \& \-\-proxy example.com.proxy.com:8080 .Ve .IP "\fB\-p, \-\-prefix \s-1PREFIX\s0\fR" 4 .IX Item "-p, --prefix PREFIX" Add \s-1PREFIX\s0 to all retrieved files. .IP "\fB\-P, \-\-postfix \s-1POSTFIX \s0\fR" 4 .IX Item "-P, --postfix POSTFIX " Add \s-1POSTFIX\s0 to all retrieved files. .IP "\fB\-D, \-\-prefix\-date\fR" 4 .IX Item "-D, --prefix-date" Add iso8601 \*(L":YYYY\-MM\-DD\*(R" prefix to all retrieved files. This is added before possible \fB\-\-prefix\-www\fR or \fB\-\-prefix\fR. .IP "\fB\-W, \-\-prefix\-www\fR" 4 .IX Item "-W, --prefix-www" Usually the files are stored with the same name as in the \s-1URL\s0 dir, but if you retrieve files that have identical names you can store each page separately so that the file name is prefixed by the site name. .Sp .Vb 2 \& http://example.com/page.html \-\-> example.com::page.html \& http://example2.com/page.html \-\-> example2.com::page.html .Ve .IP "\fB\-r, \-\-regexp \s-1REGEXP\s0\fR" 4 .IX Item "-r, --regexp REGEXP" Retrieve file matching at the destination \s-1URL\s0 site. This is like \*(L"Connect to the \s-1URL\s0 and get all files matching \s-1REGEXP\*(R".\s0 Here all gzip compressed files are found form \s-1HTTP\s0 server directory: .Sp .Vb 1 \& pwget \-v \-r "\e.gz" http://example.com/archive/ .Ve .Sp Caveat: currently works only for http:// URLs. .IP "\fB\-R, \-\-config\-regexp \s-1REGEXP\s0\fR" 4 .IX Item "-R, --config-regexp REGEXP" Retrieve URLs matching \s-1REGEXP\s0 from configuration file. This cancels \&\fB\-\-tag\fR options in the command line. .IP "\fB\-s, \-\-selftest\fR" 4 .IX Item "-s, --selftest" Run some internal tests. For maintainer or developer only. .IP "\fB\-\-sleep \s-1SECONDS\s0\fR" 4 .IX Item "--sleep SECONDS" Sleep \s-1SECONDS\s0 before next \s-1URL\s0 request. When using regexp based downlaods that may return many hits, some sites disallow successive requests in within short period of time. This options makes program sleep for number of \s-1SECONDS\s0 between retrievals to overcome 'Service unavailable'. .IP "\fB\-\-stdout\fR" 4 .IX Item "--stdout" Retrieve \s-1URL\s0 and write to stdout. .IP "\fB\-\-skip\-version\fR" 4 .IX Item "--skip-version" Do not download files that have version number and which already exists on disk. Suppose you have these files and you use option \fB\-\-skip\-version\fR: .Sp .Vb 2 \& package.tar.gz \& file\-1.1.tar.gz .Ve .Sp Only file.txt is retrieved, because file\-1.1.tar.gz contains version number and the file has not changed since last retrieval. The idea is, that in every release the number in in distribution increases, but there may be distributions which do not contain version number. In regular intervals you may want to load those packages again, but skip versioned files. In short: This option does not make much sense without additional option \fB\-\-new\fR .Sp If you want to reload versioned file again, add option \fB\-\-overwrite\fR. .IP "\fB\-t, \-\-test, \-\-dry\-run\fR" 4 .IX Item "-t, --test, --dry-run" Run in test mode. .IP "\fB\-T, \-\-tag \s-1NAME\s0 [\s-1NAME\s0] ...\fR" 4 .IX Item "-T, --tag NAME [NAME] ..." Search tag \s-1NAME\s0 from the config file and download only entries defined under that tag. Refer to \fB\-\-config \s-1FILE\s0\fR option description. You can give Multiple \fB\-\-tag\fR switches. Combining this option with \fB\-\-regexp\fR does not make sense and the concequencies are undefined. .IP "\fB\-v, \-\-verbose [\s-1NUMBER\s0]\fR" 4 .IX Item "-v, --verbose [NUMBER]" Print verbose messages. .IP "\fB\-V, \-\-version\fR" 4 .IX Item "-V, --version" Print version information. .SH "EXAMPLES" .IX Header "EXAMPLES" Get files from site: .PP .Vb 1 \& pwget http://www.example.com/dir/package.tar.gz .. .Ve .PP Display copyright file for package \s-1GNU\s0 make from Debian pages: .PP .Vb 1 \& pwget \-\-stdout \-\-regexp \*(Aqcopyright$\*(Aq http://packages.debian.org/unstable/make .Ve .PP Get all mailing list archive files that match \*(L"gz\*(R": .PP .Vb 1 \& pwget \-\-regexp gz http://example.com/mailing\-list/archive/download/ .Ve .PP Read a directory and store it to filename \s-1YYYY\-MM\-DD::\s0!dir!000root\-file. .PP .Vb 1 \& pwget \-\-prefix\-date \-\-overwrite \-\-verbose http://www.example.com/dir/ .Ve .PP To update newest version of the package, but only if there is none at disk already. The \fB\-\-new\fR option instructs to find newer packages and the filename is only used as a skeleton for files to look for: .PP .Vb 2 \& pwget \-\-overwrite \-\-skip\-version \-\-new \-\-verbose \e \& ftp://ftp.example.com/dir/packet\-1.23.tar.gz .Ve .PP To overwrite file and add a date prefix to the file name: .PP .Vb 2 \& pwget \-\-prefix\-date \-\-overwrite \-\-verbose \e \& http://www.example.com/file.pl \& \& \-\-> YYYY\-MM\-DD::file.pl .Ve .PP To add date and \s-1WWW\s0 site prefix to the filenames: .PP .Vb 2 \& pwget \-\-prefix\-date \-\-prefix\-www \-\-overwrite \-\-verbose \e \& http://www.example.com/file.pl \& \& \-\-> YYYY\-MM\-DD::www.example.com::file.pl .Ve .PP Get all updated files under cnfiguration file's tag updates: .PP .Vb 2 \& pwget \-\-verbose \-\-overwrite \-\-skip\-version \-\-new \-\-tag updates \& pwget \-v \-o \-s \-n \-T updates .Ve .PP Get files as they read in the configuration file to the current directory, ignoring any \f(CW\*(C`lcd:\*(C'\fR and \f(CW\*(C`save:\*(C'\fR directives: .PP .Vb 3 \& pwget \-\-config $HOME/config/pwget.conf / \& \-\-no\-lcd \-\-no\-save \-\-overwrite \-\-verbose \e \& http://www.example.com/file.pl .Ve .PP To check configuration file, run the program with non-matching regexp and it parses the file and checks the \f(CW\*(C`lcd:\*(C'\fR directives on the way: .PP .Vb 1 \& pwget \-v \-r dummy\-regexp \& \& \-\-> \& \& pwget.DirectiveLcd: LCD [$EUSR/directory ...] \& is not a directory at /users/foo/bin/pwget line 889. .Ve .SH "CONFIGURATION FILE" .IX Header "CONFIGURATION FILE" .SS "Comments" .IX Subsection "Comments" The configuration file is \s-1NOT\s0 Perl code. Comments start with hash character (#). .SS "Variables" .IX Subsection "Variables" At this point, variable expansions happen only in \fBlcd:\fR. Do not try to use them anywhere else, like in URLs. .PP Path variables for \fBlcd:\fR are defined using following notation, spaces are not allowed in \s-1VALUE\s0 part (no directory names with spaces). Variable names are case sensitive. Variables substitute environment variabales with the same name. Environment variables are immediately available. .PP .Vb 3 \& VARIABLE = /home/my/dir # define variable \& VARIABLE = $dir/some/file # Use previously defined variable \& FTP = $HOME/ftp # Use environment variable .Ve .PP The right hand can refer to previously defined variables or existing environment variables. Repeat, this is not Perl code although it may look like one, but just an allowed syntax in the configuration file. Notice that there is dollar to the right hand> when variable is referred, but no dollar to the left hand side when variable is defined. Here is example of a possible configuration file contant. The tags are hierarchically ordered without a limit. .PP Warning: remember to use different variables names in separate include files. All variables are global. .SS "Include files" .IX Subsection "Include files" It is possible to include more configuration files with statement .PP .Vb 1 \& INCLUDE .Ve .PP Variable expansions are possible in the file name. There is no limit how many or how deep include structure is used. Every file is included only once, so it is safe to to have multiple includes to the same file. Every include is read, so put the most importat override includes last: .PP .Vb 2 \& INCLUDE # Global \& INCLUDE <$HOME/config/pwget.conf> # HOME overrides it .Ve .PP A special \f(CW\*(C`THIS\*(C'\fR tag means relative path of the current include file, which makes it possible to include several files form the same directory where a initial include file resides .PP .Vb 1 \& # Start of config at /etc/pwget.conf \& \& # THIS = /etc, current location \& include \& \& # Refers to directory where current user is: the pwd \& include \& \& # end .Ve .SS "Configuraton file example" .IX Subsection "Configuraton file example" The configuration file can contain many , where each directive end to a colon. The usage of each directory is best explained by examining the configuration file below and reading the commentary near each directive. .PP .Vb 1 \& # $HOME/config/pwget.conf F\- Perl pwget configuration file \& \& ROOT = $HOME # define variables \& CONF = $HOME/config \& UPDATE = $ROOT/updates \& DOWNL = $ROOT/download \& \& # Include more configuration files. It is possible to \& # split a huge file in pieces and have "linux", \& # "win32", "debian", "emacs" configurations in separate \& # and manageable files. \& \& INCLUDE <$CONF/pwget\-other.conf> \& INCLUDE <$CONF/pwget\-more.conf> \& \& tag1: local\-copies tag1: local # multiple names to this category \& \& lcd: $UPDATE # chdir directive \& \& # This is show to user with option \-\-verbose \& print: Notice, this site moved YYYY\-MM\-DD, update your bookmarks \& \& file://absolute/dir/file\-1.23.tar.gz \& \& tag1: external \& \& lcd: $DOWNL \& \& tag2: external\-http \& \& http://www.example.com/page.html \& http://www.example.com/page.html save:/dir/dir/page.html \& \& tag2: external\-ftp \& \& ftp://ftp.com/dir/file.txt.gz save:xx\-file.txt.gz login:foo pass:passwd x: \& \& lcd: $HOME/download/package \& \& ftp://ftp.com/dir/package\-1.1.tar.gz new: \& \& tag2: package\-x \& \& lcd: $DOWNL/package\-x \& \& # Person announces new files in his homepage, download all \& # announced files. Unpack everything (x:) and remove any \& # existing directories (xopt:rm) \& \& http://example.com/~foo pregexp:\e.tar\e.gz$ x: xopt:rm \& \& # End of configuration file pwget.conf .Ve .SH "LIST OF DIRECTIVES IN CONFIGURATION FILE" .IX Header "LIST OF DIRECTIVES IN CONFIGURATION FILE" All the directives must in the same line where the \s-1URL\s0 is. The programs scans lines and determines all options given in line for the \s-1URL.\s0 Directives can be overridden by command line options. .IP "\fBcnv:CONVERSION\fR" 4 .IX Item "cnv:CONVERSION" Currently only \fBconv:text\fR is available. .Sp Convert downloaded page to text. This option always needs either \fBsave:\fR or \fBrename:\fR, because only those directives change filename. Here is an example: .Sp .Vb 2 \& http://example.com/dir/file.html cnv:text save:file.txt \& http://example.com/dir/ pregexp:\e.html cnv:text rename:s/html/txt/ .Ve .Sp A \fBtext:\fR shorthand directive can be used instead of \fBcnv:text\fR. .IP "\fBcregexp:REGEXP\fR" 4 .IX Item "cregexp:REGEXP" Download file only if the content matches \s-1REGEXP.\s0 This is same as option \&\fB\-\-Regexp\-content\fR. In this example directory listing Emacs lisp packages (.el) are downloaded but only if their content indicates that the Author is Mr. Foo: .Sp .Vb 1 \& http://example.com/index.html cregexp:(?i)author:.*Foo pregexp:\e.el$ .Ve .IP "\fBlcd:DIRECTORY\fR" 4 .IX Item "lcd:DIRECTORY" Set local download directory to \s-1DIRECTORY \s0(chdir to it). Any environment variables are substituted in path name. If this tag is found, it replaces setting of \fB\-\-Output\fR. If path is not a directory, terminate with error. See also \fB\-\-Create\-paths\fR and \fB\-\-no\-lcd\fR. .IP "\fBlogin:LOGIN\-NAME\fR" 4 .IX Item "login:LOGIN-NAME" Ftp login name. Default value is \*(L"anonymous\*(R". .IP "\fBmirror:SITE\fR" 4 .IX Item "mirror:SITE" This is relevant to Sourceforge only which does not allow direct downloads with links. Visit project's Sourceforge homepage and see which mirrors are available for downloading. .Sp An example: .Sp .Vb 1 \& http://sourceforge.net/projects/austrumi/files/austrumi/austrumi\-1.8.5/austrumi\-1.8.5.iso/download new: mirror:kent .Ve .IP "\fBnew:\fR" 4 .IX Item "new:" Get newest file. This variable is reset to the value of \fB\-\-new\fR after the line has been processed. Newest means, that an \f(CW\*(C`ls\*(C'\fR command is run in the ftp, and something equivalent in \s-1HTTP \s0\*(L"ftp directories\*(R", and any files that resemble the filename is examined, sorted and heurestically determined according to version number of file which one is the latest. For example files that have version information in \s-1YYYYMMDD\s0 format will most likely to be retrieved right. .Sp Time stamps of the files are not checked. .Sp The only requirement is that filename \f(CW\*(C`must\*(C'\fR follow the universal version numbering standard: .Sp .Vb 1 \& FILE\-VERSION.extension # de facto VERSION is defined as [\ed.]+ \& \& file\-19990101.tar.gz # ok \& file\-1999.0101.tar.gz # ok \& file\-1.2.3.5.tar.gz # ok \& \& file1234.txt # not recognized. Must have "\-" \& file\-0.23d.tar.gz # warning, letters are problematic .Ve .Sp Files that have some alphabetic version indicator at the end of \&\s-1VERSION\s0 may not be handled correctly. Contact the developer and inform him about the de facto standard so that files can be retrieved more intelligently. .Sp \&\fI\s-1NOTE:\s0\fR In order the \fBnew:\fR directive to know what kind of files to look for, it needs a file tamplate. You can use a direct link to some filename. Here the location \*(L"http://www.example.com/downloads\*(R" is examined and the filename template used is took as \*(L"file\-1.1.tar.gz\*(R" to search for files that might be newer, like \*(L"file\-9.1.10.tar.gz\*(R": .Sp .Vb 1 \& http://www.example.com/downloads/file\-1.1.tar.gz new: .Ve .Sp If the filename appeard in a named page, use directive \fBfile:\fR for template. In this case the \*(L"download.html\*(R" page is examined for files looking like \*(L"file.*tar.gz\*(R" and the latest is searched: .Sp .Vb 1 \& http://www.example.com/project/download.html file:file\-1.1.tar.gz new: .Ve .IP "\fBoverwrite:\fR \fBo:\fR" 4 .IX Item "overwrite: o:" Same as turning on \fB\-\-overwrite\fR .IP "\fBpage:\fR" 4 .IX Item "page:" Read web page and apply commands to it. An example: contact the root page and save it: .Sp .Vb 1 \& http://example.com/~foo page: save:foo\-homepage.html .Ve .Sp In order to find the correct information from the page, other directives are usually supplied to guide the searching. .Sp 1) Adding directive \f(CW\*(C`pregexp:ARCHIVE\-REGEXP\*(C'\fR matches the A \s-1HREF\s0 links in the page. .Sp 2) Adding directive \fBnew:\fR instructs to find newer \s-1VERSIONS\s0 of the file. .Sp 3) Adding directive \f(CW\*(C`file:DOWNLOAD\-FILE\*(C'\fR tells what template to use to construct the downloadable file name. This is needed for the \&\f(CW\*(C`new:\*(C'\fR directive. .Sp 4) A directive \f(CW\*(C`vregexp:VERSION\-REGEXP\*(C'\fR matches the exact location in the page from where the version information is extracted. The default regexp looks for line that says \*(L"The latest version ... is ... N.N\*(R". The regexp must return submatch 2 for the version number. .Sp \&\s-1AN EXAMPLE\s0 .Sp Search for newer files from a \s-1HTTP\s0 directory listing. Examine page http://www.example.com/download/dir for model \f(CW\*(C`package\-1.1.tar.gz\*(C'\fR and find a newer file. E.g. \f(CW\*(C`package\-4.7.tar.gz\*(C'\fR would be downloaded. .Sp .Vb 1 \& http://www.example.com/download/dir/package\-1.1.tar.gz new: .Ve .Sp \&\s-1AN EXAMPLE\s0 .Sp Search for newer files from the content of the page. The directive \&\fBfile:\fR acts as a model for filenames to pay attention to. .Sp .Vb 1 \& http://www.example.com/project/download.html new: pregexp:tar.gz file:package\-1.1.tar.gz .Ve .Sp \&\s-1AN EXAMPLE\s0 .Sp Use directive \fBrename:\fR to change the filename before soring it on disk. Here, the version number is attached to the actila filename: .Sp .Vb 2 \& file.txt\-1.1 \& file.txt\-1.2 .Ve .Sp The directived needed would be as follows; entries have been broken to separate lines for legibility: .Sp .Vb 6 \& http://example.com/files/ \& pregexp:\e.el\-\ed \& vregexp:(file.el\-([\ed.]+)) \& file:file.el\-1.1 \& new: \& rename:s/\-[\ed.]+// .Ve .Sp This effectively reads: \*(L"See if there is new version of something that looks like file.el\-1.1 and save it under name file.el by deleting the extra version number at the end of original filename\*(R". .Sp \&\s-1AN EXAMPLE\s0 .Sp Contact absolute \fBpage:\fR at http://www.example.com/package.html and search A \s-1HREF\s0 urls in the page that match \fBpregexp:\fR. In addition, do another scan and search the version number in the page from thw position that match \fBvregexp:\fR (submatch 2). .Sp After all the pieces have been found, use template \fBfile:\fR to make the retrievable file using the version number found from \fBvregexp:\fR. The actual download location is combination of \fBpage:\fR and A \s-1HREF \&\s0\fBpregexp:\fR location. .Sp The directived needed would be as follows; entries have been broken to separate lines for legibility: .Sp .Vb 7 \& http://www.example.com/~foo/package.html \& page: \& pregexp: package.tar.gz \& vregexp: ((?i)latest.*?version.*?\eb([\ed][\ed.]+).*) \& file: package\-1.3.tar.gz \& new: \& x: .Ve .Sp An example of web page where the above would apply: .Sp .Vb 2 \& \& \& \& The latest version of package is 2.4.1 It can be \& downloaded in several forms: \& \& Tar file \& ZIP file \& \& \& .Ve .Sp For this example, assume that \f(CW\*(C`package.tar.gz\*(C'\fR is a symbolic link pointing to the latest release file \f(CW\*(C`package\-2.4.1.tar.gz\*(C'\fR. Thus the actual download location would have been \&\f(CW\*(C`http://www.example.com/~foo/download/files/package\-2.4.1.tar.gz\*(C'\fR. .Sp Why not simply download \f(CW\*(C`package.tar.gz\*(C'\fR? Because then the program can't decide if the version at the page is newer than one stored on disk from the previous download. With version numbers in the file names, the comparison is possible. .IP "\fBpage:find\fR" 4 .IX Item "page:find" \&\s-1FIXME:\s0 This opton is obsolete. do not use. .Sp \&\s-1THIS IS FOR HTTP\s0 only. Use Use directive \fBregexp:\fR for \s-1FTP\s0 protocls. .Sp This is a more general instruction than the \fBpage:\fR and \fBvregexp:\fR explained above. .Sp Instruct to download every \s-1URL\s0 on \s-1HTML\s0 page matching \fBpregexp:RE\fR. In typical situation the page maintainer lists his software in the development page. This example would download every tar.gz file in the page. Note, that the \s-1REGEXP\s0 is matched against the A \s-1HREF\s0 link content, not the actual text that is displayed on the page: .Sp .Vb 1 \& http://www.example.com/index.html page:find pregexp:\e.tar.gz$ .Ve .Sp You can also use additional \fBregexp-no:\fR directive if you want to exclude files after the \fBpregexp:\fR has matched a link. .Sp .Vb 1 \& http://www.example.com/index.html page:find pregexp:\e.tar.gz$ regexp\-no:desktop .Ve .IP "\fBpass:PASSWORD\fR" 4 .IX Item "pass:PASSWORD" For \s-1FTP\s0 logins. Default value is \f(CW\*(C`nobody@example.com\*(C'\fR. .IP "\fBpregexp:RE\fR" 4 .IX Item "pregexp:RE" Search A \s-1HREF\s0 links in page matching a regular expression. The regular expression must be a single word with no whitespace. This is incorrect: .Sp .Vb 1 \& pregexp:(this regexp ) .Ve .Sp It must be written as: .Sp .Vb 1 \& pregexp:(this\es+regexp\es) .Ve .IP "\fBprint:MESSAGE\fR" 4 .IX Item "print:MESSAGE" Print associated message to user requesting matching tag name. This directive must in separate line inside tag. .Sp .Vb 1 \& tag1: linux \& \& print: this download site moved 2002\-02\-02, check your bookmarks. \& http://new.site.com/dir/file\-1.1.tar.gz new: .Ve .Sp The \f(CW\*(C`print:\*(C'\fR directive for tag is shown only if user turns on \-\-verbose mode: .Sp .Vb 1 \& pwget \-v \-T linux .Ve .IP "\fBrename:PERL\-CODE\fR" 4 .IX Item "rename:PERL-CODE" Rename each file using PERL-CODE. The PERL-CODE must be full perl program with no spaces anywhere. Following variables are available during the \&\fIeval()\fR of code: .Sp .Vb 3 \& $ARG = current file name \& $url = complete url for the file \& The code must return $ARG which is used for file name .Ve .Sp For example, if page contains links to .html files that are in fact text files, following statement would change the file extensions: .Sp .Vb 1 \& http://example.com/dir/ page:find pregexp:\e.html rename:s/html/txt/ .Ve .Sp You can also call function \f(CW\*(C`MonthToNumber($string)\*(C'\fR if the filename contains written month name, like <2005\-February.mbox>.The function will convert the name into number. Many mailing list archives can be downloaded cleanly this way. .Sp .Vb 2 \& # This will download SA\-Exim Mailing list archives: \& http://lists.merlins.org/archives/sa\-exim/ pregexp:\e.txt$ rename:$ARG=MonthToNumber($ARG) .Ve .Sp Here is a more complicated example: .Sp .Vb 1 \& http://www.contactor.se/~dast/svnusers/mbox.cgi pregexp:mbox.*\ed$ rename:my($y,$m)=($url=~/year=(\ed+).*month=(\ed+)/);$ARG="$y\-$m.mbox" .Ve .Sp Let's break that one apart. You may spend some time with this example since the possiblilities are limitless. .Sp .Vb 2 \& 1. Connect to page \& http://www.contactor.se/~dast/svnusers/mbox.cgi \& \& 2. Search page for URLs matching regexp \*(Aqmbox.*\ed$\*(Aq. A \& found link could match hrefs like this: \& http://svn.haxx.se/users/mbox.cgi?year=2004&month=12 \& \& 3. The found link is put to $ARG (same as $_), which can be used \& to extract suitable mailbox name with a perl code that is \& evaluated. The resulting name must apear in $ARG. Thus the code \& effectively extract two items from the link to form a mailbox \& name: \& \& my ($y, $m) = ( $url =~ /year=(\ed+).*month=(\ed+)/ ) \& $ARG = "$y\-$m.mbox" \& \& => 2004\-12.mbox .Ve .Sp Just remember, that the perl code that follows \f(CW\*(C`rename:\*(C'\fR directive \&\fBmust\fR must not contain any spaces. It all must be readable as one string. .IP "\fBregexp:REGEXP\fR" 4 .IX Item "regexp:REGEXP" Get all files in ftp directory matching regexp. Directive \fBsave:\fR is ignored. .IP "\fBregexp\-no:REGEXP\fR" 4 .IX Item "regexp-no:REGEXP" After the \f(CW\*(C`regexp:\*(C'\fR directive has matched, exclude files that match directive \fBregexp-no:\fR .IP "\fBRegexp:REGEXP\fR" 4 .IX Item "Regexp:REGEXP" This option is for interactive use. Retrieve all files from \s-1HTTP\s0 or \s-1FTP\s0 site which match \s-1REGEXP.\s0 .IP "\fBsave:LOCAL\-FILE\-NAME\fR" 4 .IX Item "save:LOCAL-FILE-NAME" Save file under this name to local disk. .IP "\fBtagN:NAME\fR" 4 .IX Item "tagN:NAME" Downloads can be grouped under \f(CW\*(C`tagN\*(C'\fR so that e.g. option \fB\-\-tag1\fR would start downloading files from that point on until next \f(CW\*(C`tag1\*(C'\fR is found. There are currently unlimited number of tag levels: tag1, tag2 and tag3, so that you can arrange your downlods hierarchially in the configuration file. For example to download all Linux files rhat you monitor, you would give option \fB\-\-tag linux\fR. To download only the \s-1NT\s0 Emacs latest binary, you would give option \fB\-\-tag emacs-nt\fR. Notice that you do not give the \&\f(CW\*(C`level\*(C'\fR in the option, program will find it out from the configuration file after the tag name matches. .Sp The downloading stops at next tag of the \f(CW\*(C`same level\*(C'\fR. That is, tag2 stops only at next tag2, or when upper level tag is found (tag1) or or until end of file. .Sp .Vb 1 \& tag1: linux # All Linux downlods under this category \& \& tag2: sunsite tag2: another\-name\-for\-this\-spot \& \& # List of files to download from here \& \& tag2: ftp.funet.fi \& \& # List of files to download from here \& \& tag1: emacs\-binary \& \& tag2: emacs\-nt \& \& tag2: xemacs\-nt \& \& tag2: emacs \& \& tag2: xemacs .Ve .IP "\fBx:\fR" 4 .IX Item "x:" Extract (unpack) file after download. See also option \fB\-\-unpack\fR and \&\fB\-\-no\-extract\fR The archive file, say .tar.gz will be extracted the file in current download location. (see directive \fBlcd:\fR) .Sp The unpack procedure checks the contents of the archive to see if the package is correctly formed. The de facto archive format is .Sp .Vb 1 \& package\-N.NN.tar.gz .Ve .Sp In the archive, all files are supposed to be stored under the proper subdirectory with version information: .Sp .Vb 4 \& package\-N.NN/doc/README \& package\-N.NN/doc/INSTALL \& package\-N.NN/src/Makefile \& package\-N.NN/src/some\-code.java .Ve .Sp \&\f(CW\*(C`IMPORTANT:\*(C'\fR If the archive does not have a subdirectory for all files, a subdirectory is created and all items are unpacked under it. The default subdirectory name in constructed from the archive name with currect date stamp in format: .Sp .Vb 1 \& package\-YYYY.MMDD .Ve .Sp If the archive name contains something that looks like a version number, the created directory will be constructed from it, instead of current date. .Sp .Vb 1 \& package\-1.43.tar.gz => package\-1.43 .Ve .IP "\fBxx:\fR" 4 .IX Item "xx:" Like directive \fBx:\fR but extract the archive \f(CW\*(C`as is\*(C'\fR, without checking content of the archive. If you know that it is ok for the archive not to include any subdirectories, use this option to suppress creation of an artificial root package\-YYYY.MMDD. .IP "\fBxopt:rm\fR" 4 .IX Item "xopt:rm" This options tells to remove any previous unpack directory. .Sp Sometimes the files in the archive are all read-only and unpacking the archive second time, after some period of time, would display .Sp .Vb 2 \& tar: package\-3.9.5/.cvsignore: Could not create file: \& Permission denied \& \& tar: package\-3.9.5/BUGS: Could not create file: \& Permission denied .Ve .Sp This is not a serious error, because the archive was already on disk and tar did not overwrite previous files. It might be good to inform the archive maintainer, that the files have wrong permissions. It is customary to expect that distributed packages have writable flag set for all files. .SH "ERRORS" .IX Header "ERRORS" Here is list of possible error messages and how to deal with them. Turning on \fB\-\-debug\fR will help to understand how program has interpreted the configuration file or command line options. Pay close attention to the generated output, because it may reveal that a regexp for a site is too lose or too tight. .IP "\fB\s-1ERROR\s0 {\s-1URL\-HERE\s0} Bad file descriptor\fR" 4 .IX Item "ERROR {URL-HERE} Bad file descriptor" This is \*(L"file not found error\*(R". You have written the filename incorrectly. Double check the configuration file's line. .SH "BUGS AND LIMITATIONS" .IX Header "BUGS AND LIMITATIONS" \&\f(CW\*(C`Sourceforge note\*(C'\fR: To download archive files from Sourceforge requires some trickery because of the redirections and load balancers the site uses. The Sourceforge page have also undergone many changes during their existence. Due to these changes there exists an ugly hack in the program to use \fIwget\fR\|(1) to get certain information from the site. This could have been implemented in pure Perl, but as of now the developer hasn't had time to remove the \fIwget\fR\|(1) dependency. No doubt, this is an ironic situation to use \fIwget\fR\|(1). You you have Perl skills, go ahead and look at \fIUrlHttGet()\fR. \fIUrlHttGetWget()\fR and sen patches. .PP The program was initially designed to read options from one line. It is unfortunately not possible to change the program to read configuration file directives from multiple lines, e.g. by using backslashes (\e) to indicate contuatinued line. .SH "ENVIRONMENT" .IX Header "ENVIRONMENT" Variable \f(CW\*(C`PWGET_CFG\*(C'\fR can point to the root configuration file. The configuration file is read at startup if it exists. .PP .Vb 2 \& export PWGET_CFG=$HOME/conf/pwget.conf # /bin/hash syntax \& setenv PWGET_CFG $HOME/conf/pwget.conf # /bin/csh syntax .Ve .SH "EXIT STATUS" .IX Header "EXIT STATUS" Not defined. .SH "DEPENDENCIES" .IX Header "DEPENDENCIES" External utilities: .PP .Vb 2 \& wget(1) only needed for Sourceforge.net downloads \& see BUGS AND LIMITATIONS .Ve .PP Non-core Perl modules from \s-1CPAN:\s0 .PP .Vb 2 \& LWP::UserAgent \& Net::FTP .Ve .PP The following modules are loaded in run-time only if directive \&\fBcnv:text\fR is used. Otherwise these modules are not loaded: .PP .Vb 3 \& HTML::Parse \& HTML::TextFormat \& HTML::FormatText .Ve .PP This module is loaded in run-time only if \s-1HTTPS\s0 scheme is used: .PP .Vb 1 \& Crypt::SSLeay .Ve .SH "SEE ALSO" .IX Header "SEE ALSO" \&\fIlwp\-download\fR\|(1) \&\fIlwp\-mirror\fR\|(1) \&\fIlwp\-request\fR\|(1) \&\fIlwp\-rget\fR\|(1) \&\fIwget\fR\|(1) .SH "AUTHOR" .IX Header "AUTHOR" Jari Aalto .SH "LICENSE AND COPYRIGHT" .IX Header "LICENSE AND COPYRIGHT" Copyright (C) 1996\-2016 Jari Aalto .PP This program is free software; you can redistribute and/or modify program under the terms of \s-1GNU\s0 General Public license either version 2 of the License, or (at your option) any later version.