table of contents
URLGRABBER(1) | URLGRABBER(1) |
NAME¶
urlgrabber - a high-level cross-protocol url-grabber.SYNOPSIS¶
urlgrabber [OPTIONS] URL [FILE]DESCRIPTION¶
urlgrabber is a binary program and python module for fetching files. It is designed to be used in programs that need common (but not necessarily simple) url-fetching features.OPTIONS¶
--help, -hhelp page specifying available options to the binary
program.
--copy-local
ignored except for file:// urls, in which case it
specifies whether urlgrab should still make a copy of the file, or simply
point to the existing copy.
--throttle=NUMBER
if it's an int, it's the bytes/second throttle limit. If
it's a float, it is first multiplied by bandwidth. If throttle == 0,
throttling is disabled. If None, the module-level default (which can be set
with set_throttle) is used.
--bandwidth=NUMBER
the nominal max bandwidth in bytes/second. If throttle is
a float and bandwidth == 0, throttling is disabled. If None, the module-level
default (which can be set with set_bandwidth) is used.
--range=RANGE
a tuple of the form first_byte,last_byte describing a
byte range to retrieve. Either or both of the values may be specified. If
first_byte is None, byte offset 0 is assumed. If last_byte is None, the last
byte available is assumed. Note that both first and last_byte values are
inclusive so a range of (10,11) would return the 10th and 11th bytes of the
resource.
--user-agent=STR
the user-agent string provide if the url is HTTP.
--retry=NUMBER
the number of times to retry the grab before bailing. If
this is zero, it will retry forever. This was intentional... really, it was
:). If this value is not supplied or is supplied but is None retrying does not
occur.
--retrycodes
a sequence of errorcodes (values of e.errno) for which it
should retry. See the doc on URLGrabError for more details on this. retrycodes
defaults to -1,2,4,5,6,7 if not specified explicitly.
MODULE USE EXAMPLES¶
In its simplest form, urlgrabber can be a replacement for urllib2's open, or even python's file if you're just reading:from urlgrabber import urlopen fo = urlopen(url) data = fo.read() fo.close()
from urlgrabber import urlopen local_filename = urlgrab(url) # grab a local copy of the file data = urlread(url) # just read the data into a string
* it's a little ugly to modify the default grabber because you have to reach into the module to do it * you could run into conflicts if different parts of the code modify the default grabber and therefore expect different behavior
from urlgrabber.grabber import URLGrabber g = URLGrabber() data = g.urlread(url)
from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url)
from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url, filename=None, reget=None)
AUTHORS¶
Written by: Michael D. Stenner <mstenner@linux.duke.edu> Ryan Tomayko <rtomayko@naeblis.cx>This manual page was written by Kevin Coyner <kevin@rustybear.com> for the Debian system (but may be used by others). It borrows heavily on the documentation included in the urlgrabber module. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation.
RESOURCES¶
Main web site: http://linux.duke.edu/projects/urlgrabber/04/09/2007 |