.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.40) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "URI::Fetch 3pm" .TH URI::Fetch 3pm "2021-09-16" "perl v5.32.1" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" URI::Fetch \- Smart URI fetching/caching .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use URI::Fetch; \& \& ## Simple fetch. \& my $res = URI::Fetch\->fetch(\*(Aqhttp://example.com/atom.xml\*(Aq) \& or die URI::Fetch\->errstr; \& do_something($res\->content) if $res\->is_success; \& \& ## Fetch using specified ETag and Last\-Modified headers. \& $res = URI::Fetch\->fetch(\*(Aqhttp://example.com/atom.xml\*(Aq, \& ETag => \*(Aq123\-ABC\*(Aq, \& LastModified => time \- 3600, \& ) \& or die URI::Fetch\->errstr; \& \& ## Fetch using an on\-disk cache that URI::Fetch manages for you. \& my $cache = Cache::File\->new( cache_root => \*(Aq/tmp/cache\*(Aq ); \& $res = URI::Fetch\->fetch(\*(Aqhttp://example.com/atom.xml\*(Aq, \& Cache => $cache \& ) \& or die URI::Fetch\->errstr; .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" \&\fIURI::Fetch\fR is a smart client for fetching \s-1HTTP\s0 pages, notably syndication feeds (\s-1RSS,\s0 Atom, and others), in an intelligent, bandwidth\- and time-saving way. That means: .IP "\(bu" 4 \&\s-1GZIP\s0 support .Sp If you have \fICompress::Zlib\fR installed, \fIURI::Fetch\fR will automatically try to download a compressed version of the content, saving bandwidth (and time). .IP "\(bu" 4 \&\fILast-Modified\fR and \fIETag\fR support .Sp If you use a local cache (see the \fICache\fR parameter to \fIfetch\fR), \&\fIURI::Fetch\fR will keep track of the \fILast-Modified\fR and \fIETag\fR headers from the server, allowing you to only download pages that have been modified since the last time you checked. .IP "\(bu" 4 Proper understanding of \s-1HTTP\s0 error codes .Sp Certain \s-1HTTP\s0 error codes are special, particularly when fetching syndication feeds, and well-written clients should pay special attention to them. \&\fIURI::Fetch\fR can only do so much for you in this regard, but it gives you the tools to be a well-written client. .Sp The response from \fIfetch\fR gives you the raw \s-1HTTP\s0 response code, along with special handling of 4 codes: .RS 4 .IP "\(bu" 4 200 (\s-1OK\s0) .Sp Signals that the content of a page/feed was retrieved successfully. .IP "\(bu" 4 301 (Moved Permanently) .Sp Signals that a page/feed has moved permanently, and that your database of feeds should be updated to reflect the new \&\s-1URI.\s0 .IP "\(bu" 4 304 (Not Modified) .Sp Signals that a page/feed has not changed since it was last fetched. .IP "\(bu" 4 410 (Gone) .Sp Signals that a page/feed is gone and will never be coming back, so you should stop trying to fetch it. .RE .RS 4 .RE .SS "Change from 0.09" .IX Subsection "Change from 0.09" If you make a request using a cache and get back a 304 response code (Not Modified), then if the content was returned from the cache, then \f(CW\*(C`is_success()\*(C'\fR will return true, and \f(CW\*(C`$response\->content\*(C'\fR will contain the cached content. .PP I think this is the right behaviour, given the philosophy of \f(CW\*(C`URI::Fetch\*(C'\fR, but please let me (\s-1NEILB\s0) know if you disagree. .SH "USAGE" .IX Header "USAGE" .ie n .SS "URI::Fetch\->fetch($uri, %param)" .el .SS "URI::Fetch\->fetch($uri, \f(CW%param\fP)" .IX Subsection "URI::Fetch->fetch($uri, %param)" Fetches a page identified by the \s-1URI\s0 \fI\f(CI$uri\fI\fR. .PP On success, returns a \fIURI::Fetch::Response\fR object; on failure, returns \&\f(CW\*(C`undef\*(C'\fR. .PP \&\fI\f(CI%param\fI\fR can contain: .IP "\(bu" 4 LastModified .IP "\(bu" 4 ETag .Sp \&\fILastModified\fR and \fIETag\fR can be supplied to force the server to only return the full page if it's changed since the last request. If you're writing your own feed client, this is recommended practice, because it limits both your bandwidth use and the server's. .Sp If you'd rather not have to store the \fILastModified\fR time and \fIETag\fR yourself, see the \fICache\fR parameter below (and the \s-1SYNOPSIS\s0 above). .IP "\(bu" 4 Cache .Sp If you'd like \fIURI::Fetch\fR to cache responses between requests, provide the \fICache\fR parameter with an object supporting the Cache \s-1API\s0 (e.g. \&\fICache::File\fR, \fICache::Memory\fR). Specifically, an object that supports \&\f(CW\*(C`$cache\->get($key)\*(C'\fR and \f(CW\*(C`$cache\->set($key, $value, $expires)\*(C'\fR. .Sp If supplied, \fIURI::Fetch\fR will store the page content, ETag, and last-modified time of the response in the cache, and will pull the content from the cache on subsequent requests if the page returns a Not-Modified response. .IP "\(bu" 4 UserAgent .Sp Optional. You may provide your own LWP::UserAgent instance. Look into LWPx::ParanoidUserAgent if you're fetching URLs given to you by possibly malicious parties. .IP "\(bu" 4 NoNetwork .Sp Optional. Controls the interaction between the cache and \s-1HTTP\s0 requests with If\-Modified\-Since/If\-None\-Match headers. Possible behaviors are: .RS 4 .IP "false (default)" 4 .IX Item "false (default)" If a page is in the cache, the origin \s-1HTTP\s0 server is always checked for a fresher copy with an If-Modified-Since and/or If-None-Match header. .ie n .IP "1" 4 .el .IP "\f(CW1\fR" 4 .IX Item "1" If set to \f(CW1\fR, the origin \s-1HTTP\s0 is never contacted, regardless of the page being in cache or not. If the page is missing from cache, the fetch method will return undef. If the page is in cache, that page will be returned, no matter how old it is. Note that setting this option means the URI::Fetch::Response object will never have the http_response member set. .ie n .IP """N"", where N > 1" 4 .el .IP "\f(CWN\fR, where N > 1" 4 .IX Item "N, where N > 1" The origin \s-1HTTP\s0 server is not contacted \fBif\fR the page is in cache \&\fBand\fR the cached page was inserted in the last N seconds. If the cached copy is older than N seconds, a normal \s-1HTTP\s0 request (full or cache check) is done. .RE .RS 4 .RE .IP "\(bu" 4 ContentAlterHook .Sp Optional. A subref that gets called with a scalar reference to your content so you can modify the content before it's returned and before it's put in cache. .Sp For instance, you may want to only cache the section of an \s-1HTML\s0 document, or you may want to take a feed \s-1URL\s0 and cache only a pre-parsed version of it. If you modify the scalarref given to your hook and change it into a hashref, scalarref, or some blessed object, that same value will be returned to you later on not-modified responses. .IP "\(bu" 4 CacheEntryGrep .Sp Optional. A subref that gets called with the \fIURI::Fetch::Response\fR object about to be cached (with the contents already possibly transformed by your \f(CW\*(C`ContentAlterHook\*(C'\fR). If your subref returns true, the page goes into the cache. If false, it doesn't. .IP "\(bu" 4 Freeze .IP "\(bu" 4 Thaw .Sp Optional. Subrefs that get called to serialize and deserialize, respectively, the data that will be cached. The cached data should be assumed to be an arbitrary Perl data structure, containing (potentially) references to arrays, hashes, etc. .Sp Freeze should serialize the structure into a scalar; Thaw should deserialize the scalar into a data structure. .Sp By default, \fIStorable\fR will be used for freezing and thawing the cached data structure. .IP "\(bu" 4 ForceResponse .Sp Optional. A boolean that indicates a \fIURI::Fetch::Response\fR should be returned regardless of the \s-1HTTP\s0 status. By default \f(CW\*(C`undef\*(C'\fR is returned when a response is not a \&\*(L"success\*(R" (200 codes) or one of the recognized \s-1HTTP\s0 status codes listed above. The \s-1HTTP\s0 status message can then be retreived using the \f(CW\*(C`errstr\*(C'\fR method on the class. .SH "REPOSITORY" .IX Header "REPOSITORY" .SH "LICENSE" .IX Header "LICENSE" \&\fIURI::Fetch\fR is free software; you may redistribute it and/or modify it under the same terms as Perl itself. .SH "AUTHOR & COPYRIGHT" .IX Header "AUTHOR & COPYRIGHT" Except where otherwise noted, \fIURI::Fetch\fR is Copyright 2004 Benjamin Trott, ben+cpan@stupidfool.org. All rights reserved. .PP Currently maintained by Neil Bowers. .SH "CONTRIBUTORS" .IX Header "CONTRIBUTORS" .IP "\(bu" 4 Tim Appnel .IP "\(bu" 4 Mario Domgoergen .IP "\(bu" 4 Karen Etheridge .IP "\(bu" 4 Brad Fitzpatrick .IP "\(bu" 4 Jason Hall .IP "\(bu" 4 Naoya Ito .IP "\(bu" 4 Tatsuhiko Miyagawa