Scroll to navigation

Scrappy::Scraper::Control(3pm) User Contributed Perl Documentation Scrappy::Scraper::Control(3pm)

NAME

Scrappy::Scraper::Control - Scrappy HTTP Request Constraints System

VERSION

version 0.94112090

SYNOPSIS

    #!/usr/bin/perl
    use Scrappy::Scraper::Control;
    my  $control = Scrappy::Scraper::Control->new;
    
        $control->allow('http://search.cpan.org');
        $control->allow('http://search.cpan.org', if => {
                content_type => ['text/html', 'application/x-tar']
            }
        );
        
        $control->restrict('http://www.cpan.org');
        
        if ($control->is_allowed('http://search.cpan.org/')) {
            ...
        }
        
        # constraints will only be checked if the is_allowed method is
        # passed a HTTP::Response object.

DESCRIPTION

Scrappy::Scraper::Control provides HTTP request access control for the Scrappy framework.

ATTRIBUTES

The following is a list of object attributes available with every Scrappy::Scraper::Control instance.

allowed

The allowed attribute holds a hasherf of allowed domain/contraints.

    my  $control = Scrappy::Scraper::Control->new;
        $control->allowed;
        
        e.g.
        
        {
            'www.foobar.com' => {
                methods => [qw/GET POST PUSH PUT DELETE/],
                content_type => ['text/html']
            }
        }

restricted

The restricted attribute holds a hasherf of restricted domain/contraints.

    my  $control = Scrappy::Scraper::Control->new;
        $control->restricted;
        
        e.g.
        
        {
            'www.foobar.com' => {
                methods => [qw/GET POST PUSH PUT DELETE/]
            }
        }

METHODS

allow

    my  $control = Scrappy::Scraper::Control->new;
        $control->allow('http://www.perl.org');
        $control->allow('http://search.cpan.org', if => {
                content_type => ['text/html', 'application/x-tar']
            }
        );

restrict

    my  $control = Scrappy::Scraper::Control->new;
        $control->restrict('http://www.perl.org');
        $control->restrict('http://search.cpan.org', if => {
                content_type => ['text/html', 'application/x-tar']
            }
        );

is_allowed

    my  $control = Scrappy::Scraper::Control->new;
        $control->allow('http://search.cpan.org');
        $control->restrict('http://www.perl.org');
        
        if (! $control->is_allowed('http://perl.org')) {
            die 'Cant get to Perl.org';
        }

AUTHOR

Al Newkirk <awncorp@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2010 by awncorp.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

2022-06-17 perl v5.34.0