NAME¶
Lire::W3CExtendedLog - Base implementation of a W3C Extended Log parser
SYNOPSIS¶
use Lire::W3CExtendedLog;
my $parser = new Lire::W3CExtendedLog;
my $w3c_rec = $parser->parse( $line );
DESCRIPTION¶
This module defines objects able to parse W3C Extended Log Format. This log
format is defined at
http://www.w3.org/TR/WD-logfile.html
All attributes of the created object can be overriden by e.g. modules extending
the object. The attributes are:
type2regex¶
type2regex is a hash containing key-value pairs like
'name' => '([-_.0-9a-zA-Z]+)'
Keys are all data formats for log file field entries as defined in the W3C
specification: 'integer', 'fixed', 'uri', 'date', 'time' and 'string', along
with 'name' and 'address' types.
identifier2type¶
identifier2type is a hash containing key-value pairs like
'dns' => 'name',
'uri-query' => 'uri',
'ip' =>
Keys are the W3C defined Field identifiers, with their prefixes stripped off.
field2re¶
field2re is subroutine; when called as
$self->{field2re('c-ip')}
it will return e.g.
'(\d+\.\d+\.\d+\.\d+|-)'
Arguments are as found in the Fields directive, so, in an ideal world, should be
identifiers. It uses
type2regex.
field2decoder¶
field2decoder is a subroutine; it returns one of
\&uri_decode
,
\&string_decode or
undef, depending on, a.o.,
is_iis. It is used by
build_parser.
parse¶
parse is the preferred interface to this module. It expects a line as its
argument, and returns a reference to a hash (like
&w3c_parser), or
executes
&parse_directive.
parse_directive¶
parse_directive expects a directive in its argument, it fills the object.
w3c_parser¶
w3c_parser is a subroutine; it expects a logline as argument, and returns
a reference to a hash, mapping $self->{'fields'} entries to their decoded
values. It uses the
&field2re and
&field2decoder
routines. It is build in build_parser.
build_parser¶
build_parser is a subroutine, it builds and returns
&w3c_parser. It is called in
&parse_directive.
log_date and log_time¶
log_date and
log_time contain strings constructed from the Date
directive.
version and sofware¶
version and
software contain strings constructed from the Version
and Software directives, respectively.
fields¶
fields contains the entire string from the Fields directive.
is_iis¶
is_iis is set in case the Software directive contains 'Microsoft
Internet' as a substring. It is used to enable IIS specific support.
tab_sep¶
tab_sep is set in case tabs are found in the Fields directive. We assume
these will be used in the log itself too, and allow unescaped spaces in the
log.
Summarizing:
&parse --calls--> &parse_directive
`--calls--> &w3c_parser
&parse_directive --calls--> &build_parser
&build_parser --calls--> &field2decoder
`--calls--> &field2re
`--returns--> &w3c_parser
&field2decoder --returns--> &uri_decode, &string_decode
&field2re --uses--> %type2regex
`--uses--> %identifier2type
BUILDING INHERITING MODULES¶
FIXME . Needs to be written. Steal from w3c_extended2dlf's
Lire::WWW::ExtendedLog, which ISA Lire::W3CExtendedLog.
SEE ALSO¶
w3c_extended2dlf(1),
ms_isa2dlf(1)
AUTHOR¶
Francis J. Lacoste <flacoste@logreport.org>
VERSION¶
$Id: W3CExtendedLog.pm,v 1.18 2006/07/23 13:16:30 vanbaal Exp $
COPYRIGHT¶
Copyright (C) 2001-2002 Stichting LogReport Foundation LogReport@LogReport.org
This file is part of Lire.
Lire is free software; you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program (see COPYING); if not, check with
http://www.gnu.org/copyleft/gpl.html.