NAME¶
DateTime::Format::Builder - Create DateTime parser classes and objects.
SYNOPSIS¶
package DateTime::Format::Brief;
our $VERSION = '0.07';
use DateTime::Format::Builder
(
parsers => {
parse_datetime => [
{
regex => qr/^(\d{4})(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)$/,
params => [qw( year month day hour minute second )],
},
{
regex => qr/^(\d{4})(\d\d)(\d\d)$/,
params => [qw( year month day )],
},
],
}
);
DESCRIPTION¶
DateTime::Format::Builder creates DateTime parsers. Many string formats of dates
and times are simple and just require a basic regular expression to extract
the relevant information. Builder provides a simple way to do this without
writing reams of structural code.
Builder provides a number of methods, most of which you'll never need, or at
least rarely need. They're provided more for exposing of the module's innards
to any subclasses, or for when you need to do something slightly beyond what I
expected.
TUTORIAL¶
See DateTime::Format::Builder::Tutorial.
ERROR HANDLING AND BAD PARSES¶
Often, I will speak of "undef" being returned, however that's not
strictly true.
When a simple single specification is given for a method, the method isn't given
a single parser directly. It's given a wrapper that will call
"on_fail()" if the single parser returns "undef". The
single parser must return "undef" so that a multiple parser can work
nicely and actual errors can be thrown from any of the callbacks.
Similarly, any multiple parsers will only call "on_fail()" right at
the end when it's tried all it could.
"on_fail()" (see later) is defined, by default, to throw an error.
Multiple parser specifications can also specify "on_fail" with a
coderef as an argument in the options block. This will take precedence over
the inheritable and over-ridable method.
That said, don't throw real errors from callbacks in multiple parser
specifications unless you really want parsing to stop right there and not try
any other parsers.
In summary: calling a
method will result in either a "DateTime"
object being returned or an error being thrown (unless you've overridden
"on_fail()" or "create_method()", or you've specified a
"on_fail" key to a multiple parser specification).
Individual
parsers (be they multiple parsers or single parsers) will
return either the "DateTime" object or "undef".
SINGLE SPECIFICATIONS¶
A single specification is a hash ref of instructions on how to create a parser.
The precise set of keys and values varies according to parser type. There are
some common ones though:
- •
- length is an optional parameter that can be used to
specify that this particular regex is only applicable to strings of
a certain fixed length. This can be used to make parsers more efficient.
It's strongly recommended that any parser that can use this parameter
does.
You may happily specify the same length twice. The parsers will be tried in
order of specification.
You can also specify multiple lengths by giving it an arrayref of numbers
rather than just a single scalar. If doing so, please keep the number of
lengths to a minimum.
If any specifications without lengths are given and the particular
length parser fails, then the non-length parsers are tried.
This parameter is ignored unless the specification is part of a multiple
parser specification.
- •
- label provides a name for the specification and is
passed to some of the callbacks about to mentioned.
- •
- on_match and on_fail are callbacks. Both
routines will be called with parameters of:
- •
- input, being the input to the parser (after any
preprocessing callbacks).
- •
- label, being the label of the parser, if there is
one.
- •
- self, being the object on which the method has been
invoked (which may just be a class name). Naturally, you can then invoke
your own methods on it do get information you want.
- •
- args, being an arrayref of any passed arguments, if
any. If there were no arguments, then this parameter is not given.
These routines will be called depending on whether the
regex match
succeeded or failed.
- •
- preprocess is a callback provided for cleaning up
input prior to parsing. It's given a hash as arguments with the following
keys:
- •
- input being the datetime string the parser was given
(if using multiple specifications and an overall preprocess then
this is the date after it's been through that preprocessor).
- •
- parsed being the state of parsing so far. Usually
empty at this point unless an overall preprocess was given. Items
may be placed in it and will be given to any postprocessor and
"DateTime->new" (unless the postprocessor deletes it).
- •
- self, args, label as per
on_match and on_fail.
The return value from the routine is what is given to the
regex. Note
that this is last code stop before the match.
Note: mixing
length and a
preprocess that modifies the
length of the input string is probably not what you meant to do. You probably
meant to use the
multiple parser variant of
preprocess which is
done
before any length calculations. This "single parser"
variant of
preprocess is performed
after any length
calculations.
- •
- postprocess is the last code stop before
"DateTime->new()" is called. It's given the same arguments as
preprocess. This allows it to modify the parsed parameters after
the parse and before the creation of the object. For example, you might
use:
{
regex => qr/^(\d\d) (\d\d) (\d\d)$/,
params => [qw( year month day )],
postprocess => \&_fix_year,
}
where "_fix_year" is defined as:
sub _fix_year
{
my %args = @_;
my ($date, $p) = @args{qw( input parsed )};
$p->{year} += $p->{year} > 69 ? 1900 : 2000;
return 1;
}
This will cause the two digit years to be corrected according to the cut
off. If the year was '69' or lower, then it is made into 2069 (or 2045, or
whatever the year was parsed as). Otherwise it is assumed to be 19xx. The
DateTime::Format::Mail module uses code similar to this (only it allows
the cut off to be configured and it doesn't use Builder).
Note: It is very important to return an explicit value from
the postprocess callback. If the return value is false then the
parse is taken to have failed. If the return value is true, then the parse
is taken to have succeeded and "DateTime->new()" is
called.
See the documentation for the individual parsers for their valid keys.
Parsers at the time of writing are:
- •
- DateTime::Format::Builder::Parser::Regex - provides regular
expression based parsing.
- •
- DateTime::Format::Builder::Parser::Strptime - provides
strptime based parsing.
Subroutines / coderefs as specifications.¶
A single parser specification can be a coderef. This was added mostly because it
could be and because I knew someone, somewhere, would want to use it.
If the specification is a reference to a piece of code, be it a subroutine,
anonymous, or whatever, then it's passed more or less straight through. The
code should return "undef" in event of failure (or any false value,
but "undef" is strongly preferred), or a true value in the event of
success (ideally a "DateTime" object or some object that has the
same interface).
This all said, I generally wouldn't recommend using this feature unless you have
to.
Callbacks¶
I mention a number of callbacks in this document.
Any time you see a callback being mentioned, you can, if you like, substitute an
arrayref of coderefs rather than having the straight coderef.
MULTIPLE SPECIFICATIONS¶
These are very easily described as an array of single specifications.
Note that if the first element of the array is an arrayref, then you're
specifying options.
- •
- preprocess lets you specify a preprocessor that is
called before any of the parsers are tried. This lets you do things like
strip off timezones or any unnecessary data. The most common use people
have for it at present is to get the input date to a particular length so
that the length is usable (DateTime::Format::ICal would use it to
strip off the variable length timezone).
Arguments are as for the single parser preprocess variant with
the exception that label is never given.
- •
- on_fail should be a reference to a subroutine that
is called if the parser fails. If this is not provided, the default action
is to call "DateTime::Format::Builder::on_fail", or the
"on_fail" method of the subclass of DTFB that was used to create
the parser.
EXECUTION FLOW¶
Builder allows you to plug in a fair few callbacks, which can make following how
a parse failed (or succeeded unexpectedly) somewhat tricky.
For Single Specifications¶
A single specification will do the following:
User calls parser:
my $dt = $class->parse_datetime( $string );
- 1.
- preprocess is called. It's given $string and a
reference to the parsing workspace hash, which we'll call $p. At this
point, $p is empty. The return value is used as $date for the rest of this
single parser. Anything put in $p is also used for the rest of this single
parser.
- 2.
- regex is applied.
- 3.
- If regex did not match, then on_fail
is called (and is given $date and also label if it was defined).
Any return value is ignored and the next thing is for the single parser to
return "undef".
If regex did match, then on_match is called with the
same arguments as would be given to on_fail. The return value is
similarly ignored, but we then move to step 4 rather than exiting the
parser.
- 4.
- postprocess is called with $date and a filled out
$p. The return value is taken as a indication of whether the parse was a
success or not. If it wasn't a success then the single parser will exit at
this point, returning undef.
- 5.
- "DateTime->new()" is called and the user is
given the resultant "DateTime" object.
See the section on error handling regarding the "undef"s mentioned
above.
For Multiple Specifications¶
With multiple specifications:
User calls parser:
my $dt = $class->complex_parse( $string );
- 1.
- The overall preprocessor is called and is given
$string and the hashref $p (identically to the per parser
preprocess mentioned in the previous flow).
If the callback modifies $p then a copy of $p is given to each of the
individual parsers. This is so parsers won't accidentally pollute each
other's workspace.
- 2.
- If an appropriate length specific parser is found, then it
is called and the single parser flow (see the previous section) is
followed, and the parser is given a copy of $p and the return value of the
overall preprocessor as $date.
If a "DateTime" object was returned so we go straight back to the
user.
If no appropriate parser was found, or the parser returned
"undef", then we progress to step 3!
- 3.
- Any non-length based parsers are tried in the order
they were specified.
For each of those the single specification flow above is performed, and is
given a copy of the output from the overall preprocessor.
If a real "DateTime" object is returned then we exit back to the
user.
If no parser could parse, then an error is thrown.
See the section on error handling regarding the "undef"s mentioned
above.
METHODS¶
In the general course of things you won't need any of the methods. Life often
throws unexpected things at us so the methods are all available for use.
import¶
"import()" is a wrapper for "create_class()". If you specify
the
class option (see documentation for "create_class()") it
will be ignored.
create_class¶
This method can be used as the runtime equivalent of "import()". That
is, it takes the exact same parameters as when one does:
use DateTime::Format::Builder ( blah blah blah )
That can be (almost) equivalently written as:
use DateTime::Format::Builder;
DateTime::Format::Builder->create_class( blah blah blah );
The difference being that the first is done at compile time while the second is
done at run time.
In the tutorial I said there were only two parameters at present. I lied. There
are actually three of them.
- •
- parsers takes a hashref of methods and their parser
specifications. See the tutorial above for details.
Note that if you define a subroutine of the same name as one of the methods
you define here, an error will be thrown.
- •
- constructor determines whether and how to create a
"new()" function in the new class. If given a true value, a
constructor is created. If given a false value, one isn't.
If given an anonymous sub or a reference to a sub then that is used as
"new()".
The default is 1 (that is, create a constructor using our default code which
simply creates a hashref and blesses it).
If your class defines its own "new()" method it will not be
overwritten. If you define your own "new()" and also tell
Builder to define one an error will be thrown.
- •
- verbose takes a value. If the value is undef, then
logging is disabled. If the value is a filehandle then that's where
logging will go. If it's a true value, then output will go to
"STDERR".
Alternatively, call "$DateTime::Format::Builder::verbose()" with
the relevant value. Whichever value is given more recently is adhered to.
Be aware that verbosity is a global wide setting.
- •
- class is optional and specifies the name of the
class in which to create the specified methods.
If using this method in the guise of "import()" then this field
will cause an error so it is only of use when calling as
"create_class()".
- •
- version is also optional and specifies the value to
give $VERSION in the class. It's generally not recommended unless you're
combining with the class option. A "ExtUtils::MakeMaker"
/ "CPAN" compliant version specification is much better.
In addition to creating any of the methods it also creates a "new()"
method that can instantiate (or clone) objects.
SUBCLASSING¶
In the rest of the documentation I've often lied in order to get some of the
ideas across more easily. The thing is, this module's very flexible. You can
get markedly different behaviour from simply subclassing it and overriding
some methods.
create_method¶
Given a parser coderef, returns a coderef that is suitable to be a method.
The default action is to call "on_fail()" in the event of a non-parse,
but you can make it do whatever you want.
on_fail¶
This is called in the event of a non-parse (unless you've overridden
"create_method()" to do something else.
The single argument is the input string. The default action is to call
"croak()". Above, where I've said parsers or methods throw errors,
this is the method that is doing the error throwing.
You could conceivably override this method to, say, return "undef".
USING BUILDER OBJECTS aka USERS USING BUILDER¶
The methods listed in the METHODS section are all you generally need when
creating your own class. Sometimes you may not want a full blown class to
parse something just for this one program. Some methods are provided to make
that task easier.
new¶
The basic constructor. It takes no arguments, merely returns a new
"DateTime::Format::Builder" object.
my $parser = DateTime::Format::Builder->new();
If called as a method on an object (rather than as a class method), then it
clones the object.
my $clone = $parser->new();
clone¶
Provided for those who prefer an explicit "clone()" method rather than
using "new()" as an object method.
my $clone_of_clone = $clone->clone();
parser¶
Given either a single or multiple parser specification, sets the object to have
a parser based on that specification.
$parser->parser(
regex => qr/^ (\d{4}) (\d\d) (\d\d) $/x;
params => [qw( year month day )],
);
The arguments given to "parser()" are handed directly to
"create_parser()". The resultant parser is passed to
"set_parser()".
If called as an object method, it returns the object.
If called as a class method, it creates a new object, sets its parser and
returns that object.
set_parser¶
Sets the parser of the object to the given parser.
$parser->set_parser( $coderef );
Note: this method does not take specifications. It also does not take anything
except coderefs. Luckily, coderefs are what most of the other methods produce.
The method return value is the object itself.
get_parser¶
Returns the parser the object is using.
my $code = $parser->get_parser();
parse_datetime¶
Given a string, it calls the parser and returns the "DateTime" object
that results.
my $dt = $parser->parse_datetime( "1979 07 16" );
The return value, if not a "DateTime" object, is whatever the parser
wants to return. Generally this means that if the parse failed an error will
be thrown.
If you call this function, it will throw an errror.
LONGER EXAMPLES¶
Some longer examples are provided in the distribution. These implement some of
the common parsing DateTime modules using Builder. Each of them are, or were,
drop in replacements for the modules at the time of writing them.
THANKS¶
Dave Rolsky (DROLSKY) for kickstarting the DateTime project, writing
DateTime::Format::ICal and DateTime::Format::MySQL, and some much needed
review.
Joshua Hoblitt (JHOBLITT) for the concept, some of the API, impetus for writing
the multilength code (both one length with multiple parsers and single parser
with multiple lengths), blame for the Regex custom constructor code, spotting
a bug in Dispatch, and more much needed review.
Kellan Elliott-McCrea (KELLAN) for even more review, suggestions,
DateTime::Format::W3CDTF and the encouragement to rewrite these docs almost
100%!
Claus Faerber (CFAERBER) for having me get around to fixing the auto-constructor
writing, providing the 'args'/'self' patch, and suggesting the
multi-callbacks.
Rick Measham (RICKM) for DateTime::Format::Strptime which Builder now supports.
Matthew McGillis for pointing out that "on_fail" overriding should be
simpler.
Simon Cozens (SIMON) for saying it was cool.
SUPPORT¶
Support for this module is provided via the datetime@perl.org email list. See
http://lists.perl.org/ for more details.
Alternatively, log them via the CPAN RT system via the web or email:
http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DateTime%3A%3AFormat%3A%3ABuilder
bug-datetime-format-builder@rt.cpan.org
This makes it much easier for me to track things and thus means your problem is
less likely to be neglected.
LICENCE AND COPYRIGHT¶
Copyright X Iain Truskett, 2003. All rights reserved.
This library is free software; you can redistribute it and/or modify it under
the same terms as Perl itself, either Perl version 5.000 or, at your option,
any later version of Perl 5 you may have available.
The full text of the licences can be found in the
Artistic and
COPYING files included with this module, or in perlartistic and perlgpl
as supplied with Perl 5.8.1 and later.
AUTHOR¶
Originally written by Iain Truskett <spoon@cpan.org>, who died on December
29, 2003.
Maintained by Dave Rolsky <autarch@urth.org>.
SEE ALSO¶
"datetime@perl.org" mailing list.
http://datetime.perl.org/
perl, DateTime, DateTime::Format::Builder::Tutorial,
DateTime::Format::Builder::Parser