.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.42)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings. \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote. \*(C+ will
.\" give a nicer C++. Capital omega is used to do unbreakable dashes and
.\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
. ds -- \(*W-
. ds PI pi
. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
. ds L" ""
. ds R" ""
. ds C` ""
. ds C' ""
'br\}
.el\{\
. ds -- \|\(em\|
. ds PI \(*p
. ds L" ``
. ds R" ''
. ds C`
. ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
. if \nF \{\
. de IX
. tm Index:\\$1\t\\n%\t"\\$2"
..
. if !\nF==2 \{\
. nr % 0
. nr F 2
. \}
. \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "Statistics::R::IO::Parser 3pm"
.TH Statistics::R::IO::Parser 3pm "2022-02-10" "perl v5.34.0" "User Contributed Perl Documentation"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
Statistics::R::IO::Parser \- Functions for parsing R data files
.SH "VERSION"
.IX Header "VERSION"
version 1.0002
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 2
\& use Statistics::R::IO::ParserState;
\& use Statistics::R::IO::Parser;
\&
\& my $state = Statistics::R::IO::ParserState\->new(
\& data => \*(Aqfile.rds\*(Aq
\& );
\& say $state\->at
\& say $state\->next\->at;
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
You shouldn't create instances of this class, it exists mainly to
handle deserialization of R data files by the \f(CW\*(C`IO\*(C'\fR classes.
.SH "FUNCTIONS"
.IX Header "FUNCTIONS"
This library is inspired by monadic parser frameworks from the
Haskell world, like Packrat or
Parsec . What this means
is that \fIparsers\fR are constructed by combining simpler parsers.
.PP
The library offers a selection of basic parsers and combinators.
Each of these is a function (think of it as a factory) that returns
another function (the actual parser) which receives the current
parsing state (Statistics::R::IO::ParserState) as the argument
and returns a two-element array reference (called for brevity \*(L"a
pair\*(R" in the following text) with the result of the parser in the
first element and the new parser state in the second element. If the
\&\fIparser\fR fails, say if the current state is \*(L"a\*(R" where a number is
expected, it returns \f(CW\*(C`undef\*(C'\fR to signal failure.
.PP
The descriptions of individual functions below use a shorthand
because the above mechanism is implied. Thus, when \f(CW\*(C`any_char\*(C'\fR is
described as \*(L"parses any character\*(R", it really means that calling
\&\f(CW\*(C`any_char\*(C'\fR will return a function that when called with the current
state will return \*(L"a pair of the character...\*(R", etc.
.SS "\s-1CHARACTER PARSERS\s0"
.IX Subsection "CHARACTER PARSERS"
.IP "any_char" 4
.IX Item "any_char"
Parses any character, returning a pair of the character at the current
State's position and the new state, advanced by one from the starting
state. If the state is at the end (\f(CW\*(C`$state\-\*(C'\fReof> is true), returns
undef to signal failure.
.ie n .IP "char $c" 4
.el .IP "char \f(CW$c\fR" 4
.IX Item "char $c"
Parses the given character \f(CW$c\fR, returning a pair of the character at
the current State's position if it is equal to \f(CW$c\fR and the new
state, advanced by one from the starting state. If the state is at the
end (\f(CW\*(C`$state\-\*(C'\fReof> is true) or the character at the current position
is not \f(CW$c\fR, returns undef to signal failure.
.ie n .IP "string $s" 4
.el .IP "string \f(CW$s\fR" 4
.IX Item "string $s"
Parses the given string \f(CW$s\fR, returning a pair of the sequence of
characters starting at the current State's position if it is equal to
\&\f(CW$s\fR and the new state, advanced by \f(CW\*(C`length($s)\*(C'\fR from the starting
state. If the state is at the end (\f(CW\*(C`$state\-\*(C'\fReof> is true) or the
string starting at the current position is not \f(CW$s\fR, returns undef to
signal failure.
.SS "\s-1NUMBER PARSERS\s0"
.IX Subsection "NUMBER PARSERS"
.IP "endianness [$end]" 4
.IX Item "endianness [$end]"
When the \f(CW$end\fR argument is given, this functions sets the byte
order used by parsers in the module to be little\- (when \f(CW$end\fR is
\&\*(L"<\*(R") or big-endian (\f(CW$end\fR is \*(L">\*(R"). This function changes
the \fBmodule's\fR state and remains in effect until the next change.
.Sp
When called with no arguments, \f(CW\*(C`endianness\*(C'\fR returns the current
byte order in effect. The starting byte order is big-endian.
.IP "any_uint8, any_uint16, any_uint24, any_uint32" 4
.IX Item "any_uint8, any_uint16, any_uint24, any_uint32"
Parses an 8\-, 16\-, 24\-, and 32\-bit \fIunsigned\fR integer, returning a
pair of the integer starting at the current State's position and the
new state, advanced by 1, 2, 3, or 4 bytes from the starting state,
depending on the parser. The integer value is determined by the
current value of \f(CW\*(C`endianness\*(C'\fR. If there are not enough elements left
in the data from the current position, returns undef to signal
failure.
.ie n .IP "uint8 $n, uint16 $n, uint24 $n, uint32 $n" 4
.el .IP "uint8 \f(CW$n\fR, uint16 \f(CW$n\fR, uint24 \f(CW$n\fR, uint32 \f(CW$n\fR" 4
.IX Item "uint8 $n, uint16 $n, uint24 $n, uint32 $n"
Parses the specified 8\-, 16\-, 24\-, and 32\-bit \fIunsigned\fR integer
\&\f(CW$n\fR, returning a pair of the integer at the current State's
position if it is equal \f(CW$n\fR and the new state. The new state is
advanced by 1, 2, 3, or 4 bytes from the starting state, depending
on the parser. The integer value is determined by the current value
of \f(CW\*(C`endianness\*(C'\fR. If there are not enough elements left in the data
from the current position or the current position is not \f(CW$n\fR,
returns undef to signal failure.
.IP "any_int8, any_int16, any_int24, any_int32" 4
.IX Item "any_int8, any_int16, any_int24, any_int32"
Parses an 8\-, 16\-, 24\-, and 32\-bit \fIsigned\fR integer, returning a pair
of the integer starting at the current State's position and the new
state, advanced by 1, 2, 3, or 4 bytes from the starting state,
depending on the parser. The integer value is determined by the
current value of \f(CW\*(C`endianness\*(C'\fR. If there are not enough elements left
in the data from the current position, returns undef to signal
failure.
.ie n .IP "int8 $n, int16 $n, int24 $n, int32 $n" 4
.el .IP "int8 \f(CW$n\fR, int16 \f(CW$n\fR, int24 \f(CW$n\fR, int32 \f(CW$n\fR" 4
.IX Item "int8 $n, int16 $n, int24 $n, int32 $n"
Parses the specified 8\-, 16\-, 24\-, and 32\-bit \fIsigned\fR integer
\&\f(CW$n\fR, returning a pair of the integer at the current State's
position if it is equal \f(CW$n\fR and the new state. The new state is
advanced by 1, 2, 3, or 4 bytes from the starting state, depending
on the parser. The integer value is determined by the current value
of \f(CW\*(C`endianness\*(C'\fR. If there are not enough elements left in the data
from the current position or the current position is not \f(CW$n\fR,
returns undef to signal failure.
.IP "any_real32, any_real64" 4
.IX Item "any_real32, any_real64"
Parses an 32\- or 64\-bit real number, returning a pair of the number
starting at the current State's position and the new state, advanced
by 4 or 8 bytes from the starting state, depending on the parser. The
real value is determined by the current value of \f(CW\*(C`endianness\*(C'\fR. If
there are not enough elements left in the data from the current
position, returns undef to signal failure.
.IP "any_int32_na, any_real64_na" 4
.IX Item "any_int32_na, any_real64_na"
Parses a 32\-bit \fIsigned\fR integer or 64\-bit real number, respectively,
but recognizing R\-style missing values (NAs): \s-1INT_MIN\s0 for integers and
a special NaN bit pattern for reals. Returns a pair of the number
value (\f(CW\*(C`undef\*(C'\fR if a \s-1NA\s0) and the new state, advanced by 4 or 8 bytes
from the starting state, depending on the parser. If there are not
enough elements left in the data from the current position, returns
undef to signal failure.
.SS "\s-1SEQUENCING\s0"
.IX Subsection "SEQUENCING"
.ie n .IP "seq $p1, ..." 4
.el .IP "seq \f(CW$p1\fR, ..." 4
.IX Item "seq $p1, ..."
This combinator applies parsers \f(CW$p1\fR, ... in sequence, using the
returned parse state of \f(CW$p1\fR as the input parse state to \f(CW$p2\fR,
etc. Returns a pair of the concatenation of all the parsers'
results and the parsing state returned by the final parser. If any
of the parsers returns undef, \f(CW\*(C`seq\*(C'\fR will return it immediately
without attempting to apply any further parsers.
.ie n .IP "many_till $p, $end" 4
.el .IP "many_till \f(CW$p\fR, \f(CW$end\fR" 4
.IX Item "many_till $p, $end"
This combinator applies a parser \f(CW$p\fR until parser \f(CW$end\fR succeeds.
It does this by alternating applications of \f(CW$end\fR and \f(CW$p\fR; once
\&\f(CW$end\fR succeeds, the function returns the concatenation of results of
preceding applications of \f(CW$p\fR. (Thus, if \f(CW$end\fR succeeds
immediately, the 'result' is an empty list.) Otherwise, \f(CW$p\fR is
applied and must succeed, and the procedure repeats. Returns a pair of
the concatenation of all the \f(CW$p\fR's results and the parsing state
returned by the final parser. If any applications of \f(CW$p\fR returns
undef, \f(CW\*(C`many_till\*(C'\fR will return it immediately.
.ie n .IP "count $n, $p" 4
.el .IP "count \f(CW$n\fR, \f(CW$p\fR" 4
.IX Item "count $n, $p"
This combinator applies the parser \f(CW$p\fR exactly \f(CW$n\fR times in
sequence, threading the parse state through each call. Returns a
pair of the concatenation of all the parsers' results and the
parsing state returned by the final application. If any application
of \f(CW$p\fR returns undef, \f(CW\*(C`count\*(C'\fR will return it immediately without
attempting any more applications.
.ie n .IP "with_count [$num_p = any_uint32], $p" 4
.el .IP "with_count [$num_p = any_uint32], \f(CW$p\fR" 4
.IX Item "with_count [$num_p = any_uint32], $p"
This combinator first applies parser \f(CW$num_p\fR to get the number of
times that \f(CW$p\fR should be applied in sequence. If only one argument
is given, \f(CW\*(C`any_uint32\*(C'\fR is used as the default value of \f(CW$num_p\fR.
(So \f(CW\*(C`with_count\*(C'\fR works by getting a number \fI\f(CI$n\fI\fR by applying
\&\f(CW$num_p\fR and then calling \f(CW\*(C`count $n, $p\*(C'\fR.) Returns a pair of the
concatenation of all the parsers' results and the parsing state
returned by the final application. If the initial application of
\&\f(CW$num_p\fR or any application of \f(CW$p\fR returns undef, \f(CW\*(C`with_count\*(C'\fR
will return it immediately without attempting any more applications.
.ie n .IP "choose $p1, ..." 4
.el .IP "choose \f(CW$p1\fR, ..." 4
.IX Item "choose $p1, ..."
This combinator applies parsers \f(CW$p1\fR, ... in sequence, until one
of them succeeds, when it immediately returns the parser's result.
If all of the parsers fail, \f(CW\*(C`choose\*(C'\fR fails and returns undef
.SS "\s-1COMBINATORS\s0"
.IX Subsection "COMBINATORS"
.ie n .IP "bind $p1, $f" 4
.el .IP "bind \f(CW$p1\fR, \f(CW$f\fR" 4
.IX Item "bind $p1, $f"
This combinator applies parser \f(CW$p1\fR and, if it succeeds, calls
function \f(CW$f\fR using the first element of \f(CW$p1\fR's result as the
argument. The call to \f(CW$f\fR needs to return a parser, which \f(CW\*(C`bind\*(C'\fR
applies to the parsing state after \f(CW$p1\fR's application.
.Sp
The \f(CW\*(C`bind\*(C'\fR combinator is an essential building block for most
combinators described so far. For instance, \f(CW\*(C`with_count\*(C'\fR can be
written as:
.Sp
.Vb 5
\& bind($num_p,
\& sub {
\& my $n = shift;
\& count $n, $p;
\& })
.Ve
.ie n .IP "mreturn $value" 4
.el .IP "mreturn \f(CW$value\fR" 4
.IX Item "mreturn $value"
Returns a parser that when applied returns \f(CW$value\fR without
changing the parsing state.
.ie n .IP "error $message" 4
.el .IP "error \f(CW$message\fR" 4
.IX Item "error $message"
Returns a parser that when applied croaks with the \f(CW$message\fR and
the current parsing state.
.SS "\s-1SINGLETONS\s0"
.IX Subsection "SINGLETONS"
These functions are an interface to ParseState's
singleton-related functions, \*(L"add_singleton\*(R" in ParseState and
\&\*(L"get_singleton\*(R" in ParseState. They exist because certain types of
objects in R data files, for instance environments, have to exist as
unique instances, and any subsequent objects that include them refer
to them by a \*(L"reference id\*(R".
.ie n .IP "add_singleton $singleton" 4
.el .IP "add_singleton \f(CW$singleton\fR" 4
.IX Item "add_singleton $singleton"
Adds the \f(CW$singleton\fR to the current parsing state. Returns a pair
of \f(CW$singleton\fR and the new parsing state.
.ie n .IP "get_singleton $ref_id" 4
.el .IP "get_singleton \f(CW$ref_id\fR" 4
.IX Item "get_singleton $ref_id"
Retrieves from the current parse state the singleton identified by
\&\f(CW$ref_id\fR, returning a pair of the singleton and the (unchanged)
state.
.ie n .IP "reserve_singleton $p" 4
.el .IP "reserve_singleton \f(CW$p\fR" 4
.IX Item "reserve_singleton $p"
Preallocates a space for a singleton before running a given parser,
and then assigns the parser's value to the singleton. Returns a pair
of the singleton and the new parse state.
.SH "BUGS AND LIMITATIONS"
.IX Header "BUGS AND LIMITATIONS"
Instances of this class are intended to be immutable. Please do not
try to change their value or attributes.
.PP
There are no known bugs in this module. Please see
Statistics::R::IO for bug reporting.
.SH "SUPPORT"
.IX Header "SUPPORT"
See Statistics::R::IO for support and contact information.
.SH "AUTHOR"
.IX Header "AUTHOR"
Davor Cubranic
.SH "COPYRIGHT AND LICENSE"
.IX Header "COPYRIGHT AND LICENSE"
This software is Copyright (c) 2017 by University of British Columbia.
.PP
This is free software, licensed under:
.PP
.Vb 1
\& The GNU General Public License, Version 3, June 2007
.Ve