.\" Automatically generated by Pod::Man 2.22 (Pod::Simple 3.07) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "Biblio::Citation::Parser::Standard 3pm" .TH Biblio::Citation::Parser::Standard 3pm "2009-11-15" "perl v5.10.1" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" \&\fBBiblio::Citation::Parser::Standard\fR \- citation parsing functionality .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 5 \& use Biblio::Citation::Parser::Standard; \& # Parse a simple reference \& $parser = new Biblio::Citation::Parser::Standard; \& $metadata = $parser\->parse("M. Jewell (2004) Citation Parsing for Beginners. Journal of Madeup References 4(3)."); \& print "The title of this article is ".$metadata\->{atitle}."\en"; .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" Biblio::Citation::Parser::Standard uses a relatively simple template matching technique to extract metadata from citations. .PP The Templates.pm module currently provides almost 400 templates, with more being added regularly, and the parser returns the metadata in a form that is easily massaged into OpenURLs (see the Biblio::OpenURL module for an even easier way). .SH "METHODS" .IX Header "METHODS" .ie n .IP "$parser = Biblio::Citation::Parser::Standard\->\fInew()\fR" 4 .el .IP "\f(CW$parser\fR = Biblio::Citation::Parser::Standard\->\fInew()\fR" 4 .IX Item "$parser = Biblio::Citation::Parser::Standard->new()" The \fInew()\fR method creates a new parser. .ie n .IP "$reliability = Biblio::Citation::Parser::Standard::get_reliability($template)" 4 .el .IP "\f(CW$reliability\fR = Biblio::Citation::Parser::Standard::get_reliability($template)" 4 .IX Item "$reliability = Biblio::Citation::Parser::Standard::get_reliability($template)" The get_reliability method returns a value that acts as an indicator of the likelihood of a template matching correctly. Fields such as page ranges, URLs, etc, have high likelihoods (as they follow rigorous patterns), whereas titles, publications, etc have lower likelihoods. .Sp The method takes a template as a parameter, but you shouldn't really need to use this method much. .ie n .IP "$concreteness = Biblio::Citation::Parser::Standard::get_concreteness($template)" 4 .el .IP "\f(CW$concreteness\fR = Biblio::Citation::Parser::Standard::get_concreteness($template)" 4 .IX Item "$concreteness = Biblio::Citation::Parser::Standard::get_concreteness($template)" As with the \fIget_reliability()\fR method, \fIget_concreteness()\fR takes a template as a parameter, and returns a numeric indicator. In this case, it is the number of non-field characters in the template. The more 'concrete' a template, the higher the probability that it will match well. For example, '_PUBLICATION_ Vol. _VOLUME_' is a better match than '_PUBLICATION_ _VOLUME_', as _PUBLICATION_ is likely to subsume 'Vol.' in the second case. .ie n .IP "$string = Biblio::Citation::Parser::Standard::strip_spaces(@strings)" 4 .el .IP "\f(CW$string\fR = Biblio::Citation::Parser::Standard::strip_spaces(@strings)" 4 .IX Item "$string = Biblio::Citation::Parser::Standard::strip_spaces(@strings)" This is a helper function to remove spaces from all elements of an array. .ie n .IP "$templates = \fIBiblio::Citation::Parser::Standard::get_templates()\fR" 4 .el .IP "\f(CW$templates\fR = \fIBiblio::Citation::Parser::Standard::get_templates()\fR" 4 .IX Item "$templates = Biblio::Citation::Parser::Standard::get_templates()" Returns the current template list from the Biblio::Citation::Parser::Templates module. Useful for giving status lists. .ie n .IP "@authors = Biblio::Citation::Parser::Standard::handle_authors($string)" 4 .el .IP "\f(CW@authors\fR = Biblio::Citation::Parser::Standard::handle_authors($string)" 4 .IX Item "@authors = Biblio::Citation::Parser::Standard::handle_authors($string)" This (rather large) function handles the author fields of a reference. It is not all-inclusive yet, but it is usably accurate. It can handle author lists that are separated by semicolons, commas, and a few other delimiters, as well as &, and, and 'et al'. .Sp The method takes an author string as a parameter, and returns an array of extracted information in the format '{family => \f(CW$family\fR, given => \&\f(CW$given\fR}'. .ie n .IP "%metadata = $parser\->xtract_metadata($reference)" 4 .el .IP "\f(CW%metadata\fR = \f(CW$parser\fR\->xtract_metadata($reference)" 4 .IX Item "%metadata = $parser->xtract_metadata($reference)" This is the key method in the Standard module, although it is not actually called directly by users (the 'parse' method provides a wrapper). It takes a reference, and returns a hashtable representing extracted metadata. .Sp A regular expression map is present in this method to transform '_AUFIRST_', \&'_ISSN_', etc, into expressions that should match them. The method then finds the template which best matches the reference, picking the result that has the highest concreteness and reliability (see above), and returns the fields in the hashtable. It also creates the marked-up version, that is useful for further formatting. .ie n .IP "$metadata = $parser\->parse($reference);" 4 .el .IP "\f(CW$metadata\fR = \f(CW$parser\fR\->parse($reference);" 4 .IX Item "$metadata = $parser->parse($reference);" This method provides a wrapper to the extract_metadata function. Simply pass a reference string, and a metadata hash is returned. .SH "NOTES" .IX Header "NOTES" The parser provided should not be seen as exhaustive. As new techniques are implemented, further modules will be released. .SH "AUTHOR" .IX Header "AUTHOR" Mike Jewell