NAME¶
Text::BibTeX - interface to read and parse BibTeX files
SYNOPSIS¶
use Text::BibTeX;
$bibfile = new Text::BibTeX::File "foo.bib";
$newfile = new Text::BibTeX::File ">newfoo.bib";
while ($entry = new Text::BibTeX::Entry $bibfile)
{
next unless $entry->parse_ok;
. # hack on $entry contents, using various
. # Text::BibTeX::Entry methods
.
$entry->write ($newfile);
}
DESCRIPTION¶
The "Text::BibTeX" module serves mainly as a high-level introduction
to the "Text::BibTeX" library, for both code and documentation
purposes. The code loads the two fundamental modules for processing BibTeX
files ("Text::BibTeX::File" and "Text::BibTeX::Entry"),
and this documentation gives a broad overview of the whole library that isn't
available in the documentation for the individual modules that comprise it.
In addition, the "Text::BibTeX" module provides a number of
miscellaneous functions that are useful in processing BibTeX data (especially
the kind that comes from bibliographies as defined by BibTeX 0.99, rather than
generic database files). These functions don't generally fit in the
object-oriented class hierarchy centred around the
"Text::BibTeX::Entry" class, mainly because they are specific to
bibliographic data and operate on generic strings (rather than being tied to a
particular BibTeX entry). These are also documented here, in
"MISCELLANEOUS FUNCTIONS".
Note that every module described here begins with the "Text::BibTeX"
prefix. For brevity, I have dropped this prefix from most class and module
names in the rest of this manual page (and in most of the other manual pages
in the library).
MODULES AND CLASSES¶
The "Text::BibTeX" library includes a number of modules, many of which
provide classes. Usually, the relationship is simple and obvious: a module
provides a class of the same name---for instance, the
"Text::BibTeX::Entry" module provides the
"Text::BibTeX::Entry" class. There are a few exceptions, though:
most obviously, the "Text::BibTeX" module doesn't provide any
classes itself, it merely loads two modules ("Text::BibTeX::Entry"
and "Text::BibTeX::File") that do. The other exceptions are
mentioned in the descriptions below, and discussed in detail in the
documentation for the respective modules.
The modules are presented roughly in order of increasing specialization: the
first three are essential for any program that processes BibTeX data files,
regardless of what kind of data they hold. The later modules are specialized
for use with bibliographic databases, and serve both to emulate BibTeX 0.99's
standard styles and to provide an example of how to define a database
structure through such specialized modules. Each module is fully documented in
its respective manual page.
- "Text::BibTeX"
- Loads the two fundamental modules ("Entry" and
"File"), and provides a number of miscellaneous functions that
don't fit anywhere in the class hierarchy.
- "Text::BibTeX::File"
- Provides an object-oriented interface to BibTeX database
files. In addition to the obvious attributes of filename and filehandle,
the "file" abstraction manages properties such as the database
structure and options for it.
- "Text::BibTeX::Entry"
- Provides an object-oriented interface to BibTeX entries,
which can be parsed from "File" objects, arbitrary filehandles,
or strings. Manages all the properties of a single entry: type, key,
fields, and values. Also serves as the base class for the structured
entry classes (described in detail in Text::BibTeX::Structure).
- "Text::BibTeX::Value"
- Provides an object-oriented interface to values and
simple values, high-level constructs that can be used to represent
the strings associated with each field in an entry. Normally, field values
are returned simply as Perl strings, with macros expanded and multiple
strings "pasted" together. If desired, you can instruct
"Text::BibTeX" to return "Text::BibTeX::Value"
objects, which give you access to the original form of the data.
- "Text::BibTeX::Structure"
- Provides the "Structure" and
"StructuredEntry" classes, which serve primarily as base classes
for the two kinds of classes that define database structures. Read this
man page for a comprehensive description of the mechanism for implementing
Perl classes analogous to BibTeX "style files".
- "Text::BibTeX::Bib"
- Provides the "BibStructure" and
"BibEntry" classes, which serve two purposes: they fulfill the
same role as the standard style files of BibTeX 0.99, and they give an
example of how to write new database structures. These ultimately derive
from, respectively, the "Structure" and
"StructuredEntry" classes provided by the "Structure"
module.
- "Text::BibTeX::BibSort"
- One of the "BibEntry" class's base classes:
handles the generation of sort keys for sorting prior to output
formatting.
- "Text::BibTeX::BibFormat"
- One of the "BibEntry" class's base classes:
handles the formatting of bibliographic data for output in a markup
language such as LaTeX.
- "Text::BibTeX::Name"
- A class used by the "Bib" structure and specific
to bibliographic data as defined by BibTeX itself: parses individual
author names into "first", "von", "last",
and "jr" parts.
- "Text::BibTeX::NameFormat"
- Also specific to bibliographic data: puts split-up names
(as parsed by the "Name" class) back together in a custom
way.
For a first time through the library, you'll probably want to confine your
reading to Text::BibTeX::File and Text::BibTeX::Entry. The other modules will
come in handy eventually, especially if you need to emulate BibTeX in a fairly
fine grained way (e.g. parsing names, generating sort keys). But for the
simple database hacks that are the bread and butter of the
"Text::BibTeX" library, the "File" and "Entry"
classes are the bulk of what you'll need. You may also find some of the
material in this manual page useful, namely "CONSTANT VALUES" and
"UTILITY FUNCTIONS".
EXPORTS¶
The "Text::BibTeX" module has a number of optional exports, most of
them constant values described in "CONSTANT VALUES" below. The
default exports are a subset of these constant values that are used
particularly often, the "entry metatypes" (also accessible via the
export tag "metatypes"). Thus, the following two lines are
equivalent:
use Text::BibTeX;
use Text::BibTeX qw(:metatypes);
Some of the various subroutines provided by the module are also exportable.
"bibloop", "split_list", "purify_string", and
"change_case" are all useful in everyday processing of BibTeX data,
but don't really fit anywhere in the class hierarchy. They may be imported
from "Text::BibTeX" using the "subs" export tag.
"check_class" and "display_list" are also exportable, but
only by name; they are not included in any export tag. (These two mainly exist
for use by other modules in the library.) For instance, to use
"Text::BibTeX" and import the entry metatype constants and the
common subroutines:
use Text::BibTeX qw(:metatypes :subs);
Another group of subroutines exists for direct manipulation of the macro table
maintained by the underlying C library. These functions (see "Macro table
functions", below) allow you to define, delete, and query the value of
BibTeX macros (or "abbreviations"). They may be imported
en
masse using the "macrosubs" export tag:
use Text::BibTeX qw(:macrosubs);
CONSTANT VALUES¶
The "Text::BibTeX" module makes a number of constant values available.
These correspond to the values of various enumerated types in the underlying C
library,
btparse, and their meanings are more fully explained in the
btparse documentation.
Each group of constants is optionally exportable using an export tag given in
the descriptions below.
- Entry metatypes
- "BTE_UNKNOWN", "BTE_REGULAR",
"BTE_COMMENT", "BTE_PREAMBLE",
"BTE_MACRODEF". The "metatype" method in the
"Entry" class always returns one of these values. The latter
three describe, respectively, "comment", "preamble",
and "string" entries; "BTE_REGULAR" describes all
other entry types. "BTE_UNKNOWN" should never be seen (it's
mainly useful for C code that might have to detect half-baked data
structures). See also btparse. Export tag: "metatypes".
- AST node types
- "BTAST_STRING", "BTAST_MACRO",
"BTAST_NUMBER". Used to distinguish the three kinds of simple
values---strings, macros, and numbers. The "SimpleValue" class'
"type" method always returns one of these three values. See also
Text::BibTeX::Value, btparse. Export tag: "nodetypes".
- Name parts
- "BTN_FIRST", "BTN_VON",
"BTN_LAST", "BTN_JR", "BTN_NONE". Used to
specify the various parts of a name after it has been split up. These are
mainly useful when using the "NameFormat" class. See also
bt_split_names and bt_format_names. Export tag:
"nameparts".
- Join methods
- "BTJ_MAYTIE", "BTJ_SPACE",
"BTJ_FORCETIE", "BTJ_NOTHING". Used to tell the
"NameFormat" class how to join adjacent tokens together; see
Text::BibTeX::NameFormat and bt_format_names. Export tag:
"joinmethods".
UTILITY FUNCTIONS¶
"Text::BibTeX" provides several functions that operate outside of the
normal class hierarchy. Of these, only "bibloop" is likely to be of
much use to you in writing everyday BibTeX-hacking programs; the other two
("check_class" and "display_list") are mainly provided for
the use of other modules in the library. They are documented here mainly for
completeness, but also because they might conceivably be useful in other
circumstances.
- bibloop (ACTION, FILES [, DEST])
- Loops over all entries in a set of BibTeX files, performing
some caller-supplied action on each entry. FILES should be a reference to
the list of filenames to process, and ACTION a reference to a subroutine
that will be called on each entry. DEST, if given, should be a
"Text::BibTeX::File" object (opened for output) to which entries
might be printed.
The subroutine referenced by ACTION is called with exactly one argument: the
"Text::BibTeX::Entry" object representing the entry currently
being processed. Information about both the entry itself and the file
where it originated is available through this object; see
Text::BibTeX::Entry. The ACTION subroutine is only called if the entry was
successfully parsed; any syntax errors will result in a warning message
being printed, and that entry being skipped. Note that all
successfully parsed entries are passed to the ACTION subroutine, even
"preamble", "string", and "comment" entries.
To skip these pseudo-entries and only process "regular" entries,
then your action subroutine should look something like this:
sub action {
my $entry = shift;
return unless $entry->metatype == BTE_REGULAR;
# process $entry ...
}
If your action subroutine needs any more arguments, you can just create a
closure (anonymous subroutine) as a wrapper, and pass it to
"bibloop":
sub action {
my ($entry, $extra_stuff) = @_;
# ...
}
my $extra = ...;
Text::BibTeX::bibloop (sub { &action ($_[0], $extra) }, \@files);
If the ACTION subroutine returns a true value and DEST was given, then the
processed entry will be written to DEST.
- check_class (PACKAGE, DESCRIPTION, SUPERCLASS,
METHODS)
- Ensures that a PACKAGE implements a class meeting certain
requirements. First, it inspects Perl's symbol tables to ensure that a
package named PACKAGE actually exists. Then, it ensures that the class
named by PACKAGE derives from SUPERCLASS (using the universal method
"isa"). This derivation might be through multiple inheritance,
or through several generations of a class hierarchy; the only requirement
is that SUPERCLASS is somewhere in PACKAGE's tree of base classes.
Finally, it checks that PACKAGE provides each method listed in METHODS (a
reference to a list of method names). This is done with the universal
method "can", so the methods might actually come from one of
PACKAGE's base classes.
DESCRIPTION should be a brief string describing the class that was expected
to be provided by PACKAGE. It is used for generating warning messages if
any of the class requirements are not met.
This is mainly used by the supervisory code in
"Text::BibTeX::Structure", to ensure that user-supplied
structure modules meet the rules required of them.
- display_list (LIST, QUOTE)
- Converts a list of strings to the grammatical conventions
of a human language (currently, only English rules are supported). LIST
must be a reference to a list of strings. If this list is empty, the empty
string is returned. If it has one element, then just that element is
returned. If it has two elements, then they are joined with the string
" and " and the resulting string is returned. Otherwise, the
list has N elements for N >= 3; elements 1..N-1
are joined with commas, and the final element is tacked on with an
intervening ", and ".
If QUOTE is true, then each string is encased in single quotes before
anything else is done.
This is used elsewhere in the library for two very distinct purposes: for
generating warning messages describing lists of fields that should be
present or are conflicting in an entry, and for generating lists of author
names in formatted bibliographies.
MISCELLANEOUS FUNCTIONS¶
In addition to loading the "File" and "Entry" modules,
"Text::BibTeX" loads the XSUB code which bridges the Perl modules to
the underlying C library,
btparse. This XSUB code provides a number of
miscellaneous utility functions, most of which are put into other packages in
the "Text::BibTeX" family for use by the corresponding classes. (For
instance, the XSUB code loaded by "Text::BibTeX" provides a function
"Text::BibTeX::Entry::parse", which is actually documented as the
"parse" method of the "Text::BibTeX::Entry" class---see
Text::BibTeX::Entry. However, for completeness this function---and all the
other functions that become available when you "use
Text::BibTeX"---are at least mentioned here. The only functions from this
group that you're ever likely to use are described in "Generic
string-processing functions".
Startup/shutdown functions¶
These just initialize and shutdown the underlying C library. Don't call either
one of them; the "Text::BibTeX" startup/shutdown code takes care of
it as appropriate. They're just mentioned here for completeness.
- initialize ()
- cleanup ()
Generic string-processing functions¶
- split_list (STRING, DELIM [, FILENAME [, LINE [,
DESCRIPTION]]])
- Splits a string on a fixed delimiter according to the
BibTeX rules for splitting up lists of names. With BibTeX, the delimiter
is hard-coded as "and"; here, you can supply any string.
Instances of DELIM in STRING are considered delimiters if they are at
brace-depth zero, surrounded by whitespace, and not at the beginning or
end of STRING; the comparison is case-insensitive. See bt_split_names for
full details of how splitting is done (it's not the same as Perl's
"split" function).
Returns the list of strings resulting from splitting STRING on DELIM.
- purify_string (STRING [, OPTIONS])
- "Purifies" STRING in the BibTeX way (usually for
generation of sort keys). See bt_misc for details; note that, unlike the C
interface, "purify_string" does not modify STRING
in-place. A purified copy of the input string is returned.
OPTIONS is currently unused.
- change_case (TRANFORM, STRING [, OPTIONS])
- Transforms the case of STRING according to TRANSFORM (a
single character, one of 'u', 'l', or 't'). See bt_misc for details;
again, "change_case" differs from the C interface in that STRING
is not modified in-place---the input string is copied, and the transformed
copy is returned.
Entry-parsing functions¶
Although these functions are provided by the "Text::BibTeX" module,
they are actually in the "Text::BibTeX::Entry" package. That's
because they are implemented in C, and thus loaded with the XSUB code that
"Text::BibTeX" loads; however, they are actually methods in the
"Text::BibTeX::Entry" class. Thus, they are documented as methods in
Text::BibTeX::Entry.
- parse (ENTRY_STRUCT, FILENAME, FILEHANDLE)
- parse_s (ENTRY_STRUCT, TEXT)
Macro table functions¶
These functions allow direct access to the macro table maintained by
btparse, the C library underlying "Text::BibTeX". In the
normal course of events, macro definitions always accumulate, and are only
defined as a result of parsing a macro definition (@string) entry.
btparse never deletes old macro definitions for you, and doesn't have
any built-in default macros. If, for example, you wish to start fresh with new
macros for every file, use "delete_all_macros". If you wish to
pre-define certain macros, use "add_macro_text". (But note that the
"Bib" structure, as part of its mission to emulate BibTeX 0.99,
defines the standard "month name" macros for you.)
See also bt_macros in the
btparse documentation for a description of the
C interface to these functions.
- add_macro_text (MACRO, TEXT [, FILENAME [, LINE]])
- Defines a new macro, or redefines an old one. MACRO is the
name of the macro, and TEXT is the text it should expand to. FILENAME and
LINE are just used to generate any warnings about the macro definition.
The only such warning occurs when you redefine an old macro: its value is
overridden, and "add_macro_text()" issues a warning saying
so.
- delete_macro (MACRO)
- Deletes a macro from the macro table. If MACRO isn't
defined, takes no action.
- delete_all_macros ()
- Deletes all macros from the macro table.
- macro_length (MACRO)
- Returns the length of a macro's expansion text. If the
macro is undefined, returns 0; no warning is issued.
- macro_text (MACRO [, FILENAME [, LINE]])
- Returns the expansion text of a macro. If the macro is not
defined, issues a warning and returns "undef". FILENAME and
LINE, if supplied, are used for generating this warning; they should be
supplied if you're looking up the macro as a result of finding it in a
file.
Name-parsing functions¶
These are both private functions for the use of the "Name" class, and
therefore are put in the "Text::BibTeX::Name" package. You should
use the interface provided by that class for parsing names in the BibTeX
style.
- _split (NAME_STRUCT, NAME, FILENAME, LINE, NAME_NUM,
KEEP_CSTRUCT)
- free (NAME_STRUCT)
These are private functions for the use of the "NameFormat" class, and
therefore are put in the "Text::BibTeX::NameFormat" package. You
should use the interface provided by that class for formatting names in the
BibTeX style.
- create ([PARTS [, ABBREV_FIRST]])
- free (FORMAT_STRUCT)
- _set_text (FORMAT_STRUCT, PART, PRE_PART, POST_PART,
PRE_TOKEN, POST_TOKEN)
- _set_options (FORMAT_STRUCT, PART, ABBREV, JOIN_TOKENS,
JOIN_PART)
- format_name (NAME_STRUCT, FORMAT_STRUCT)
BUGS AND LIMITATIONS¶
"Text::BibTeX" inherits several limitations from its base C library,
btparse; see "BUGS AND LIMITATIONS" in btparse for details.
In addition, "Text::BibTeX" will not work with a Perl binary built
using the "sfio" library. This is because Perl's I/O abstraction
layer does not extend to third-party C libraries that use stdio, and
btparse most certainly does use stdio.
SEE ALSO¶
btool_faq, Text::BibTeX::File, Text::BibTeX::Entry, Text::BibTeX::Value
AUTHOR¶
Greg Ward <gward@python.net>
COPYRIGHT¶
Copyright (c) 1997-2000 by Gregory P. Ward. All rights reserved. This file is
part of the Text::BibTeX library. This library is free software; you may
redistribute it and/or modify it under the same terms as Perl itself.