NAME¶
SGMLS - class for postprocessing the output from the
sgmls and
nsgmls parsers.
SYNOPSIS¶
use SGMLS;
my $parse = new SGMLS(STDIN);
my $event = $parse->next_event;
while ($event) {
SWITCH: {
($event->type eq 'start_element') && do {
my $element = $event->data; # An object of class SGMLS_Element
[[your code for the beginning of an element]]
last SWITCH;
};
($event->type eq 'end_element') && do {
my $element = $event->data; # An object of class SGMLS_Element
[[your code for the end of an element]]
last SWITCH;
};
($event->type eq 'cdata') && do {
my $cdata = $event->data; # A string
[[your code for character data]]
last SWITCH;
};
($event->type eq 'sdata') && do {
my $sdata = $event->data; # A string
[[your code for system data]]
last SWITCH;
};
($event->type eq 're') && do {
[[your code for a record end]]
last SWITCH;
};
($event->type eq 'pi') && do {
my $pi = $event->data; # A string
[[your code for a processing instruction]]
last SWITCH;
};
($event->type eq 'entity') && do {
my $entity = $event->data; # An object of class SGMLS_Entity
[[your code for an external entity]]
last SWITCH;
};
($event->type eq 'start_subdoc') && do {
my $entity = $event->data; # An object of class SGMLS_Entity
[[your code for the beginning of a subdoc entity]]
last SWITCH;
};
($event->type eq 'end_subdoc') && do {
my $entity = $event->data; # An object of class SGMLS_Entity
[[your code for the end of a subdoc entity]]
last SWITCH;
};
($event->type eq 'conforming') && do {
[[your code for a conforming document]]
last SWITCH;
};
die "Internal error: unknown event type " . $event->type . "\n";
}
$event = $parse->next_event;
}
DESCRIPTION¶
The
SGMLS package consists of several related classes: see
"SGMLS", "SGMLS_Event", "SGMLS_Element",
"SGMLS_Attribute", "SGMLS_Notation", and
"SGMLS_Entity". All of these classes are available when you specify
use SGMLS;
Generally, the only object which you will create explicitly will belong to the
"SGMLS" class; all of the others will then be created automatically
for you over the course of the parse. Much fuller documentation is available
in the ".sgml" files in the "DOC/" directory of the
"SGMLS.pm" distribution.
The "SGMLS" class
This class holds a single parse. When you create an instance of it, you specify
a file handle as an argument (if you are reading the output of
sgmls or
nsgmls from a pipe, the file handle will ordinarily be
"STDIN"):
my $parse = new SGMLS(STDIN);
The most important method for this class is "next_event", which reads
and returns the next major event from the input stream. It is important to
note that the "SGMLS" class deals with most
ESIS events
itself: attributes and entity definitions, for example, are collected and
stored automatically and invisibly to the user. The following list contains
all of the methods for the "SGMLS" class:
- "next_event()": Return an "SGMLS_Event"
object containing the next major event from the SGML parse.
- "element()": Return an "SGMLS_Element"
object containing the current element in the document.
- "file()": Return a string containing the name of
the current SGML source file (this will work only if the "-l"
option was given to sgmls or nsgmls).
- "line()": Return a string containing the current
line number from the source file (this will work only if the "-l"
option was given to sgmls or nsgmls).
- "appinfo()": Return a string containing the
"APPINFO" parameter (if any) from the SGML declaration.
- "notation(NNAME)": Return an
"SGMLS_Notation" object representing the notation named
"NNAME". With newer versions of nsgmls, all notations are
available; otherwise, only the notations which are actually used will be
available.
- "entity(ENAME)": Return an
"SGMLS_Entity" object representing the entity named
"ENAME". With newer versions of nsgmls, all entities are
available; otherwise, only external data entities and internal entities used
as attribute values will be available.
- "ext()": Return a reference to an associative
array for user-defined extensions.
-
The "SGMLS_Event" class
This class holds a single major event, as generated by the
"next_event" method in the "SGMLS" class. It uses the
following methods:
- "type()": Return a string describing the type of
event: "start_element", "end_element",
"cdata", "sdata", "re", "pi",
"entity", "start_subdoc", "end_subdoc", and
"conforming". See "SYNOPSIS", above, for the values
associated with each of these.
- "data()": Return the data associated with the
current event (if any). For "start_element" and
"end_element", returns an "SGMLS_ELement" object; for
"entity", "start_subdoc", and "end_subdoc",
returns an "SGMLS_Entity" object; for "cdata",
"sdata", and "pi", returns a string; and for
"re" and "conforming", returns the empty string. See
"SYNOPSIS", above, for an example of this method's use.
- "key()": Return a string key to the event, such
as an element or entity name (otherwise, the same as
"data()").
- "file()": Return the current file name, as in the
"SGMLS" class.
- "line()": Return the current line number, as in
the "SGMLS" class.
- "element()": Return the current element, as in
the "SGMLS" class.
- "parse()": Return the "SGMLS" object
which generated the event.
- "entity(ENAME)": Look up an entity, as in the
"SGMLS" class.
- "notation(ENAME)": Look up a notation, as in the
"SGMLS" class.
- "ext()": Return a reference to an associative
array for user-defined extensions.
-
The "SGMLS_Element" class
This class is used for elements, and contains all associated information (such
as the element's attributes). It recognises the following methods:
- "name()": Return a string containing the name, or
Generic Identifier, of the element, in upper case.
- "parent()": Return the "SGMLS_Element"
object for the element's parent (if any).
- "parse()": Return the "SGMLS" object
for the current parse.
- "attributes()": Return a reference to an
associative array of attribute names and "SGMLS_Attribute"
structures. Attribute names will be all in upper case.
- "attribute_names()": Return an array of strings
containing the names of all attributes defined for the current element, in
upper case.
- "attribute(ANAME)": Return the
"SGMLS_Attribute" structure for the attribute
"ANAME".
- "set_attribute(ATTRIB)": Add the
"SGMLS_Attribute" object "ATTRIB" to the current
element, replacing any other attribute structure with the same name.
- "in(GI)": Return "true" (ie. 1) if the
string "GI" is the name of the current element's parent, or
"false" (ie. 0) if it is not.
- "within(GI)": Return "true" (ie. 1) if
the string "GI" is the name of any of the ancestors of the current
element, or "false" (ie. 0) if it is not.
- "ext()": Return a reference to an associative
array for user-defined extensions.
-
The "SGMLS_Attribute" class
Each instance of an attribute for each "SGMLS_Element" is an object
belonging to this class, which recognises the following methods:
- "name()": Return a string containing the name of
the current attribute, all in upper case.
- "type()": Return a string containing the type of
the current attribute, all in upper case. Available types are
"IMPLIED", "CDATA", "NOTATION",
"ENTITY", and "TOKEN".
- "value()": Return the value of the current
attribute, if any. This will be an empty string if the type is
"IMPLIED", a string of some sort if the type is "CDATA"
or "TOKEN" (if it is "TOKEN", you may want to split the
string into a series of separate tokens), an "SGMLS_Notation"
object if the type is "NOTATION", or an "SGMLS_Entity"
object if the type is "ENTITY". Note that if the value is
"CDATA", it will not have escape sequences for 8-bit
characters, record ends, or SDATA processed -- that will be your
responsibility.
- "is_implied()": Return "true" (ie. 1)
if the value of the attribute is implied, or "false" (ie. 0) if it
is specified in the document.
- "set_type(TYPE)": Change the type of the
attribute to the string "TYPE" (which should be all in upper
case). Available types are "IMPLIED", "CDATA",
"NOTATION", "ENTITY", and "TOKEN".
- "set_value(VALUE)": Change the value of the
attribute to "VALUE", which may be a string, an
"SGMLS_Entity" object, or an "SGMLS_Notation" subject,
depending on the attribute's type.
- "ext()": Return a reference to an associative
array available for user-defined extensions.
-
The "SGMLS_Notation" class
All declared notations appear as objects belonging to this class, which
recognises the following methods:
- "name()": Return a string containing the name of
the notation.
- "sysid()": Return a string containing the system
identifier of the notation, if any.
- "pubid()": Return a string containing the public
identifier of the notation, if any.
- "ext()": Return a reference to an associative
array available for user-defined extensions.
-
The "SGMLS_Entity" class
All declared entities appear as objects belonging to this class, which
recognises the following methods:
- "name()": Return a string containing the name of
the entity, in mixed case.
- "type()": Return a string containing the type of
the entity, in upper case. Available types are "CDATA",
"SDATA", "NDATA" (external entities only),
"SUBDOC", "PI" (newer versions of nsgmls only),
or "TEXT" (newer versions of nsgmls only).
- "value()": Return a string containing the value
of the entity, if it is internal.
- "sysid()": Return a string containing the system
identifier of the entity (if any), if it is external.
- "pubid()": Return a string containing the public
identifier of the entity (if any), if it is external.
- "filenames()": Return an array of strings
containing any file names generated from the identifiers, if the entity is
external.
- "notation()": Return the
"SGMLS_Notation" object associated with the entity, if it is
external.
- "data_attributes()": Return a reference to an
associative array of data attribute names (in upper case) and the associated
"SGMLS_Attribute" objects for the current entity.
- "data_attribute_names()": Return an array of data
attribute names (in upper case) for the current entity.
- "data_attribute(ANAME)": Return the
"SGMLS_Attribute" object for the data attribute named
"ANAME" for the current entity.
- "set_data_attribute(ATTRIB)": Add the
"SGMLS_Attribute" object "ATTRIB" to the current entity,
replacing any other data attribute with the same name.
- "ext()": Return a reference to an associative
array for user-defined extensions.
AUTHOR AND COPYRIGHT¶
Copyright 1994 and 1995 by David Megginson,
"dmeggins@aix1.uottawa.ca". Distributed under the terms of the Gnu
General Public License (version 2, 1991) -- see the file "COPYING"
which is included in the
SGMLS.pm distribution.
SEE ALSO:¶
SGMLS::Output and SGMLS::Refs.