NAME¶
Tree::DAG_Node - An N-ary tree
SYNOPSIS¶
Using as a base class:
package Game::Tree::Node;
use parent 'Tree::DAG_Node';
# Now add your own methods overriding/extending the methods in C<Tree::DAG_Node>...
Using as a class of its own:
use Tree::DAG_Node;
my $root = Tree::DAG_Node->new();
$root->name("I'm the tops");
my $new_daughter = $root->new_daughter;
$new_daughter->name("More");
...
Using with utf-8 data:
read_tree($file_name) works with utf-8 data. See t/read.tree.t and t/tree.utf8.attributes.txt.
Such a file can be created by redirecting the output of tree2string() to a file of type utf-8.
See the docs for Encode for the difference between utf8 and utf-8. In brief, use
utf-8.
DESCRIPTION¶
This class encapsulates/makes/manipulates objects that represent nodes in a tree
structure. The tree structure is not an object itself, but is emergent from
the linkages you create between nodes. This class provides the methods for
making linkages that can be used to build up a tree, while preventing you from
ever making any kinds of linkages which are not allowed in a tree (such as
having a node be its own mother or ancestor, or having a node have two
mothers).
This is what I mean by a "tree structure", a bit redundantly stated:
- o A tree is a special case of an acyclic directed graph
- o A tree is a network of nodes where there's exactly one root node
- Also, the only primary relationship between nodes is the mother-daughter
relationship.
- o No node can be its own mother, or its mother's mother, etc
- o Each node in the tree has exactly one parent
- Except for the root of course, which is parentless.
- o Each node can have any number (0 .. N) daughter nodes
- A given node's daughter nodes constitute an ordered list.
However, you are free to consider this ordering irrelevant. Some
applications do need daughters to be ordered, so I chose to consider this
the general case.
- o A node can appear in only one tree, and only once in that tree
- Notably (notable because it doesn't follow from the two above points), a
node cannot appear twice in its mother's daughter list.
- o There's an idea of up versus down
- Up means towards to the root, and down means away from the root (and
towards the leaves).
- o There's an idea of left versus right
- Left is toward the start (index 0) of a given node's daughter list, and
right is toward the end of a given node's daughter list.
Trees as described above have various applications, among them: representing
syntactic constituency, in formal linguistics; representing contingencies in a
game tree; representing abstract syntax in the parsing of any computer
language -- whether in expression trees for programming languages, or
constituency in the parse of a markup language document. (Some of these might
not use the fact that daughters are ordered.)
(Note: B-Trees are a very special case of the above kinds of trees, and are best
treated with their own class. Check CPAN for modules encapsulating B-Trees; or
if you actually want a database, and for some reason ended up looking here, go
look at AnyDBM_File.)
Many base classes are not usable except as such -- but
"Tree::DAG_Node" can be used as a normal class. You can go ahead and
say:
use Tree::DAG_Node;
my $root = Tree::DAG_Node->new();
$root->name("I'm the tops");
$new_daughter = Tree::DAG_Node->new();
$new_daughter->name("More");
$root->add_daughter($new_daughter);
and so on, constructing and linking objects from "Tree::DAG_Node" and
making useful tree structures out of them.
A NOTE TO THE READER¶
This class is big and provides lots of methods. If your problem is simple (say,
just representing a simple parse tree), this class might seem like using an
atomic sledgehammer to swat a fly. But the complexity of this module's bells
and whistles shouldn't detract from the efficiency of using this class for a
simple purpose. In fact, I'd be very surprised if any one user ever had use
for more that even a third of the methods in this class. And remember: an
atomic sledgehammer
will kill that fly.
OBJECT CONTENTS¶
Implementationally, each node in a tree is an object, in the sense of being an
arbitrarily complex data structure that belongs to a class (presumably
"Tree::DAG_Node", or ones derived from it) that provides methods.
The attributes of a node-object are:
- o mother -- this node's mother. undef if this is a root
- o daughters -- the (possibly empty) list of daughters of this node
- o name -- the name for this node
- Need not be unique, or even printable. This is printed in some of the
various dumper methods, but it's up to you if you don't put anything
meaningful or printable here.
- o attributes -- whatever the user wants to use it for
- Presumably a hashref to whatever other attributes the user wants to store
without risk of colliding with the object's real attributes. (Example
usage: attributes to an SGML tag -- you definitely wouldn't want the
existence of a "mother=foo" pair in such a tag to collide with a
node object's 'mother' attribute.)
Aside from (by default) initializing it to {}, and having the access method
called "attributes" (described a ways below), I don't do
anything with the "attributes" in this module. I basically
intended this so that users who don't want/need to bother deriving a class
from "Tree::DAG_Node", could still attach whatever data they
wanted in a node.
"mother" and "daughters" are attributes that relate to
linkage -- they are never written to directly, but are changed as appropriate
by the "linkage methods", discussed below.
The other two (and whatever others you may add in derived classes) are simply
accessed thru the same-named methods, discussed further below.
About The Documented Interface¶
Stick to the documented interface (and comments in the source -- especially ones
saying "undocumented!" and/or "disfavored!" -- do not
count as documentation!), and don't rely on any behavior that's not in the
documented interface.
Specifically, unless the documentation for a particular method says "this
method returns thus-and-such a value", then you should not rely on it
returning anything meaningful.
A
passing acquaintance with at least the broader details of the source
code for this class is assumed for anyone using this class as a base class --
especially if you're overriding existing methods, and
definitely if
you're overriding linkage methods.
MAIN CONSTRUCTOR, AND INITIALIZER¶
- the constructor CLASS->new() or CLASS->new($options)
- This creates a new node object, calls $object->_init($options) to
provide it sane defaults (like: undef name, undef mother, no daughters,
'attributes' setting of a new empty hashref), and returns the object
created. (If you just said "CLASS-> new()" or
"CLASS->new", then it pretends you called
"CLASS->new({})".)
Currently no options for putting in hashref $options are part of the
documented interface, but the options is here in case you want to add such
behavior in a derived class.
Read on if you plan on using Tree::DAG_New as a base class. (Otherwise feel
free to skip to the description of _init.)
There are, in my mind, two ways to do object construction:
Way 1: create an object, knowing that it'll have certain uninteresting sane
default values, and then call methods to change those values to what you
want. Example:
$node = Tree::DAG_Node->new;
$node->name('Supahnode!');
$root->add_daughter($node);
$node->add_daughters(@some_others)
Way 2: be able to specify some/most/all the object's attributes in the call
to the constructor. Something like:
$node = Tree::DAG_Node->new({
name => 'Supahnode!',
mother => $root,
daughters => \@some_others
});
After some deliberation, I've decided that the second way is a Bad Thing.
First off, it is not markedly more concise than the first way.
Second off, it often requires subtly different syntax (e.g., \@some_others
vs @some_others). It just complicates things for the programmer and the
user, without making either appreciably happier.
See however the comments under "new($hashref)" for options newly
supported in the call to new().
(This is not to say that options in general for a constructor are bad --
"random_network($options)", discussed far below, necessarily
takes options. But note that those are not options for the default values
of attributes.)
Anyway, if you use "Tree::DAG_Node" as a superclass, and you add
attributes that need to be initialized, what you need to do is provide an
_init method that calls $this->SUPER::_init($options) to use its
superclass's _init method, and then initializes the new attributes:
sub _init {
my($this, $options) = @_[0,1];
$this->SUPER::_init($options); # call my superclass's _init to
# init all the attributes I'm inheriting
# Now init /my/ new attributes:
$this->{'amigos'} = []; # for example
}
...or, as I prefer when I'm being a neat freak:
sub _init {
my($this, $options) = @_[0,1];
$this->SUPER::_init($options);
$this->_init_amigos($options);
}
sub _init_amigos {
my $this = $_[0];
# Or my($this,$options) = @_[0,1]; if I'm using $options
$this->{'amigos'} = [];
}
In other words, I like to have each attribute initialized thru a method
named _init_[attribute], which should expect the object as $_[0] and the
options hashref (or {} if none was given) as $_[1]. If you insist on
having your _init recognize options for setting attributes, you might as
well have them dealt with by the appropriate _init_[attribute] method,
like this:
sub _init {
my($this, $options) = @_[0,1];
$this->SUPER::_init($options);
$this->_init_amigos($options);
}
sub _init_amigos {
my($this,$options) = @_[0,1]; # I need options this time
$this->{'amigos'} = [];
$this->amigos(@{$options->{'amigos'}}) if $options->{'amigos'};
}
All this bookkeeping looks silly with just one new attribute in a class
derived straight from "Tree::DAG_Node", but if there's lots of
new attributes running around, and if you're deriving from a class derived
from a class derived from "Tree::DAG_Node", then tidy
stratification/modularization like this can keep you sane.
- the constructor $obj->new() or $obj->new($options)
- Just another way to get at the "new($hashref)" method. This
does not copy $obj, but merely constructs a new object of the same
class as it. Saves you the bother of going $class = ref $obj; $obj2 =
$class->new;
- the method $node->_init($options)
- Initialize the object's attribute values. See the discussion above.
Presumably this should be called only by the guts of the
"new($hashref)" constructor -- never by the end user.
Currently there are no documented options for putting in the $options
hashref, but (in case you want to disregard the above rant) the option
exists for you to use $options for something useful in a derived class.
Please see the source for more information.
- see also (below) the constructors "new_daughter" and
"new_daughter_left"
METHODS¶
add_daughter(LIST)¶
An exact synonym for "add_daughters(LIST)".
add_daughters(LIST)¶
This method adds the node objects in LIST to the (right) end of $mother's
daughter list. Making a node N1 the daughter of another node N2 also
means that N1's
mother attribute is "automatically" set to
N2; it also means that N1 stops being anything else's daughter as it becomes
N2's daughter.
If you try to make a node its own mother, a fatal error results. If you try to
take one of a node N1's ancestors and make it also a daughter of N1, a fatal
error results. A fatal error results if anything in LIST isn't a node object.
If you try to make N1 a daughter of N2, but it's
already a daughter of
N2, then this is a no-operation -- it won't move such nodes to the end of the
list or anything; it just skips doing anything with them.
add_daughter_left(LIST)¶
An exact synonym for "add_daughters_left(LIST)".
add_daughters_left(LIST)¶
This method is just like "add_daughters(LIST)", except that it adds
the node objects in LIST to the (left) beginning of $mother's daughter list,
instead of the (right) end of it.
add_left_sister(LIST)¶
An exact synonym for "add_left_sisters(LIST)".
add_left_sisters(LIST)¶
This adds the elements in LIST (in that order) as immediate left sisters of
$node. In other words, given that B's mother's daughter-list is (A,B,C,D),
calling B->add_left_sisters(X,Y) makes B's mother's daughter-list
(A,X,Y,B,C,D).
If LIST is empty, this is a no-op, and returns empty-list.
This is basically implemented as a call to $node->replace_with(LIST, $node),
and so all replace_with's limitations and caveats apply.
The return value of $node->add_left_sisters(LIST) is the elements of LIST
that got added, as returned by replace_with -- minus the copies of $node you'd
get from a straight call to $node->replace_with(LIST, $node).
add_right_sister(LIST)¶
An exact synonym for "add_right_sisters(LIST)".
add_right_sisters(LIST)¶
Just like add_left_sisters (which see), except that the elements in LIST (in
that order) as immediate
right sisters of $node;
In other words, given that B's mother's daughter-list is (A,B,C,D), calling
B->add_right_sisters(X,Y) makes B's mother's daughter-list (A,B,X,Y,C,D).
address()¶
address(ADDRESS)¶
With the first syntax, returns the address of $node within its tree, based on
its position within the tree. An address is formed by noting the path between
the root and $node, and concatenating the daughter-indices of the nodes this
passes thru (starting with 0 for the root, and ending with $node).
For example, if to get from node ROOT to node $node, you pass thru ROOT, A, B,
and $node, then the address is determined as:
- o ROOT's my_daughter_index is 0
- o A's my_daughter_index is, suppose, 2
- A is index 2 in ROOT's daughter list.
- o B's my_daughter_index is, suppose, 0
- B is index 0 in A's daughter list.
- o $node's my_daughter_index is, suppose, 4
- $node is index 4 in B's daughter list.
The address of the above-described $node is, therefore, "0:2:0:4".
(As a somewhat special case, the address of the root is always "0";
and since addresses start from the root, all addresses start with a
"0".)
The second syntax, where you provide an address, starts from the root of the
tree $anynode belongs to, and returns the node corresponding to that address.
Returns undef if no node corresponds to that address. Note that this routine
may be somewhat liberal in its interpretation of what can constitute an
address; i.e., it accepts "0.2.0.4", besides "0:2:0:4".
Also note that the address of a node in a tree is meaningful only in that tree
as currently structured.
(Consider how ($address1 cmp $address2) may be magically meaningful to you, if
you meant to figure out what nodes are to the right of what other nodes.)
ancestors()¶
Returns the list of this node's ancestors, starting with its mother, then
grandmother, and ending at the root. It does this by simply following the
'mother' attributes up as far as it can. So if $item IS the root, this returns
an empty list.
Consider that scalar($node->ancestors) returns the ply of this node within
the tree -- 2 for a granddaughter of the root, etc., and 0 for root itself.
attribute()¶
attribute(SCALAR)¶
Exact synonyms for "
attributes()" and
"attributes(SCALAR)".
attributes()¶
attributes(SCALAR)¶
In the first form, returns the value of the node object's "attributes"
attribute. In the second form, sets it to the value of SCALAR. I intend this
to be used to store a reference to a (presumably anonymous) hash the user can
use to store whatever attributes he doesn't want to have to store as object
attributes. In this case, you needn't ever set the value of this. (_init has
already initialized it to {}.) Instead you can just do...
$node->attributes->{'foo'} = 'bar';
...to write foo => bar.
clear_daughters()¶
This unlinks all $mother's daughters. Returns the list of what used to be
$mother's daughters.
Not to be confused with "remove_daughters(LIST)".
common(LIST)¶
Returns the lowest node in the tree that is ancestor-or-self to the nodes $node
and LIST.
If the nodes are far enough apart in the tree, the answer is just the root.
If the nodes aren't all in the same tree, the answer is undef.
As a degenerate case, if LIST is empty, returns $node.
common_ancestor(LIST)¶
Returns the lowest node that is ancestor to all the nodes given (in nodes $node
and LIST). In other words, it answers the question: "What node in the
tree, as low as possible, is ancestor to the nodes given ($node and
LIST)?"
If the nodes are far enough apart, the answer is just the root -- except if any
of the nodes are the root itself, in which case the answer is undef (since the
root has no ancestor).
If the nodes aren't all in the same tree, the answer is undef.
As a degenerate case, if LIST is empty, returns $node's mother; that'll be undef
if $node is root.
copy($option)¶
Returns a copy the calling node (the invocant). E.g.: my($copy) = $node ->
copy;
$option is a hashref of options, with these (key => value) pairs:
- o no_attribute_copy => $Boolean
- If set to 1, do not copy the node's attributes.
If not specified, defaults to 0, which copies attributes.
copy_at_and_under()¶
copy_at_and_under($options)¶
This returns a copy of the subtree consisting of $node and everything under it.
If you pass no options, copy_at_and_under pretends you've passed {}.
This works by recursively building up the new tree from the leaves, duplicating
nodes using $orig_node->copy($options_ref) and then linking them up into a
new tree of the same shape.
Options you specify are passed down to calls to $node->copy.
copy_tree()¶
copy_tree($options)¶
This returns the root of a copy of the tree that $node is a member of. If you
pass no options, copy_tree pretends you've passed {}.
This method is currently implemented as just a call to
$this->root->copy_at_and_under($options), but magic may be added in the
future.
Options you specify are passed down to calls to $node->copy.
daughters()¶
This returns the (possibly empty) list of daughters for $node.
delete_tree()¶
Destroys the entire tree that $node is a member of (starting at the root), by
nulling out each node-object's attributes (including, most importantly, its
linkage attributes -- hopefully this is more than sufficient to eliminate all
circularity in the data structure), and then moving it into the class
DEADNODE.
Use this when you're finished with the tree in question, and want to free up its
memory. (If you don't do this, it'll get freed up anyway when your program
ends.)
If you try calling any methods on any of the node objects in the tree you've
destroyed, you'll get an error like:
Can't locate object method "leaves_under"
via package "DEADNODE".
So if you see that, that's what you've done wrong. (Actually, the class DEADNODE
does provide one method: a no-op method "delete_tree". So if you
want to delete a tree, but think you may have deleted it already, it's safe to
call $node->delete_tree on it (again).)
The "
delete_tree()" method is needed because Perl's garbage
collector would never (as currently implemented) see that it was time to
de-allocate the memory the tree uses -- until either you call
$node->delete_tree, or until the program stops (at "global
destruction" time, when
everything is unallocated).
Incidentally, there are better ways to do garbage-collecting on a tree, ways
which don't require the user to explicitly call a method like "
delete_tree()" -- they involve dummy classes, as explained at
<
http://mox.perl.com/misc/circle-destroy.pod>
However, introducing a dummy class concept into "Tree::DAG_Node" would
be rather a distraction. If you want to do this with your derived classes, via
a DESTROY in a dummy class (or in a tree-metainformation class, maybe), then
feel free to.
The only case where I can imagine "
delete_tree()" failing to
totally void the tree, is if you use the hashref in the "attributes"
attribute to store (presumably among other things) references to other nodes'
"attributes" hashrefs -- which 1) is maybe a bit odd, and 2) is your
problem, because it's your hash structure that's circular, not the tree's.
Anyway, consider:
# null out all my "attributes" hashes
$anywhere->root->walk_down({
'callback' => sub {
$hr = $_[0]->attributes; %$hr = (); return 1;
}
});
# And then:
$anywhere->delete_tree;
(I suppose "
delete_tree()" is a "destructor", or as
close as you can meaningfully come for a circularity-rich data structure in
Perl.)
See also "WHEN AND HOW TO DESTROY THE TREE".
depth_under()¶
Returns an integer representing the number of branches between this $node and
the most distant leaf under it. (In other words, this returns the ply of
subtree starting of $node. Consider scalar($it->ancestors) if you want the
ply of a node within the whole tree.)
descendants()¶
Returns a list consisting of all the descendants of $node. Returns empty-list if
$node is a terminal_node.
(Note that it's spelled "descendants", not "descendents".)
draw_ascii_tree([$options])¶
Here, the [] refer to an optional parameter.
Returns an arrayref of lines suitable for printing.
Draws a nice ASCII-art representation of the tree structure.
The tree looks like:
|
<Root>
/-------+-----+---+---\
| | | | |
<I> <H> <D> <E> <B>
/---\ /---\ | | |
| | | | <F> <F> <C>
<J> <J> <J> <J> | |
| | | | <G> <G>
<K> <L> <K> <L>
| |
<M> <M>
| |
<N> <N>
| |
<O> <O>
See scripts/cut.and.paste.subtrees.pl.
Example usage:
print map("$_\n", @{$tree->draw_ascii_tree});
draw_ascii_tree() takes parameters you set in the $options
hashref:
- o h_compact
- Takes 0 or 1. Sets the extent to which
draw_ascii_tree() tries to save horizontal space.
If I think of a better scrunching algorithm, there'll be a "2"
setting for this.
Default: 1.
- o h_spacing
- Takes a number 0 or greater. Sets the number of spaces inserted
horizontally between nodes (and groups of nodes) in a tree.
Default: 1.
- o no_name
- If true, draw_ascii_tree() doesn't print the name of
the node; it simply prints a "*".
Default: 0 (i.e., print the node name.)
- o v_compact
- Takes a number 0, 1, or 2. Sets the degree to which
draw_ascii_tree() tries to save vertical space.
Defaults to 1.
The code occasionally returns trees that are a bit cock-eyed in parts; if anyone
can suggest a better drawing algorithm, I'd be appreciative.
See also "tree2string([$options], [$some_tree])".
dump_names($options)¶
Returns an array.
Dumps, as an indented list, the names of the nodes starting at $node, and
continuing under it. Options are:
- o _depth -- A nonnegative number
- Indicating the depth to consider $node as being at (and so the generation
under that is that plus one, etc.). You may choose to use set _depth =>
scalar($node->ancestors).
Default: 0.
- o tick -- a string to preface each entry with
- This string goes between the indenting-spacing and the node's name. You
may prefer "*" or "-> " or something.
Default: ''.
- o indent -- the string used to indent with
- Another sane value might be '. ' (period, space). Setting it to
empty-string suppresses indenting.
Default: ' ' x 2.
The output is not printed, but is returned as a list, where each item is a line,
with a "\n" at the end.
Note: Names are converted to a printable form using the undocumented function
_dump_quote().
Here, [] represent optional parameters.
Returns a string consisting of the node's name and, optionally, it's attributes.
Possible keys in the $options hashref:
- o no_attributes => $Boolean
- If 1, the node's attributes are not included in the string returned.
Default: 0 (include attributes).
Calls "hashref2string($hashref)".
Called by "node2string([$options], [$node])".
You would not normally call this method.
If you don't wish to supply options, use format_node({}, $node).
generation()¶
Returns a list of all nodes (going left-to-right) that are in $node's generation
-- i.e., that are the some number of nodes down from the root. $root->
generation() is just $root.
Of course, $node is always in its own generation.
generation_under($node)¶
Like "
generation()", but returns only the nodes in $node's
generation that are also descendants of $node -- in other words,
@us = $node->generation_under( $node->mother->mother );
is all $node's first cousins (to borrow yet more kinship terminology) --
assuming $node does indeed have a grandmother. Actually "cousins"
isn't quite an apt word, because @us ends up including $node's siblings and
$node.
Actually, "generation_under($node)" is just an alias to "
generation()", but I figure that this:
@us = $node->generation_under($way_upline);
is a bit more readable than this:
@us = $node->generation($way_upline);
But it's up to you.
$node->generation_under($node) returns just $node.
If you call $node->generation_under($node) but NODE2 is not $node or an
ancestor of $node, it behaves as if you called just $node->
generation().
head2 hashref2string($hashref)
Returns the given hashref as a string.
Called by "format_node([$options], [$node])".
is_daughter_of($node2)¶
Returns true iff $node is a daughter of $node2. Currently implemented as just a
test of ($it->mother eq $node2).
is_node()¶
This always returns true. More pertinently, $object->can('is_node') is true
(regardless of what "
is_node()" would do if called) for
objects belonging to this class or for any class derived from it.
is_root()¶
Returns 1 if the caller is the root, and 0 if it is not.
leaves_under()¶
Returns a list (going left-to-right) of all the leaf nodes under $node.
("Leaf nodes" are also called "terminal nodes" -- i.e.,
nodes that have no daughters.) Returns $node in the degenerate case of $node
being a leaf itself.
left_sister()¶
Returns the node that's the immediate left sister of $node. If $node is the
leftmost (or only) daughter of its mother (or has no mother), then this
returns undef.
See also "add_left_sisters(LIST)" and
"add_right_sisters(LIST)".
left_sisters()¶
Returns a list of nodes that're sisters to the left of $node. If $node is the
leftmost (or only) daughter of its mother (or has no mother), then this
returns an empty list.
See also "add_left_sisters(LIST)" and
"add_right_sisters(LIST)".
lol_to_tree($lol)¶
This must be called as a class method.
Converts something like bracket-notation for "Chomsky trees" (or
rather, the closest you can come with Perl
list-of-lists(-of-lists(-of-lists))) into a tree structure. Returns the root
of the tree converted.
The conversion rules are that: 1) if the last (possibly the only) item in a
given list is a scalar, then that is used as the "name" attribute
for the node based on this list. 2) All other items in the list represent
daughter nodes of the current node -- recursively so, if they are list
references; otherwise, (non-terminal) scalars are considered to denote nodes
with that name. So ['Foo', 'Bar', 'N'] is an alternate way to represent
[['Foo'], ['Bar'], 'N'].
An example will illustrate:
use Tree::DAG_Node;
$lol =
[
[
[ [ 'Det:The' ],
[ [ 'dog' ], 'N'], 'NP'],
[ '/with rabies\\', 'PP'],
'NP'
],
[ 'died', 'VP'],
'S'
];
$tree = Tree::DAG_Node->lol_to_tree($lol);
$diagram = $tree->draw_ascii_tree;
print map "$_\n", @$diagram;
...returns this tree:
|
<S>
|
/------------------\
| |
<NP> <VP>
| |
/---------------\ <died>
| |
<NP> <PP>
| |
/-------\ </with rabies\>
| |
<Det:The> <N>
|
<dog>
By the way (and this rather follows from the above rules), when denoting a LoL
tree consisting of just one node, this:
$tree = Tree::DAG_Node->lol_to_tree( 'Lonely' );
is okay, although it'd probably occur to you to denote it only as:
$tree = Tree::DAG_Node->lol_to_tree( ['Lonely'] );
which is of course fine, too.
mother()¶
This returns what node is $node's mother. This is undef if $node has no mother
-- i.e., if it is a root.
See also "
is_root()" and "
root()".
my_daughter_index()¶
Returns what index this daughter is, in its mother's "daughter" list.
In other words, if $node is ($node->mother->daughters)[3], then
$node->my_daughter_index returns 3.
As a special case, returns 0 if $node has no mother.
name()¶
name(SCALAR)¶
In the first form, returns the value of the node object's "name"
attribute. In the second form, sets it to the value of SCALAR.
new($hashref)¶
These options are supported in $hashref:
- o attributes => A hashref of attributes
- o daughters => An arrayref of nodes
- o mother => A node
- o name => A string
See also "MAIN CONSTRUCTOR, AND INITIALIZER" for a long discussion on
object creation.
new_daughter()¶
new_daughter($options)¶
This
constructs a
new node (of the same class as $mother), and
adds it to the (right) end of the daughter list of $mother. This is
essentially the same as going
$daughter = $mother->new;
$mother->add_daughter($daughter);
but is rather more efficient because (since $daughter is guaranteed new and
isn't linked to/from anything), it doesn't have to check that $daughter isn't
an ancestor of $mother, isn't already daughter to a mother it needs to be
unlinked from, isn't already in $mother's daughter list, etc.
As you'd expect for a constructor, it returns the node-object created.
# Note that if you radically change 'mother'/'daughters' bookkeeping, # you may
have to change this routine, since it's one of the places # that directly
writes to 'daughters' and 'mother'.
new_daughter_left()¶
new_daughter_left($options)¶
This is just like $mother->new_daughter, but adds the new daughter to the
left (start) of $mother's daughter list.
# Note that if you radically change 'mother'/'daughters' bookkeeping, # you may
have to change this routine, since it's one of the places # that directly
writes to 'daughters' and 'mother'.
node2string($options, $t, $vert_dashes)¶
Returns a string of the node's name and attributes, with a leading indent,
suitable for printing.
Possible keys in the $options hashref:
- o no_attributes => $Boolean
- If 1, the node's attributes are not included in the string returned.
Default: 0 (include attributes).
Calls "format_node([$options], [$node])".
Called by "tree2string([$options], [$some_tree])".
random_network($options)¶
This method can be called as a class method or as an object method.
In the first case, constructs a randomly arranged network under a new node, and
returns the root node of that tree. In the latter case, constructs the network
under $node.
Currently, this is implemented a bit half-heartedly, and half-wittedly. I
basically needed to make up random-looking networks to stress-test the various
tree-dumper methods, and so wrote this. If you actually want to rely on this
for any application more serious than that, I suggest examining the source
code and seeing if this does really what you need (say, in reliability of
randomness); and feel totally free to suggest changes to me (especially in the
form of "I rewrote "random_network($options)", here's the
code...")
It takes four options:
- o max_node_count -- maximum number of nodes this tree will be allowed to
have (counting the root)
- Default: 25.
- o min_depth -- minimum depth for the tree
- Leaves can be generated only after this depth is reached, so the tree will
be at least this deep -- unless max_node_count is hit first.
Default: 2.
- o max_depth -- maximum depth for the tree
- The tree will not be deeper than this.
Default: 3 plus min_depth.
- o max_children -- maximum number of children any mother in the tree can
have.
- Default: 4.
read_attributes($s)¶
Parses the string $s and extracts the name and attributes, assuming the format
is as generated by "tree2string([$options], [$some_tree])".
This bascially means the string was generated by
"hashref2string($hashref)".
Attributes may be absent, in which case they default to {}.
Returns a new node with this name and these attributes.
This method is for use by "read_tree($file_name)".
See t/tree.without.attributes.txt and t/tree.with.attributes.txt for sample
data.
read_tree($file_name)¶
Returns the root of the tree read from $file_name.
The file must have been written by re-directing the output of
"tree2string([$options], [$some_tree])" to a file, since it makes
assumptions about the format of the stringified attributes.
read_tree() works with utf-8 data. See t/read.tree.t and
t/tree.utf8.attributes.txt.
Note: To call this method you need a caller. It'll be a tree of 1 node. The
reason is that inside this method it calls various other methods, and for
these calls it needs $self. That way, those methods can be called from
anywhere, and not just from within
read_tree().
For reading and writing trees to databases, see Tree::DAG_Node::Persist.
remove_daughter(LIST)¶
An exact synonym for "remove_daughters(LIST)".
remove_daughters(LIST)¶
This removes the nodes listed in LIST from $mother's daughter list. This is a
no-operation if LIST is empty. If there are things in LIST that aren't a
current daughter of $mother, they are ignored.
Not to be confused with "
clear_daughters()".
replace_with(LIST)¶
This replaces $node in its mother's daughter list, by unlinking $node and
replacing it with the items in LIST. This returns a list consisting of $node
followed by LIST, i.e., the nodes that replaced it.
LIST can include $node itself (presumably at most once). LIST can also be
empty-list. However, if any items in LIST are sisters to $node, they are
ignored, and are not in the copy of LIST passed as the return value.
As you might expect for any linking operation, the items in LIST cannot be
$node's mother, or any ancestor to it; and items in LIST are, of course,
unlinked from their mothers (if they have any) as they're linked to $node's
mother.
(In the special (and bizarre) case where $node is root, this simply calls
$this->unlink_from_mother on all the items in LIST, making them roots of
their own trees.)
Note that the daughter-list of $node is not necessarily affected; nor are the
daughter-lists of the items in LIST. I mention this in case you think
replace_with switches one node for another, with respect to its mother list
and its daughter list, leaving the rest of the tree unchanged. If
that's what you want, replacing $Old with $New, then you want:
$New->set_daughters($Old->clear_daughters);
$Old->replace_with($New);
(I can't say $node's and LIST-items' daughter lists are
never affected my
replace_with -- they can be affected in this case:
$N1 = ($node->daughters)[0]; # first daughter of $node
$N2 = ($N1->daughters)[0]; # first daughter of $N1;
$N3 = Tree::DAG_Node->random_network; # or whatever
$node->replace_with($N1, $N2, $N3);
As a side affect of attaching $N1 and $N2 to $node's mother, they're unlinked
from their parents ($node, and $N1, respectively). But N3's daughter list is
unaffected.
In other words, this method does what it has to, as you'd expect it to.
replace_with_daughters()¶
This replaces $node in its mother's daughter list, by unlinking $node and
replacing it with its daughters. In other words, $node becomes motherless and
daughterless as its daughters move up and take its place. This returns a list
consisting of $node followed by the nodes that were its daughters.
In the special (and bizarre) case where $node is root, this simply unlinks its
daughters from it, making them roots of their own trees.
Effectively the same as $node->replace_with($node->daughters), but more
efficient, since less checking has to be done. (And I also think
$node->replace_with_daughters is a more common operation in tree-wrangling
than $node->replace_with(LIST), so deserves a named method of its own, but
that's just me.)
# Note that if you radically change 'mother'/'daughters' bookkeeping, # you may
have to change this routine, since it's one of the places # that directly
writes to 'daughters' and 'mother'.
right_sister()¶
Returns the node that's the immediate right sister of $node. If $node is the
rightmost (or only) daughter of its mother (or has no mother), then this
returns undef.
See also "add_left_sisters(LIST)" and
"add_right_sisters(LIST)".
right_sisters()¶
Returns a list of nodes that're sisters to the right of $node. If $node is the
rightmost (or only) daughter of its mother (or has no mother), then this
returns an empty list.
See also "add_left_sisters(LIST)" and
"add_right_sisters(LIST)".
root()¶
Returns the root of whatever tree $node is a member of. If $node is the root,
then the result is $node itself.
Not to be confused with "
is_root()".
self_and_descendants()¶
Returns a list consisting of itself (as element 0) and all the descendants of
$node. Returns just itself if $node is a terminal_node.
(Note that it's spelled "descendants", not "descendents".)
self_and_sisters()¶
Returns a list of all nodes (going left-to-right) that have the same mother as
$node -- including $node itself. This is just like
$node->mother->daughters, except that that fails where $node is root,
whereas $root->self_and_siblings, as a special case, returns $root.
(Contrary to how you may interpret how this method is named, "self" is
not (necessarily) the first element of what's returned.)
set_daughters(LIST)¶
This unlinks all $mother's daughters, and replaces them with the daughters in
LIST.
Currently implemented as just $mother->clear_daughters followed by
$mother->add_daughters(LIST).
simple_lol_to_tree($simple_lol)¶
This must be called as a class method.
This is like lol_to_tree, except that rule 1 doesn't apply -- i.e., all scalars
(or really, anything not a listref) in the LoL-structure end up as named
terminal nodes, and only terminal nodes get names (and, of course, that name
comes from that scalar value). This method is useful for making things like
expression trees, or at least starting them off. Consider that this:
$tree = Tree::DAG_Node->simple_lol_to_tree(
[ 'foo', ['bar', ['baz'], 'quux'], 'zaz', 'pati' ]
);
converts from something like a Lispish or Iconish tree, if you pretend the
brackets are parentheses.
Note that there is a (possibly surprising) degenerate case of what I'm calling a
"simple-LoL", and it's like this:
$tree = Tree::DAG_Node->simple_lol_to_tree('Lonely');
This is the (only) way you can specify a tree consisting of only a single node,
which here gets the name 'Lonely'.
sisters()¶
Returns a list of all nodes (going left-to-right) that have the same mother as
$node --
not including $node itself. If $node is root, this returns
empty-list.
string2hashref($s)¶
Returns a hashref built from the string.
The string is expected to be something like '{AutoCommit => '1', PrintError
=> "0", ReportError => 1}'.
The empty string is returned as {}.
tree_to_lol()¶
Returns that tree (starting at $node) represented as a LoL, like what $lol,
above, holds. (This is as opposed to
"tree_to_lol_notation($options)", which returns the viewable code
like what gets evaluated and stored in $lol, above.)
Lord only knows what you use this for -- maybe for feeding to Data::Dumper, in
case "tree_to_lol_notation($options)" doesn't do just what you want?
tree_to_lol_notation($options)¶
Dumps a tree (starting at $node) as the sort of LoL-like bracket notation you
see in the above example code. Returns just one big block of text. The only
option is "multiline" -- if true, it dumps the text as the sort of
indented structure as seen above; if false (and it defaults to false), dumps
it all on one line (with no indenting, of course).
For example, starting with the tree from the above example, this:
print $tree->tree_to_lol_notation, "\n";
prints the following (which I've broken over two lines for sake of printability
of documentation):
[[[['Det:The'], [['dog'], 'N'], 'NP'], [["/with rabies\x5c"],
'PP'], 'NP'], [['died'], 'VP'], 'S'],
Doing this:
print $tree->tree_to_lol_notation({ multiline => 1 });
prints the same content, just spread over many lines, and prettily indented.
Note: Names are converted to a printable form using the undocumented function
_dump_quote().
tree_to_simple_lol()¶
Returns that tree (starting at $node) represented as a simple-LoL -- i.e., one
where non-terminal nodes are represented as listrefs, and terminal nodes are
gotten from the contents of those nodes' "name' attributes.
Note that in the case of $node being terminal, what you get back is the same as
$node->name.
Compare to tree_to_simple_lol_notation.
tree_to_simple_lol_notation($options)¶
A simple-LoL version of tree_to_lol_notation (which see); takes the same
options.
Note: Names are converted to a printable form using the undocumented function
_dump_quote().
tree2string([$options], [$some_tree])¶
Here, the [] represent optional parameters.
Returns an arrayref of lines, suitable for printing.
Draws a nice ASCII-art representation of the tree structure.
The tree looks like:
Root. Attributes: {# => "0"}
|---I. Attributes: {# => "1"}
| |---J. Attributes: {# => "3"}
| | |---K. Attributes: {# => "3"}
| |---J. Attributes: {# => "4"}
| |---L. Attributes: {# => "5"}
| |---M. Attributes: {# => "5"}
| |---N. Attributes: {# => "5"}
| |---O. Attributes: {# => "5"}
|---H. Attributes: {# => "2"}
| |---J. Attributes: {# => "3"}
| | |---K. Attributes: {# => "3"}
| |---J. Attributes: {# => "4"}
| |---L. Attributes: {# => "5"}
| |---M. Attributes: {# => "5"}
| |---N. Attributes: {# => "5"}
| |---O. Attributes: {# => "5"}
|---D. Attributes: {# => "6"}
| |---F. Attributes: {# => "8"}
| |---G. Attributes: {# => "8"}
|---E. Attributes: {# => "7"}
| |---F. Attributes: {# => "8"}
| |---G. Attributes: {# => "8"}
|---B. Attributes: {# => "9"}
|---C. Attributes: {# => "9"}
Or, without attributes:
Root
|---I
| |---J
| | |---K
| |---J
| |---L
| |---M
| |---N
| |---O
|---H
| |---J
| | |---K
| |---J
| |---L
| |---M
| |---N
| |---O
|---D
| |---F
| |---G
|---E
| |---F
| |---G
|---B
|---C
See scripts/cut.and.paste.subtrees.pl.
Example usage:
print map("$_\n", @{$tree->tree2string});
Can be called with $some_tree set to any $node, and will print the tree assuming
$node is the root.
If you don't wish to supply options, use tree2string({}, $node).
Possible keys in the $options hashref (which defaults to {}):
- o no_attributes => $Boolean
- If 1, the node's attributes are not included in the string returned.
Default: 0 (include attributes).
Calls "node2string($options, $t, $vert_dashes)".
See also "draw_ascii_tree([$options])".
unlink_from_mother()¶
This removes node from the daughter list of its mother. If it has no mother,
this is a no-operation.
Returns the mother unlinked from (if any).
walk_down($options)¶
Performs a depth-first traversal of the structure at and under $node. What it
does at each node depends on the value of the options hashref, which you must
provide. There are three options, "callback" and
"callbackback" (at least one of which must be defined, as a sub
reference), and "_depth".
This is what
walk_down() does, in pseudocode form:
- o Starting point
- Start at the $node given.
- o Callback
- If there's a callback, call it with $node as the first argument,
and the options hashref as the second argument (which contains the
potentially useful _depth, remember). This function must return
true or false -- if false, it will block the next step:
- o Daughters
- If $node has any daughter nodes, increment _depth, and call
$daughter->walk_down($options) for each daughter (in order, of course),
where options_hashref is the same hashref it was called with. When this
returns, decrements _depth.
- Callbackback
- If there's a callbackback, call just it as with callback
(but tossing out the return value). Note that callback returning
false blocks traversal below $node, but doesn't block calling callbackback
for $node. (Incidentally, in the unlikely case that $node has stopped
being a node object, callbackback won't get called.)
- o Return
$node->walk_down($options) is the way to recursively do things to a tree (if
you start at the root) or part of a tree; if what you're doing is best done
via pre-pre order traversal, use
callback; if what you're doing is best
done with post-order traversal, use
callbackback.
walk_down() is even the basis for plenty of the methods
in this class. See the source code for examples both simple and horrific.
Note that if you don't specify
_depth, it effectively defaults to 0. You
should set it to scalar($node->ancestors) if you want
_depth to
reflect the true depth-in-the-tree for the nodes called, instead of just the
depth below $node. (If $node is the root, there's difference, of course.)
And
by the way, it's a bad idea to modify the tree from the callback.
Unpredictable things may happen. I instead suggest having your callback add to
a stack of things that need changing, and then, once
walk_down() is all finished, changing those nodes from
that stack.
Note that the existence of
walk_down() doesn't mean you
can't write you own special-use traversers.
WHEN AND HOW TO DESTROY THE TREE¶
It should be clear to you that if you've built a big parse tree or something,
and then you're finished with it, you should call $some_node->delete_tree
on it if you want the memory back.
But consider this case: you've got this tree:
A
/ | \
B C D
| | \
E X Y
Let's say you decide you don't want D or any of its descendants in the tree, so
you call D->unlink_from_mother. This does NOT automagically destroy the
tree D-X-Y. Instead it merely splits the tree into two:
A D
/ \ / \
B C X Y
|
E
To destroy D and its little tree, you have to explicitly call delete_tree on it.
Note, however, that if you call C->unlink_from_mother, and if you don't have
a link to C anywhere, then it
does magically go away. This is because
nothing links to C -- whereas with the D-X-Y tree, D links to X and Y, and X
and Y each link back to D. Note that calling C->delete_tree is harmless --
after all, a tree of only one node is still a tree.
So, this is a surefire way of getting rid of all $node's children and freeing up
the memory associated with them and their descendants:
foreach my $it ($node->clear_daughters) { $it->delete_tree }
Just be sure not to do this:
foreach my $it ($node->daughters) { $it->delete_tree }
$node->clear_daughters;
That's bad; the first call to $_->delete_tree will climb to the root of
$node's tree, and nuke the whole tree, not just the bits under $node. You
might as well have just called $node->delete_tree. (Moreavor, once $node is
dead, you can't call clear_daughters on it, so you'll get an error there.)
BUG REPORTS¶
If you find a bug in this library, report it to me as soon as possible, at the
address listed in the MAINTAINER section, below. Please try to be as specific
as possible about how you got the bug to occur.
HELP!¶
If you develop a given routine for dealing with trees in some way, and use it a
lot, then if you think it'd be of use to anyone else, do email me about it; it
might be helpful to others to include that routine, or something based on it,
in a later version of this module.
It's occurred to me that you might like to (and might yourself develop routines
to) draw trees in something other than ASCII art. If you do so -- say, for
PostScript output, or for output interpretable by some external plotting
program -- I'd be most interested in the results.
RAMBLINGS¶
This module uses "strict", but I never wrote it with -w warnings in
mind -- so if you use -w, do not be surprised if you see complaints from the
guts of DAG_Node. As long as there is no way to turn off -w for a given module
(instead of having to do it in every single subroutine with a "local
$^W"), I'm not going to change this. However, I do, at points, get bursts
of ambition, and I try to fix code in DAG_Node that generates warnings,
as
I come across them -- which is only occasionally. Feel free to email me
any patches for any such fixes you come up with, tho.
Currently I don't assume (or enforce) anything about the class membership of
nodes being manipulated, other than by testing whether each one provides a
method "
is_node()", a la:
die "Not a node!!!" unless UNIVERSAL::can($node, "is_node");
So, as far as I'm concerned, a given tree's nodes are free to belong to
different classes, just so long as they provide/inherit "
is_node()", the few methods that this class relies on to navigate
the tree, and have the same internal object structure, or a superset of it.
Presumably this would be the case for any object belonging to a class derived
from "Tree::DAG_Node", or belonging to "Tree::DAG_Node"
itself.
When routines in this class access a node's "mother" attribute, or its
"daughters" attribute, they (generally) do so directly (via
$node->{'mother'}, etc.), for sake of efficiency. But classes derived from
this class should probably do this instead thru a method (via
$node->mother, etc.), for sake of portability, abstraction, and general
goodness.
However, no routines in this class (aside from, necessarily,
_init() ,
_init_name(), and
"
name()") access the "name" attribute directly;
routines (like the various tree draw/dump methods) get the "name"
value thru a call to $obj->
name(). So if you want the object's name
to not be a real attribute, but instead have it derived dynamically from some
feature of the object (say, based on some of its other attributes, or based on
its address), you can to override the "
name()" method,
without causing problems. (Be sure to consider the case of $obj->name as a
write method, as it's used in
/lol_to_tree($lol) and
"random_network($options)".)
FAQ¶
Which is the best tree processing module?¶
"Tree::DAG_Node", as it happens. More details: "SEE ALSO".
How to process every node in tree?¶
See "walk_down($options)". $options normally looks like this, assuming
we wish to pass in an arrayref as a stack:
my(@stack);
$tree -> walk_down
({
callback =>
sub
{
my(@node, $options) = @_;
# Process $node, using $options...
push @{$$options{stack} }, $node -> name;
return 1; # Keep walking.
},
_depth => 0,
stack => \@stack,
});
# Process @stack...
How do I switch from Tree to Tree::DAG_Node?¶
- o The node's name
- In "Tree" you use $node -> value and in
"Tree::DAG_Node" it's $node -> name.
- o The node's attributes
- In "Tree" you use $node -> meta and in
"Tree::DAG_Node" it's $node -> attributes.
Are there techniques for processing lists of nodes?¶
- o Copy the daughter list, and change it
-
@them = $mother->daughters;
@removed = splice(@them, 0, 2, @new_nodes);
$mother->set_daughters(@them);
- o Select a sub-set of nodes
-
$mother->set_daughters
(
grep($_->name =~ /wanted/, $mother->daughters)
);
Why did you break up the sections of methods in the POD?¶
Because I want to list the methods in alphabetical order.
Why did you move the POD to the end?¶
Because the apostrophes in the text confused the syntax hightlighter in my
editor UltraEdit.
SEE ALSO¶
- o HTML::Element, HTML::Tree and HTML::TreeBuilder
- Sean is also the author of these modules.
- o Tree
- Lightweight.
- o Tree::Binary
- Lightweight.
- o Tree::DAG_Node::Persist
- Lightweight.
- o Tree::Persist
- Lightweight.
- o Forest
- Uses Moose.
"Tree::DAG_Node" itself is also lightweight.
REFERENCES¶
Wirth, Niklaus. 1976.
Algorithms + Data Structures = Programs
Prentice-Hall, Englewood Cliffs, NJ.
Knuth, Donald Ervin. 1997.
Art of Computer Programming, Volume 1,
Third Edition: Fundamental Algorithms. Addison-Wesley, Reading, MA.
Wirth's classic, currently and lamentably out of print, has a good section on
trees. I find it clearer than Knuth's (if not quite as encyclopedic), probably
because Wirth's example code is in a block-structured high-level language
(basically Pascal), instead of in assembler (MIX).
Until some kind publisher brings out a new printing of Wirth's book, try poking
around used bookstores (or "www.abebooks.com") for a copy. I think
it was also republished in the 1980s under the title
Algorithms and Data
Structures, and in a German edition called
Algorithmen und
Datenstrukturen. (That is, I'm sure books by Knuth were published under
those titles, but I'm
assuming that they're just later
printings/editions of
Algorithms + Data Structures = Programs.)
MACHINE-READABLE CHANGE LOG¶
The file Changes was converted into Changelog.ini by Module::Metadata::Changes.
SUPPORT¶
Email the author, or log a bug on RT:
<
https://rt.cpan.org/Public/Dist/Display.html?Name=Tree::DAG_Node>.
ACKNOWLEDGEMENTS¶
The code to print the tree, in
tree2string(), was adapted from
Forest::Tree::Writer::ASCIIWithBranches by the dread Stevan Little.
MAINTAINER¶
David Hand, "<cogent@cpan.org>" up to V 1.06.
Ron Savage "<rsavage@cpan.org>" from V 1.07.
In this POD, usage of 'I' refers to Sean, up until V 1.07.
AUTHOR¶
Sean M. Burke, "<sburke@cpan.org>"
COPYRIGHT, LICENSE, AND DISCLAIMER¶
Copyright 1998-2001, 2004, 2007 by Sean M. Burke and David Hand.
This program is free software. It is released under the Artistic License 2.0.
See <
http://opensource.org/licenses/Artistic-2.0>.
This program is distributed in the hope that it will be useful, but without any
warranty; without even the implied warranty of merchantability or fitness for
a particular purpose.