NAME¶
Set::IntSpan - Manages sets of integers
SYNOPSIS¶
use Set::IntSpan qw(grep_set map_set grep_spans map_spans);
$Set::IntSpan::Empty_String = $string;
$set = new Set::IntSpan $set_spec;
$set = new Set::IntSpan @set_specs;
$valid = valid Set::IntSpan $run_list;
$set = copy $set $set_spec;
$run_list = run_list $set;
@elements = elements $set;
@sets = sets $set;
@spans = spans $set;
$u_set = union $set $set_spec;
$i_set = intersect $set $set_spec;
$x_set = xor $set $set_spec;
$d_set = diff $set $set_spec;
$c_set = complement $set;
$set->U($set_spec); # Union
$set->I($set_spec); # Intersect
$set->X($set_spec); # Xor
$set->D($set_spec); # Diff
$set->C; # Complement
equal $set $set_spec
equivalent $set $set_spec
superset $set $set_spec
subset $set $set_spec
$n = cardinality $set;
$n = size $set;
empty $set
finite $set
neg_inf $set
pos_inf $set
infinite $set
universal $set
member $set $n;
insert $set $n;
remove $set $n;
$min = min $set;
$max = max $set;
$holes = holes $set;
$cover = cover $set;
$inset = inset $set $n;
$smaller = trim $set $n;
$bigger = pad $set $n;
$subset = grep_set { ... } $set;
$mapset = map_set { ... } $set;
$subset = grep_spans { ... } $set;
$mapset = map_spans { ... } $set;
for ($element=$set->first; defined $element; $element=$set->next) { ... }
for ($element=$set->last ; defined $element; $element=$set->prev) { ... }
$element = $set->start($n);
$element = $set->current;
$n = $set->at($i);
$slice = $set->slice($from, $to);
$i = $set->ord($n);
$i = $set->span_ord($n);
Operator overloads¶
$u_set = $set + $set_spec; # union
$i_set = $set * $set_spec; # intersect
$x_set = $set ^ $set_spec; # xor
$d_set = $set - $set_spec; # diff
$c_set = ~$set; # complement
$set += $set_spec; # union
$set *= $set_spec; # intersect
$set ^= $set_spec; # xor
$set -= $set_spec; # diff
$set eq $set_spec # equal
$set ne $set_spec # not equal
$set le $set_spec # subset
$set lt $set_spec # proper subset
$set ge $set_spec # superset
$set gt $set_spec # proper superset
# compare sets by cardinality
$set1 == $set2
$set1 != $set2
$set1 <= $set2
$set1 < $set2
$set1 >= $set2
$set1 > $set2
$set1 <=> $set2
# compare cardinality of set to an integer
$set1 == $n
$set1 != $n
$set1 <= $n
$set1 < $n
$set1 >= $n
$set1 > $n
$set1 <=> $n
@sorted = sort @sets; # sort sets by cardinality
if ($set) { ... } # true if $set is not empty
print "$set\n"; # stringizes to the run list
EXPORTS¶
@EXPORT¶
Nothing
@EXPORT_OK¶
"grep_set", "map_set", "grep_spans",
"map_spans"
DESCRIPTION¶
"Set::IntSpan" manages sets of integers. It is optimized for sets that
have long runs of consecutive integers. These arise, for example, in .newsrc
files, which maintain lists of articles:
alt.foo: 1-21,28,31
alt.bar: 1-14192,14194,14196-14221
A run of consecutive integers is sometimes called a
span.
Sets are stored internally in a run-length coded form. This provides for both
compact storage and efficient computation. In particular, set operations can
be performed directly on the encoded representation.
"Set::IntSpan" is designed to manage finite sets. However, it can also
represent some simple infinite sets, such as { x | x>n }. This allows
operations involving complements to be carried out consistently, without
having to worry about the actual value of INT_MAX on your machine.
SPANS¶
A
span is a run of consecutive integers. A span may be represented by an
array reference, in any of 5 forms:
Span Set
[ $n, $n ] { n }
[ $a, $b ] { x | a<=x && x<=b}
Span Set
[ undef, $b ] { x | x<=b }
[ $a , undef ] { x | x>=a }
[ undef, undef ] The set of all integers
Some methods operate directly on spans.
SET SPECIFICATIONS¶
Many of the methods take a
set specification. There are four kinds of set
specifications.
Empty¶
If a set specification is omitted, then the empty set is assumed. Thus,
$set = new Set::IntSpan;
creates a new, empty set. Similarly,
copy $set;
removes all elements from $set.
Object reference¶
If an object reference is given, it is taken to be a "Set::IntSpan"
object.
Run list¶
If a string is given, it is taken to be a
run list. A run list specifies
a set using a syntax similar to that in newsrc files.
A run list is a comma-separated list of
runs. Each run specifies a set of
consecutive integers. The set is the union of all the runs.
Runs may be written in any of 5 forms.
- n
- { n }
- a-b
- { x | a<=x && x<=b }
- (-n
- { x | x<=n }
- n-)
- { x | x>=n }
- (-)
- The set of all integers
The empty set is consistently written as '' (the null string). It is also
denoted by the special form '-' (a single dash).
Restrictions¶
The runs in a run list must be disjoint, and must be listed in increasing order.
Valid characters in a run list are 0-9, '(', ')', '-' and ','. White space and
underscore (_) are ignored. Other characters are not allowed.
Examples¶
Run list Set
"-" { }
"1" { 1 }
"1-2" { 1, 2 }
"-5--1" { -5, -4, -3, -2, -1 }
"(-)" the integers
"(--1" the negative integers
"1-3, 4, 18-21" { 1, 2, 3, 4, 18, 19, 20, 21 }
Array reference¶
If an array reference is given, then the elements of the array specify the
elements of the set. The array may contain
- •
- integers
- •
- spans
The set is the union of all the integers and spans in the array. The integers
and spans need not be disjoint. The integers and spans may be in any order.
Examples¶
Array ref Set
[ ] { }
[ 1, 1 ] { 1 }
[ 1, 3, 2 ] { 1, 2, 3 }
[ 1, [ 5, 8 ], 5, [ 7, 9 ], 2 ] { 1, 2, 5, 6, 7, 8, 9 }
[ undef, undef ] the integers
[ undef, -1 ] the negative integers
ITERATORS¶
Each set has a single
iterator, which is shared by all calls to
"first", "last", "start", "next",
"prev", and "current". At all times, the iterator is
either an element of the set, or "undef".
"first", "last", and "start" set the iterator;
"next", and "prev" move it; and "current"
returns it. Calls to these methods may be freely intermixed.
Using "next" and "prev", a single loop can move both
forwards and backwards through a set. Using "start", a loop can
iterate over portions of an infinite set.
METHODS¶
Creation¶
- $set = "new"
"Set::IntSpan" $set_spec
- $set = "new"
"Set::IntSpan" @set_specs
- Creates and returns a "Set::IntSpan" object.
The initial contents of the set are given by $set_spec,
or by the union of all the @set_specs.
- $ok = "valid"
"Set::IntSpan" $run_list
- Returns true if $run_list is a valid
run list. Otherwise, returns false and leaves an error message in $@.
- $set = "copy"
$set $set_spec
- Copies $set_spec into
$set . The previous contents of
$set are lost. For convenience, "copy"
returns $set.
- $run_list = "run_list"
$set
- Returns a run list that represents
$set. The run list will not contain white space.
$set is not affected.
By default, the empty set is formatted as '-'; a different string may be
specified in $Set::IntSpan::Empty_String.
- @elements = "elements"
$set
- Returns an array containing the elements of
$set . The elements will be sorted in numerical
order. In scalar context, returns an array reference.
$set is not affected.
- @sets = "sets"
$set
- Returns the runs in $set, as a list of
"Set::IntSpan" objects. The sets in the list are in order.
- @spans = "spans"
$set
- Returns the runs in $set, as a list of
the form
([$a1, $b1],
[$a2, $b2],
...
[$aN, $bN])
If a run contains only a single integer, then the upper and lower bounds of
the corresponding span will be equal.
If the set has no lower bound, then $a1 will be "undef".
Similarly, if the set has no upper bound, then $bN will be
"undef".
The runs in the list are in order.
Set operations¶
For these operations, a new "Set::IntSpan" object is created and
returned. The operands are not affected.
- $u_set = "union"
$set $set_spec
- Returns the set of integers in either
$set or $set_spec.
- $i_set = "intersect"
$set $set_spec
- Returns the set of integers in both
$set and $set_spec.
- $x_set = "xor"
$set $set_spec
- Returns the set of integers in $set or
$set_spec, but not both.
- $d_set = "diff"
$set $set_spec
- Returns the set of integers in $set
but not in $set_spec.
- $c_set = "complement"
$set
- Returns the set of integers that are not in
$set .
Mutators¶
By popular demand, "Set::IntSpan" now has mutating forms of the binary
set operations. These methods alter the object on which they are called.
- $set->"U"($set_spec)
- Makes $set the union of
$set and $set_spec. Returns
$set.
- $set->"I"($set_spec)
- Makes $set the intersection of
$set and $set_spec. Returns
$set.
- $set->"X"($set_spec)
- Makes $set the symmetric difference of
$set and $set_spec. Returns
$set.
- $set->"D"($set_spec)
- Makes $set the difference of
$set and $set_spec. Returns
$set.
- $set->"C"
- Converts $set to its own complement.
Returns $set.
Comparison¶
- "equal" $set
$set_spec
- Returns true iff $set and
$set_spec contain the same elements.
- "equivalent" $set
$set_spec
- Returns true iff $set and
$set_spec contain the same number of elements. All
infinite sets are equivalent.
- "superset" $set
$set_spec
- Returns true iff $set is a superset of
$set_spec.
- "subset" $set
$set_spec
- Returns true iff $set is a subset of
$set_spec.
Cardinality¶
- $n = "cardinality"
$set
- $n = "size"
$set
- Returns the number of elements in
$set. Returns -1 for infinite sets. "size"
is provided as an alias for "cardinality".
- "empty" $set
- Returns true iff $set is empty.
- "finite" $set
- Returns true iff $set is finite.
- "neg_inf" $set
- Returns true iff $set contains {x |
x<n} for some n.
- "pos_inf" $set
- Returns true iff $set contains {x |
x>n} for some n.
- "infinite" $set
- Returns true iff $set is
infinite.
- "universal" $set
- Returns true iff $set contains all
integers.
Membership¶
- "member" $set
$n
- Returns true iff the integer $n is a
member of $set.
- "insert" $set
$n
- Inserts the integer $n into
$set . Does nothing if $n is
already a member of $set.
- "remove" $set
$n
- Removes the integer $n from
$set . Does nothing if $n is not
a member of $set.
Extrema¶
- "min" $set
- Returns the smallest element of $set,
or "undef" if there is none.
- "max" $set
- Returns the largest element of $set,
or "undef" if there is none.
Spans¶
- $holes = "holes"
$set
- Returns a set containing all the holes in
$set , that is, all the integers that are in-between
spans of $set.
"holes" is always a finite set.
- $cover = "cover"
$set
- Returns a set consisting of a single span from
$set ->"min" to
$set->"max". This is the same as
union $set $set->holes
- $inset = "inset"
$set $n
- $smaller = "trim"
$set $n
- $bigger = "pad"
$set $n
- "inset" returns a set constructed by removing
$n integers from each end of each span of
$set . If $n is negative, then
-$n integers are added to each end of each span.
In the first case, spans may vanish from the set; in the second case, holes
may vanish.
"trim" is provided as a synonym for "inset".
"pad" $set $n is the
same as "inset" $set
-$n.
Iterators¶
- $set->"first"
- Sets the iterator for $set to the
smallest element of $set. If there is no smallest
element, sets the iterator to "undef". Returns the
iterator.
- $set->"last"
- Sets the iterator for $set to the
largest element of $set. If there is no largest
element, sets the iterator to "undef". Returns the
iterator.
- $set->"start"($n)
- Sets the iterator for $set to
$n. If $n is not an element of
$set, sets the iterator to "undef". Returns
the iterator.
- $set->"next"
- Sets the iterator for $set to the next
element of $set. If there is no next element, sets
the iterator to "undef". Returns the iterator.
"next" will return "undef" only once; the next call to
"next" will reset the iterator to the smallest element of
$set.
- $set->"prev"
- Sets the iterator for $set to the
previous element of $set. If there is no previous
element, sets the iterator to "undef". Returns the iterator.
"prev" will return "undef" only once; the next call to
"prev" will reset the iterator to the largest element of
$set.
- $set->"current"
- Returns the iterator for $set.
Indexing¶
The elements of a set are kept in numerical order. These methods index into the
set based on this ordering.
- $n =
$set->"at"($i)
- Returns the $ith element of
$set , or "undef" if there is no
$i th element. Negative indices count backwards from
the end of the set.
Dies if
- •
- $i is non-negative and
$set is "neg_inf"
- •
- $i is negative and
$set is "pos_inf"
- $slice =
$set->"slice"( $from,
$to)
- Returns a "Set::IntSpan" object containing the
elements of $set at indices
$from.. $to. Negative indices
count backwards from the end of the set.
Dies if
- •
- $from is non-negative and
$set is "neg_inf"
- •
- $from is negative and
$set is "pos_inf"
- $i =
$set->"ord"($n)
- The inverse of "at".
Returns the index $i of the integer
$n in $set, or "undef"
if $n if not an element of $set.
Dies if $set is "neg_inf".
- $i =
$set->"span_ord"($n)
- Returns the index $i of the span
containing the integer $n, or "undef" if
$n if not an element of $set.
To recover the span containing $n, write
($set->spans)[$i]
OPERATOR OVERLOADS¶
For convenience, some operators are overloaded on "Set::IntSpan"
objects.
set operations¶
One operand must be a "Set::IntSpan" object. The other operand may be
a "Set::IntSpan" object or a set specification.
$u_set = $set + $set_spec; # union
$i_set = $set * $set_spec; # intersect
$x_set = $set ^ $set_spec; # xor
$d_set = $set - $set_spec; # diff
$c_set = ~$set; # complement
$set += $set_spec; # union
$set *= $set_spec; # intersect
$set ^= $set_spec; # xor
$set -= $set_spec; # diff
equality¶
The string comparison operations are overloaded to compare sets for equality and
containment. One operand must be a "Set::IntSpan" object. The other
operand may be a "Set::IntSpan" object or a set specification.
$set eq $set_spec # equal
$set ne $set_spec # not equal
$set le $set_spec # subset
$set lt $set_spec # proper subset
$set ge $set_spec # superset
$set gt $set_spec # proper superset
equivalence¶
The numerical comparison operations are overloaded to compare sets by
cardinality. One operand must be a "Set::IntSpan" object. The other
operand may be a "Set::IntSpan" object or an integer.
$set1 == $set2
$set1 != $set2
$set1 <= $set2
$set1 < $set2
$set1 >= $set2
$set1 > $set2
$set1 <=> $set2
$set1 cmp $set2
$set1 == $n
$set1 != $n
$set1 <= $n
$set1 < $n
$set1 >= $n
$set1 > $n
$set1 <=> $n
$set1 cmp $n
N.B. The "cmp" operator is overloaded to compare sets by cardinality,
not containment. This is done so that
sort @sets
will sort a list of sets by cardinality.
conversion¶
In boolean context, a "Set::IntSpan" object evaluates to true if it is
not empty.
A "Set::IntSpan" object stringizes to its run list.
FUNCTIONS¶
- $sub_set = "grep_set" { ...
} $set
- Evaluates the BLOCK for each integer in
$set (locally setting $_ to each integer) and returns
a "Set::IntSpan" object containing those integers for which the
BLOCK returns TRUE.
Returns "undef" if $set is infinite.
- $map_set = "map_set" { ... }
$set
- Evaluates the BLOCK for each integer in
$set (locally setting $_ to each integer) and returns
a "Set::IntSpan" object containing all the integers returned as
results of all those evaluations.
Evaluates the BLOCK in list context, so each element of
$set may produce zero, one, or more elements in the
returned set. The elements may be returned in any order, and need not be
disjoint.
Returns "undef" if $set is infinite.
- $sub_set = "grep_spans" {
... } $set
- Evaluates the BLOCK for each span in
$set and returns a "Set::IntSpan" object
containing those spans for which the BLOCK returns TRUE.
Within BLOCK, $_ is locally set to an array ref of the form
[ $lower, $upper ]
where $lower and $upper are the
bounds of the span. If the span contains only one integer, then
$lower and $upper will be
equal. If the span is unbounded, then the corresponding element(s) of the
array will be "undef".
- $map_set = "map_spans" { ...
} $set
- Evaluates the BLOCK for each span in
$set, and returns a "Set::IntSpan" object
consisting of the union of all the spans returned as results of all those
evaluations.
Within BLOCK, $_ is locally set to an array ref of the form
[ $lower, $upper ]
as described above for "grep_spans". Each evaluation of BLOCK must
return a list of spans. Each returned list may contain zero, one, or more
spans. Spans may be returned in any order, and need not be disjoint.
However, for each bounded span, the constraint
$lower <= $upper
must hold.
CLASS VARIABLES¶
- $Set::IntSpan::Empty_String
- $Set::IntSpan::Empty_String contains the string that is
returned when "run_list" is called on the empty set.
$Empty_String is initially '-'; alternatively, it may be set to ''. Other
values should be avoided, to ensure that "run_list" always
returns a valid run list.
"run_list" accesses $Empty_String through a reference stored in
$set->{"empty_string"}. Subclasses that
wish to override the value of $Empty_String can reassign this
reference.
DIAGNOSTICS¶
Any method (except "valid") will "die" if it is passed an
invalid run list.
- "Set::IntSpan::_copy_run_list: Bad syntax:"
$runList
- (F) $run_list has bad syntax
- "Set::IntSpan::_copy_run_list: Bad order:"
$runList
- (F) $run_list has overlapping runs or
runs that are out of order.
- "Set::IntSpan::elements: infinite set"
- (F) An infinite set was passed to
"elements".
- "Set::IntSpan::at: negative infinite set"
- (F) "at" was called with a non-negative index on
a negative infinite set.
- "Set::IntSpan::at: positive infinite set"
- (F) "at" was called with a negative index on a
positive infinite set.
- "Set::IntSpan::slice: negative infinite set"
- (F) "slice" was called with
$from non-negative on a negative infinite set.
- "Set::IntSpan::slice: positive infinite set"
- (F) "slice" was called with
$from negative on a positive infinite set.
- "Set::IntSpan::ord: negative infinite set"
- (F) "ord" was called on a negative infinite
set.
- Out of memory!
- (X) "elements" $set can
generate an "Out of memory!" message on sufficiently large
finite sets.
NOTES¶
Traps¶
Beware of forms like
union $set [1..5];
This passes an element of @set to union, which is probably not what you want. To
force interpretation of $set and [1..5] as separate arguments, use forms like
union $set +[1..5];
or
$set->union([1..5]);
grep_set and map_set¶
"grep_set" and "map_set" make it easy to construct sets for
which the internal representation used by "Set::IntSpan" is
not small. Consider:
$billion = new Set::IntSpan '0-1_000_000_000'; # OK
$odd = grep_set { $_ & 1 } $billion; # trouble
$even = map_set { $_ * 2 } $billion; # double trouble
Error handling¶
There are two common approaches to error handling: exceptions and return codes.
There seems to be some religion on the topic, so "Set::IntSpan"
provides support for both.
To catch exceptions, protect method calls with an eval:
$run_list = <STDIN>;
eval { $set = new Set::IntSpan $run_list };
$@ and print "$@: try again\n";
To check return codes, use an appropriate method call to validate arguments:
$run_list = <STDIN>;
if (valid Set::IntSpan $run_list)
{ $set = new Set::IntSpan $run_list }
else
{ print "$@ try again\n" }
Similarly, use "finite" to protect calls to "elements":
finite $set and @elements = elements $set;
Calling "elements" on a large, finite set can generate an "Out of
memory!" message, which cannot (easily) be trapped. Applications that
must retain control after an error can use "intersect" to protect
calls to "elements":
@elements = elements { intersect $set "-1_000_000 - 1_000_000" };
or check the size of $set first:
finite $set and cardinality $set < 2_000_000 and @elements = elements $set;
Limitations¶
Although "Set::IntSpan" can represent some infinite sets, it does
not perform infinite-precision arithmetic. Therefore, finite elements
are restricted to the range of integers on your machine.
Extensions¶
Users report that you can construct Set::IntSpan objects on anything that
behaves like an integer. For example:
$x = new Math::BigInt ...;
$set = new Set::Intspan [ [ $x, $x+5 ] ];
I'm not documenting this as supported behavior, because I don't have the
resources to test it, but I'll try not to break it. If anyone finds problems
with it, let me know.
Roots¶
The sets implemented here are based on a Macintosh data structure called a
region. See Inside Macintosh for more information.
"Set::IntSpan" was originally written to manage run lists for the
"News::Newsrc" module.
AUTHOR¶
Steven McDougall <swmcd@world.std.com>
ACKNOWLEDGMENTS¶
- •
- Malcolm Cook <mec@stowers-institute.org>
- •
- David Hawthorne <dsrthorne@hotmail.com>
- •
- Martin Krzywinski <martink@bcgsc.ca>
- •
- Marc Lehmann <schmorp@schmorp.de>
- •
- Andrew Olson <aolson@me.com>
COPYRIGHT¶
Copyright (c) 1996-2010 by Steven McDougall. This module is free software; you
can redistribute it and/or modify it under the same terms as Perl
itself.