NAME¶
struct::list - Procedures for manipulating lists
SYNOPSIS¶
package require
Tcl 8.4
package require
struct::list ?1.8?
::struct::list longestCommonSubsequence sequence1
sequence2 ?
maxOccurs?
::struct::list longestCommonSubsequence2 sequence1
sequence2 ?
maxOccurs?
::struct::list lcsInvert lcsData len1 len2
::struct::list lcsInvert2 lcs1 lcs2 len1
len2
::struct::list lcsInvertMerge lcsData len1
len2
::struct::list lcsInvertMerge2 lcs1 lcs2 len1
len2
::struct::list reverse sequence
::struct::list shuffle list
::struct::list assign sequence varname
?
varname?...
::struct::list flatten ?
-full? ?
--?
sequence
::struct::list map sequence cmdprefix
::struct::list mapfor var sequence script
::struct::list filter sequence cmdprefix
::struct::list filterfor var sequence expr
::struct::list split sequence cmdprefix
?
passVar failVar?
::struct::list fold sequence initialvalue
cmdprefix
::struct::list shift listvar
::struct::list iota n
::struct::list equal a b
::struct::list repeat size element1 ?
element2
element3...?
::struct::list repeatn value size...
::struct::list dbJoin
?
-inner|
-left|
-right|
-full? ?
-keys
varname? {
keycol table}...
::struct::list dbJoinKeyed
?
-inner|
-left|
-right|
-full? ?
-keys
varname?
table...
::struct::list swap listvar i j
::struct::list firstperm list
::struct::list nextperm perm
::struct::list permutations list
::struct::list foreachperm var list body
DESCRIPTION¶
The
::struct::list namespace contains several useful commands for
processing Tcl lists. Generally speaking, they implement algorithms more
complex or specialized than the ones provided by Tcl itself.
It exports only a single command,
struct::list. All functionality
provided here can be reached through a subcommand of this command.
COMMANDS¶
- ::struct::list longestCommonSubsequence
sequence1 sequence2 ?maxOccurs?
- Returns the longest common subsequence of elements in the
two lists sequence1 and sequence2. If the maxOccurs
parameter is provided, the common subsequence is restricted to elements
that occur no more than maxOccurs times in sequence2.
The return value is a list of two lists of equal length. The first sublist
is of indices into sequence1, and the second sublist is of indices
into sequence2. Each corresponding pair of indices corresponds to
equal elements in the sequences; the sequence returned is the longest
possible.
- ::struct::list longestCommonSubsequence2
sequence1 sequence2 ? maxOccurs?
- Returns an approximation to the longest common sequence of
elements in the two lists sequence1 and sequence2. If the
maxOccurs parameter is omitted, the subsequence computed is exactly
the longest common subsequence; otherwise, the longest common subsequence
is approximated by first determining the longest common sequence of only
those elements that occur no more than maxOccurs times in
sequence2, and then using that result to align the two lists,
determining the longest common subsequences of the sublists between the
two elements.
As with longestCommonSubsequence, the return value is a list of two
lists of equal length. The first sublist is of indices into
sequence1, and the second sublist is of indices into
sequence2. Each corresponding pair of indices corresponds to equal
elements in the sequences. The sequence approximates the longest common
subsequence.
- ::struct::list lcsInvert lcsData
len1 len2
- This command takes a description of a longest common
subsequence ( lcsData), inverts it, and returns the result.
Inversion means here that as the input describes which parts of the two
sequences are identical the output describes the differences instead.
To be fully defined the lengths of the two sequences have to be known and
are specified through len1 and len2.
The result is a list where each element describes one chunk of the
differences between the two sequences. This description is a list
containing three elements, a type and two pairs of indices into
sequence1 and sequence2 respectively, in this order. The
type can be one of three values:
- added
- Describes an addition. I.e. items which are missing in
sequence1 can be found in sequence2. The pair of indices
into sequence1 describes where the added range had been expected to
be in sequence1. The first index refers to the item just before the
added range, and the second index refers to the item just after the added
range. The pair of indices into sequence2 describes the range of
items which has been added to it. The first index refers to the first item
in the range, and the second index refers to the last item in the
range.
- deleted
- Describes a deletion. I.e. items which are in
sequence1 are missing from sequence2. The pair of indices
into sequence1 describes the range of items which has been deleted.
The first index refers to the first item in the range, and the second
index refers to the last item in the range. The pair of indices into
sequence2 describes where the deleted range had been expected to be
in sequence2. The first index refers to the item just before the
deleted range, and the second index refers to the item just after the
deleted range.
- changed
- Describes a general change. I.e a range of items in
sequence1 has been replaced by a different range of items in
sequence2. The pair of indices into sequence1 describes the
range of items which has been replaced. The first index refers to the
first item in the range, and the second index refers to the last item in
the range. The pair of indices into sequence2 describes the range
of items replacing the original range. Again the first index refers to the
first item in the range, and the second index refers to the last item in
the range.
sequence 1 = {a b r a c a d a b r a}
lcs 1 = {1 2 4 5 8 9 10}
lcs 2 = {0 1 3 4 5 6 7}
sequence 2 = {b r i c a b r a c}
Inversion = {{deleted {0 0} {-1 0}}
{changed {3 3} {2 2}}
{deleted {6 7} {4 5}}
{added {10 11} {8 8}}}
- Notes:
- •
- An index of -1 in a deleted chunk refers to
just before the first element of the second sequence.
- •
- Also an index equal to the length of the first sequence in
an added chunk refers to just behind the end of the sequence.
- ::struct::list lcsInvert2 lcs1
lcs2 len1 len2
- Similar to lcsInvert. Instead of directly taking the
result of a call to longestCommonSubsequence this subcommand
expects the indices for the two sequences in two separate lists.
- ::struct::list lcsInvertMerge lcsData
len1 len2
- Similar to lcsInvert. It returns essentially the
same structure as that command, except that it may contain chunks of type
unchanged too.
These new chunks describe the parts which are unchanged between the two
sequences. This means that the result of this command describes both the
changed and unchanged parts of the two sequences in one structure.
sequence 1 = {a b r a c a d a b r a}
lcs 1 = {1 2 4 5 8 9 10}
lcs 2 = {0 1 3 4 5 6 7}
sequence 2 = {b r i c a b r a c}
Inversion/Merge = {{deleted {0 0} {-1 0}}
{unchanged {1 2} {0 1}}
{changed {3 3} {2 2}}
{unchanged {4 5} {3 4}}
{deleted {6 7} {4 5}}
{unchanged {8 10} {5 7}}
{added {10 11} {8 8}}}
- ::struct::list lcsInvertMerge2 lcs1
lcs2 len1 len2
- Similar to lcsInvertMerge. Instead of directly
taking the result of a call to longestCommonSubsequence this
subcommand expects the indices for the two sequences in two separate
lists.
- ::struct::list reverse sequence
- The subcommand takes a single sequence as argument
and returns a new sequence containing the elements of the input sequence
in reverse order.
- ::struct::list shuffle list
- The subcommand takes a list and returns a copy of
that list with the elements it contains in random order. Every possible
ordering of elements is equally likely to be generated. The Fisher-Yates
shuffling algorithm is used internally.
- ::struct::list assign sequence
varname ? varname?...
- The subcommand assigns the first n elements of the
input sequence to the one or more variables whose names were listed
after the sequence, where n is the number of specified variables.
If there are more variables specified than there are elements in the
sequence the empty string will be assigned to the superfluous
variables.
If there are more elements in the sequence than variable names
specified the subcommand returns a list containing the unassigned
elements. Else an empty list is returned.
tclsh> ::struct::list assign {a b c d e} foo bar
c d e
tclsh> set foo
a
tclsh> set bar
b
- ::struct::list flatten ?-full?
?--? sequence
- The subcommand takes a single sequence and returns a
new sequence where one level of nesting was removed from the input
sequence. In other words, the sublists in the input sequence are replaced
by their elements.
The subcommand will remove any nesting it finds if the option -full
is specified.
tclsh> ::struct::list flatten {1 2 3 {4 5} {6 7} {{8 9}} 10}
1 2 3 4 5 6 7 {8 9} 10
tclsh> ::struct::list flatten -full {1 2 3 {4 5} {6 7} {{8 9}} 10}
1 2 3 4 5 6 7 8 9 10
- ::struct::list map sequence
cmdprefix
- The subcommand takes a sequence to operate on and a
command prefix ( cmdprefix) specifying an operation, applies the
command prefix to each element of the sequence and returns a sequence
consisting of the results of that application.
The command prefix will be evaluated with a single word appended to it. The
evaluation takes place in the context of the caller of the subcommand.
tclsh> # squaring all elements in a list
tclsh> proc sqr {x} {expr {$x*$x}}
tclsh> ::struct::list map {1 2 3 4 5} sqr
1 4 9 16 25
tclsh> # Retrieving the second column from a matrix
tclsh> # given as list of lists.
tclsh> proc projection {n list} {::lindex $list $n}
tclsh> ::struct::list map {{a b c} {1 2 3} {d f g}} {projection 1}
b 2 f
- ::struct::list mapfor var
sequence script
- The subcommand takes a sequence to operate on and a
tcl script, applies the script to each element of the sequence and
returns a sequence consisting of the results of that application.
The script will be evaluated as is, and has access to the current list
element through the specified iteration variable var. The
evaluation takes place in the context of the caller of the subcommand.
tclsh> # squaring all elements in a list
tclsh> ::struct::list mapfor x {1 2 3 4 5} {
expr {$x * $x}
}
1 4 9 16 25
tclsh> # Retrieving the second column from a matrix
tclsh> # given as list of lists.
tclsh> ::struct::list mapfor x {{a b c} {1 2 3} {d f g}} {
lindex $x 1
}
b 2 f
- ::struct::list filter sequence
cmdprefix
- The subcommand takes a sequence to operate on and a
command prefix ( cmdprefix) specifying an operation, applies the
command prefix to each element of the sequence and returns a sequence
consisting of all elements of the sequence for which the command
prefix returned true. In other words, this command filters out all
elements of the input sequence which fail the test the
cmdprefix represents, and returns the remaining elements.
The command prefix will be evaluated with a single word appended to it. The
evaluation takes place in the context of the caller of the subcommand.
tclsh> # removing all odd numbers from the input
tclsh> proc even {x} {expr {($x % 2) == 0}}
tclsh> ::struct::list filter {1 2 3 4 5} even
2 4
Note: The
filter is a specialized application of
fold where
the result is extended with the current item or not, depending o nthe result
of the test.
- ::struct::list filterfor var
sequence expr
- The subcommand takes a sequence to operate on and a
tcl expression ( expr) specifying a condition, applies the
conditionto each element of the sequence and returns a sequence consisting
of all elements of the sequence for which the expression returned
true. In other words, this command filters out all elements of the
input sequence which fail the test the condition expr
represents, and returns the remaining elements.
The expression will be evaluated as is, and has access to the current list
element through the specified iteration variable var. The
evaluation takes place in the context of the caller of the subcommand.
tclsh> # removing all odd numbers from the input
tclsh> ::struct::list filterfor x {1 2 3 4 5} {($x % 2) == 0}
2 4
- ::struct::list split sequence
cmdprefix ? passVar failVar?
- This is a variant of method filter, see above.
Instead of returning just the elements passing the test we get lists of
both passing and failing elements.
If no variable names are specified then the result of the command will be a
list containing the list of passing elements, and the list of failing
elements, in this order. Otherwise the lists of passing and failing
elements are stored into the two specified variables, and the result will
be a list containing two numbers, the number of elements passing the test,
and the number of elements failing, in this order.
The interface to the test is the same as used by filter.
- ::struct::list fold sequence
initialvalue cmdprefix
- The subcommand takes a sequence to operate on, an
arbitrary string initial value and a command prefix
(cmdprefix) specifying an operation.
The command prefix will be evaluated with two words appended to it. The
second of these words will always be an element of the sequence. The
evaluation takes place in the context of the caller of the subcommand.
It then reduces the sequence into a single value through repeated
application of the command prefix and returns that value. This reduction
is done by
- 1
- Application of the command to the initial value and the
first element of the list.
- 2
- Application of the command to the result of the last call
and the second element of the list.
- i
- Application of the command to the result of the last call
and the i'th element of the list.
- end
- Application of the command to the result of the last call
and the last element of the list. The result of this call is returned as
the result of the subcommand.
tclsh> # summing the elements in a list.
tclsh> proc + {a b} {expr {$a + $b}}
tclsh> ::struct::list fold {1 2 3 4 5} 0 +
15
- ::struct::list shift listvar
- The subcommand takes the list contained in the variable
named by listvar and shifts it down one element. After the call
listvar will contain a list containing the second to last elements
of the input list. The first element of the ist is returned as the result
of the command. Shifting the empty list does nothing.
- ::struct::list iota n
- The subcommand returns a list containing the integer
numbers in the range [0,n). The element at index i of the
list contain the number i.
For " n == 0" an empty list will be returned.
- ::struct::list equal a b
- The subcommand compares the two lists a and b
for equality. In other words, they have to be of the same length and have
to contain the same elements in the same order. If an element is a list
the same definition of equality applies recursively.
A boolean value will be returned as the result of the command. This value
will be true if the two lists are equal, and false
else.
- ::struct::list repeat size
element1 ? element2 element3...?
- The subcommand creates a list of length " size
* number of elements" by repeating size times the
sequence of elements element1 element2 ....
size must be a positive integer, elementn can be any
Tcl value. Note that repeat 1 arg ... is identical to list arg
..., though the arg is required with repeat.
Examples:
tclsh> ::struct::list repeat 3 a
a a a
tclsh> ::struct::list repeat 3 [::struct::list repeat 3 0]
{0 0 0} {0 0 0} {0 0 0}
tclsh> ::struct::list repeat 3 a b c
a b c a b c a b c
tclsh> ::struct::list repeat 3 [::struct::list repeat 2 a] b c
{a a} b c {a a} b c {a a} b c
- ::struct::list repeatn value
size...
- The subcommand creates a (nested) list containing the
value in all positions. The exact size and degree of nesting is
determined by the size arguments, all of which have to be integer
numbers greater than or equal to zero.
A single argument size which is a list of more than one element will
be treated as if more than argument size was specified.
If only one argument size is present the returned list will not be
nested, of length size and contain value in all positions.
If more than one size argument is present the returned list will be
nested, and of the length specified by the last size argument given
to it. The elements of that list are defined as the result of
Repeat for the same arguments, but with the last size value
removed.
An empty list will be returned if no size arguments are present.
tclsh> ::struct::list repeatn 0 3 4
{0 0 0} {0 0 0} {0 0 0} {0 0 0}
tclsh> ::struct::list repeatn 0 {3 4}
{0 0 0} {0 0 0} {0 0 0} {0 0 0}
tclsh> ::struct::list repeatn 0 {3 4 5}
{{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}} {{0 0 0} {0 0 0} {0 0 0} {0 0 0}}
- ::struct::list dbJoin
?-inner|-left| -right|-full? ?-keys
varname? { keycol table}...
- The method performs a table join according to relational
algebra. The execution of any of the possible outer join operation is
triggered by the presence of either option -left, -right, or
-full. If none of these options is present a regular inner join
will be performed. This can also be triggered by specifying -inner.
The various possible join operations are explained in detail in section
TABLE JOIN.
If the -keys is present its argument is the name of a variable to
store the full list of found keys into. Depending on the exact nature of
the input table and the join mode the output table may not contain all the
keys by default. In such a case the caller can declare a variable for this
information and then insert it into the output table on its own, as she
will have more information about the placement than this command.
What is left to explain is the format of the arguments.
The keycol arguments are the indices of the columns in the tables
which contain the key values to use for the joining. Each argument applies
to the table following immediately after it. The columns are counted from
0, which references the first column. The table associated with the
column index has to have at least keycol+1 columns. An error will
be thrown if there are less.
The table arguments represent a table or matrix of rows and columns
of values. We use the same representation as generated and consumed by the
methods get rect and set rect of matrix objects. In
other words, each argument is a list, representing the whole matrix. Its
elements are lists too, each representing a single rows of the matrix. The
elements of the row-lists are the column values.
The table resulting from the join operation is returned as the result of the
command. We use the same representation as described above for the input
tables.
- ::struct::list dbJoinKeyed
?-inner|-left| -right|-full? ?-keys
varname? table...
- The operations performed by this method are the same as
described above for dbJoin. The only difference is in the
specification of the keys to use. Instead of using column indices separate
from the table here the keys are provided within the table itself. The row
elements in each table are not the lists of column values, but a
two-element list where the second element is the regular list of column
values and the first element is the key to use.
- ::struct::list swap listvar i
j
- The subcommand exchanges the elements at the indices
i and j in the list stored in the variable named by
listvar. The list is modified in place, and also returned as the
result of the subcommand.
- ::struct::list firstperm list
- This subcommand returns the lexicographically first
permutation of the input list.
- ::struct::list nextperm perm
- This subcommand accepts a permutation of a set of elements
(provided by perm) and returns the next permutatation in
lexicographic sequence.
The algorithm used here is by Donal E. Knuth, see section REFERENCES
for details.
- ::struct::list permutations list
- This subcommand returns a list containing all permutations
of the input list in lexicographic order.
- ::struct::list foreachperm var
list body
- This subcommand executes the script body once for
each permutation of the specified list. The permutations are
visited in lexicographic order, and the variable var is set to the
permutation for which body is currently executed. The result of the
loop command is the empty string.
LONGEST COMMON SUBSEQUENCE AND FILE COMPARISON¶
The
longestCommonSubsequence subcommand forms the core of a flexible
system for doing differential comparisons of files, similar to the capability
offered by the Unix command
diff. While this procedure is quite rapid
for many tasks of file comparison, its performance degrades severely if
sequence2 contains many equal elements (as, for instance, when using
this procedure to compare two files, a quarter of whose lines are blank. This
drawback is intrinsic to the algorithm used (see the Reference for details).
One approach to dealing with the performance problem that is sometimes effective
in practice is arbitrarily to exclude elements that appear more than a certain
number of times. This number is provided as the
maxOccurs parameter. If
frequent lines are excluded in this manner, they will not appear in the common
subsequence that is computed; the result will be the longest common
subsequence of infrequent elements. The procedure
longestCommonSubsequence2 implements this heuristic. It functions as a
wrapper around
longestCommonSubsequence; it computes the longest common
subsequence of infrequent elements, and then subdivides the subsequences that
lie between the matches to approximate the true longest common subsequence.
TABLE JOIN¶
This is an operation from relational algebra for relational databases.
The easiest way to understand the regular inner join is that it creates the
cartesian product of all the tables involved first and then keeps only all
those rows in the resulting table for which the values in the specified key
columns are equal to each other.
Implementing this description naively, i.e. as described above will generate a
huge intermediate result. To avoid this the cartesian product and the
filtering of row are done at the same time. What is required is a fast way to
determine if a key is present in a table. In a true database this is done
through indices. Here we use arrays internally.
An
outer join is an extension of the inner join for two tables. There are
three variants of outerjoins, called
left,
right, and
full outer joins. Their result always contains all rows from an inner
join and then some additional rows.
- [1]
- For the left outer join the additional rows are all rows
from the left table for which there is no key in the right table. They are
joined to an empty row of the right table to fit them into the
result.
- [2]
- For the right outer join the additional rows are all rows
from the right table for which there is no key in the left table. They are
joined to an empty row of the left table to fit them into the result.
- [3]
- The full outer join combines both left and right outer
join. In other words, the additional rows are as defined for left outer
join, and right outer join, combined.
We extend all the joins from two to
n tables (
n > 2) by
executing
(...((table1 join table2) join table3) ...) join tableN
Examples for all the joins:
Inner Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} inner join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver}
Left Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} left outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {2 blue {} {}}
Right Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} right outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {{} {} 3 driver}
Full Outer Join
{0 foo} {0 bagel} {0 foo 0 bagel}
{1 snarf} full outer join {1 snatz} = {1 snarf 1 snatz}
{2 blue} {3 driver} {2 blue {} {}}
{{} {} 3 driver}
REFERENCES¶
- [1]
- J. W. Hunt and M. D. McIlroy, "An algorithm for
differential file comparison," Comp. Sci. Tech. Rep. #41, Bell
Telephone Laboratories (1976). Available on the Web at the second author's
personal site: http://www.cs.dartmouth.edu/~doug/
- [2]
- Donald E. Knuth, "Fascicle 2b of 'The Art of Computer
Programming' volume 4". Available on the Web at the author's personal
site: http://www-cs-faculty.stanford.edu/~knuth/fasc2b.ps.gz.
BUGS, IDEAS, FEEDBACK¶
This document, and the package it describes, will undoubtedly contain bugs and
other problems. Please report such in the category
struct :: list of
the
Tcllib SF Trackers
[
http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas
for enhancements you may have for either package and/or documentation.
KEYWORDS¶
Fisher-Yates, assign, common, comparison, diff, differential, equal, equality,
filter, first permutation, flatten, folding, full outer join, generate
permutations, inner join, join, left outer join, list, longest common
subsequence, map, next permutation, outer join, permutation, reduce,
repeating, repetition, reshuffle, reverse, right outer join, shuffle,
subsequence, swapping
CATEGORY¶
Data structures
COPYRIGHT¶
Copyright (c) 2003-2005 by Kevin B. Kenny. All rights reserved
Copyright (c) 2003-2010 Andreas Kupries <andreas_kupries@users.sourceforge.net>