NAME¶
Data::Sorting - Multi-key sort using function results
SYNOPSIS¶
use Data::Sorting qw( :basics :arrays :extras );
# Sorting functions default to simple string comparisons
@names = qw( Bob Alice Ellen Charlie David );
@ordered = sorted_by( undef, @names );
# Various options can be passed before the list values
@ordered = sorted_by( [ -order=>'reverse' ], @names );
# You can also generate a sorting function and then apply it
$function = sort_function();
@ordered = $function->( @names ); # or &{$function}(@names)
@ordered = sort_function( -order=>'reverse' )->( @names );
# The :array functions are prototyped to take the array first
@ordered = sorted_array( @names );
@ordered = sorted_arrayref( \@names );
# You can also sort an array in place, changing its internal order
sort_array( @names );
sort_arrayref( \@names );
# There are several sorting options, such as -compare => 'natural'
@movies = ( 'The Matrix', 'Plan 9', '2001', 'Terminator 2' );
@ordered = sort_function( -compare => 'natural' )->( @movies );
# @ ordered now contains '2001', 'The Matrix', 'Plan 9', 'Terminator 2'
# To sort numbers, pass the -compare => 'numeric' option
@numbers = ( 18, 5, 23, 42, 156, 91, 64 );
@ordered = sorted_by( [ -compare => 'numeric' ], @numbers );
@ordered = sort_function( -compare => 'numeric' )->( @numbers );
@ordered = sorted_array( @numbers, -compare => 'numeric' );
sort_array( @numbers, -compare => 'numeric' );
# You can sort by the results of a function to be called on each item
sort_array( @numbers, -compare => 'numeric', sub { $_[0] % 16 } );
# @numbers now contains 64, 18, 5, 23, 42, 91, 156
# For arrays of datastructures, pass in keys to extract for sorting
@records = (
{ 'rec_id'=>3, 'name'=>{'first'=>'Bob', 'last'=>'Macy'} },
{ 'rec_id'=>1, 'name'=>{'first'=>'Sue', 'last'=>'Jones'} },
{ 'rec_id'=>2, 'name'=>{'first'=>'Al', 'last'=>'Jones' } },
);
@ordered = sorted_array( @records, 'rec_id' );
# For nested data structures, pass an array of keys to fetch
@ordered = sorted_array( @records, ['name','first'] );
# Pass multiple sort keys for multiple-level sorts
@ordered = sorted_array( @records, ['name','last'], ['name','first'] );
# Any selected sort options are applied to all subsequent sort keys
@ordered = sorted_array( @records,
-order => 'reverse', ['name','last'], ['name','first'] );
# Options specified within a hash-ref apply only to that key
@ordered = sorted_array( @records,
{ order=>'reverse', sortkey=>['name','last'] },
['name','first'] );
# Locale support is available if you have Perl 5.004 or later and POSIX
POSIX::setlocale( POSIX::LC_COLLATE(), 'en_US' );
POSIX::setlocale( POSIX::LC_CTYPE(), 'en_US' );
@ordered = sorted_array( @records,
-compare=>'locale', ['name','last'], ['name','first'] );
ABSTRACT¶
Data::Sorting provides functions to sort the contents of arrays based on a
collection of extraction and comparison rules. Extraction rules are used to
identify the attributes of array elements on which the ordering is based;
comparison rules specify how those values should be ordered.
Index strings may be used to retrieve values from array elements, or function
references may be passed in to call on each element. Comparison rules are
provided for numeric, bytewise, and case-insensitive orders, as well as a
'natural' comparison that places numbers first, in numeric order, followed by
the remaining items in case-insensitive textual order.
DESCRIPTION¶
This module provides several public functions with different calling interfaces
that all use the same underlying sorting mechanisms.
These functions may be imported individually or in groups using the following
tags:
- :basics
- sorted_by(), sort_function(): General-purpose sorting
functions.
- :array
- sorted_array(), sorted_arrayref(), sort_array(),
sort_arrayref(): Prototyped functions for arrays.
- :extras
- sort_key_values(), sort_description(): Two accessory
functions that explain how sorting is being carried out.
All of these functions take a list of sorting rules as arguments. See "Sort
Rule Syntax" for a discussion of the contents of the $sort_rule or
@sort_rules parameters shown below.
sorted_by¶
@ordered = sorted_by( $sort_rule, @value_array );
@ordered = sorted_by( $sort_rule, @$value_arrayref );
@ordered = sorted_by( $sort_rule, $value1, $value2, $value3 );
@ordered = sorted_by( \@sort_rules, @value_array );
@ordered = sorted_by( \@sort_rules, @$value_arrayref );
@ordered = sorted_by( \@sort_rules, $value1, $value2, $value3 );
This is a general-purpose sorting function which accepts one or more sort order
rules and a list of input values, then returns the values in the order
specified by the rules.
sort_function¶
@ordered = sort_function( @sort_rules )->( @value_array );
@ordered = sort_function( @sort_rules )->( @$value_arrayref );
@ordered = sort_function( @sort_rules )->( $value1, $value2, $value3 );
Creates an anonymous function which applies the provided sort rules. The
function may be cached and used multiple times to apply the same rules again.
sorted_array¶
@ordered = sorted_array( @value_array, @sort_rules );
@ordered = sorted_array( @$value_arrayref, @sort_rules );
Returns a sorted list of the items without altering the order of the original
list.
sorted_arrayref¶
@ordered = sorted_arrayref( \@value_array, @sort_rules );
@ordered = sorted_arrayref( $value_arrayref, @sort_rules );
Returns a sorted list of the items without altering the order of the original
list.
sort_array¶
sort_array( @value_array, @sort_rules );
sort_array( @$value_arrayref, @sort_rules );
Sorts the contents of the specified array using a list of sorting rules.
sort_arrayref¶
sort_arrayref( \@value_array, @sort_rules );
sort_arrayref( $value_arrayref, @sort_rules );
Equivalent to sort_array, but takes an explicit array reference as its first
argument, rather than an array variable.
sort_key_values¶
@key_values = sort_key_values( \@value_array, @sort_rules );
@key_values = sort_key_values( $value_arrayref, @sort_rules );
Doesn't actually perform any sorting. Extracts and returns the values which
would be used as sort keys from each item in the array, in their original
order.
sort_description¶
@description = sort_description( $descriptor, @sort_rules );
Doesn't actually perform any sorting. Provides descriptive information about the
sort rules for diagnostic purposes.
Sort Rule Syntax¶
The sort rule argument list may contain several different types of parameters,
which are parsed identically by all of the public functions described above.
A sort rule definition list may contain any combination of the following
argument structures:
- nothing
- If no sort keys are specified, a default sort key is created using the
"extract => "self"" option.
@ordered = sorted_array( @names );
- sortkey
- Specifies a sort key. Each sortkey may be either a scalar value, or
an array reference. Appropriate values for a sortkey vary depending
on which "extract" option is being used, and are discussed
further below.
@ordered = sorted_array( @numbers, sub { $_[0] % 8 } );
@ordered = sorted_array( @records, 'rec_id' );
@ordered = sorted_array( @records, ['name','first'] );
Any number of sortkeys may be provided:
@ordered = sorted_array( @records, ['name','last'],
['name','first'] );
- -sortkey => sortkey
- Another way of specifying a sort key is by preceding it with the
"-sortkey" flag.
@ordered = sorted_array( @numbers, -sortkey => sub { $_[0] % 8 } );
@ordered = sorted_array( @records, -sortkey => ['name','last'],
-sortkey => ['name','first'] );
- { sortkey => sortkey, option => option_value,
... }
- Additional options can be specified by passing a reference to a hash
containing a sortkey and values for any number of options described in the
list below.
@ordered = sorted_array( @numbers, { sortkey => sub { abs(shift) },
compare => 'numeric', } );
- -option => option_value
- Sets a default option for any subsequent sortkeys in the argument list.
@ordered = sorted_array( @records, -compare => 'numeric',
-sortkey => sub { abs(shift) });
@ordered = sorted_array( @records, -compare => 'textual',
-sortkey => ['name','last'],
-sortkey => ['name','first'] );
The possible
option values are:
- extract
- Determines the function which will be used to retrieve the sort key value
from each item in the input list.
- compare
- Determines the function which will be used to order the extracted
values.
- order
- Can be set to "reverse" or "descending" to invert the
sort order. Defaults to "ascending".
- engine
- Determines the underlying sorting algorithm which will be used to
implement the sort. Generally left blank, enabling the module to select
the best one available.
Each of these options is discussed at further length below.
For the extract option, you may specify one of the following
option_values:
- any
- The default. Based on the sortkey may behave as the 'self', 'key',
or 'method' options described below.
- self
- Uses the input value as the sort key, unaltered. Typically used when
sorting strings or other scalar values.
- key
- Allows for indexing in to hash or array references, allowing you to sort a
list of arrayrefs based on the nth value in each, or to sort a list
of hashrefs based on a given key.
If the sortkey is an array reference, then the keys are looked up
sequentially, allowing you to sort on the contents of a nested hash or
array structure.
- method
- Uses the sortkey as a method name to be called on each list value,
enabling you to sort objects by some calculated value.
If the sortkey is an array reference, then the first value is used as the
method name and the remaining values as arguments to that method.
- CODEREF
- You may pass in a reference to a custom extraction function that will be
used to retrieve the sort key values for this rule. The function will be
called separately for each value in the input list, receiving that current
value as an argument.
If the sortkey is an array reference, then the first value is used as the
function reference and the remaining values as arguments to be passed
after the item value.
extract => self | method | key | code | CODEREF | ...
sortkey => - | m.name | key/idx | CODEREF | args
Comparison Functions¶
For the compare option, you may specify one of the following
option_values:
- cmp
- The default comparison, using Perl's default cmp operator.
- numeric
- A numeric comparison using Perl's <=> operator.
- textual
- A text-oriented comparison that ignores whitespace and
capitalization.
- natural
- A multi-type comparison that places empty values first, then numeric
values in numeric order, then non-textual values like punctuation,
followed by textual values in text order. The natural ordering also
includes moving subsidiary words to the end, eg "The Book of
Verse" is sorted as "Book of Verse, The"
- locale : $three_way_cmp
- Comparator functions which use the POSIX strcoll function for
ordering.
- lc_locale : $three_way_cmp
- A case-insensitive version of the POSIX strcoll ordering.
- num_lc_locale
- Like the 'natural' style, this comparison distinguishes between empty and
numeric values, but uses the lc_locale function to sort the textual
values.
- CODEREF
- You may pass in a reference to a custom comparison function that will be
used to order the sort key values for this rule.
Each of these functions may return a postive, zero, or negative value based on
the relationship of the values in the $a and $b positions of the current
@ValueSet array. An undefined return indicates that the comparator is unable
to provide an ordering for this pair, in which case the choice will fall
through to the next comparator in the list; if no comparator specifies an
order, they are left in their original order.
Ascending or Descending Order¶
For the order option, you may specify one of the following
option_values:
- forward or ascending
- The default order, from lower values to higher ones.
- reverse or descending
- Reverses the ordering dictated by a sort rule.
Sorting Engines¶
Depending on the specific sorting rules used in a given call, this module
automatically selects an internal function that provides an appropriate
approach to implementing the sort, called the sort "engine".
You can override this selection by setting an "engine" option on the
first sort key, which can either contain either the name of one of the
engines, described below, or a CODEREF with equivalent behavior.
- trivial
- In the common case of sorting raw values with a cmp comparison, the
fast-but-simple "trivial" engine is used, which simply applies
Perl's default sorting.
- orcish
- For a complex multi-key sort the "orcish" engine is typically
selected.
- precalc
- Used when there's only one sorting key.
You may also set the $PreCalculate package variable to true to force this
engine to be selected. Because the sort key values for the list are
calculated before entering Perl's sort operation, there's less of a chance
of possible re-entry problems due to nested uses of the sort operator,
which causes a fatal error in at least some versions of Perl.
- packed
- Some sorts are handled with the Guttman-Rosler technique, extracting
packed keys and using Perl's default sort function, which is substantially
faster, but currently only a limited set of simple comparisons can be
handled this way. (For more information on packed-default sorting, see
http://www.sysarch.com/perl/sort_paper.html or search for
"Guttman-Rosler".)
STATUS AND SUPPORT¶
This release of Data::Sorting is intended for public review and feedback.
Name DSLIP Description
-------------- ----- ---------------------------------------------
Data::
::Sorting bdpfp Multi-key sort using function results
Further information and support for this module is available at
www.evoscript.org.
Please report bugs or other problems to <bugs@evoscript.com>.
BUGS AND TO DO¶
The following issues have been noted for future improvements:
Convert more types of comparisons to packed-default sorts for speed.
Further investigate the current status of the Sort::Records module.
Add a comparator function for an alpha-numeric-spans sorting model like
Sort::Naturally.
Interface to Sort::PolySort for alternate comparator styles, like
"name" and "usdate".
For non-scalar values, compare referents along the lines of
Ref::cmpref().
Provide better handling for nested sorts; perhaps throw an exception from the
inner instance to the outer, catch and set $PreCalculate, then go back into
the loop?
Replace dynamic scoping with object instances for thread safety. May not be
necessary given changes in threading models.
CREDITS AND COPYRIGHT¶
Developed By¶
M. Simon Cavalletto, simonm@cavalletto.org
Evolution Softworks, www.evoscript.org
Copyright¶
Copyright 2003 Matthew Cavalletto.
Portions copyright 1996, 1997, 1998, 1999 Evolution Online Systems, Inc.
License¶
You may use, modify, and distribute this software under the same terms as
Perl.