NAME¶
PDL::Compression - compression utilities
DESCRIPTION¶
These routines generally accept some data as a PDL and compress it into a
smaller PDL. Algorithms typically work on a single dimension and thread over
other dimensions, producing a threaded table of compressed values if more than
one dimension is fed in.
The Rice algorithm, in particular, is designed to be identical to the RICE_1
algorithm used in internal FITS-file compression (see PDL::IO::FITS).
SYNOPSIS¶
use PDL::Compression
($b,$asize) = $a->rice_compress();
$c = $b->rice_expand($asize);
FUNCTIONS¶
METHODS¶
rice_compress¶
Signature: (in(n); [o]out(m); int[o]len(); lbuf(n); int blocksize)
Squishes an input PDL along the 0 dimension by Rice compression. In scalar
context, you get back only the compressed PDL; in list context, you also get
back ancillary information that is required to uncompress the data with
rice_uncompress.
Multidimensional data are threaded over - each row is compressed separately, and
the returned PDL is squished to the maximum compressed size of any row. If any
of the streams could not be compressed (the algorithm produced longer output),
the corresponding length is set to -1 and the row is treated as if it had
length 0.
Rice compression only works on integer data types -- if you have floating point
data you must first quantize them.
The underlying algorithm is identical to the Rice compressor used in CFITSIO
(and is used by PDL::IO::FITS to load and save compressed FITS images).
The optional blocksize indicates how many samples are to be compressed as a
unit; it defaults to 32.
How it works:
Rice compression is a subset of Golomb compression, and works on data sets where
variation between adjacent samples is typically small compared to the dynamic
range of each sample. In this implementation (originally written by Richard
White and contributed to CFITSIO in 1999), the data are divided into blocks of
samples (by default 32 samples per block). Each block has a running difference
applied, and the difference is bit-folded to make it positive definite. High
order bits of the difference stream are discarded, and replaced with a unary
representation; low order bits are preserved. Unary representation is very
efficient for small numbers, but large jumps could give rise to ludicrously
large bins in a plain Golomb code; such large jumps ("high entropy"
samples) are simply recorded directly in the output stream.
Working on astronomical or solar image data, typical compression ratios of 2-3
are achieved.
$out = $pdl->rice_compress($blocksize);
($out, $len, $blocksize, $dim0) = $pdl->rice_compress;
$new = $out->rice_expand;
rice_compress ignores the bad-value flag of the input piddles. It will set the
bad-value flag of all output piddles if the flag is set for any of the input
piddles.
rice_expand¶
Signature: (in(n); [o]out(m); lbuf(n); int blocksize)
Unsquishes a PDL that has been squished by rice_expand.
($out, $len, $blocksize, $dim0) = $pdl->rice_compress;
$copy = $out->rice_expand($dim0, $blocksize);
rice_expand ignores the bad-value flag of the input piddles. It will set the
bad-value flag of all output piddles if the flag is set for any of the input
piddles.
AUTHORS¶
Copyright (C) 2010 Craig DeForest. All rights reserved. There is no warranty.
You are allowed to redistribute this software / documentation under certain
conditions. For details, see the file COPYING in the PDL distribution. If this
file is separated from the PDL distribution, the copyright notice should be
included in the file.
The Rice compression library is derived from the similar library in the CFITSIO
3.24 release, and is licensed under yet more more lenient terms than PDL
itself; that notice is present in the file "ricecomp.c".
BUGS¶
- •
- Currently headers are ignored.
- •
- Currently there is only one compression algorithm.
TODO¶
- •
- Add object encapsulation
- •
- Add test suite