.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "Utils 3pm" .TH Utils 3pm "2023-04-08" "perl v5.36.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" PDL::VectorValued::Utils \- Low\-level utilities for vector\-valued PDLs .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 2 \& use PDL; \& use PDL::VectorValued::Utils; \& \& ##\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \& ## ... stuff happens .Ve .SH "FUNCTIONS" .IX Header "FUNCTIONS" .SH "Vector-Based Run-Length Encoding and Decoding" .IX Header "Vector-Based Run-Length Encoding and Decoding" .SS "vv_rlevec" .IX Subsection "vv_rlevec" .Vb 1 \& Signature: (c(M,N); indx [o]a(N); [o]b(M,N)) .Ve .PP Run-length encode a set of vectors. .PP Higher-order \fBrle()\fR, for use with \fBqsortvec()\fR. .PP Given set of vectors \f(CW$c\fR, generate a vector \f(CW$a\fR with the number of occurrences of each element (where an \*(L"element\*(R" is a vector of length \f(CW$M\fR ocurring in \f(CW$c\fR), and a set of vectors \f(CW$b\fR containing the unique values. As for \fBrle()\fR, only the elements up to the first instance of 0 in \f(CW$a\fR should be considered. .PP Can be used together with \fBclump()\fR to run-length encode \*(L"values\*(R" of arbitrary dimensions. Can be used together with \fBrotate()\fR, \fBcat()\fR, \fBappend()\fR, and \fBqsortvec()\fR to count N\-grams over a 1d \s-1PDL.\s0 .PP See also: PDL::Slices::rle, PDL::Ufunc::qsortvec, PDL::Primitive::uniqvec .PP vv_rlevec does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "vv_rldvec" .IX Subsection "vv_rldvec" .Vb 1 \& Signature: (int a(N); b(M,N); [o]c(M,N)) .Ve .PP Run-length decode a set of vectors, akin to a higher-order \fBrld()\fR. .PP Given a vector $a() of the number of occurrences of each row, and a set $c() of row-vectors each of length \f(CW$M\fR, run-length decode to $c(). .PP Can be used together with \fBclump()\fR to run-length decode \*(L"values\*(R" of arbitrary dimensions. .PP See also: PDL::Slices::rld. .PP vv_rldvec does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "vv_enumvec" .IX Subsection "vv_enumvec" .Vb 1 \& Signature: (v(M,N); int [o]k(N)) .Ve .PP Enumerate a list of vectors with locally unique keys. .PP Given a sorted list of vectors \f(CW$v\fR, generate a vector \f(CW$k\fR containing locally unique keys for the elements of \f(CW$v\fR (where an \*(L"element\*(R" is a vector of length \f(CW$M\fR ocurring in \f(CW$v\fR). .PP Note that the keys returned in \f(CW$k\fR are only unique over a run of a single vector in \f(CW$v\fR, so that each unique vector in \f(CW$v\fR has at least one 0 (zero) index in \f(CW$k\fR associated with it. If you need global keys, see \fBenumvecg()\fR. .PP vv_enumvec does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "vv_enumvecg" .IX Subsection "vv_enumvecg" .Vb 1 \& Signature: (v(M,N); int [o]k(N)) .Ve .PP Enumerate a list of vectors with globally unique keys. .PP Given a sorted list of vectors \f(CW$v\fR, generate a vector \f(CW$k\fR containing globally unique keys for the elements of \f(CW$v\fR (where an \*(L"element\*(R" is a vector of length \f(CW$M\fR ocurring in \f(CW$v\fR). Basically does the same thing as: .PP .Vb 1 \& $k = $v\->vsearchvec($v\->uniqvec); .Ve .PP \&... but somewhat more efficiently. .PP vv_enumvecg does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "vv_rleseq" .IX Subsection "vv_rleseq" .Vb 1 \& Signature: (c(N); indx [o]a(N); [o]b(N)) .Ve .PP Run-length encode a vector of subsequences. .PP Given a vector of $c() of concatenated variable-length, variable-offset subsequences, generate a vector \f(CW$a\fR containing the length of each subsequence and a vector \f(CW$b\fR containing the subsequence offsets. As for \fBrle()\fR, only the elements up to the first instance of 0 in \f(CW$a\fR should be considered. .PP See also PDL::Slices::rle. .PP vv_rleseq does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "vv_rldseq" .IX Subsection "vv_rldseq" .Vb 1 \& Signature: (int a(N); b(N); [o]c(M)) .Ve .PP Run-length decode a subsequence vector. .PP Given a vector $a() of sequence lengths and a vector $b() of corresponding offsets, decode concatenation of subsequences to $c(), as for: .PP .Vb 2 \& $c = null; \& $c = $c\->append($b($_)+sequence($a\->type,$a($_))) foreach (0..($N\-1)); .Ve .PP See also: PDL::Slices::rld. .PP vv_rldseq does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "vv_vsearchvec" .IX Subsection "vv_vsearchvec" .Vb 1 \& Signature: (find(M); which(M,N); int [o]found()) .Ve .PP Routine for searching N\-dimensional values \- akin to \fBvsearch()\fR for vectors. .PP .Vb 2 \& $found = vsearchvec($find, $which); \& $nearest = $which\->dice_axis(1,$found); .Ve .PP Returns for each row-vector in \f(CW$find\fR the index along dimension N of the least row vector of \f(CW$which\fR greater or equal to it. \&\f(CW$which\fR should be sorted in increasing order. If the value of \f(CW$find\fR is larger than any member of \f(CW$which\fR, the index to the last element of \f(CW$which\fR is returned. .PP See also: \fBPDL::Primitive::vsearch()\fR. .PP vv_vsearchvec does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SH "Vector-Valued Sorting and Comparison" .IX Header "Vector-Valued Sorting and Comparison" The following functions are provided for lexicographic sorting of vectors, rsp. axis indices. As of PDL::VectorValued v1.0.12, \fBvv_qsortvec()\fR and \&\fBvv_qsortveci()\fR are just deprecated aliases for the builtin \s-1PDL\s0 functions of the same names. Older versions of this module used a dedicated implementation as a workaround for a bug in \s-1PDL\-2.4.3,\s0 which has long since been fixed. .SS "vv_cmpvec" .IX Subsection "vv_cmpvec" .Vb 1 \& Signature: (a(N); b(N); int [o]cmp()) .Ve .PP Lexicographically compare a pair of vectors. .PP vv_cmpvec does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "vv_qsortvec" .IX Subsection "vv_qsortvec" .Vb 1 \& Signature: (a(n,m); [o]b(n,m)) .Ve .PP Deprecated alias for \fBPDL::Ufunc::qsortvec()\fR, which see for details. .SS "vv_qsortveci" .IX Subsection "vv_qsortveci" .Vb 1 \& Signature: (a(n,m); indx [o]ix(m)) .Ve .PP Deprecated alias for \fBPDL::Ufunc::qsortveci()\fR, which see for details. .SH "Vector-Valued Set Operations" .IX Header "Vector-Valued Set Operations" The following functions are provided for set operations on sorted vector-valued PDLs. .SS "vv_union" .IX Subsection "vv_union" .Vb 1 \& Signature: (a(M,NA); b(M,NB); [o]c(M,NC); int [o]nc()) .Ve .PP Union of two vector-valued PDLs. Input PDLs $a() and $b() \fB\s-1MUST\s0\fR be sorted in lexicographic order. On return, $\fBnc()\fR holds the actual number of vector-values in the union. .PP In scalar context, slices $c() to the actual number of elements in the union and returns the sliced \s-1PDL.\s0 .PP vv_union does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "vv_intersect" .IX Subsection "vv_intersect" .Vb 1 \& Signature: (a(M,NA); b(M,NB); [o]c(M,NC); int [o]nc()) .Ve .PP Intersection of two vector-valued PDLs. Input PDLs $a() and $b() \fB\s-1MUST\s0\fR be sorted in lexicographic order. On return, $\fBnc()\fR holds the actual number of vector-values in the intersection. .PP In scalar context, slices $c() to the actual number of elements in the intersection and returns the sliced \s-1PDL.\s0 .PP vv_intersect does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "vv_setdiff" .IX Subsection "vv_setdiff" .Vb 1 \& Signature: (a(M,NA); b(M,NB); [o]c(M,NC); int [o]nc()) .Ve .PP Set-difference ($a() \e $b()) of two vector-valued PDLs. Input PDLs $a() and $b() \fB\s-1MUST\s0\fR be sorted in lexicographic order. On return, $\fBnc()\fR holds the actual number of vector-values in the computed vector set. .PP In scalar context, slices $c() to the actual number of elements in the output vector set and returns the sliced \s-1PDL.\s0 .PP vv_setdiff does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SH "Sorted Vector Set Operations" .IX Header "Sorted Vector Set Operations" The following functions are provided for set operations on flat sorted PDLs with unique values. They may be more efficient to compute than the corresponding implementations via \fBPDL::Primitive::setops()\fR. .SS "v_union" .IX Subsection "v_union" .Vb 1 \& Signature: (a(NA); b(NB); [o]c(NC); int [o]nc()) .Ve .PP Union of two flat sorted unique-valued PDLs. Input PDLs $a() and $b() \fB\s-1MUST\s0\fR be sorted in lexicographic order and contain no duplicates. On return, $\fBnc()\fR holds the actual number of values in the union. .PP In scalar context, reshapes $c() to the actual number of elements in the union and returns it. .PP v_union does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "v_intersect" .IX Subsection "v_intersect" .Vb 1 \& Signature: (a(NA); b(NB); [o]c(NC); int [o]nc()) .Ve .PP Intersection of two flat sorted unique-valued PDLs. Input PDLs $a() and $b() \fB\s-1MUST\s0\fR be sorted in lexicographic order and contain no duplicates. On return, $\fBnc()\fR holds the actual number of values in the intersection. .PP In scalar context, reshapes $c() to the actual number of elements in the intersection and returns it. .PP v_intersect does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SS "v_setdiff" .IX Subsection "v_setdiff" .Vb 1 \& Signature: (a(NA); b(NB); [o]c(NC); int [o]nc()) .Ve .PP Set-difference ($a() \e $b()) of two flat sorted unique-valued PDLs. Input PDLs $a() and $b() \fB\s-1MUST\s0\fR be sorted in lexicographic order and contain no duplicate values. On return, $\fBnc()\fR holds the actual number of values in the computed vector set. .PP In scalar context, reshapes $c() to the actual number of elements in the difference set and returns it. .PP v_setdiff does not process bad values. It will set the bad-value flag of all output ndarrays if the flag is set for any of the input ndarrays. .SH "Miscellaneous Vector-Valued Operations" .IX Header "Miscellaneous Vector-Valued Operations" .SS "vv_vcos" .IX Subsection "vv_vcos" .Vb 1 \& Signature: (a(M,N);b(M);float+ [o]vcos(N)) .Ve .PP Computes the vector cosine similarity of a dense vector $b() with respect to each row $a(*,i) of a dense \s-1PDL\s0 $a(). This is basically the same thing as: .PP .Vb 1 \& ($a * $b)\->sumover / ($a\->pow(2)\->sumover\->sqrt * $b\->pow(2)\->sumover\->sqrt) .Ve .PP \&... but should be must faster to compute, and avoids allocating potentially large temporaries for the vector magnitudes. Output values in $\fBvcos()\fR are cosine similarities in the range [\-1,1], except for zero-magnitude vectors which will result in NaN values in $\fBvcos()\fR. .PP You can use \s-1PDL\s0 threading to batch-compute distances for multiple $b() vectors simultaneously: .PP .Vb 2 \& $bx = random($M, $NB); ##\-\- get $NB random vectors of size $N \& $vcos = vv_vcos($a,$bx); ##\-\- $vcos(i,j) ~ sim($a(,i),$b(,j)) .Ve .PP \&\fBvv_vcos()\fR will set the bad status flag on the output piddle $\fBvcos()\fR if it is set on either of the input piddles $a() or $b(), but \s-1BAD\s0 values will otherwise be ignored for computing the cosine similarity. .SH "ACKNOWLEDGEMENTS" .IX Header "ACKNOWLEDGEMENTS" .IP "\(bu" 4 Perl by Larry Wall .IP "\(bu" 4 \&\s-1PDL\s0 by Karl Glazebrook, Tuomas J. Lukka, Christian Soeller, and others. .IP "\(bu" 4 Code for \fBrlevec()\fR and \fBrldvec()\fR derived from the \s-1PDL\s0 builtin functions \&\fBrle()\fR and \fBrld()\fR in \f(CW$PDL_SRC_ROOT\fR/Basic/Slices/slices.pd .SH "KNOWN BUGS" .IX Header "KNOWN BUGS" Probably many. .SH "AUTHOR" .IX Header "AUTHOR" Bryan Jurish .SH "COPYRIGHT" .IX Header "COPYRIGHT" .IP "\(bu" 4 Code for \fBqsortvec()\fR copyright (C) Tuomas J. Lukka 1997. Contributions by Christian Soeller (c.soeller@auckland.ac.nz) and Karl Glazebrook (kgb@aaoepp.aao.gov.au). All rights reserved. There is no warranty. You are allowed to redistribute this software / documentation under certain conditions. For details, see the file \s-1COPYING\s0 in the \s-1PDL\s0 distribution. If this file is separated from the \s-1PDL\s0 distribution, the copyright notice should be included in the file. .IP "\(bu" 4 All other parts copyright (c) 2007\-2022, Bryan Jurish. All rights reserved. .Sp This package is free software, and entirely without warranty. You may redistribute it and/or modify it under the same terms as Perl itself. .SH "SEE ALSO" .IX Header "SEE ALSO" \&\fBperl\fR\|(1), \s-1\fBPDL\s0\fR\|(3perl)