.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "Statistics::TopK 3pm" .TH Statistics::TopK 3pm "2023-02-03" "perl v5.36.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Statistics::TopK \- Implementation of the top\-k streaming algorithm .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use Statistics::TopK; \& \& my $counter = Statistics::TopK\->new(10); \& while (my $val = ) { \& chomp $val; \& $counter\->add($val); \& } \& my @top = $counter\->top; \& my %counts = $counter\->counts; .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" The \f(CW\*(C`Statistics::TopK\*(C'\fR module implements the top-k streaming algorithm, also know as the \*(L"heavy hitters\*(R" algorithm. It is designed to process data streams and probabilistally calculate the \f(CW\*(C`k\*(C'\fR most frequent items while using limited memory. .PP A typical example would be to determine the top 10 \s-1IP\s0 addresses listed in an access log. A simple solution would be to hash each \s-1IP\s0 address to a counter and then sort the resulting hash by the counter size. But the hash could theoretically require over 4 billion keys. .PP The top-k algorithm only requires storage space proportional to the number of items of interest. It accomplishes this by sacrificing precision, as it is only a probabilistic counter. .SH "METHODS" .IX Header "METHODS" .SS "new" .IX Subsection "new" .Vb 1 \& $counter = Statistics::TopK\->new($k) .Ve .PP Creates a new \f(CW\*(C`Statistics::TopK\*(C'\fR object which is prepared to count the top \&\f(CW$k\fR elements. .SS "add" .IX Subsection "add" .Vb 1 \& $count = $counter\->add($element) .Ve .PP Count the given \f(CW$element\fR and return its approximate count (if any) in the \&\f(CW\*(C`Statistics::TopK\*(C'\fR object. .PP Note that adding an element does not guarantee it will be counted yet, as the algorithm is probabilistic, and the occurrence of the current element might only be used decrease the count of one of the current top elements. .SS "top" .IX Subsection "top" .Vb 1 \& @top = $counter\->top() .Ve .PP Returns a list of the top-k counted elements. .SS "counts" .IX Subsection "counts" .Vb 1 \& %counts = $counter\->counts() .Ve .PP Returns a hash of the top-k counted elements and their counts. .SH "SEE ALSO" .IX Header "SEE ALSO" http://en.wikipedia.org/wiki/Streaming_algorithm#Heavy_hitters .SH "REQUESTS AND BUGS" .IX Header "REQUESTS AND BUGS" Please report any bugs or feature requests to . I will be notified, and then you'll automatically be notified of progress on your bug as I make changes. .SH "SUPPORT" .IX Header "SUPPORT" You can find documentation for this module with the perldoc command. .PP .Vb 1 \& perldoc Statistics::TopK .Ve .PP You can also look for information at: .IP "\(bu" 4 GitHub Source Repository .Sp .IP "\(bu" 4 AnnoCPAN: Annotated \s-1CPAN\s0 documentation .Sp .IP "\(bu" 4 \&\s-1CPAN\s0 Ratings .Sp .IP "\(bu" 4 \&\s-1RT: CPAN\s0's request tracker .Sp .IP "\(bu" 4 Search \s-1CPAN\s0 .Sp .SH "COPYRIGHT AND LICENSE" .IX Header "COPYRIGHT AND LICENSE" Copyright (C) 2009\-2015 gray , all rights reserved. .PP This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. .SH "AUTHOR" .IX Header "AUTHOR" gray,