NAME¶
JudyHS - C library for creating and accessing a dynamic array, using an
array-of-bytes of
Length as an
Index and a word as a
Value.
SYNOPSIS¶
cc [flags] sourcefiles -lJudy
#include <Judy.h>
Word_t * PValue; // JudyHS array element
int Rc_int; // return flag
Word_t Rc_word; // full word return value
Pvoid_t PJHSArray = (Pvoid_t) NULL; // initialize JudyHS array
uint8_t * Index; // array-of-bytes pointer
Word_t Length; // number of bytes in Index
JHSI( PValue, PJHSArray, Index, Length); // JudyHSIns()
JHSD( Rc_int, PJHSArray, Index, Length); // JudyHSDel()
JHSG( PValue, PJHSArray, Index, Length); // JudyHSGet()
JHSFA(Rc_word, PJHSArray); // JudyHSFreeArray()
DESCRIPTION¶
A JudyHS array is the equivalent of an array of word-sized value/pointers. An
Index is a pointer to an array-of-bytes of specified length:
Length. Rather than using a null terminated string, this difference
from
JudySL(3) allows strings to contain all bits (specifically the
null character). This new addition (May 2004) to Judy arrays is a hybird using
the best capabilities of hashing and Judy methods.
JudyHS does not have
a poor performance case where knowledge of the hash algorithm can be used to
degrade the performance.
Since JudyHS is based on a hash method,
Indexes are not stored in any
particular order. Therefore the JudyHSFirst(), JudyHSNext(), JudyHSPrev() and
JudyHSLast() neighbor search functions are not practical. The
Length of
each array-of-bytes can be from 0 to the limits of
malloc() (about
2GB).
The hallmark of
JudyHS is speed with scalability, but memory efficiency
is excellent. The speed is very competitive with the best hashing methods. The
memory efficiency is similar to a linked list of the same
Indexes and
Values.
JudyHS is designed to scale from 0 to billions of
Indexes.
A JudyHS array is allocated with a
NULL pointer
Pvoid_t PJHSArray = (Pvoid_t) NULL;
Because the macro forms of the API have a simpler error handling interface than
the equivalent
functions, they are the preferred way to use JudyHS.
JHSI(PValue, PJHSArray, Index, Length) // JudyHSIns()¶
Given a pointer to a JudyHS array (
PJHSArray), insert an
Index
string of length:
Length and a
Value into the JudyHS array:
PJHSArray. If the
Index is successfully inserted, the
Value is initialized to 0. If the
Index was already present, the
Value is not modified.
Return
PValue pointing to
Value. Your program should use this
pointer to read or modify the
Value, for example:
Value = *PValue;
*PValue = 1234;
Note:
JHSI() and
JHSD can reorganize the JudyHS array.
Therefore, pointers returned from previous
JudyHS calls become invalid
and must be re-acquired (using
JHSG()).
JHSD(Rc_int, PJHSArray, Index, Length) // JudyHSDel()¶
Given a pointer to a JudyHS array (
PJHSArray), delete the specified
Index along with the
Value from the JudyHS array.
Return
Rc_int set to 1 if successfully removed from the array. Return
Rc_int set to 0 if
Index was not present.
JHSG(PValue, PJHSArray, Index, Length) // JudyHSGet()¶
Given a pointer to a JudyHS array (
PJHSArray), find
Value
associated with
Index.
Return
PValue pointing to
Index's
Value. Return
PValue set to
NULL if the
Index was not present.
JHSFA(Rc_word, PJHSArray) // JudyHSFreeArray()¶
Given a pointer to a JudyHS array (
PJHSArray), free the entire array.
Return
Rc_word set to the number of bytes freed and
PJHSArray set
to NULL.
ERRORS: See: Judy_3.htm#ERRORS¶
EXAMPLES¶
Show how to program with the JudyHS macros. This program will print duplicate
lines and their line number from
stdin.
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <Judy.h>
// Compiled:
// cc -O PrintDupLines.c -lJudy -o PrintDupLines
#define MAXLINE 1000000 /* max fgets length of line */
uint8_t Index[MAXLINE]; // string to check
int // Usage: PrintDupLines < file
main()
{
Pvoid_t PJArray = (PWord_t)NULL; // Judy array.
PWord_t PValue; // Judy array element pointer.
Word_t Bytes; // size of JudyHS array.
Word_t LineNumb = 0; // current line number
Word_t Dups = 0; // number of duplicate lines
while (fgets(Index, MAXLINE, stdin) != (char *)NULL)
{
LineNumb++; // line number
// store string into array
JHSI(PValue, PJArray, Index, strlen(Index));
if (PValue == PJERR) // See ERRORS section
{
fprintf(stderr, "Out of memory -- exit\n");
exit(1);
}
if (*PValue == 0) // check if duplicate
{
Dups++;
printf("Duplicate lines %lu:%lu:%s", *PValue, LineNumb, Index);
}
else
{
*PValue = LineNumb; // store Line number
}
}
printf("%lu Duplicates, free JudyHS array of %lu Lines\n",
Dups, LineNumb - Dups);
JHSFA(Bytes, PJArray); // free JudyHS array
printf("JudyHSFreeArray() free'ed %lu bytes of memory\n", Bytes);
return (0);
}
AUTHOR¶
JudyHS was invented and implemented by Doug Baskins after retiring -Packard.
SEE ALSO¶
Judy(3),
Judy1(3),
JudyL(3),
JudySL(3),
malloc(),
the Judy website,
http://judy.sourceforge.net, for further information
and Application Notes.