.\" generated with Ronn/v0.7.3
.\" http://github.com/rtomayko/ronn/tree/0.7.3
.
.TH "ELEKTRA\-DATA\-STRUCTURES" "7" "2015-11-19" "" ""
.
.SH "NAME"
\fBelektra\-data\-structures\fR \- data structures
.
.P
For an introduction, please read first about elektra classes \fIelektra\-classes\.md\fR\. You might want to read about architecture first \fIelektra\-architecture\.md\fR\.
.
.SH "Introduction"
Data structures define the common layer in Elektra\. They are used to transfer configuration between Elektra and applications, but also between plugins\.
.
.SS "ADT"
Both the \fBKeySet\fR and the interface to metadata within a \fBKey\fR are actually \fBADT\fR (abstract data types)\. The API is designed so that different implementations of the data structures can be used internally\.
.
.P
A \fBhash\fR data structure presents a good candidate as alternative data structure, especially for the metadata interface\. It is believed to be much faster on lookup, but considerably slower on sorted enumeration\.
.
.P
A \fBAVL tree\fR also serve as a competitor\. AVL trees are expected to be faster for inserting keys at any place, but may be slower for appending because of the needed reorganisations\. Their disadvantage is that they need to allocate a large number of small pieces of memory\. Further investigations, namely implementation and benchmarks, are required to decide\.
.
.P
A \fBtrie\fR could also be used for lookup of key names\. It performs well for lookup, but needs more memory allocations\.
.
.P
Currently the \fBKeySet\fR is implemented as a sorted array\. It is fast on appending and iterating, and has nearly no size\-overhead\. To improve the lookup\-time, an additional \fBhash\fR will be used\.
.
.SS "ABI compatibility"
Application binary interface, or ABI, is the interface to all data structures of an application or library directly allocated or accessed by the user\.
.
.P
Special care has been taken in Elektra to support all changes within the data structures without any ABI changes\. ABI changes would entail the recompilation of applications and plugins using Elektra\. The functions \fBkeyNew()\fR, \fBksNew()\fR and \fBkdbOpen()\fR allocate the data structures for the applications\. The user only gets pointers to them\. It is not possible for the user to allocate or access these data structures directly when only using the public header file \fB<kdb\.h>\fR\. The functions \fBkeyDel()\fR, \fBksDel()\fR and \fBkdbClose()\fR free the resources after use\. Using the C++ binding deallocation is done automatically\.
.
.SH "Meta data"
Read here \fIelektra\-meta\-data\.md\fR\.
.
.SH "KeySet"
This subsection describes what has changed between 0\.7 and 0\.8 and deals with some general implementation issues\.
.
.SS "Operations"
\fBKeySet\fR resembles the classical mathematical set\. Operations like union, intersection or difference are well defined\. In mathematics typically every operation yields a new set\. Instead, we try to reuse sets in the following ways:
.
.IP "\(bu" 4
A completely new and independent \fBKeySet\fR as return value would resemble the mathematical ideal closely\. This operation would be expensive\. Every \fBKey\fR needs to be duplicated and inserted into a new \fBKeySet\fR\.
.
.IP
Such a \fBdeep duplication\fR was only needed in \fBkdbSet()\fR\.
.
.IP "\(bu" 4
The resulting \fBKeySet\fR is created during the operation, but only a flat copy is made\. This means that the keys in it are actually not duplicated, but only their reference counter is increased\. This method is similar to the mathematical model\. Compared with a deep copy it can achieve good performance\. But all changes to the values of keys in the resulting \fBKeySet\fR affect the original \fBKeySet\fR, too\.
.
.IP
\fBksDup(const KeySet *source)\fR produces a new \fBKeySet\fR that way\. The \fBsource\fR is not changed as shown by the \fBconst\fR modifier\.
.
.IP "\(bu" 4
The result of the operation is applied to the \fBKeySet\fR passed as argument directly\. This is actually quite common, but for this situation other names of the operations are more suitable\.
.
.IP
For example, a union which changes the \fBKeySet\fR is called \fBksAppend()\fR\.
.
.IP "\(bu" 4
A new \fBKeySet\fR is created, but the \fBKeySet\fR passed as parameter is reduced by the keys needed for the new \fBKeySet\fR\. This is useful in situations where many operations have to be applied in a sequence reducing the given \fBKeySet\fR until no more keys are left\. None of the reference pointers changes in this situation\.
.
.IP
\fBksCut(KeySet *ks, const Key *cutpoint)\fR works that way\. All keys below the \fBcutpoint\fR are moved from \fBks\fR to the returned key set\.
.
.IP "" 0
.
.SS "Consistency"
There are several ways to define consistency relations on key sets\. For \fBstrict consistency\fR every parent key must exist before the user can append a key to a key set\. For example, the key set with the keys
.
.IP "" 4
.
.nf

system
system/elektra
system/elektra/mountpoints
.
.fi
.
.IP "" 0
.
.P
would allow the key \fBsystem/elektra/mountpoints/tcl\fR to be added, but not the key \fBsystem/apps/abc\fR because \fBsystem/apps\fR is missing\. File systems enforce this kind of consistency\.
.
.P
These semantics are however not useful for configurations\. Especially for user configurations often only some keys need to be overwritten\. It is not a good idea to copy all parent keys to the users configuration\. For this reason we use a less strict definition of consistency supporting such holes\.
.
.P
We also evaluated a form of \fBweak consistency\fR\. It avoids adding some unnecessary keys\. A constraint is that a key can be added only if it has a parent key\. But the constraint does not apply if no other key exists above the key about to be inserted\. From that moment it will serve as parent key for other keys\. With the current implementation of \fBKeySet\fR, however, it is not possible to decide this constraint in constant time\. Instead its worst\-case complexity would be $log(n) * x$ where $n$ is the number of keys currently in the key set and $x$ is the \fIdepth\fR of the key\. The depth is the number of \fB/\fR in the key name\. The worst\-case of the complexity applies when the inserting works without a parent key\. For example, with the keys
.
.IP "" 4
.
.nf

user/sw/apps/abc/current/bindings
user/sw/apps/abc/current/bindings/key1
user/sw/apps/abc/current/bindings/key2
.
.fi
.
.IP "" 0
.
.P
the weak consistency would allow inserting \fBuser/sw/apps/abc/current/bindings/key3\fR because it is directly below an existing key\. It would also allow adding \fBuser/sw/apps/xyz/current\fR because it does not have any parent key\. But it would not allow \fBuser/sw/apps/abc/current/bindings/dir/key1\fR to add\. The worst\-case complexity was found to be too expensive, and hence \fBKeySet\fR has \fBno consistency\fR check at all\.
.
.P
This means any key with a valid key name can be inserted into \fBKeySet\fR\. The \fBKeySet\fR is changed so that it is now impossible to append keys without a name\. \fBksAppendKey(ks, Key *toAppend)\fR takes ownership of the key \fBtoAppend\fR and will delete the key in that case\. The caller does not have to free \fBtoAppend\fR: either it is in the key set or it is deleted\.
.
.P
Binary search determines the position where to insert a key\. The C version of binary search \fBbsearch()\fR cannot tell where to insert a key when it is not found\. So the algorithm has to be reimplemented\. Java\'s binary search \fBbinarySearch()\fR uses a trick to both indicate where a key is found and where to insert it with the same return code by returning the negative value \fB((\-insertion point) \- 1)\fR indicating where the new value should be inserted when the key is not found\. Elektra now also uses this trick internally\.
.
.SS "Internal Cursor"
\fBKeySet\fR supports an \fBexternal iterator\fR with the two functions \fBksRewind()\fR to go to the beginning and \fBksNext()\fR to advance the \fIinternal cursor\fR to the next key\. This side effect is used to indicate a position for operations on a \fBKeySet\fR without any additional parameter\. This technique is comfortable to see which key has caused an error after an unsuccessful key database operation\.
.
.P
Elektra only has some functions to change the cursor of a key set\. But these allow the user to compose powerful functions\. Plugins do that extensively as we will see later in \fBksLookupRE()\fR\. The user can additionally write more such functions for his or her own purposes\. To change the internal cursor, it is sufficient to iterate over the \fBKeySet\fR and stop at the wanted key\. With this technique, we can, for example, realise lookup by value, by specific metadata and by parts of the name\. Without an additional index, it is not possible that such operations perform more efficiently than by a linear iteration key by key\. For that reason, Elektra\'s core does not provide such functions\. The function \fBksLookupByName()\fR, however, uses the more efficient binary search because the array inside the \fBKeySet\fR is ordered by name\.
.
.SS "External Cursor"
External cursor is an alternative to the approach explained above\. Elektra provides a limited \fIexternal cursor\fR through the interface \fBksGetCursor()\fR and \fBksSetCursor()\fR\. It is not possible to advance this cursor\. The difference to the internal cursor is that the position is not stored within \fBKeySet\fR\.
.
.P
We considered providing an external cursor for performance reasons\. But we found out that the speed of iteration is mainly limited because of safety checks\. The investigated methods become slower proportional to the ease of use and safety\. When using null pointers and range checks, the function is noticeably slower than without\. With the same amount of checks, using an external cursor is not much faster than the \fBksNext()\fR\. External cursor with checks is in a benchmark about 10\e% faster\.
.
.P
But an external cursor directly accessing the array can be much faster\. Using an unchecked external cursor can be about 50% faster than using the internal cursor with ksNext()\. For this endeavour, Elektra\'s private header files need to be included\. Including private header files, however, should not be done with levity because ABI compatibility will be gone on any changes of the data structures\. This fact means the application or plugin needs to be recompiled when any of the internal structures of Elektra are changed\. We strongly discourage including these header files\.
.
.P
Nevertheless, the overall performance impact for iteration is minimal and should not matter too much\. Even if only a single \fBkeySetMeta()\fR is used inside the iteration loop, the iteration costs are insignificant\. Only for trivial actions such as just changing a variable, counter or marker for every key the iteration costs become the lion\'s share\. In such situations an \fIinternal iterator\fR yields better results\. For example, \fBksForEach()\fR applies a user defined function for every key in a \fBKeySet\fR without having null pointer or out of range problems\.
.
.SH "Trie vs\. Split"
Up to now, we have discussed external data structures visible to the user of the library\. The application and plugin programmer needs them to access configuration\. Last, but not least, we will show two internal data structures\. The user will not see them\. To understand the algorithm, however, the user needs to understand them as well\.
.
.SS "Trie"
\fITrie\fR or prefix tree is an ordered tree data structure\. In Elektra, it provides the information to decide in which backend a key resides\. The algorithm, presented in algorithm \fIelektra\-algorithm\.md\fR, also needs a list of all backends\. The initial approach was to iterate over the \fBTrie\fR to get a list of all backends\. But the transformation of a \fBTrie\fR to a list of backends, contained many bugs caused by corner cases in connection with the default backend and cascading mount points\.
.
.SS "Split"
So, instead of transforming the trie to a list of backends, we introduced a new data structure \eintro[Split@\elstinline{Split}]{Split}\. The name \fBSplit\fR comes from the fact that an initial key set is split into many key sets\. These key sets are stored in the \fBSplit\fR object\. \fBSplit\fR advanced to the central data structure for the algorithm:
.
.IP "" 4
.
.nf

typedef struct _Split   Split;

struct _Split {
    size_t size;
    size_t alloc;
    KeySet **keysets;
    Backend **handles;
    Key **parents;
    int *syncbits;
};
.
.fi
.
.IP "" 0
.
.P
The data structure \fBSplit\fR contains the following fields:
.
.IP "\(bu" 4
\fBsize\fR: contains the number of key sets currently in \fBSplit\fR\.
.
.IP "\(bu" 4
\fBalloc\fR: allows us to allocate more items than currently in use\.
.
.IP "\(bu" 4
\fBkeysets\fR represents a list of key sets\. The keys in one of the key sets are known to belong to a specific backend\.
.
.IP "\(bu" 4
\fBhandles\fR: contains a list of handles to backends\.
.
.IP "\(bu" 4
\fBparents\fR: represents a list of keys\. Each \fBparentKey\fR contains the root key of a backend\. No key of the respective key set is above the \fBparentKey\fR\. The key name of \fBparentKey\fR contains the mount point of a backend\. The resolver writes the file name into the value of the \fBparentKey\fR\.
.
.IP "\(bu" 4
\fBsyncbits\fR: are some bits that can be set for every backend\. The algorithm uses the \fBsyncbits\fR to decide if the key set needs to be synchronised\.
.
.IP "" 0
.
.P
Continue reading with the error handling \fIelektra\-error\-handling\.md\fR\.