NAME¶
KiokuDB::Tutorial - Getting started with KiokuDB
INSTALLATION¶
The easiest way to install KiokuDB along with a number of backends is
Task::KiokuDB.
KiokuDB depends on Moose and a few other modules out of the box, but no specific
storage module.
KiokuDB is a frontend to several backends, much like DBI uses DBDs to connect to
actual databases.
For development and testing you can use the KiokuDB::Backend::Hash backend,
which is an in memory store, but for production use KiokuDB::Backend::DBI or
KiokuDB::Backend::BDB are the recommended backends.
See below for instructions on getting KiokuDB::Backend::BDB installed.
CREATING A DIRECTORY HANDLE¶
A KiokuDB directory is the main object through which all work is done.
The simplest directory that is ready for use can be created like this:
my $dir = KiokuDB->new(
backend => KiokuDB::Backend::Hash->new
);
We will revisit other more interesting backend configuration later in this
document, but for now this will do.
You can also use DSN strings to connect to the various backends:
KiokuDB->connect("hash");
KiokuDB->connect("dbi:SQLite:dbname=foo", create => 1);
KiokuDB->connect("bdb:dir=foo", create => 1);
You can also use a configuration file:
KiokuDB->connect("/path/to/my_db.yml");
Which is just a YAML file:
---
# these are basically the arguments for 'new'
backend:
class: KiokuDB::Backend::DBI
dsn: dbi:SQLite:dbname=/tmp/test.db
create: 1
USING THE DBI BACKEND¶
During this tutorial we will be using the DBI backend for two reasons. The first
is DBI's ubiquity. The second is the possibility of easily looking behind the
scenes, to more clearly demonstrate what KiokuDB is doing.
That said, the examples will work with all backends exactly the same.
First we create $dir:
my $dir = KiokuDB->connect(
"dbi:SQLite:dbname=kiokudb_tutorial.db",
create => 1, # this causes the tables to be created
);
Note that if you are connecting with a username and password you need to specify
these as named arguments:
my $dir = KiokuDB->connect(
$dsn,
user => $user,
password => $password,
);
INSERTING OBJECTS¶
Let's start by defining a simple class using Moose:
package Person;
use Moose;
has name => (
isa => "Str",
is => "rw",
);
We can instantiate it:
my $obj = Person->new( name => "Homer Simpson" );
and insert the object to the database as follows:
my $scope = $dir->new_scope;
my $homer_id = $dir->store($obj);
This is very trivial use of KiokuDB, but it illustrates a few important things.
First, no schema is necessary. KiokuDB uses Moose to introspect your object
without needing to predefine anything like tables.
Second, every object in the database has an ID. If you don't choose an ID for an
object, KiokuDB will assign a UUID instead.
This ID is like a primary key in a relational database.
You can also specify an ID instead of letting one be generated:
$dir->store( homer => $obj );
Third, all KiokuDB operations need to be performed within a
scope. The
scope is not really doing anything important in this simple example, but
becomes necessary when cycles and weak references are in use. We will look
into that in more detail later.
LOADING OBJECTS¶
So now that Homer has been inserted into the database, we can fetch him out of
there using the ID we got from "store".
my $homer = $dir->lookup($homer_id);
Assuming that $scope and $obj are still in scope, $homer and $obj will actually
be the same object:
# this is true:
refaddr($homer) == refaddr($obj)
This is because KiokuDB tracks which objects are "live" in the
live
object set (KiokuDB::LiveObjects).
If the object wasn't already in memory then KiokuDB would have fetched it from
the backend instead.
WHAT WAS STORED¶
Let's peek into the database:
% sqlite3 kiokudb_tutorial.db
SQLite version 3.4.0
Enter ".help" for instructions
sqlite>
The database schema has two tables, "entries" and
"gin_index":
sqlite> .tables
entries gin_index
"gin_index" is used for more complex queries, and we'll get back to it
at the end of the tutorial.
For now let's just have a closer look at "entries":
sqlite> .schema entries
CREATE TABLE entries (
id varchar NOT NULL,
data blob NOT NULL,
class varchar,
root boolean NOT NULL,
tied char(1),
PRIMARY KEY (id)
);
The main columns are "id" and "data". In KiokuDB every
object has an ID which serves as a primary key and a BLOB of data associated
with it.
Since the default serializer for the DBI backend is KiokuDB::Serializer::JSON,
we examine the data.
First let's set "sqlite"'s output mode to "line". This is
easier to read with large columns:
sqlite> .mode line
And select the data from the table:
sqlite> select id, data from entries;
id = 201C5B55-E759-492F-8F20-A529C7C02C8B
data = {"__CLASS__":"Person","data":{"name":"Homer Simpson"},"id":"201C5B55-E759-492F-8F20-A529C7C02C8B","root":true}
As you can see the "name" attribute is stored under the
"data" key inside the blob, as is the object's class.
The "data" column contains all of the data necessary to recreate the
object.
All the other columns are only for searches. Later on you'll also see how to
create user defined columns.
When using KiokuDB::Backend::BDB the on-disk format is just a hash of
"id" to "data" with no additional columns.
OBJECT RELATIONSHIPS¶
Let's extend the "Person" class to hold some more interesting data
than just a "name":
package Person;
has spouse => (
isa => "Person",
is => "rw",
weak_ref => 1,
);
This new "spouse" attribute will hold a reference to another person
object.
Let's first create and insert another object:
my $marge_id = $dir->store(
Person->new( name => "Marge Simpson" ),
);
Now that we have both objects in the database, let's link them together:
{
my $scope = $dir->new_scope;
my ( $marge, $homer ) = $dir->lookup( $marge_id, $homer_id );
$marge->spouse($homer);
$homer->spouse($marge);
$dir->store( $marge, $homer );
}
Now we have created a persistent
object graph, that is several objects
which point to each other.
The reason "spouse" had the "weak_ref" option was so that
this circular structure will not leak.
When then objects are updated in the database, KiokuDB sees that their
"spouse" attribute contains references, and this relationship will
be encoded using their unique ID in storage.
To load the graph, we can do something like this:
{
my $scope = $dir->new_scope;
my $homer = $dir->lookup($homer_id);
print $homer->spouse->name; # Marge Simpson
}
{
my $scope = $dir->new_scope;
my $marge = $dir->lookup($marge_id);
print $marge->spouse->name; # Homer Simpson
refaddr($marge) == refaddr($marge->spouse->spouse); # true
}
When KiokuDB is loading the initial object, all the objects the object depends
on will also be loaded. The "spouse" attribute contains a reference
to another object (by ID), and this link is resolved at inflation time.
The purpose of "new_scope"¶
This is where "new_scope" becomes important. As objects are inflated
from the database, they are pushed onto the live object scope, in order to
increase their reference count.
If this was not done, by the time $homer was returned from "lookup"
his "spouse" attribute would have been cleared because there is no
other reference to Marge.
This demonstrates why:
sub get_homer {
my $homer = Person->new( name => "Homer Simpson" );
my $marge = Person->new( name => "Marge Simpson" );
$homer->spouse($marge);
$marge->spouse($homer);
return $homer;
# at this point $homer and $marge go out of scope
# $homer has a refcount of 1 because it's the return value
# $marge has a refcount of 0, and gets destroyed
# the weak reference in $homer->spouse is cleared
}
my $homer = get_homer();
$homer->spouse; # this returns undef
By using this idiom:
{
my $scope = $dir->new_scope;
# do all KiokuDB work in here
}
You are ensuring that the objects live at least as long as is necessary.
In a web application context you usually create one new scope per request. In
fact, Catalyst::Model::KiokuDB does this automatically.
REFERENCES IN THE DATABASE¶
Now that we have an object graph in the database let's have another look at
what's inside.
sqlite> select id, data from entries;
id = 201C5B55-E759-492F-8F20-A529C7C02C8B
data = {"__CLASS__":"Person","data":{"name":"Homer Simpson","spouse":{"$ref":"05A8D61C-6139-4F51-A748-101010CC8B02.data"}},"id":"201C5B55-E759-492F-8F20-A529C7C02C8B","root":true}
id = 05A8D61C-6139-4F51-A748-101010CC8B02
data = {"__CLASS__":"Person","data":{"name":"Marge Simpson","spouse":{"$ref":"201C5B55-E759-492F-8F20-A529C7C02C8B.data"}},"id":"05A8D61C-6139-4F51-A748-101010CC8B02","root":true}
You'll notice the "spouse" field has a JSON object with a $ref field
inside it holding the UUID of the target object.
When data is loaded KiokuDB queues up references to unloaded objects and then
loads them in order to materialize the memory resident object graph.
If you're curious about why the data is represented this way, this format is
called "JSPON", or JavaScript Persistent Object Notation
(<
http://www.jspon.org/>). When using KiokuDB::Backend::Storable the
KiokuDB::Entry and KiokuDB::Reference objects are serialized with their
storable hooks instead.
OBJECT SETS¶
More complex relationships (not necessarily 1 to 1) are usually easy to model
with Set::Object.
Let's extend the "Person" class to add such a relationship:
package Person;
has children => (
does => "KiokuDB::Set",
is => "rw",
);
KiokuDB::Set objects are KiokuDB specific wrappers for Set::Object.
my @kids = map { Person->new( name => $_ ) } qw(maggie lisa bart);
use KiokuDB::Util qw(set);
my $set = set(@kids);
$homer->children($set);
$dir->store($homer);
The "set" convenience function creates a new KiokuDB::Set::Transient
object. A transient set is one which started its life in memory space (as
opposed to a set that was loaded from the database).
The "weak_set" convenience function also exists, creating a transient
set with Set::Object::Weak used internally to help avoid circular structures
(for instance if setting a "parent" attribute in our example).
The set object behaves pretty much like a normal Set::Object:
my @kids = $dir->lookup($homer_id)->children->members;
The main difference is that sets coming from the database are deferred by
default, that is the objects in @kids are not loaded until they are actually
needed.
This allows large object graphs to exist in the database, while only being
partially loaded, without breaking the encapsulation of user objects. This
behavior is implemented in KiokuDB::Set::Deferred and KiokuDB::Set::Loaded.
This set object is optimized to make most operations defer loading. For
instance, if you intersect two deferred sets, only the members of the
intersection set will need to be loaded.
THE TYPEMAP¶
Storing an object with KiokuDB involves passing it to KiokuDB::Collapser, the
object that "flattens" objects into KiokuDB::Entry before the
entries are inserted into the backend.
The collapser uses a KiokuDB::TypeMap object that tells it how objects of each
type should be collapsed.
During retrieval of objects the same typemap is used to reinflate objects back
into working objects.
Trying to store an object that is not in the typemap is an error. The reason
behind this is that it doesn't make sense to store every type of object (for
instance "DBI" handles need a socket, objects based on XS modules
have an internal pointer as an integer, whose address won't be valid the next
time it's loaded), and even though the majority of objects are safe to
serialize, even a small bit of unreported fragility is usually enough to
create large, hard to debug problems.
An exception to this rule is Moose based objects, because they have sufficient
meta information available through Moose's powerful reflection support in
order to be safely serialized.
Additionally, the standard backends provide a default typemap for common objects
(DateTime, Path::Class, etc), which by default is merged with any custom
typemap you pass to KiokuDB.
So, in order to actually get KiokuDB to store things like Class::Accessor based
objects, you can do something like this:
KiokuDB->new(
backend => $backend,
allow_classes => [qw(My::Object)],
);
Which is shorthand for:
my $dir = KiokuDB->new(
backend => $backend,
typemap => KiokuDB::TypeMap->new(
entries => {
"My::Object" => KiokuDB::TypeMap::Entry::Naive->new,
},
),
);
KiokuDB::TypeMap::Entry::Naive is a type map entry that performs naive
collapsing of the object, by simply walking it recursively.
When the collapser encounters an object it will ask KiokuDB::TypeMap::Resolver
for a collapsing routine based on the class of the object.
This lookup is typically performed by "ref $object", not using
inheritance, because a typemap entry that is safe to use with a superclass
isn't necessarily safe to use with a subclass. If you
do want inherited
entries, specify "isa_entries":
KiokuDB::TypeMap->new(
isa_entries => {
"My::Object" => KiokuDB::TypeMap::Entry::Naive->new,
},
);
If no normal ("ref" keyed) entry is found for an object, the isa
entries are searched for a superclass of that object. Subclass entries are
tried before superclass entries. The result of this lookup is cached, so it
only happens once per class.
Typemap Entries¶
If you want to do custom serialization hooks, you can specify hooks to collapse
your object:
KiokuDB::TypeMap::Entry::Callback->new(
collapse => sub {
my $object = shift;
...
return @some_args;
},
expand => sub {
my ( $class, @some_args ) = @_;
...
return $object;
},
);
These hooks are called as methods on the object to be collapsed.
For instance the Path::Class related typemap ISA entry is:
'Path::Class::Entity' => KiokuDB::TypeMap::Entry::Callback->new(
intrinsic => 1,
collapse => "stringify",
expand => "new",
);
The "intrinsic" flag is discussed in the next section.
Another option for typemap entries is KiokuDB::TypeMap::Entry::Passthrough,
which is appropriate when you know the backend's serialization can handle that
data type natively.
For example, if your object has a Storable hook which you know is appropriate
(e.g. contains no sub objects that need to be collapsible) and your backend
uses KiokuDB::Backend::Serialize::Storable. DateTime is an example of a class
with such storable hopes:
'DateTime' => KiokuDB::Backend::Entry::Passthrough->new( intrinsic => 1 )
Intrinsic vs. First Class¶
In KiokuDB every object is normally assigned an ID, and if the object is shared
by several objects this relationship will be preserved.
However, for some objects this is not the desired behavior. These are objects
that represent values, like DateTime, Path::Class entries, URI objects, etc.
KiokuDB can be asked to collapse such objects
intrinsicly, that is
instead of creating a new KiokuDB::Entry with its own ID for the object, the
object gets collapsed directly into its parent's structures.
This means that shared references that are collapsed intrinsically will be
loaded back from the database as two distinct copies, so updates to one will
not affect the other.
For instance, when we run the following code:
use Path::Class;
my $path = file(qw(path to foo));
$obj_1->file($path);
$obj_2->file($path);
$dir->store( $obj_1, $obj_2 );
While the following is true when the data is being inserted, it will no longer
be true when $obj_1 and $obj_2 are loaded from the database:
refaddr($obj_1->file) == refaddr($obj_2->file)
This is because both $obj_1 and $obj_2 each got its own copy of $path.
This behavior is usually more appropriate for objects that aren't mutated, but
are instead cloned and replaced, and for which creating a first class entry in
the backend with its own ID is undesired.
The Default Typemap¶
Each backend comes with a default typemap, with some built in entries for common
CPAN modules' objects. KiokuDB::TypeMap::Default contains more details.
SIMPLE SEARCHES¶
Most backends support an inefficient but convenient simple search, which scans
the entries and matches fields.
If you want to make use of this API we suggest using KiokuDB::Backend::DBI since
simple searching is implemented using an SQL where clause, which is much more
efficient (you do have to set up the column manually though).
Calling the "search" method with a hash reference as the only argument
invokes the simple search functionality, returning a Data::Stream::Bulk with
the results:
my $stream = $dir->search({ name => "Homer Simpson" });
while ( my $block = $stream->next ) {
foreach my $object ( @$block ) {
# $object->name eq "Homer Simpson"
}
}
This exact API is intentionally still underdefined. In the future it will be
compatible with DBIx::Class 0.09's syntax.
DBI SEARCH COLUMNS¶
In order to make use of the simple search API we need to configure columns for
our DBI backend.
Let's create a 'name' column to search by:
my $dir = KiokuDB->connect(
"dbi:SQLite:dbname=foo",
columns => [
# specify extra columns for the 'entries' table
# in the same format you pass to DBIC's add_columns
name => {
data_type => "varchar",
is_nullable => 1, # probably important
},
],
);
You can either alter the schema manually, or use "kioku dump" to back
up your data, delete the database, connect with "create => 1" and
then use "kioku load".
To populate this column we'll need to load Homer and update him:
{
my $s = $dir->new_scope;
$dir->update( $dir->lookup( $homer_id ) );
}
And this is what it looks in the database:
id = 201C5B55-E759-492F-8F20-A529C7C02C8B
name = Homer Simpson
GETTING STARTED WITH BDB¶
The most mature backend for KiokuDB is KiokuDB::Backend::BDB. It performs very
well, and supports many features, like Search::GIN integration to provide
customized indexing of your objects and transactions.
KiokuDB::Backend::DBI is newer and not as tested, but also supports transactions
and Search::GIN based queries. It performs quite well too, but isn't as fast
as KiokuDB::Backend::BDB.
Installing KiokuDB::Backend::BDB¶
KiokuDB::Backend::BDB needs the BerkeleyDB module, and a recent version of
Berkeley DB itself, which can be found here:
http://www.oracle.com/technology/software/products/berkeley-db/db/index.html
<
http://www.oracle.com/technology/software/products/berkeley-db/db/index.html>.
BerkeleyDB (the library) normally installs into
"/usr/local/BerkeleyDB.4.7", while BerkeleyDB (the module) looks for
it in "/usr/local/BerkeleyDB", so adding a symbolic link should make
installation easy.
Once you have BerkeleyDB installed, KiokuDB::Backend::BDB should install without
problem and you can use it with KiokuDB.
Using KiokuDB::Backend::BDB¶
To use the BDB backend we must first create the storage. To do this the
"create" flag must be passed:
my $backend = KiokuDB::Backend::BDB->new(
manager => {
home => Path::Class::Dir->new(qw(path to storage)),
create => 1,
},
);
The BDB backend uses BerkeleyDB::Manager to do a lot of the BerkeleyDB
gruntwork. The BerkeleyDB::Manager object will be instantiated using the
arguments provided in the "manager" attribute.
Now that the storage is created we can make use of this backend, much like
before:
my $dir = KiokuDB->new( backend => $backend );
Subsequent opens will not require the "create" argument to be true,
but it doesn't hurt.
This "connect" call is equivalent to the above:
my $dir = KiokuDB->connect( "bdb:dir=path/to/storage", create => 1 );
TRANSACTIONS¶
Some backends (ones which do the KiokuDB::Backend::Role::TXN role) can be used
with transactions.
If you are familiar with DBIx::Class this should be very familiar:
$dir->txn_do(sub {
$dir->store($obj);
});
This will create a BerkeleyDB level transaction, and all changes to the database
are committed if the block was executed cleanly.
If any error occurred the transaction will be rolled back, and the changes will
not be visible to subsequent reads.
Note that KiokuDB does
not touch live instances, so if you do something
like
$dir->txn_do(sub {
my $scope = $dir->new_scope;
$obj->name("Dancing Hippy");
$dir->store($obj);
die "an error";
});
the "name" attribute is
not rolled back, it is simply the
"store" operation that gets reverted.
Transactions will nest properly, and with most backends they generally increase
write performance as well.
QUERIES¶
KiokuDB::Backend::BDB::GIN is a subclass of KiokuDB::Backend::BDB that provides
Search::GIN integration.
Search::GIN is a framework to index and query objects, inspired by Postgres'
internal GIN api. GIN stands for Generalized Inverted Indexes.
Using Search::GIN arbitrary search keys can be indexed for your objects, and
these objects can then be looked up using queries.
For instance, one of the pre canned searches Search::GIN supports out of the box
is class indexing. Let's use Search::GIN::Extract::Callback to do custom
indexing of our objects:
my $dir = KiokuDB->new(
backend => KiokuDB::Backend::BDB::GIN->new(
extract => Search::GIN::Extract::Callback->new(
extract => sub {
my ( $obj, $extractor, @args ) = @_;
if ( $obj->isa("Person") ) {
return {
type => "user",
name => $obj->name,
};
}
return;
},
),
),
);
$dir->store( @random_objects );
To look up the objects, we use the a manual key lookup query:
my $query = Search::GIN::Query::Manual->new(
values => {
type => "person",
},
);
my $stream = $dir->search($query);
The result is Data::Stream::Bulk object that represents the search results. It
can be iterated as follows:
while ( my $block = $stream->next ) {
foreach my $person ( @$block ) {
print "found a person: ", $person->name;
}
}
Or even more simply, if you don't mind loading the whole resultset into memory:
my @people = $stream->all;
Search::GIN is very much in its infancy, and is very under documented. However
it does work for simple searches such as this and contains pre canned
solutions like Search::GIN::Extract::Class.
In short, it works today, but watch this space for new developments.