NAME¶
DBIx::Class::Storage::DBI::Replicated::Introduction - Minimum Need to Know
SYNOPSIS¶
This is an introductory document for DBIx::Class::Storage::Replication.
This document is not an overview of what replication is or why you should be
using it. It is not a document explaining how to setup MySQL native
replication either. Copious external resources are available for both. This
document presumes you have the basics down.
DESCRIPTION¶
DBIx::Class supports a framework for using database replication. This system is
integrated completely, which means once it's setup you should be able to
automatically just start using a replication cluster without additional work
or changes to your code. Some caveats apply, primarily related to the proper
use of transactions (you are wrapping all your database modifying statements
inside a transaction, right ;) ) however in our experience properly written
DBIC will work transparently with Replicated storage.
Currently we have support for MySQL native replication, which is relatively easy
to install and configure. We also currently support single master to one or
more replicants (also called 'slaves' in some documentation). However the
framework is not specifically tied to the MySQL framework and supporting other
replication systems or topographies should be possible. Please bring your
patches and ideas to the #dbix-class IRC channel or the mailing list.
For an easy way to start playing with MySQL native replication, see:
MySQL::Sandbox.
If you are using this with a Catalyst based application, you may also want to
see more recent updates to Catalyst::Model::DBIC::Schema, which has support
for replication configuration options as well.
REPLICATED STORAGE¶
By default, when you start DBIx::Class, your Schema (DBIx::Class::Schema) is
assigned a storage_type, which when fully connected will reflect your
underlying storage engine as defined by your chosen database driver. For
example, if you connect to a MySQL database, your storage_type will be
DBIx::Class::Storage::DBI::mysql Your storage type class will contain database
specific code to help smooth over the differences between databases and let
DBIx::Class do its thing.
If you want to use replication, you will override this setting so that the
replicated storage engine will 'wrap' your underlying storages and present a
unified interface to the end programmer. This wrapper storage class will
delegate method calls to either a master database or one or more replicated
databases based on if they are read only (by default sent to the replicants)
or write (reserved for the master). Additionally, the Replicated storage will
monitor the health of your replicants and automatically drop them should one
exceed configurable parameters. Later, it can automatically restore a
replicant when its health is restored.
This gives you a very robust system, since you can add or drop replicants and
DBIC will automatically adjust itself accordingly.
Additionally, if you need high data integrity, such as when you are executing a
transaction, replicated storage will automatically delegate all database
traffic to the master storage. There are several ways to enable this high
integrity mode, but wrapping your statements inside a transaction is the easy
and canonical option.
PARTS OF REPLICATED STORAGE¶
A replicated storage contains several parts. First, there is the replicated
storage itself (DBIx::Class::Storage::DBI::Replicated). A replicated storage
takes a pool of replicants (DBIx::Class::Storage::DBI::Replicated::Pool) and a
software balancer (DBIx::Class::Storage::DBI::Replicated::Balancer). The
balancer does the job of splitting up all the read traffic amongst the
replicants in the Pool. Currently there are two types of balancers, a Random
one which chooses a Replicant in the Pool using a naive randomizer algorithm,
and a First replicant, which just uses the first one in the Pool (and
obviously is only of value when you have a single replicant).
REPLICATED STORAGE CONFIGURATION¶
All the parts of replication can be altered dynamically at runtime, which makes
it possibly to create a system that automatically scales under load by
creating more replicants as needed, perhaps using a cloud system such as
Amazon EC2. However, for common use you can setup your replicated storage to
be enabled at the time you connect the databases. The following is a breakdown
of how you may wish to do this. Again, if you are using Catalyst, I strongly
recommend you use (or upgrade to) the latest Catalyst::Model::DBIC::Schema,
which makes this job even easier.
First, you need to get a $schema object and set the storage_type:
my $schema = MyApp::Schema->clone;
$schema->storage_type([
'::DBI::Replicated' => {
balancer_type => '::Random',
balancer_args => {
auto_validate_every => 5,
master_read_weight => 1
},
pool_args => {
maximum_lag =>2,
},
}
]);
Then, you need to connect your DBIx::Class::Schema.
$schema->connection($dsn, $user, $pass);
Let's break down the settings. The method "storage_type" in
DBIx::Class::Schema takes one mandatory parameter, a scalar value, and an
option second value which is a Hash Reference of configuration options for
that storage. In this case, we are setting the Replicated storage type using
'::DBI::Replicated' as the first value. You will only use a different value if
you are subclassing the replicated storage, so for now just copy that first
parameter.
The second parameter contains a hash reference of stuff that gets passed to the
replicated storage. "balancer_type" in
DBIx::Class::Storage::DBI::Replicated is the type of software load balancer
you will use to split up traffic among all your replicants. Right now we have
two options, "::Random" and "::First". You can review
documentation for both at:
DBIx::Class::Storage::DBI::Replicated::Balancer::First,
DBIx::Class::Storage::DBI::Replicated::Balancer::Random.
In this case we will have three replicants, so the ::Random option is the only
one that makes sense.
'balancer_args' get passed to the balancer when it's instantiated. All balancers
have the 'auto_validate_every' option. This is the number of seconds we allow
to pass between validation checks on a load balanced replicant. So the higher
the number, the more possibility that your reads to the replicant may be
inconsistent with what's on the master. Setting this number too low will
result in increased database loads, so choose a number with care. Our
experience is that setting the number around 5 seconds results in a good
performance / integrity balance.
'master_read_weight' is an option associated with the ::Random balancer. It
allows you to let the master be read from. I usually leave this off (default
is off).
The 'pool_args' are configuration options associated with the replicant pool.
This object (DBIx::Class::Storage::DBI::Replicated::Pool) manages all the
declared replicants. 'maximum_lag' is the number of seconds a replicant is
allowed to lag behind the master before being temporarily removed from the
pool. Keep in mind that the Balancer option 'auto_validate_every' determines
how often a replicant is tested against this condition, so the true possible
lag can be higher than the number you set. The default is zero.
No matter how low you set the maximum_lag or the auto_validate_every settings,
there is always the chance that your replicants will lag a bit behind the
master for the supported replication system built into MySQL. You can ensure
reliable reads by using a transaction, which will force both read and write
activity to the master, however this will increase the load on your master
database.
After you've configured the replicated storage, you need to add the connection
information for the replicants:
$schema->storage->connect_replicants(
[$dsn1, $user, $pass, \%opts],
[$dsn2, $user, $pass, \%opts],
[$dsn3, $user, $pass, \%opts],
);
These replicants should be configured as slaves to the master using the
instructions for MySQL native replication, or if you are just learning, you
will find MySQL::Sandbox an easy way to set up a replication cluster.
And now your $schema object is properly configured! Enjoy!
AUTHOR¶
John Napiorkowski <jjnapiork@cpan.org>
LICENSE¶
You may distribute this code under the same terms as Perl itself.