NAME¶
DBI::Profile - Performance profiling and benchmarking for the DBI
SYNOPSIS¶
The easiest way to enable DBI profiling is to set the DBI_PROFILE environment
variable to 2 and then run your code as usual:
DBI_PROFILE=2 prog.pl
This will profile your program and then output a textual summary grouped by
query when the program exits. You can also enable profiling by setting the
Profile attribute of any DBI handle:
$dbh->{Profile} = 2;
Then the summary will be printed when the handle is destroyed.
Many other values apart from are possible - see "ENABLING A PROFILE"
below.
DESCRIPTION¶
The DBI::Profile module provides a simple interface to collect and report
performance and benchmarking data from the DBI.
For a more elaborate interface, suitable for larger programs, see
DBI::ProfileDumper and dbiprof. For Apache/mod_perl applications see
DBI::ProfileDumper::Apache.
OVERVIEW¶
Performance data collection for the DBI is built around several concepts which
are important to understand clearly.
- Method Dispatch
- Every method call on a DBI handle passes through a single 'dispatch'
function which manages all the common aspects of DBI method calls, such as
handling the RaiseError attribute.
- Data Collection
- If profiling is enabled for a handle then the dispatch code takes a
high-resolution timestamp soon after it is entered. Then, after calling
the appropriate method and just before returning, it takes another
high-resolution timestamp and calls a function to record the information.
That function is passed the two timestamps plus the DBI handle and the
name of the method that was called. That data about a single DBI method
call is called a profile sample.
- Data Filtering
- If the method call was invoked by the DBI or by a driver then the call is
ignored for profiling because the time spent will be accounted for by the
original 'outermost' call for your code.
For example, the calls that the selectrow_arrayref() method makes to
prepare() and execute() etc. are not counted individually
because the time spent in those methods is going to be allocated to the
selectrow_arrayref() method when it returns. If this was not done
then it would be very easy to double count time spent inside the DBI.
- Data Storage Tree
- The profile data is accumulated as 'leaves on a tree'. The 'path' through
the branches of the tree to a particular leaf is determined dynamically
for each sample. This is a key feature of DBI profiling.
For each profiled method call the DBI walks along the Path and uses each
value in the Path to step into and grow the Data tree.
For example, if the Path is
[ 'foo', 'bar', 'baz' ]
then the new profile sample data will be merged into the tree at
$h->{Profile}->{Data}->{foo}->{bar}->{baz}
But it's not very useful to merge all the call data into one leaf node
(except to get an overall 'time spent inside the DBI' total). It's more
common to want the Path to include dynamic values such as the current
statement text and/or the name of the method called to show what the time
spent inside the DBI was for.
The Path can contain some 'magic cookie' values that are automatically
replaced by corresponding dynamic values when they're used. These magic
cookies always start with a punctuation character.
For example a value of '"!MethodName"' in the Path causes the
corresponding entry in the Data to be the name of the method that was
called. For example, if the Path was:
[ 'foo', '!MethodName', 'bar' ]
and the selectall_arrayref() method was called, then the profile
sample data for that call will be merged into the tree at:
$h->{Profile}->{Data}->{foo}->{selectall_arrayref}->{bar}
- Profile Data
- Profile data is stored at the 'leaves' of the tree as references to an
array of numeric values. For example:
[
106, # 0: count of samples at this node
0.0312958955764771, # 1: total duration
0.000490069389343262, # 2: first duration
0.000176072120666504, # 3: shortest duration
0.00140702724456787, # 4: longest duration
1023115819.83019, # 5: time of first sample
1023115819.86576, # 6: time of last sample
]
After the first sample, later samples always update elements 0, 1, and 6,
and may update 3 or 4 depending on the duration of the sampled call.
ENABLING A PROFILE¶
Profiling is enabled for a handle by assigning to the Profile attribute. For
example:
$h->{Profile} = DBI::Profile->new();
The Profile attribute holds a blessed reference to a hash object that contains
the profile data and attributes relating to it.
The class the Profile object is blessed into is expected to provide at least a
DESTROY method which will dump the profile data to the DBI trace file handle
(STDERR by default).
All these examples have the same effect as each other:
$h->{Profile} = 0;
$h->{Profile} = "/DBI::Profile";
$h->{Profile} = DBI::Profile->new();
$h->{Profile} = {};
$h->{Profile} = { Path => [] };
Similarly, these examples have the same effect as each other:
$h->{Profile} = 6;
$h->{Profile} = "6/DBI::Profile";
$h->{Profile} = "!Statement:!MethodName/DBI::Profile";
$h->{Profile} = { Path => [ '!Statement', '!MethodName' ] };
If a non-blessed hash reference is given then the DBI::Profile module is
automatically "require"'d and the reference is blessed into that
class.
If a string is given then it is processed like this:
($path, $module, $args) = split /\//, $string, 3
@path = split /:/, $path
@args = split /:/, $args
eval "require $module" if $module
$module ||= "DBI::Profile"
$module->new( Path => \@Path, @args )
So the first value is used to select the Path to be used (see below). The second
value, if present, is used as the name of a module which will be loaded and
it's "new" method called. If not present it defaults to
DBI::Profile. Any other values are passed as arguments to the "new"
method. For example: ""2/DBIx::OtherProfile/Foo:42"".
Numbers can be used as a shorthand way to enable common Path values. The
simplest way to explain how the values are interpreted is to show the code:
push @Path, "DBI" if $path_elem & 0x01;
push @Path, "!Statement" if $path_elem & 0x02;
push @Path, "!MethodName" if $path_elem & 0x04;
push @Path, "!MethodClass" if $path_elem & 0x08;
push @Path, "!Caller2" if $path_elem & 0x10;
So "2" is the same as "!Statement" and "6" (2+4)
is the same as "!Statement:!Method". Those are the two most commonly
used values. Using a negative number will reverse the path. Thus
"-6" will group by method name then statement.
The splitting and parsing of string values assigned to the Profile attribute may
seem a little odd, but there's a good reason for it. Remember that attributes
can be embedded in the Data Source Name string which can be passed in to a
script as a parameter. For example:
dbi:DriverName(Profile=>2):dbname
dbi:DriverName(Profile=>{Username}:!Statement/MyProfiler/Foo:42):dbname
And also, if the "DBI_PROFILE" environment variable is set then The
DBI arranges for every driver handle to share the same profile object. When
perl exits a single profile summary will be generated that reflects (as nearly
as practical) the total use of the DBI by the application.
THE PROFILE OBJECT¶
The DBI core expects the Profile attribute value to be a hash reference and if
the following values don't exist it will create them as needed:
Data¶
A reference to a hash containing the collected profile data.
Path¶
The Path value is a reference to an array. Each element controls the value to
use at the corresponding level of the profile Data tree.
If the value of Path is anything other than an array reference, it is treated as
if it was:
[ '!Statement' ]
The elements of Path array can be one of the following types:
Special Constant
!Statement
Use the current Statement text. Typically that's the value of the Statement
attribute for the handle the method was called with. Some methods, like
commit() and
rollback(), are unrelated to a particular
statement. For those methods !Statement records an empty string.
For statement handles this is always simply the string that was given to
prepare() when the handle was created. For database handles this is the
statement that was last prepared or executed on that database handle. That can
lead to a little 'fuzzyness' because, for example, calls to the
quote()
method to build a new statement will typically be associated with the previous
statement. In practice this isn't a significant issue and the dynamic Path
mechanism can be used to setup your own rules.
!MethodName
Use the name of the DBI method that the profile sample relates to.
!MethodClass
Use the fully qualified name of the DBI method, including the package, that the
profile sample relates to. This shows you where the method was implemented.
For example:
'DBD::_::db::selectrow_arrayref' =>
0.022902s
'DBD::mysql::db::selectrow_arrayref' =>
2.244521s / 99 = 0.022445s avg (first 0.022813s, min 0.022051s, max 0.028932s)
The "DBD::_::db::selectrow_arrayref" shows that the driver has
inherited the selectrow_arrayref method provided by the DBI.
But you'll note that there is only one call to DBD::_::db::selectrow_arrayref
but another 99 to DBD::mysql::db::selectrow_arrayref. Currently the first call
doesn't record the true location. That may change.
!Caller
Use a string showing the filename and line number of the code calling the
method.
!Caller2
Use a string showing the filename and line number of the code calling the
method, as for !Caller, but also include filename and line number of the code
that called that. Calls from DBI:: and DBD:: packages are skipped.
!File
Same as !Caller above except that only the filename is included, not the line
number.
!File2
Same as !Caller2 above except that only the filenames are included, not the line
number.
!Time
Use the current value of
time(). Rarely used. See the more useful
"!Time~N" below.
!Time~N
Where "N" is an integer. Use the current value of
time() but
with reduced precision. The value used is determined in this way:
int( time() / N ) * N
This is a useful way to segregate a profile into time slots. For example:
[ '!Time~60', '!Statement' ]
Code Reference
The subroutine is passed the handle it was called on and the DBI method name.
The current Statement is in $_. The statement string should not be modified,
so most subs start with "local $_ = $_;".
The list of values it returns is used at that point in the Profile Path.
The sub can 'veto' (reject) a profile sample by including a reference to undef
in the returned list. That can be useful when you want to only profile
statements that match a certain pattern, or only profile certain methods.
Subroutine Specifier
A Path element that begins with '"&"' is treated as the name of a
subroutine in the DBI::ProfileSubs namespace and replaced with the
corresponding code reference.
Currently this only works when the Path is specified by the
"DBI_PROFILE" environment variable.
Also, currently, the only subroutine in the DBI::ProfileSubs namespace is
'&norm_std_n3'. That's a very handy subroutine when profiling code that
doesn't use placeholders. See DBI::ProfileSubs for more information.
Attribute Specifier
A string enclosed in braces, such as '"{Username}"', specifies that
the current value of the corresponding database handle attribute should be
used at that point in the Path.
Reference to a Scalar
Specifies that the current value of the referenced scalar be used at that point
in the Path. This provides an efficient way to get 'contextual' values into
your profile.
Other Values
Any other values are stringified and used literally.
(References, and values that begin with punctuation characters are reserved.)
REPORTING¶
The current accumulated profile data can be formatted and output using
print $h->{Profile}->format;
To discard the profile data and start collecting fresh data you can do:
$h->{Profile}->{Data} = undef;
The default results format looks like this:
DBI::Profile: 0.001015s 42.7% (5 calls) programname @ YYYY-MM-DD HH:MM:SS
'' =>
0.000024s / 2 = 0.000012s avg (first 0.000015s, min 0.000009s, max 0.000015s)
'SELECT mode,size,name FROM table' =>
0.000991s / 3 = 0.000330s avg (first 0.000678s, min 0.000009s, max 0.000678s)
Which shows the total time spent inside the DBI, with a count of the total
number of method calls and the name of the script being run, then a formatted
version of the profile data tree.
If the results are being formatted when the perl process is exiting (which is
usually the case when the DBI_PROFILE environment variable is used) then the
percentage of time the process spent inside the DBI is also shown. If the
process is not exiting then the percentage is calculated using the time
between the first and last call to the DBI.
In the example above the paths in the tree are only one level deep and use the
Statement text as the value (that's the default behaviour).
The merged profile data at the 'leaves' of the tree are presented as total time
spent, count, average time spent (which is simply total time divided by the
count), then the time spent on the first call, the time spent on the fastest
call, and finally the time spent on the slowest call.
The 'avg', 'first', 'min' and 'max' times are not particularly useful when the
profile data path only contains the statement text. Here's an extract of a
more detailed example using both statement text and method name in the path:
'SELECT mode,size,name FROM table' =>
'FETCH' =>
0.000076s
'fetchrow_hashref' =>
0.036203s / 108 = 0.000335s avg (first 0.000490s, min 0.000152s, max 0.002786s)
Here you can see the 'avg', 'first', 'min' and 'max' for the 108 calls to
fetchrow_hashref() become rather more interesting. Also the data for
FETCH just shows a time value because it was only called once.
Currently the profile data is output sorted by branch names. That may change in
a later version so the leaf nodes are sorted by total time per leaf node.
Report Destination¶
The default method of reporting is for the DESTROY method of the Profile object
to format the results and write them using:
DBI->trace_msg($results, 0); # see $ON_DESTROY_DUMP below
to write them to the DBI
trace() filehandle (which defaults to STDERR).
To direct the DBI trace filehandle to write to a file without enabling tracing
the
trace() method can be called with a trace level of 0. For example:
DBI->trace(0, $filename);
The same effect can be achieved without changing the code by setting the
"DBI_TRACE" environment variable to "0=filename".
The $DBI::Profile::ON_DESTROY_DUMP variable holds a code ref that's called to
perform the output of the formatted results. The default value is:
$ON_DESTROY_DUMP = sub { DBI->trace_msg($results, 0) };
Apart from making it easy to send the dump elsewhere, it can also be useful as a
simple way to disable dumping results.
CHILD HANDLES¶
Child handles inherit a reference to the Profile attribute value of their
parent. So if profiling is enabled for a database handle then by default the
statement handles created from it all contribute to the same merged profile
data tree.
PROFILE OBJECT METHODS¶
See "REPORTING".
as_node_path_list¶
@ary = $dbh->{Profile}->as_node_path_list();
@ary = $dbh->{Profile}->as_node_path_list($node, $path);
Returns the collected data ($dbh->{Profile}{Data}) restructured into a list
of array refs, one for each leaf node in the Data tree. This 'flat' structure
is often much simpler for applications to work with.
The first element of each array ref is a reference to the leaf node. The
remaining elements are the 'path' through the data tree to that node.
For example, given a data tree like this:
{key1a}{key2a}[node1]
{key1a}{key2b}[node2]
{key1b}{key2a}{key3a}[node3]
The
as_node_path_list() method will return this list:
[ [node1], 'key1a', 'key2a' ]
[ [node2], 'key1a', 'key2b' ]
[ [node3], 'key1b', 'key2a', 'key3a' ]
The nodes are ordered by key, depth-first.
The $node argument can be used to focus on a sub-tree. If not specified it
defaults to $dbh->{Profile}{Data}.
The $path argument can be used to specify a list of path elements that will be
added to each element of the returned list. If not specified it defaults to a
ref to an empty array.
as_text¶
@txt = $dbh->{Profile}->as_text();
$txt = $dbh->{Profile}->as_text({
node => undef,
path => [],
separator => " > ",
format => '%1$s: %11$fs / %10$d = %2$fs avg (first %12$fs, min %13$fs, max %14$fs)'."\n";
sortsub => sub { ... },
);
Returns the collected data ($dbh->{Profile}{Data}) reformatted into a list of
formatted strings. In scalar context the list is returned as a single
concatenated string.
A hashref can be used to pass in arguments, the default values are shown in the
example above.
The "node" and <path> arguments are passed to
as_node_path_list().
The "separator" argument is used to join the elements of the path for
each leaf node.
The "sortsub" argument is used to pass in a ref to a sub that will
order the list. The subroutine will be passed a reference to the array
returned by
as_node_path_list() and should sort the contents of the
array in place. The return value from the sub is ignored. For example, to sort
the nodes by the second level key you could use:
sortsub => sub { my $ary=shift; @$ary = sort { $a->[2] cmp $b->[2] } @$ary }
The "format" argument is a "sprintf" format string that
specifies the format to use for each leaf node. It uses the explicit format
parameter index mechanism to specify which of the arguments should appear
where in the string. The arguments to sprintf are:
1: path to node, joined with the separator
2: average duration (total duration/count)
(3 thru 9 are currently unused)
10: count
11: total duration
12: first duration
13: smallest duration
14: largest duration
15: time of first call
16: time of first call
CUSTOM DATA MANIPULATION¶
Recall that "$h->{Profile}->{Data}" is a reference to the
collected data. Either to a 'leaf' array (when the Path is empty, i.e.,
DBI_PROFILE env var is 1), or a reference to hash containing values that are
either further hash references or leaf array references.
Sometimes it's useful to be able to summarise some or all of the collected data.
The
dbi_profile_merge_nodes() function can be used to merge leaf node
values.
dbi_profile_merge_nodes¶
use DBI qw(dbi_profile_merge_nodes);
$time_in_dbi = dbi_profile_merge_nodes(my $totals=[], @$leaves);
Merges profile data node. Given a reference to a destination array, and zero or
more references to profile data, merges the profile data into the destination
array. For example:
$time_in_dbi = dbi_profile_merge_nodes(
my $totals=[],
[ 10, 0.51, 0.11, 0.01, 0.22, 1023110000, 1023110010 ],
[ 15, 0.42, 0.12, 0.02, 0.23, 1023110005, 1023110009 ],
);
$totals will then contain
[ 25, 0.93, 0.11, 0.01, 0.23, 1023110000, 1023110010 ]
and $time_in_dbi will be 0.93;
The second argument need not be just leaf nodes. If given a reference to a hash
then the hash is recursively searched for leaf nodes and all those found are
merged.
For example, to get the time spent 'inside' the DBI during an http request, your
logging code run at the end of the request (i.e. mod_perl LogHandler) could
use:
my $time_in_dbi = 0;
if (my $Profile = $dbh->{Profile}) { # if DBI profiling is enabled
$time_in_dbi = dbi_profile_merge_nodes(my $total=[], $Profile->{Data});
$Profile->{Data} = {}; # reset the profile data
}
If profiling has been enabled then $time_in_dbi will hold the time spent inside
the DBI for that handle (and any other handles that share the same profile
data) since the last request.
Prior to DBI 1.56 the
dbi_profile_merge_nodes() function was called
dbi_profile_merge(). That name still exists as an alias.
CUSTOM DATA COLLECTION¶
Using The Path Attribute¶
XXX example to be added later using a selectall_arrayref call
XXX nested inside a fetch loop where the first column of the
XXX outer loop is bound to the profile Path using
XXX bind_column(1, \${ $dbh->{Profile}->{Path}->[0] })
XXX so you end up with separate profiles for each loop
XXX (patches welcome to add this to the docs :)
Adding Your Own Samples¶
The
dbi_profile() function can be used to add extra sample data into the
profile data tree. For example:
use DBI;
use DBI::Profile (dbi_profile dbi_time);
my $t1 = dbi_time(); # floating point high-resolution time
... execute code you want to profile here ...
my $t2 = dbi_time();
dbi_profile($h, $statement, $method, $t1, $t2);
The $h parameter is the handle the extra profile sample should be associated
with. The $statement parameter is the string to use where the Path specifies
!Statement. If $statement is undef then $h->{Statement} will be used.
Similarly $method is the string to use if the Path specifies !MethodName.
There is no default value for $method.
The $h->{Profile}{Path} attribute is processed by
dbi_profile() in the
usual way.
The $h parameter is usually a DBI handle but it can also be a reference to a
hash, in which case the
dbi_profile() acts on each defined value in the
hash. This is an efficient way to update multiple profiles with a single
sample, and is used by the DashProfiler module.
SUBCLASSING¶
Alternate profile modules must subclass DBI::Profile to help ensure they work
with future versions of the DBI.
CAVEATS¶
Applications which generate many different statement strings (typically because
they don't use placeholders) and profile with !Statement in the Path (the
default) will consume memory in the Profile Data structure for each statement.
Use a code ref in the Path to return an edited (simplified) form of the
statement.
If a method throws an exception itself (not via RaiseError) then it won't be
counted in the profile.
If a HandleError subroutine throws an exception (rather than returning 0 and
letting RaiseError do it) then the method call won't be counted in the
profile.
Time spent in DESTROY is added to the profile of the parent handle.
Time spent in DBI->*() methods is not counted. The time spent in the driver
connect method, $drh->
connect(), when it's called by
DBI->connect is counted if the DBI_PROFILE environment variable is set.
Time spent fetching tied variables, $DBI::errstr, is counted.
Time spent in FETCH for $h->{Profile} is not counted, so getting the profile
data doesn't alter it.
DBI::PurePerl does not support profiling (though it could in theory).
For asynchronous queries, time spent while the query is running on the backend
is not counted.
A few platforms don't support the
gettimeofday() high resolution time
function used by the DBI (and available via the
dbi_time() function).
In which case you'll get integer resolution time which is mostly useless.
On Windows platforms the
dbi_time() function is limited to millisecond
resolution. Which isn't sufficiently fine for our needs, but still much better
than integer resolution. This limited resolution means that fast method calls
will often register as taking 0 time. And timings in general will have much
more 'jitter' depending on where within the 'current millisecond' the start
and end timing was taken.
This documentation could be more clear. Probably needs to be reordered to start
with several examples and build from there. Trying to explain the concepts
first seems painful and to lead to just as many forward references. (Patches
welcome!)