.\" Automatically generated by Pod::Man 4.10 (Pod::Simple 3.35)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "RRD_PDPCALC 1"
.TH RRD_PDPCALC 1 "2019-05-30" "1.7.1" "rrdtool"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
PDP calculation explanation \- PDP inner calculation logics with an example by Tianpeng Xia
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
This article explains how \s-1PDP\s0 are calculated in a detailed yet easy-to-understand way, with an example.
.SH "Refreshing some basics about PDP"
.IX Header "Refreshing some basics about PDP"
.SS "Fundamental knowledge"
.IX Subsection "Fundamental knowledge"
If you have not read the tutorials or man pages either on the official site or those by others, then I strongly encourage you to do so.
As said in the description, this article will only explain how a \s-1PDP\s0 is calculated, but not the definition of it.
So please read the following materials to get a basic understanding of \s-1PDP:\s0
.PP
<http://rrdtool.vandenbogaerdt.nl/process.php> \- By Alex van den Bogaerdt. This article explained \s-1PDP\s0 in a very detailed and clear way, however, it does not explain the \*(L"normalization process\*(R" in its \*(L"Normalize interval\*(R" section in the right way( as opposed to the official version I confirmed with \f(CW@oetiker\fR himself). The flaw can be easily seen in the bar charts, discussed in the \*(L"Calculation logics\*(R" section.
.PP
<https://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html> \- This one is on the official site. Actually it's the manual page for \*(L"rrdcreate\*(R", and it reveals what's under the hood with regard to \s-1PDP\s0 calculation in its \*(L"The \s-1HEARTBEAT\s0 and the \s-1STEP\*(R"\s0 section.
.PP
The text graph by Don Baarda provides a vivid explanation on how \fB\s-1UNKOWN\s0\fR data are produced and how heartbeat value can influence in the sampling. Unfortunately, it fails to give a clear method by which PDPs are calculated.
.PP
<https://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html> \- Another detailed official tutorial by Alex van den Bogaerdt. Similarly, it only provides examples with data evenly and exactly distributed according to the step set.
.PP
If you don't like doing experiments or care about the inner mechanics that much, you can just stop here and give more attention to more practical topics like graph exports or command manual. But if you are the sort of people like me who just care as much about the calculation logics, please read on.
.SH "Calculation logics"
.IX Header "Calculation logics"
Here begins the core part of this article. In the following content of this section, I would like to give two versions of calculation methods, one by Alex van den Bogaerdt and the other by \f(CW@eotiker\fR.
.PP
To provide an ASCII-friendly explanation, I will explain both versions with the char below instead of a real image.
.PP
.Vb 11
\&  |
\&  |    (v1)
\&  | _\|_\|_\|_\|_\|_\|_                        (v4)  (v5)
\&  | |     |           (v3)        _\|_\|_\|_\|_\|_\|_\|_\|_\|_\|_\|_
\&  | |     |        _\|_\|_\|_\|_\|_\|_\|_\|_\|_\|_\|_\|_\|_|     ||   |
\&  | |     |        |            ||     ||   |
\&  | |     |        |            ||     ||   |
\&  | |     |   (v2) |            ||     ||   |
\&  | |     |_\|_\|_\|_\|_\|_\|_\|_|            ||     ||   |
\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\->
\&  0 1     3        7            17     20   21
.Ve
.PP
The X axis means time slots( each second denotes one slot) and the Y axis means the value.
.PP
Let's make everything a little clearer:
.PP
\&\- The step is 5
.PP
\&\- each \s-1PDP\s0 gets updated only if a value arrives at or after the last slot of the \s-1PDP,\s0 for instance, the last slot of the \s-1PDP\s0 from 16 to 20 is 20
.PP
\&\- The heartbeat is 20, so the samples during the entire 7\-17 period is not discarded
.PP
\&\- At second 3, the first value comes in as v1, and so on
.PP
\&\- Second 0 is the origin, and it does not count as a sample
.SS "Bogaerdt version"
.IX Subsection "Bogaerdt version"
As can be seen on this page: <http://rrdtool.vandenbogaerdt.nl/process.php>, after all the primary data are transformed to rates( except for \s-1GAUGE,\s0 of course), they have to go through a \fBnormalization process\fR if they are not distributed exactly according to the step or on well-defined boundaries in time, in the words of the author.
.PP
What does that mean? Basically, if all the \fBknown\fR (as opposed to an \fBunknown\fR value) data make up at least 50% of all slots during a period, then a \s-1PDP\s0 is calculated from them.
.PP
This version seems to go well until we reach the bar chart part.
.PP
According to the \s-1ASCII\s0 bar chart, we have the following results:
.PP
From second 1 on, the \s-1PDP\s0 of each period( 1\-5,6\-10, ...) is computed by averaging all the values within it.
.PP
So:
\&\- the \s-1PDP\s0 from 1 to 5 is (v1*3+v2*2)/5
.PP
\&\- the \s-1PDP\s0 from 6 to 10 is (v2*2+v3*3)/5
.PP
\&\- the \s-1PDP\s0 from 11 to 15 is (v3*5)/5, since all the values in slots 11, 12, 13, 14 and 15 are the same, which is v3
.PP
\&\- ...
.ie n .SS "The official version( also @oetiker version):"
.el .SS "The official version( also \f(CW@oetiker\fP version):"
.IX Subsection "The official version( also @oetiker version):"
Using the same chart, this version suggests the following:
.PP
\&\- the \s-1PDP\s0 from 1 to 5 is (v1*3+v2*2)/5
.PP
\&\- the PDPs from 6 to 10 and 11 to 15 are the \fB\s-1SAME\s0\fR, which is (v2*2+v3*8)
.PP
\&\- ...
.SS "A Comparison and some explanation"
.IX Subsection "A Comparison and some explanation"
So we have seen the above two versions and their PDPs from 6 to 10 and 11 to 15 do not comply with each other.
.PP
Why is that?
.PP
Because the difference between the official version and Bogaerdt version stems from the way they do the calculation for \s-1PDP\s0(6\-10) and \s-1PDP\s0(11\-15).
.PP
Let's discuss this in more detail using the above bar chart.
.PP
\fIBogaerdt's version,\fR
.IX Subsection "Bogaerdt's version,"
.PP
PDPs are \fBalways computed individually\fR no matter how values arrive.
.PP
For example, the value at slot 17 comes after the last slot of \s-1PDP\s0(11\-15). Also, the immediate previous value before slot 17 is at 7. All the slots from 7 to 17 are assigned v3. Since each \s-1PDP\s0 is computed individually, \s-1PDP\s0(6\-10) is (v2*2+v3*3)/5 while the \s-1PDP\s0(11\-15) is (v3*5)/5.
.PP
\fIThe official version\fR
.IX Subsection "The official version"
.PP
PDPs are \fBalways computed in terms of the steps which the next update spans\fR, be it 1 step, 2 steps or n steps; in other words, PDPs may be computed \fBtogether\fR.
.PP
For example, the update at slot 17 spans \s-1PDP\s0(6\-10) and \s-1PDP\s0(11\-15) because the \fBimmediate\fR previous value is at 7 and 7 is within 6 and 10 , and 17 is after 15. \s-1PDP\s0(1\-5) and \s-1PDP\s0(16\-20) are not included since the update at slot 7 has already triggered the calculation for \s-1PDP\s0(1\-5) and the update at slot 17 comes before the last slot of \s-1PDP\s0(16\-20) which is 20.
.PP
That's the reason why \s-1PDP\s0(6\-10) and \s-1PDP\s0(11\-15) have the same value, (v2*2+v3*8).
.SH "An example"
.IX Header "An example"
If you are still confused, don't worry, an example is here to help you.
.PP
Let's get our hands dirty with some commands
.PP
.Vb 6
\& rrdtool create target.rrd \-\-start 1000000000  \-\-step 5 DS:mem:GAUGE:20:0:100 RRA:AVERAGE:0.5:1:10
\& rrdtool update target.rrd 1000000003:8 1000000006:1 1000000017:6 \e
\& 1000000020:7 1000000021:7 1000000022:4 \e
\& 1000000023:3 1000000036:1 1000000037:2 \e
\& 1000000038:3 1000000039:3 1000000042:5
\& rrdtool fetch target.rrd AVERAGE \-\-start 1000000000 \-\-end 1000000045
.Ve
.PP
Basically, the above codes contain 3 commands: create, update and fetch. First create a new rrd file, and then we feed in some data and last we fetch all the PDPs from the rrd.
.SS "Focus on single steps"
.IX Subsection "Focus on single steps"
In order to provide a detailed explanation, each the calculation process of each \s-1PDP\s0 is provided.
.PP
Below is the output of the commands above:
.PP
.Vb 10
\& 1000000005: 5.2000000000e+00
\& 1000000010: 5.5000000000e+00
\& 1000000015: 5.5000000000e+00
\& 1000000020: 6.6000000000e+00
\& 1000000025: 1.7333333333e+00
\& 1000000030: 1.7333333333e+00
\& 1000000035: 1.7333333333e+00
\& 1000000040: 2.8000000000e+00
\& 1000000045: nan
\& 1000000050: nan
.Ve
.PP
\&\s-1NOTE: 1000000005\s0 means the \s-1PDP\s0 from 1000000001 to 1000000005, and so on. For concision and readability, we use only the last two digits, so 05 denotes 1000000005. We choose the type of the data source as gauge because original values will be treated as rates, no additional transformation is needed, see this article <http://rrdtool.vandenbogaerdt.nl/process.php> for detail.
.PP
05: 5.2 = (8*3+1*2)/5
.PP
10: 5.5 = (1*1+6*9)/10
.PP
15: the same as the previous one
.PP
20: 6.6 = (6*2+7*3)/5
.PP
25: 1.73333 = (7+4+3+1*12)/15
.PP
\&...
.PP
45: nan, as the last value is at 42,which does not trigger the calculation for \s-1PDP\s0(41\-45)
.PP
50: nan, why this unknown \s-1PDP\s0 is shown is explained in this article <https://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html>
.SH "SUMMARY"
.IX Header "SUMMARY"
All that said, I hope you get a clear understanding of the inner calculation \*(L"magic\*(R" for PDPs.
.SS "Other References"
.IX Subsection "Other References"
.IP "\(bu" 4
A great PowerShell shell script for generating \s-1ASCII\s0 bar charts: <https://gallery.technet.microsoft.com/scriptcenter/Sample\-Script\-to\-Generate\-59c80d4c>
.IP "\(bu" 4
<https://stackoverflow.com/questions/18924450/rrd\-wrong\-values>