.\" Man page generated from reStructuredText.
.
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.TH "PGCOPYDB FOLLOW" "1" "Nov 03, 2022" "0.10" "pgcopydb"
.SH NAME
pgcopydb follow \- pgcopydb follow
.sp
The command \fBpgcopydb follow\fP replays the database changes registered at
the source database with the logical decoding pluing \fI\%wal2json\fP into the
target database.
.sp
\fBIMPORTANT:\fP
.INDENT 0.0
.INDENT 3.5
While the \fBpgcopydb follow\fP is a full client for the logical decoding
plugin wal2json, the general use case involves using \fBpgcopydb clone
\-\-follow\fP as documented in \fI\%Change Data Capture using Postgres Logical Decoding\fP\&.
.UNINDENT
.UNINDENT
.sp
When using Logical Decoding with pgcopydb or another tool, consider making
sure you\(aqre familiar with the \fI\%Logical Replication Restrictions\fP that
apply. In particular:
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
DDL are not replicated.
.sp
When using DDL for partition scheme maintenance, such as when using the
\fI\%pg_partman\fP extension, then consider creating a week or a month of
partitions in advance, so that creating new partitions does not happen
during the migration window.
.IP \(bu 2
Sequence data is not replicated.
.sp
When using \fBpgcopydb clone \-\-follow\fP (starting with pgcopydb version
0.9) then the sequence data is synced at the end of the operation, after
the cutover point implemented via the
\fI\%pgcopydb stream sentinel set endpos\fP\&.
.sp
Updating the sequences manually is also possible by running the command
\fI\%pgcopydb copy sequences\fP\&.
.IP \(bu 2
Large Objects are not replicated.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
See the Postgres documentation page for \fI\%Logical Replication Restrictions\fP
to read the exhaustive list of restrictions.
.SH PGCOPYDB FOLLOW
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
pgcopydb follow: Replay changes from the source database to the target database
usage: pgcopydb follow  \-\-source ... \-\-target ...

  \-\-source              Postgres URI to the source database
  \-\-target              Postgres URI to the target database
  \-\-dir                 Work directory to use
  \-\-filters <filename>  Use the filters defined in <filename>
  \-\-restart             Allow restarting when temp files exist already
  \-\-resume              Allow resuming operations after a failure
  \-\-not\-consistent      Allow taking a new snapshot on the source database
  \-\-snapshot            Use snapshot obtained with pg_export_snapshot
  \-\-slot\-name           Use this Postgres replication slot name
  \-\-create\-slot         Create the replication slot
  \-\-origin              Use this Postgres replication origin node name
  \-\-endpos              Stop replaying changes when reaching this LSN
.ft P
.fi
.UNINDENT
.UNINDENT
.SH DESCRIPTION
.sp
This command runs two concurrent subprocesses.
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP 1. 3
The first one pre\-fetches the changes from the source database using
the Postgres Logical Decoding protocol and save the JSON messages in
local JSON files.
.sp
The logical decoding plugin \fI\%wal2json\fP must be available on the
source database system.
.sp
Each time a JSON file is closed, an auxilliary process is started to
transform the JSON file into a matching SQL file. This processing is
done in the background, and the main receiver process only waits for
the transformation process to be finished when there is a new JSON file
to transform.
.sp
In other words, only one such transform process can be started in the
background, and the process is blocking when a second one could get
started.
.sp
The design model here is based on the assumption that receiving the
next set of JSON messages that fills\-up a whole JSON file is going to
take more time than transforming the JSON file into an SQL file. When
that assumption proves wrong, consider opening an issue on the github
project for pgcopydb.
.IP 2. 3
The second process catches\-up with changes happening on the source
database by applying the SQL files to the target database system.
.sp
The Postgres API for \fI\%Replication Progress Tracking\fP is used in that
process so that we can skip already applied transactions at restart or
resume.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
It is possible to start the \fBpgcopydb follow\fP command and then later,
while it\(aqs still running, set the LSN for the end position with the same
effect as using the command line option \fB\-\-endpos\fP, or switch from
prefetch mode only to prefetch and catchup mode. For that, see the commands
\fI\%pgcopydb stream sentinel set endpos\fP,
\fI\%pgcopydb stream sentinel set apply\fP, and
\fI\%pgcopydb stream sentinel set prefetch\fP\&.
.sp
Note that in many case the \fB\-\-endpos\fP LSN position is not known at the
start of this command. Also before entering the \fIprefetch and apply\fP mode it
is important to make sure that the initial base copy is finished.
.sp
Finally, it is also possible to setup the streaming replication options
before using the \fBpgcopydb follow\fP command: see the
\fI\%pgcopydb stream setup\fP and \fI\%pgcopydb stream cleanup\fP commands.
.SH REPLICA IDENTITY AND LACK OF PRIMARY KEYS
.sp
Postgres Logical Decoding works with replaying changes using SQL statements,
and for that exposes the concept of \fIReplica Identity\fP as described in the
documentation for the \fI\%ALTER TABLE ... REPLICA IDENTITY\fP command.
.sp
To quote Postgres docs:
.INDENT 0.0
.INDENT 3.5
\fIThis form changes the information which is written to the write\-ahead
log to identify rows which are updated or deleted. In most cases, the old
value of each column is only logged if it differs from the new value;
however, if the old value is stored externally, it is always logged
regardless of whether it changed. This option has no effect except when
logical replication is in use.\fP
.UNINDENT
.UNINDENT
.sp
To support Change Data Capture with Postgres Logical Decoding for tables
that do not have a Primary Key, then it is necessary to use the \fBALTER
TABLE ... REPLICA IDENTITY\fP command for those tables.
.sp
In practice the two following options are to be considered:
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
REPLICA IDENTITY USING INDEX index_name
.INDENT 2.0
.INDENT 3.5
This form is prefered when a UNIQUE index exists for the table without a
primary key. The index must be unique, not partial, not deferrable, and
include only columns marked NOT NULL.
.UNINDENT
.UNINDENT
.IP \(bu 2
REPLICA IDENTITY FULL
.INDENT 2.0
.INDENT 3.5
When this is used on a table, then the WAL records contain the old
values of all columns in the row.
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SH LOGICAL DECODING PRE-FETCHING
.sp
When using \fBpgcopydb clone \-\-follow\fP a logical replication slot is created
on the source database before the initial COPY, using the same Postgres
snapshot. This ensure data consistency.
.sp
Within the \fBpgcopydb clone \-\-follow\fP approach, it is only possible to
start applying the changes from the source database after the initial COPY
has finished on the target database.
.sp
Also, from the Postgres documentation we read that \fI\%Postgres replication
slots\fP provide an automated way to ensure that the primary does not remove
WAL segments until they have been received by all standbys.
.sp
Accumulating WAL segments on the primary during the whole duration of the
initial COPY involves capacity hazards, which translate into potential \fIFile
System is Full\fP errors on the WAL disk of the source database. It is crucial
to avoid such a situation.
.sp
This is why pgcopydb implements CDC pre\-fetching. In parallel to the initial
COPY the command \fBpgcopydb clone \-\-follow\fP pre\-fetches the changes in
local JSON and SQL files. Those files are placed in the XDG_DATA_HOME
location, which could be a mount point for an infinite Blob Storage area.
.sp
The \fBpgcopydb follow\fP command is a convenience command that\(aqs available as
a logical decoding client for the wal2json plugin, and it shares the same
implementation as the \fBpgcopydb clone \-\-follow\fP command. As a result, the
pre\-fetching strategy is also relevant to the \fBpgcopydb follow\fP command.
.SH THE SENTINEL TABLE, OR THE REMOTE CONTROL
.sp
To track progress and allow resuming of operations, pgcopydb uses a sentinel
table on the source database. The sentinel table consists of a single row
with the following fields:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ pgcopydb stream sentinel get
startpos   1/8D173AF8
endpos     0/0
apply      disabled
write_lsn  0/0
flush_lsn  0/0
replay_lsn 0/0
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note that you can use the command \fBpgcopydb stream sentinel get \-\-json\fP to
fetch a JSON formatted output, such as the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{
  "startpos": "1/8D173AF8",
  "endpos": "1/8D173AF8",
  "apply": false,
  "write_lsn": "0/0",
  "flush_lsn": "0/0",
  "replay_lsn": "0/0"
}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The first three fields (startpos, endpos, apply) are specific to pgcopydb,
then the following three fields (write_lsn, flush_lsn, replay_lsn) follow
the Postgres replication protocol as visible in the docs for the
\fI\%pg_stat_replication\fP function.
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP \(bu 2
\fBstartpos\fP
.sp
The startpos field is the current LSN on the source database at the time
when the Change Data Capture is setup in pgcopydb, such as when using the
\fI\%pgcopydb stream setup\fP command.
.sp
Note that both the \fBpgcopydb follow\fP and the \fBpgcopydb clone \-\-follow\fP
command implement the setup parts if the \fBpgcopydb stream setup\fP has not
been used already.
.IP \(bu 2
\fBendpos\fP
.sp
The endpos field is last LSN position from the source database that
pgcopydb replays. The command \fBpgcopydb follow\fP (or \fBpgcopydb clone
\-\-follow\fP) stops when reaching beyond this LSN position.
.sp
The \fBendpos\fP can be set at the start of the process, which is useful
for unit testing, or while the command is running, which is useful in
production to define a cutover point.
.sp
To define the \fBendpos\fP while the command is running, use
\fI\%pgcopydb stream sentinel set endpos\fP\&.
.IP \(bu 2
\fBapply\fP
.sp
The apply field is a boolean (enabled/disabled) that control the catchup
process. The pgcopydb catchup process replays the changes only when the
apply boolean is set to true.
.sp
The \fBpgcopydb clone \-\-follow\fP command automatically enables the apply
field of the sentinel table as soon as the initial COPY is done.
.sp
To manually control the apply field, use the
\fI\%pgcopydb stream sentinel set apply\fP command.
.IP \(bu 2
\fBwrite_lsn\fP
.sp
The Postgres documentation for \fBpg_stat_replication.write_lsn\fP is:
Last write\-ahead log location written to disk by this standby server.
.sp
In the pgcopydb case, the sentinel field write_lsn is the position that
has been written to disk (as JSON) by the streaming process.
.IP \(bu 2
\fBflush_lsn\fP
.sp
The Postgres documentation for \fBpg_stat_replication.flush_lsn\fP is:
Last write\-ahead log location flushed to disk by this standby server
.sp
In the pgcopydb case, the sentinel field flush_lsn is the position that
has been written and then fsync\(aqed to disk (as JSON) by the streaming
process.
.IP \(bu 2
\fBreplay_lsn\fP
.sp
The Postgres documentation for \fBpg_stat_replication.replay_lsn\fP is:
Last write\-ahead log location replayed into the database on this standby server
.sp
In the pgcopydb case, the sentinel field replay_lsn is the position that
has been applied to the target database, as kept track from the WAL.json
and then the WAL.sql files, and using the Postgres API for \fI\%Replication
Progress Tracking\fP\&.
.sp
The replay_lsn is also shared by the pgcopydb streaming process that
uses the Postgres logical replication protocol, so the
\fI\%pg_stat_replication\fP entry associated with the replication slot used
by pgcopydb can be used to monitor replication lag.
.UNINDENT
.UNINDENT
.UNINDENT
.sp
As the pgcopydb streaming processes maintain the sentinel table on the
source database, it is also possible to use it to keep track of the logical
replication progress.
.SH OPTIONS
.sp
The following options are available to \fBpgcopydb follow\fP:
.INDENT 0.0
.TP
.B  \-\-source
Connection string to the source Postgres instance. See the Postgres
documentation for \fI\%connection strings\fP for the details. In short both
the quoted form \fB"host=... dbname=..."\fP and the URI form
\fBpostgres://user@host:5432/dbname\fP are supported.
.TP
.B  \-\-target
Connection string to the target Postgres instance.
.TP
.B  \-\-dir
During its normal operations pgcopydb creates a lot of temporary files to
track sub\-processes progress. Temporary files are created in the directory
location given by this option, or defaults to
\fB${TMPDIR}/pgcopydb\fP when the environment variable is set, or
then to \fB/tmp/pgcopydb\fP\&.
.TP
.B  \-\-restart
When running the pgcopydb command again, if the work directory already
contains information from a previous run, then the command refuses to
proceed and delete information that might be used for diagnostics and
forensics.
.sp
In that case, the \fB\-\-restart\fP option can be used to allow pgcopydb to
delete traces from a previous run.
.TP
.B  \-\-resume
When the pgcopydb command was terminated before completion, either by an
interrupt signal (such as C\-c or SIGTERM) or because it crashed, it is
possible to resume the database migration.
.sp
When resuming activity from a previous run, table data that was fully
copied over to the target server is not sent again. Table data that was
interrupted during the COPY has to be started from scratch even when using
\fB\-\-resume\fP: the COPY command in Postgres is transactional and was rolled
back.
.sp
Same reasonning applies to the CREATE INDEX commands and ALTER TABLE
commands that pgcopydb issues, those commands are skipped on a
\fB\-\-resume\fP run only if known to have run through to completion on the
previous one.
.sp
Finally, using \fB\-\-resume\fP requires the use of \fB\-\-not\-consistent\fP\&.
.TP
.B  \-\-not\-consistent
In order to be consistent, pgcopydb exports a Postgres snapshot by calling
the \fI\%pg_export_snapshot()\fP function on the source database server. The
snapshot is then re\-used in all the connections to the source database
server by using the \fBSET TRANSACTION SNAPSHOT\fP command.
.sp
Per the Postgres documentation about \fBpg_export_snapshot\fP:
.INDENT 7.0
.INDENT 3.5
Saves the transaction\(aqs current snapshot and returns a text string
identifying the snapshot. This string must be passed (outside the
database) to clients that want to import the snapshot. The snapshot is
available for import only until the end of the transaction that exported
it.
.UNINDENT
.UNINDENT
.sp
Now, when the pgcopydb process was interrupted (or crashed) on a previous
run, it is possible to resume operations, but the snapshot that was
exported does not exists anymore. The pgcopydb command can only resume
operations with a new snapshot, and thus can not ensure consistency of the
whole data set, because each run is now using their own snapshot.
.TP
.B  \-\-snapshot
Instead of exporting its own snapshot by calling the PostgreSQL function
\fBpg_export_snapshot()\fP it is possible for pgcopydb to re\-use an already
exported snapshot.
.TP
.B  \-\-slot\-name
Logical replication slot to use. At the moment pgcopydb doesn\(aqt know how
to create the logical replication slot itself. The slot should be created
within the same transaction snapshot as the initial data copy.
.sp
Must be using the \fI\%wal2json\fP output plugin, available with
format\-version 2.
.TP
.B  \-\-create\-slot
Instruct pgcopydb to create the logical replication slot to use.
.TP
.B  \-\-endpos
Logical replication target LSN to use. Automatically stop replication and
exit with normal exit status 0 when receiving reaches the specified LSN.
If there\(aqs a record with LSN exactly equal to lsn, the record will be
output.
.sp
The \fB\-\-endpos\fP option is not aware of transaction boundaries and may
truncate output partway through a transaction. Any partially output
transaction will not be consumed and will be replayed again when the slot
is next read from. Individual messages are never truncated.
.sp
See also documentation for \fI\%pg_recvlogical\fP\&.
.TP
.B  \-\-origin
Logical replication target system needs to track the transactions that
have been applied already, so that in case we get disconnected or need to
resume operations we can skip already replayed transaction.
.sp
Postgres uses a notion of an origin node name as documented in
\fI\%Replication Progress Tracking\fP\&. This option allows to pick your own
node name and defaults to "pgcopydb". Picking a different name is useful
in some advanced scenarios like migrating several sources in the same
target, where each source should have their own unique origin node name.
.TP
.B  \-\-verbose
Increase current verbosity. The default level of verbosity is INFO. In
ascending order pgcopydb knows about the following verbosity levels:
FATAL, ERROR, WARN, INFO, NOTICE, DEBUG, TRACE.
.TP
.B  \-\-debug
Set current verbosity to DEBUG level.
.TP
.B  \-\-trace
Set current verbosity to TRACE level.
.TP
.B  \-\-quiet
Set current verbosity to ERROR level.
.UNINDENT
.SH ENVIRONMENT
.sp
PGCOPYDB_SOURCE_PGURI
.INDENT 0.0
.INDENT 3.5
Connection string to the source Postgres instance. When \fB\-\-source\fP is
ommitted from the command line, then this environment variable is used.
.UNINDENT
.UNINDENT
.sp
PGCOPYDB_TARGET_PGURI
.INDENT 0.0
.INDENT 3.5
Connection string to the target Postgres instance. When \fB\-\-target\fP is
ommitted from the command line, then this environment variable is used.
.UNINDENT
.UNINDENT
.sp
PGCOPYDB_SNAPSHOT
.INDENT 0.0
.INDENT 3.5
Postgres snapshot identifier to re\-use, see also \fB\-\-snapshot\fP\&.
.UNINDENT
.UNINDENT
.sp
TMPDIR
.INDENT 0.0
.INDENT 3.5
The pgcopydb command creates all its work files and directories in
\fB${TMPDIR}/pgcopydb\fP, and defaults to \fB/tmp/pgcopydb\fP\&.
.UNINDENT
.UNINDENT
.sp
XDG_DATA_HOME
.INDENT 0.0
.INDENT 3.5
The standard \fI\%XDG Base Directory Specification\fP defines several
environment variables that allow controling where programs should store
their files.
.INDENT 0.0
.INDENT 3.5
\fIXDG_DATA_HOME defines the base directory relative to which user\-specific
data files should be stored. If $XDG_DATA_HOME is either not set or empty,
a default equal to $HOME/.local/share should be used.\fP
.UNINDENT
.UNINDENT
.sp
When using Change Data Capture (through \fB\-\-follow\fP option and Postgres
logical decoding with \fI\%wal2json\fP) then pgcopydb pre\-fetches changes in
JSON files and transform them into SQL files to apply to the target
database.
.sp
These files are stored at the following location, tried in this order:
.INDENT 0.0
.INDENT 3.5
.INDENT 0.0
.IP 1. 3
when \fB\-\-dir\fP is used, then pgcopydb uses the \fBcdc\fP subdirectory
of the \fB\-\-dir\fP location,
.IP 2. 3
when \fBXDG_DATA_HOME\fP is set in the environment, then pgcopydb uses
that location,
.IP 3. 3
when neither of the previous settings have been used then pgcopydb
defaults to using \fB${HOME}/.local/share\fP\&.
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.SH AUTHOR
Dimitri Fontaine
.SH COPYRIGHT
2022, Dimitri Fontaine
.\" Generated by docutils manpage writer.
.