Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 6ba581c

Browse filesBrowse files
committed
Fix more race conditions in the newly-added pg_rewind test.
pg_rewind looks at the control file to check what timeline a server is on. But promotion doesn't immediately write a checkpoint, it merely writes an end-of-recovery WAL record. If pg_rewind runs immediately after promotion, before the checkpoint has completed, it will think think that the server is still on the earlier timeline. We ran into this issue a long time ago already, see commit 484a848. It's a bit bogus that pg_rewind doesn't determine the timeline correctly until the end-of-recovery checkpoint has completed. We probably should fix that. But for now work around it by waiting for the checkpoint to complete before running pg_rewind, like we did in commit 484a848. In the passing, tidy up the new test a little bit. Rerder the INSERTs so that the comments make more sense, remove a spurious CHECKPOINT call after pg_rewind has already run, and add --debug option, so that if this fails again, we'll have more data. Per buildfarm failure at https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=rorqual&dt=2020-12-06%2018%3A32%3A19&stg=pg_rewind-check. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/1713707e-e318-761c-d287-5b6a4aa807e8@iki.fi
1 parent 0473296 commit 6ba581c
Copy full SHA for 6ba581c

File tree

Expand file treeCollapse file tree

1 file changed

+15
-7
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+15
-7
lines changed

‎src/bin/pg_rewind/t/008_min_recovery_point.pl

Copy file name to clipboardExpand all lines: src/bin/pg_rewind/t/008_min_recovery_point.pl
+15-7Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,13 @@
7575
#
7676
$node_1->stop('fast');
7777
$node_3->promote;
78+
# Force a checkpoint after the promotion. pg_rewind looks at the control
79+
# file to determine what timeline the server is on, and that isn't updated
80+
# immediately at promotion, but only at the next checkpoint. When running
81+
# pg_rewind in remote mode, it's possible that we complete the test steps
82+
# after promotion so quickly that when pg_rewind runs, the standby has not
83+
# performed a checkpoint after promotion yet.
84+
$node_3->safe_psql('postgres', "checkpoint");
7885

7986
# reconfigure node_1 as a standby following node_3
8087
my $node_3_connstr = $node_3->connstr;
@@ -99,13 +106,18 @@
99106
$node_3->wait_for_catchup('node_1', 'replay', $lsn);
100107

101108
$node_1->promote;
109+
# Force a checkpoint after promotion, like earlier.
110+
$node_1->safe_psql('postgres', "checkpoint");
102111

103112
#
104113
# We now have a split-brain with two primaries. Insert a row on both to
105114
# demonstratively create a split brain. After the rewind, we should only
106115
# see the insert on 1, as the insert on node 3 is rewound away.
107116
#
108117
$node_1->safe_psql('postgres', "INSERT INTO public.foo (t) VALUES ('keep this')");
118+
# 'bar' is unmodified in node 1, so it won't be overwritten by replaying the
119+
# WAL from node 1.
120+
$node_3->safe_psql('postgres', "INSERT INTO public.bar (t) VALUES ('rewind this')");
109121

110122
# Insert more rows in node 1, to bump up the XID counter. Otherwise, if
111123
# rewind doesn't correctly rewind the changes made on the other node,
@@ -114,10 +126,6 @@
114126
$node_1->safe_psql('postgres', "INSERT INTO public.foo (t) VALUES ('and this')");
115127
$node_1->safe_psql('postgres', "INSERT INTO public.foo (t) VALUES ('and this too')");
116128

117-
# Also insert a row in 'bar' on node 3. It is unmodified in node 1, so it won't get
118-
# overwritten by replaying the WAL from node 1.
119-
$node_3->safe_psql('postgres', "INSERT INTO public.bar (t) VALUES ('rewind this')");
120-
121129
# Wait for node 2 to catch up
122130
$node_2->poll_query_until('postgres',
123131
q|SELECT COUNT(*) > 1 FROM public.bar|, 't');
@@ -139,9 +147,10 @@
139147
[
140148
'pg_rewind',
141149
"--source-server=$node_1_connstr",
142-
"--target-pgdata=$node_2_pgdata"
150+
"--target-pgdata=$node_2_pgdata",
151+
"--debug"
143152
],
144-
'pg_rewind detects rewind needed');
153+
'run pg_rewind');
145154

146155
# Now move back postgresql.conf with old settings
147156
move(
@@ -153,7 +162,6 @@
153162
# Check contents of the test tables after rewind. The rows inserted in node 3
154163
# before rewind should've been overwritten with the data from node 1.
155164
my $result;
156-
$result = $node_2->safe_psql('postgres', 'checkpoint');
157165
$result = $node_2->safe_psql('postgres', 'SELECT * FROM public.foo');
158166
is($result, qq(keep this
159167
and this

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.