Add checkout_failure_limit config/feature #911

drdrsh · Feb 27, 2025

In a high availability deployment of PgCat, it is possible that a client may land on a container of PgCat that is very busy with clients and as such the new client might be perpetually stuck in checkout failure loop because all connections are used by other clients. This is specially true in session mode pools with long-lived client connections (e.g. FDW connections).

One way to fix this issue is to close client connections after they encounter some number of checkout failure. This will force the client to hit the Network load balancer again, land on a different process/container, try to checkout a connection on the new process/container. if it fails, it is disconnected and tries with another one.

This mechanism is guaranteed to eventually land on a balanced state where all clients are able to find connections provided that the overall number of connections across all containers matches the number of clients.

I was able to reproduce this issue in a control environment and was able to show this PR is able to fix it.

Screen captures

Perpetual state of checkout failures without the fix

Self healing

Notice that initially we were seeing checkout failures but after a few disconnections, we landed on a good state

drdrsh added 3 commits February 27, 2025 07:17

Add checkout_failure_limit config/feature

147eba5

clippy

00ac444

fmt

e299a2e

drdrsh marked this pull request as ready for review February 27, 2025 13:46

drdrsh merged commit 3349cec into main Feb 27, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add checkout_failure_limit config/feature #911

Add checkout_failure_limit config/feature #911

Uh oh!

drdrsh commented Feb 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Add checkout_failure_limit config/feature #911

Add checkout_failure_limit config/feature #911

Uh oh!

Conversation

drdrsh commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Screen captures

Perpetual state of checkout failures without the fix

Self healing

Uh oh!

Uh oh!

Uh oh!

drdrsh commented Feb 27, 2025 •

edited

Loading