Skip to main content
  1. About
  2. For Teams
Asked
Modified 10 days ago
Viewed 93 times
2

I have a very simple table for a task queue and keep taking records out one by one. If I add the filter_selector after filling/resetting the table, postgresql will use it and run super fast. But if I reset the table by calling UPDATE "public"."testqueue" T SET "status"=0 then it will start using the primary key which is 10 times slower.

How can I make sure postgresql keeps using the filter_selector?

Table:

CREATE TABLE PUBLIC."testqueue" (
    "id" BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY,
    "guid" "uuid" NOT NULL DEFAULT GEN_RANDOM_UUID (),
    "created" TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW(),
    "status" SMALLINT NOT NULL DEFAULT 0,
    "notbefore" TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW(),
    "task" CHARACTER VARYING(255) NOT NULL,
    "data" "text" NULL,
    "completed" TIMESTAMP WITHOUT TIME ZONE NULL,
    PRIMARY KEY (ID)
)


INSERT INTO "public"."testqueue" ("task") SELECT 'DoSomething' FROM GENERATE_SERIES(1, 1000000)

Filter index:

CREATE  INDEX "filter_selector" ON "public"."testqueue" ("id") INCLUDE ("status","notbefore") WHERE "status"=0

Explain before updating every status to 0, after ~100000 runs:

Update on public.testqueue t  (cost=0.98..9.02 rows=1 width=40) (actual time=0.063..0.065 rows=1 loops=1)
  Output: t.id
  Buffers: shared hit=16
  CTE cte
    ->  Limit  (cost=0.42..0.55 rows=1 width=14) (actual time=0.032..0.032 rows=1 loops=1)
          Output: testqueue.id, testqueue.ctid
          Buffers: shared hit=6
          ->  LockRows  (cost=0.42..129018.39 rows=999967 width=14) (actual time=0.031..0.032 rows=1 loops=1)
                Output: testqueue.id, testqueue.ctid
                Buffers: shared hit=6
                ->  Index Scan using filter_selector on public.testqueue  (cost=0.42..119018.72 rows=999967 width=14) (actual time=0.027..0.028 rows=1 loops=1)
                      Output: testqueue.id, testqueue.ctid
                      Filter: ((testqueue.status = 0) AND (testqueue.notbefore <= now()))
                      Buffers: shared hit=5
  ->  Nested Loop  (cost=0.42..8.46 rows=1 width=40) (actual time=0.047..0.047 rows=1 loops=1)
        Output: '1'::smallint, t.ctid, cte.*
        Inner Unique: true
        Buffers: shared hit=10
        ->  CTE Scan on cte  (cost=0.00..0.02 rows=1 width=40) (actual time=0.037..0.037 rows=1 loops=1)
              Output: cte.*, cte.id
              Buffers: shared hit=6
        ->  Index Scan using testqueue_pkey on public.testqueue t  (cost=0.42..8.44 rows=1 width=14) (actual time=0.008..0.008 rows=1 loops=1)
              Output: t.ctid, t.id
              Index Cond: (t.id = cte.id)
              Buffers: shared hit=4
Planning Time: 0.121 ms
Execution Time: 0.088 ms

Explain after updating every status to 0, after ~100000 runs:

Update on public.testqueue t  (cost=0.98..9.02 rows=1 width=40) (actual time=25.514..25.517 rows=1 loops=1)
  Output: t.id
  Buffers: shared hit=97078 dirtied=9
  CTE cte
    ->  Limit  (cost=0.42..0.56 rows=1 width=14) (actual time=25.478..25.479 rows=1 loops=1)
          Output: testqueue.id, testqueue.ctid
          Buffers: shared hit=97072 dirtied=9
          ->  LockRows  (cost=0.42..133659.01 rows=999967 width=14) (actual time=25.477..25.478 rows=1 loops=1)
                Output: testqueue.id, testqueue.ctid
                Buffers: shared hit=97072 dirtied=9
                ->  Index Scan using testqueue_pkey on public.testqueue  (cost=0.42..123659.34 rows=999967 width=14) (actual time=25.466..25.467 rows=1 loops=1)
                      Output: testqueue.id, testqueue.ctid
                      Filter: ((testqueue.status = 0) AND (testqueue.notbefore <= now()))
                      Rows Removed by Filter: 100038
                      Buffers: shared hit=97071 dirtied=9
  ->  Nested Loop  (cost=0.42..8.46 rows=1 width=40) (actual time=25.498..25.499 rows=1 loops=1)
        Output: '1'::smallint, t.ctid, cte.*
        Inner Unique: true
        Buffers: shared hit=97076 dirtied=9
        ->  CTE Scan on cte  (cost=0.00..0.02 rows=1 width=40) (actual time=25.487..25.488 rows=1 loops=1)
              Output: cte.*, cte.id
              Buffers: shared hit=97072 dirtied=9
        ->  Index Scan using testqueue_pkey on public.testqueue t  (cost=0.42..8.44 rows=1 width=14) (actual time=0.007..0.007 rows=1 loops=1)
              Output: t.ctid, t.id
              Index Cond: (t.id = cte.id)
              Buffers: shared hit=4
Planning:
  Buffers: shared hit=6 dirtied=1
Planning Time: 0.147 ms
Execution Time: 25.547 ms

Select:

  WITH CTE AS (
    SELECT "id" 
    FROM "public"."testqueue"
    WHERE "status"=0 
    AND "notbefore"<=NOW()
    ORDER BY "id"
    LIMIT 1
    FOR UPDATE SKIP LOCKED
  )

  UPDATE "public"."testqueue" T 
  SET "status"=1
  FROM CTE
  WHERE CTE."id"=T."id"
  RETURNING T."id"

Update on comments:

I have included the entire explain with verbose, buffer and settings checked. I have changed the guid back to id. That was an attempt to trigger postgresql into not using the pkey, but it didn't work. The testqueue6 was a simple token I forgot to remove because I have been running multiple test.

Changing SELECT * to SELECT "id" in the CTE doesn't change anything.

The whole thing is a follow up on the same question on SQL server which included a tip about the filtered index. The index does work very well and does not seem to be the problem on it's own. It's just that postgresql doesn't use it all the time.

When using the filtered index it can handle roughly 2500 tasks per second no matter how many have been handled. With the primary key it slows down over time, starting at 800 a second and ending on about 10 per second when the list gets used more.

This gets reflected in the EXPLAIN as well. When using the filter_selector it always says Rows Removed by Filter: 0. But when using the primary key it changes to Rows Removed by Filter: 100009 when running it over 100000 times.

10
  • Could you please share the complete SELECT statement and the complete result from explain(analyze, verbose, buffers, settings) for this statement? All in plain text, as an update to your question.
    Frank Heikens
    –  Frank Heikens
    2025-10-01 19:19:38 +00:00
    Commented Oct 1 at 19:19
  • To include the status, where every status must be 0, doesn't make sense to me. Your example creates a 39MB size index; without the "status" column, it's 30MB. When moving 'notbefore' out of the INCLUDE and into the index, the size remains the same, but you now have an indexed column. It boils down to your index strategy.
    Frank Heikens
    –  Frank Heikens
    2025-10-01 19:28:34 +00:00
    Commented Oct 1 at 19:28
  • sorry, forgot the query. I'm on my phone now and can't share the explain. I can do that first thing tomorrow.
    Hugo Delsing
    –  Hugo Delsing
    2025-10-01 20:47:34 +00:00
    Commented Oct 1 at 20:47
  • 1) In Index Scan using testqueue6_pkey on testqueue as testqueue6 where is testqueue6 coming from? Are you using some partitioning scheme? 2) In questions relating to software, always include the software version, the OS and the OS version. Also where you got the software from or where you are running it for SaaS versions. Add that information as text update to question text.
    Adrian Klaver
    –  Adrian Klaver
    2025-10-01 21:13:16 +00:00
    Commented Oct 1 at 21:13
  • 1
    Why SELECT * when you only need the guid? And why is there no index on the guid? That would speed up the UPDATE
    Frank Heikens
    –  Frank Heikens
    2025-10-01 23:21:22 +00:00
    Commented Oct 1 at 23:21

4 Answers 4

2

There are two deficiencies with your index definition:

  • Since notbefore is not an index key, but only in the INCLUDE clause, PostgreSQL doesn't use the index to filter on that column. You can tell, because the condition testqueue.notbefore <= now() occurs in the Filter, not in the Index Cond in the execution plan. Actually, there is no Index Cond in the execution plan, because PostgreSQL reads the whole index.

  • You added status to the INCLUDE clause, which is just useless ballast, since that will be 0 for all indexed rows.

Reading the whole partial index makes sense as long as only few rows are indexed, so that reading the partial index is cheaper than reading the primary key index. After the UPDATE, all rows have a status of 0, and the primary key index is smaller and better.

You can make PostgreSQL use your index if you define it so that it is actually useful for the query:

CREATE INDEX ON public.testqueue (id, notbefore) WHERE status = 0;
Sign up to request clarification or add additional context in comments.

8 Comments

This index shows the same speed as my index but has another quirk: It does start using it after resetting all status to 0, but after about 70.000 runs it switches back to the primary key. So basically the same problem, that It will not use the best index for the job consistently.
That's fine. By that time, your table is probably so bloated that it makes no difference.
Hate to disagree, but inconsistency is not fine. The table is exactly as it should be and 70k actions is nothing. Dropping from 2500 to 50 per second without changing anything will make a lot of difference and is not something I can just ignore.
I cannot tell what exactly you mean by "70000 runs". You didn't show an execution plan from after that either. I don't know if you made sure that the table is vacuumed often enough or if you took care to get HOT updates. So I cannot comment on that. Inconsistency would not be good, but choosing a different plan does not make your results inconsistent. As long as the execution time is good, it doesn't matter which plan is chosen.
It's a task queue and a process is taking out the next action to take with this query. It will basically run in a loop forever because actions are added constantly. For testing I added a million and "70000 runs" was taking the first 70000 actions from the queue with this query. Nothing happened in between, other then running a loop to take actions. The problem was the execution time isn't good with the primary key.
Just to make sure it can't fall back to the primary key I have added an extra column to the table called idsort and also set it as an identity. When adding records both id and idsort have the exact same value. Then I used your INDEX on (idsort, notbefore) and changed the CTE to ORDER BY "idsort". I now have 100% use of the index which was my goal. Do you see any downside to using a second identity column? Besides the obvious that I can change the values manually.
The down side is that you a) have an extra unnecessary column b) need to access another sequence. But if plan stability makes you happy, go for it.
I don't really care which plan it uses but I do care about execution time. A drop from ~0.09 ms to ~25.5 ms is huge in a loop. Thank you for all the valuable information. Your index makes more sense and you explained it will.
2

Lower your table and index fillfactor and/or vacuum more often.

In MVCC an update actually deletes the old row version and inserts a new one, so at the default 100% table compaction the db needs to write the new thing far away from the old. Index compaction is at 90%, so there is 10% spare room to accommodate same-page updates, but since your workflow aims to keep updating everything, probably you're quickly running out of that, too. Declustered/bloated objects become harder to work with.

CREATE INDEX ON public.testqueue (id, notbefore)
WHERE status = 0
WITH (fillfactor=50);

That might be overshot or undershot but I'm not sure how your other conditions affect the operation (id vs notbefore correlation, other things you may plan to do), so you need to test yourself. If you can squeeze a vacuum analyze and a reindex every N runs, you can keep it more compacted. Note that this about doubles the storage taken up by the object, so it takes longer to scan as well but becomes more resistant to bloating. Scan times start off slower but remain more stable over time.

Another thing to consider is also clustering the table against this index and re-clustering every N jobs. For single-row updates it won't speed up lookups or jumps to heap but keeping it clustered means pages get consumed and subsequently freed in sequential chunks, correlated with your access pattern.

If you can split the table into partitions, you can keep refreshing single partitions without affecting your workflow because your skip locked will skip over the rows of a partition that's temporarily under maintenance.

Comments

1

The difference in performance is most likely caused by PostgreSQL’s default cost settings, especially random_page_cost, which is set to 4.0 by default. This value assumes spinning disks and causes the planner to favor smaller indexes like the primary key, even when they result in significantly worse performance.

In your case, PostgreSQL is choosing the primary key index (testqueue_pkey) because it’s smaller, but it’s scanning and filtering over 100,000 rows to find just one — which is much slower than necessary.

Here’s the key insight: Even though your custom index (e.g. filter_selector) is significantly faster in practice, PostgreSQL estimates its cost as only slightly lower than the primary key scan:

  • Partial index plan: (cost=0.42..119018.72 rows=999967)
  • Primary key plan: (cost=0.42..123659.34 rows=999967)

As mentioned before, you need a different index.

CREATE INDEX CONCURRENTLY idx_testqueue_ready
ON testqueue (id, notbefore) -- different column order
WHERE status = 0;

After this, you have to tune the cost parameters. I would start with this:

SET random_page_cost = 1.1;
SET cpu_tuple_cost = 0.003;
SET cpu_index_tuple_cost = 0.0025;

Check the query plan after each change to see how this affects the plan and the costs. This helps the planner prefer indexes that avoid unnecessary filtering (CPU) and I/O. It will choose the better matching index, not just the smaller one.

When you hit the right configuration, make the change in the configuration file. (Or use ALTER SYSTEM) And don't forget to use pg_reload_conf() afterwards.

Comments

1

If it is not a strong requirement that you dequeue in the order of "id", and can rather dequeue on "notbefore" then change to ORDER BY "notbefore" and index on ("notbefore") INCLUDE ("status","id") WHERE "status"=0

If you need to dequeue in order of "id", then having "notbefore" in the key after "id" or in INCLUDE doesn't change as it cannot help to find a range of "notbefore"<=NOW(), because there is one per "id".

If there are not a lot of "notbefore"<=NOW() then better index on ("notbefore") INCLUDE ("status","id") WHERE "status"=0 to find the range, it will read all, sort by "id" and pick one. Your index on ("id") INCLUDE ("status","notbefore") WHERE "status"=0 is the correct one if you want to avoid a sort.

But in both cases, the query planner doesn't use the LIMIT 1 to lower the estimation as you see rows=999967 as if it had to read all rows before finding one. There's no way to force the estimations in PostgreSQL. And because you have two indexes on "id" it has to pick one. When all status=0 as it estimates to read all, it picks the primary key because it is smaller, but the other could be better because it covers notbefore.

Do you need to declare the PRIMARY KEY? I mean, it's GENERATED ALWAYS AS IDENTITY so no risk of duplicates. And I guess you don't have foreign keys to this queuing table. Then, just create your partial index (unique) and no risk that the query planner picks the bad one as there's only one:

CREATE UNIQUE INDEX "filter_selector" ON "public"."testqueue" ("id") INCLUDE ("status","notbefore") WHERE "status"=0

If you feel uncomfortable without primary key (which I understand) you can declare it but make sure that the index will never look cheaper than the other, by adding as many include columns to it:

  • CREATE TABLE without PRIMARY KEY
  • create UNIQUE index testqueue_pkey on ("id") INCLUDE ("status","notbefore"). Add more columns in include if you need to make it more expensive.
  • use it for the primary key: ALTER TABLE testqueue ADD CONSTRAINT testqueue_pkey PRIMARY KEY USING INDEX filter_selector;
  • create you partial index CREATE UNIQUE INDEX "filter_selector" ON "public"."testqueue" ("id") INCLUDE ("notbefore") WHERE "status"=0 I don't think you need status in include as the filter is already served by the WHERE clause.

But your trick with another column is ok. Not nice, but as PostgreSQL has no optimizer hints, we need those kinds of workarounds sometimes.

Comments

Your Answer

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.