Skip to main content
  1. About
  2. For Teams
Asked
Modified 4 months ago
Viewed 107 times
Part of Google Cloud Collective
0

I need to calculate few metrics for a period of days. The requirement is - For a given day, calculate the metrics based on the past 7 days. I need to output the metrics for every day for the past 2 months.

I am trying to implement a nested loop, where I am trying to calculate all days for the last 2 months, and for each day in that loop, I am trying to calculate the sum of recs loaded for the past 7 days (from that day).

I wrote the following query, but CAN'T get it to work as it is going into an infinite loop.

Can anyone please help me fix the query below ?

WITH -- not working !!!
  RECURSIVE gen_dates AS (
  SELECT
    DATE'2025-05-27' AS days
  UNION ALL
  SELECT
    days - 1
  FROM
    gen_dates
  WHERE
    days >= days - 60),
  iter_days AS (
    SELECT days AS iter 
    FROM gen_dates
    UNION ALL
    SELECT iter - 1
    FROM iter_days
    WHERE iter >= iter - 7
  ),
  get_dates AS (
  SELECT
    a.tgt_tbl,
    DATE(a.job_dtm) AS job_dtm,
    a.num_recs
  FROM
    `my_project.my_dataset.audit_tbl` a
  JOIN
    iter_days b
  ON
    DATE(a.job_dtm) = b.iter
  AND a.tgt_tbl = 'CUSTOMER_SALES'
),
calc_sum AS (
  SELECT
    tgt_tbl,
    job_dtm,
    num_recs,
    SUM(num_recs) OVER (PARTITION BY tgt_tbl ORDER BY DATE(job_dtm) DESC ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS sum_cnt
  FROM
    get_dates
)
SELECT 
    tgt_tbl,
    job_dtm,
    FORMAT_DATE('%a', job_dtm) AS dayofweek,
    num_recs,
    sum_cnt,
    ROUND(sum_cnt/6) AS avg_cnt,
    ROUND(((ROUND(sum_cnt/6) - num_recs) / ROUND(sum_cnt/6)) * 100, 2) AS pct
FROM calc_sum
ORDER BY 2 DESC;

Thanks.

4
  • I did not look at all the details of your query, but wouldn't this be a lot easier if you joined job_ctrl_tbl to itself on the ids and b.job_run_dtm Between Date_Add(a.job_run_dtm,Interval -6 DAY) And a.job_run_dtm Then you can just sum up the table b values
    Chris Maurer
    –  Chris Maurer
    2025-05-28 15:16:31 +00:00
    Commented May 28 at 15:16
  • You can use format_date(%a, <date) to get the short name for a day, no need to hard code it like you have.
    Andrew
    –  Andrew
    2025-05-28 15:31:18 +00:00
    Commented May 28 at 15:31
  • @Chris Maurer - thanks for your reply.. i will definitely try that out.. Meanwhile, do u have any suggestions on how to implement the same logic (going back 6 days) for every day in the past 2 months ? I want my output to have all 60 days with the metrics calculated along with them.
    marie20
    –  marie20
    2025-05-28 15:32:42 +00:00
    Commented May 28 at 15:32
  • This question is similar to: Recursive Query to generate dates not working. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. (although this question could be considered the original one to keep, the other one more concisely focuses on the infinite loop problem).
    Guillaume Outters
    –  Guillaume Outters
    2025-05-28 21:59:50 +00:00
    Commented May 28 at 21:59

2 Answers 2

1
What you meant and what ran effectively

When you wrote WHERE days >= days - 60 (and later WHERE iter >= iter - 7), you meant:
WHERE previous_iteration.days >= initial_iteration.days - 60

However, in recursive CTEs, the recursive part can never access the initial iteration, only the results of the previous iteration, or external tables or CTEs.

Thus in reality what you wrote ran as:
WHERE previous_iteration.days >= previous_iteration.days - 60
in essence, the changing column of the recursive CTE was compared to… itself! (and thus this condition is always true)

What can be done

If you want minimal changes to your query you'll just have your recursive part stop condition compare an iteratively changing column, to a non-changing value.

For this value that should not change from one iteration to the next one,
but that you cannot get from the init part of the recursive CTE, you can:

  • get it from elsewhere:
    • directly code it in the stop condition:
      WHERE days >= DATE'2025-03-27'
    • or get it by SQL from anything that is not the recursive CTE itself:
      WHERE days >= (SELECT MIN(DATE(job_dtm)) FROM `my_project.my_dataset.audit_tbl`)
  • or get it from the recursive part of the CTE, by computing it in the init part of the CTE, but then propagating it to the recursive part thanks to a new column:
gen_dates AS
(
  SELECT
    DATE'2025-05-27' AS days, -- The column that will change
    DATE'2025-03-27' AS first_day -- The non-changing column
  UNION ALL
  SELECT
    days - 1, -- Changes
    first_day -- Propagated unchanged to the next iteration
  FROM
    gen_dates
  WHERE
    days >= first_day -- Now we compare something that changes on each iteration, to something that has not changed since the init part of the CTE.
)

Here I used an hardcoded date as the static column, but you can use a (changing) counter instead, in addition to the (changing) days, as long as at the end you compare it to a static value:

gen_dates AS
(
  SELECT
    DATE'2025-05-27' AS days, -- The column that will change
    60 AS remaining_iterations -- This one will change too
  UNION ALL
  SELECT
    days - 1,
    remaining_iterations - 1
  FROM
    gen_dates
  WHERE
    remaining_iterations >= 0 -- Here remaining_iterations is the changing part, while 0 is the stable "stop" value.
)

To demonstrate two of those methods, I've adapted your query to a PostgreSQL fiddle, where you will see:

  • days using the simple harcoded date technique
  • iter using the sidekick counter technique (but this time I used an incrementing counter instead of a decrementing one)
  • … and that there is no more infinite loop. Enjoy!
    (note that I focused on the infinite loop, without trying to understand the functional meaning of your query, so you'll perhaps need to adjust some bits. But at least you're not stuck anymore)
Sign up to request clarification or add additional context in comments.

Comments

0

Here is one strategy you can follow:

First define a calendar table containing the 60 days you need for example in PostreSQL you can do the following:

SELECT * FROM (SELECT CURRENT_DATE::DATE - SEQUENCE.DAY AS datum
      FROM GENERATE_SERIES(0, 60) AS SEQUENCE (DAY)
      GROUP BY SEQUENCE.DAY) DATE_CALENDAR
ORDER BY 1;

You can now CROSS JOIN this DATE_CALENDAR table with your calculation table (or try Chris Maurer method), then group by DATE_CALENDAR.datum and reproduce your calculation based on DATE_CALENDAR.datum (replacing CURRENT_DATE) that represents any day on the past 60 days.

Warning: This will not show good performance but if your tables are small it can be an easy solution.

Comments

Your Answer

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.