Since AWS Redshift doesn't support native table partitioning, I implemented what in some places is called "manual partitioning" (eg.: here); in short:
Instead of creating a big time-series table ifdata_detail
, i created weekly partitions ifdata_detail_2022_01_31
,ifdata_detail_2022_02_07
,...,ifdata_detail_2025_04_21
, and then I create an ifdata_detail
view that is the union of them all:
CREATE OR REPLACE VIEW prod.ifdata_detail AS
SELECT *, '2022-01-31'::DATE AS week_partition FROM prod.ifdata_detail_2022_01_31
UNION ALL
SELECT *, '2022-02-07'::DATE AS week_partition FROM prod.ifdata_detail_2022_02_07
UNION ALL
...
UNION ALL
SELECT *, '2025-04-21'::DATE AS week_partition FROM prod.ifdata_detail_2025_04_21
This way I thought that the Redshift query planner would skip most of the weekly partition in a query like:
select max(day)
from ifdata_detail
where week_partition = date_trunc('week', current_date)
but instead the query is taking ages (while select max(day) from ifdata_detail_2025_04_21
is a 2s query) and when throwing an EXPLAIN
I see a huge cost in each subquery scan:
XN Aggregate (cost=784674611.96..784674611.96 rows=1 width=4)
-> XN Subquery Scan ifdata_detail (cost=3562830.80..784674611.54 rows=170 width=4)
-> XN Append (cost=3562830.80..784674609.84 rows=170 width=4)
-> XN Subquery Scan "*SELECT* 1" (cost=3562830.80..3562830.81 rows=1 width=4)
-> XN Aggregate (cost=3562830.80..3562830.80 rows=1 width=4)
-> XN Result (cost=0.00..2850264.64 rows=285026464 width=4)
One-Time Filter: false
-> XN Seq Scan on ifdata_detail_2022_01_31 (cost=0.00..2850264.64 rows=285026464 width=4)
-> XN Subquery Scan "*SELECT* 2" (cost=4094825.60..4094825.61 rows=1 width=4)
-> XN Aggregate (cost=4094825.60..4094825.60 rows=1 width=4)
-> XN Result (cost=0.00..3275860.48 rows=327586048 width=4)
One-Time Filter: false
-> XN Seq Scan on ifdata_detail_2022_02_07 (cost=0.00..3275860.48 rows=327586048 width=4)
...
-> XN Subquery Scan "*SELECT* 169" (cost=4405877.06..4405877.07 rows=1 width=4)
-> XN Aggregate (cost=4405877.06..4405877.06 rows=1 width=4)
-> XN Seq Scan on ifdata_detail_2025_04_21 (cost=0.00..3524701.65 rows=352470165 width=4)
So clearly Redhisft isn't pruning the useless partitions, and I'm wondering:
max(day)
?PS: I know I could avoid a lot of pain by materializing the view into a table and exploiting a SORTKEY (eg. the "day" field) to get efficient queries, but before switching to that approach I would like to understand whether this could work.