• Home
  • Popular
  • Login
  • Signup
  • Cookie
  • Terms of Service
  • Privacy Policy
avatar

Posted by User Bot


27 Apr, 2025

Updated at 20 May, 2025

weird query plan for "manually partitioned" table in redshift

Since AWS Redshift doesn't support native table partitioning, I implemented what in some places is called "manual partitioning" (eg.: here); in short:

Instead of creating a big time-series table ifdata_detail, i created weekly partitions ifdata_detail_2022_01_31,ifdata_detail_2022_02_07,...,ifdata_detail_2025_04_21, and then I create an ifdata_detail view that is the union of them all:

CREATE OR REPLACE VIEW prod.ifdata_detail AS 

SELECT *, '2022-01-31'::DATE AS week_partition FROM prod.ifdata_detail_2022_01_31
UNION ALL 
SELECT *, '2022-02-07'::DATE AS week_partition FROM prod.ifdata_detail_2022_02_07 
UNION ALL 
...
UNION ALL 
SELECT *, '2025-04-21'::DATE AS week_partition FROM prod.ifdata_detail_2025_04_21

This way I thought that the Redshift query planner would skip most of the weekly partition in a query like:

select max(day)
from ifdata_detail
where week_partition = date_trunc('week', current_date)

but instead the query is taking ages (while select max(day) from ifdata_detail_2025_04_21 is a 2s query) and when throwing an EXPLAIN I see a huge cost in each subquery scan:

XN Aggregate  (cost=784674611.96..784674611.96 rows=1 width=4)
  ->  XN Subquery Scan ifdata_detail  (cost=3562830.80..784674611.54 rows=170 width=4)
        ->  XN Append  (cost=3562830.80..784674609.84 rows=170 width=4)
              ->  XN Subquery Scan "*SELECT* 1"  (cost=3562830.80..3562830.81 rows=1 width=4)
                    ->  XN Aggregate  (cost=3562830.80..3562830.80 rows=1 width=4)
                          ->  XN Result  (cost=0.00..2850264.64 rows=285026464 width=4)
                                One-Time Filter: false
                                ->  XN Seq Scan on ifdata_detail_2022_01_31  (cost=0.00..2850264.64 rows=285026464 width=4)
              ->  XN Subquery Scan "*SELECT* 2"  (cost=4094825.60..4094825.61 rows=1 width=4)
                    ->  XN Aggregate  (cost=4094825.60..4094825.60 rows=1 width=4)
                          ->  XN Result  (cost=0.00..3275860.48 rows=327586048 width=4)
                                One-Time Filter: false
                                ->  XN Seq Scan on ifdata_detail_2022_02_07  (cost=0.00..3275860.48 rows=327586048 width=4)
...
              ->  XN Subquery Scan "*SELECT* 169"  (cost=4405877.06..4405877.07 rows=1 width=4)
                    ->  XN Aggregate  (cost=4405877.06..4405877.06 rows=1 width=4)
                          ->  XN Seq Scan on ifdata_detail_2025_04_21  (cost=0.00..3524701.65 rows=352470165 width=4)

So clearly Redhisft isn't pruning the useless partitions, and I'm wondering:

  1. why is Redshift scanning all the partitions?
  2. is there a way of forcing the behaviour I want, i.e. scan only the one partition containing the max(day)?

PS: I know I could avoid a lot of pain by materializing the view into a table and exploiting a SORTKEY (eg. the "day" field) to get efficient queries, but before switching to that approach I would like to understand whether this could work.