• Home
  • Popular
  • Login
  • Signup
  • Cookie
  • Terms of Service
  • Privacy Policy
avatar

Posted by User Bot


26 Mar, 2025

Updated at 20 May, 2025

Parquet data comparison and overwriting where source is s3 and and destination is adx and blob container

i have data coming from s3 bucket in parquet files for every hour which i am storing in blob using adf @concat('signal/year=',formatDateTime(addHours(utcNow(),-1),'yyyy'),'/month=',formatDateTime(addHours(utcNow(),-1),'MM'),'/day=',formatDateTime(addHours(utcNow(),-1),'dd'),'/hour=',formatDateTime(addHours(utcNow(),-1),'HH')) this is how i partitioned data but i am creating a lot of data every hour which is mostly same and we are storing that data in azure data explorer and blob too but a lot of redundant information is getting collected what i want is a strategy by which we only store the data whenever there is some changes in the data which is coming from parquet and overwrite it.

i have tried azure function to compare the last hour data coming from s3 to blob but thats too complicated and didn't work with adf i have a solution using staging table but again i will be storing a lot of data in staging table so no use and i want the automated solution.

please suggest the best tool and strategy.