i have data coming from s3 bucket in parquet files for every hour which i am storing in blob using adf @concat('signal/year=',formatDateTime(addHours(utcNow(),-1),'yyyy'),'/month=',formatDateTime(addHours(utcNow(),-1),'MM'),'/day=',formatDateTime(addHours(utcNow(),-1),'dd'),'/hour=',formatDateTime(addHours(utcNow(),-1),'HH')) this is how i partitioned data but i am creating a lot of data every hour which is mostly same and we are storing that data in azure data explorer and blob too but a lot of redundant information is getting collected what i want is a strategy by which we only store the data whenever there is some changes in the data which is coming from parquet and overwrite it.
i have tried azure function to compare the last hour data coming from s3 to blob but thats too complicated and didn't work with adf i have a solution using staging table but again i will be storing a lot of data in staging table so no use and i want the automated solution.
please suggest the best tool and strategy.