I'll start by explaining my problem, so that you can have a global vision on the problem and maybe suggest me another solution. Problem: I have 2 tables with millions of records. Records are added on a daily basis and so each row in both tables has an "insertion_date" column.
C1 C2 ... C10 insertion_date (type date, ofc)
rows are cleaned up based on the insertion_date parameter. This can happen in two ways:
1-whenever an application error shows up. In this case delete is based on the single day and would be like:
delete from T1 where insertion_date=##;
in other words we delete added rows and then restart the program (business logic, can't alter it)
2- every two weeks data related to these two weeks is deleted because not needed anymore:
delete from T1 where insertion_date between ## and ##; (a two weeks period here)
Delete is currently very slow: it takes abot 8 mins just to delete about 5M of records (having the same insertion_date), I can't even imagine the time required to delete more records.
So, here's my idea!
I would partition my DB based on the value of insertion_date.
For delete case 1 I would simply delete the partion related to that insertion date, for delete case 2 partitions related to that interval.
For my application's needs at most 15 partions are present each time (maybe 20 if I want to keep at least last 5 days of data), so the documented limit of 64000 partitions is not a problem.
Real problem is that I don't know insertion_date values in advance, so my question: is it possibile to automatically create a new partition each time a new insertion_date value shows up?
And, please correct me if I am wrong, if I can accomplish partitioning I wouldn't need to edit above queries right? I will just experience faster deletion time am I correct?