Avoid duplicates processing
235699Aug 26 2003 — edited Aug 28 2003Hi All
I have a requirement in which I get a flat from source system everyday
I load this flat file into a staging table. This flat file can contain upto 6 million records
Next day when the flat file comes it will come with changed and unchanged records.
ie even if a record has not changed it will again come up in the file, the next day.
From the staging table, this data is processed and goes to other tables.
Since 75% of data will be duplicate, I do not want to process all the data everyday.
How do I do this? Getting only the unchanged records from source system is ruled out and also
I cant do a "diff" command on Unix between new and old files to load only changed records
I am thinking of 2 ways to do it
1) Create a duplicate set of table similar to staging table and load the previous days data in
this. While selecting the data use the MINUS operator on the table and do the processing
2) Retain the previous days data in the same table and while processing SELECT DISTINCT
from the table .
Which is the best way to do this, or any other alternate solutions to this
Performance is a major factor here and am using Oracle 8i.
Thanks
Ashwin N.