Hi all,
in ETL process I need to check if some cols have unique values or not using sql or plsql.
Suppose we need to load a big file data with external table initially and then test if values
on one or more columns have unique values in order to proceed with ETL process.
What is the faster test I can execute to verify that a column have unique values or not?
It's better for the ETL performance, use:
- a. techniques regard constraints like described on Ask tom forum
"ENABLE NOVALIDATE validating existing data"
(Ask Tom "ENABLE NOVALIDATE validating existing da...")
- b. "simply" query on the data?
like this:
select count(count(*)) distinct_count,
sum(count(*)) total_count,
sum(case when count(*) = 1 then 1 else null end) non_distinct_groups,
sum(case when count(*) > 1 then 1 else null end) distinct_groups
from hr.employees a
group by A.JOB_ID
- c. use analytics function?
- d. use some feature directly on external table?
Bye in advance