SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

HOWTO: Compare two tables

John SpencerJun 20 2002

TO FIND RECORDS IN TABLE1 WHICH DO NOT EXIST IN TABLE2

SELECT column_list
FROM table1
WHERE column_list NOT IN (SELECT column_list
                          FROM table2)

SELECT column_list
FROM table1
MINUS
SELECT column_list
FROM table2; 

SELECT column_list
FROM table1
WHERE NOT EXISTS (SELECT 1
                  FROM table2
                  WHERE table1.column1 = table2.column1 and
                        table1.column2 = table2.column2 and
                        ...);

SELECT table1.column_list 
FROM table1, table2
WHERE table1.column1 = table2.column1(+) and
      table1.column2 = table2.column2(+) and
      ...  and
      (table2.column1 IS NULL or
       table2.column2 IS NULL or
       ...)
-- Use and instead of or if you are sure that there is no case where
-- the two column1's will match, but the two column2's will not.

TO FIND RECORDS IN TABLE1 WHICH EXIST IN TABLE2.

SELECT column_list
FROM table1
WHERE column_list IN (SELECT column_list
                      FROM table2)

SELECT column_list
FROM table1
INTERSECT
SELECT column_list
FROM table2; 

SELECT column_list
FROM table1
WHERE EXISTS (SELECT 1
              FROM table2
              WHERE table1.column1 = table2.column1 and
                    table1.column2 = table2.column2 and
                    ...);

SELECT table1.column_list 
FROM table1, table2
WHERE table1.column1 = table2.column1 and
      table1.column2 = table2.column2 and
      ...

Which is better, depends on a number of factors. Some of the most important factors are:

1. Absloute sizes of the two tables
2. The relative sizes of the two tables
3. Cost Based Optimizer versus Rule Based Optimizer
4. Version of Oracle
5. Number and type columns that are being compared.

Number 5 may need a little clarification. If the question is find all Primary Key's in table1 which exist or do not exist in table2, then using EXISTS or a JOIN is probably best. However, if the question is find entire rows in table1 that do or do not have a corresponding entire row in table2 then INTERSECT or MINUS are likely to be best.

In all of the queries above, column_list represents the columns that you want to find the matches or mismatches for. This can be anything from a single column to * (all columns). column1, column2 ... represent each of the columns in your column list.

If we assume that there are no other selection criteria used against either table (which might affect the choice of indexes), then all methods will require at least one full table scan (at best a fast full index scan), and sometimes two. So, the chief factor affecting performance is the method Oracle uses to get the rows in table1 that do or do not exist in table2. On a conceptual basis, this is how I understand the various approaches operate. This is almost certainly not the exact algorithm that Oracle uses.

IN / NOT IN will do a full scan of table2 to get a list of values, then for each row of table1 scan this LOV to see if it finds a match (IN) or does not (NOT IN). If table2 is absolutely small (e.g. state_code table with 50 records) then this may well be the fastest approach since scanning the LOV should be fast in memory.

INTERSECT/ MINUS does a full scan of both tables, sorts each, then scans the two lists together to identify matches or mismatches. If the two tables are relatively similar in size, then this is likely to be one of the faster methods. The optimizer may choose to use index scans if the columns being compared are all indexed. If you are trying to compare all columns, or a large (unindexed) subset, then this is likely to be the fastest method. It may also be faster if you expect a relatively large number of matched or mis-matched records.

In a NOT EXISTS (or EXISTS), for each row in table 1, the correlated sub-query is executed against table2. If the all of the indentity columns in the sub-query are indexed, then this is pretty quick. The relative sizes of the tables do not matter much, but if both are large, this may be a little slower. Also, it is not as efficient when you need to look at values in several columns. If table2 is not appropriately indexed, and it is large, then this could be incredibly slow since it will require a full scan of table2 for each row in table1.

The [OUTER] JOIN approach uses SORT JOIN - MERGE JOIN or SORT JOIN - NESTED LOOP (depending on the Oracle version and optimizer mode). This is pretty efficient, particularly if the two tables are large, and relativley equally sized, and you expect a relativley small number of mismatches, or a relatively large number of matches (INNER JOIN).

There is also a fifth approach to finding records in table1 that do not exist in table2.

In sqlplus run $ORACLE_HOME/rdbms/admin/utlexcpt.sql then

ALTER TABLE table1
ADD constraint t1_t2_fk
FOREIGN KEY (column_list) REFERENCES table2 (column list)
EXCEPTIONS INTO exceptions

The rowid of any rows that do not pass validation will be stored in a table called exception (created by the script). You can then use the rowids to identify and deal with the rows. NOTE: This requires a unique index on column_list in table2.

Probably not useful for a one-off excercise, but it will prevent it happening on an on-going basis.

Caveat emptor

This posting is the work of the Rogue Moderators. It is posted with the best of intentions but is not guaranteed in any way, shape or form. In particular, the code is presented "as is" and you assume full responsibility for running it on your system.

Locked Post

New comments cannot be posted to this locked post.

Locked on Jul 18 2002

Added on Jun 20 2002

0 comments

6,738 views