I am an assisting a co-worker on trying to improve an ugly query and I can't come up with good suggestions, based on all the business rules that have been applied.
The main rule is to try and convert everything from the production system into the data warehouse and if it cannot be converted, create a log for it.
Here is a short test case.
CREATE TABLE ATTY (atty_id number, nme varchar2(20));
INSERT INTO atty VALUES (1, 'Bob');
INSERT INTO atty VALUES (2, 'Perry');
INSERT INTO atty VALUES (3, 'Ben');
CREATE TABLE PHN (phn_id number, atty_id number, phn_no varchar2(15));
INSERT INTO phn VALUES (10,1,'800-555-1234');
INSERT INTO phn VALUES (11,2,'800-555-7890');
INSERT INTO phn VALUES (12,3,'800-555-6541');
CREATE TABLE ATTY_CASE_XREF (atty_id number, pers_id number, case_id number);
-- The attorney representing a person on a case. Could be multiple people and/or attornies per case.
INSERT INTO atty_case_xref VALUES (2, 30, 40);
INSERT INTO atty_case_xref VALUES (2, 31, 40);
INSERT INTO atty_case_xref VALUES (3, 32, 41);
INSERT INTO atty_case_xref VALUES (2, 33, 41);
INSERT INTO atty_case_xref VALUES (1, 34, 42); -- yes this points to nothing (aka bad data to deal with)
CREATE TABLE case (case_id number, docket varchar2(20));
-- Due to bad data, may not always have case for each reference above.
INSERT INTO case VALUES (40, '2013-abcdefg');
INSERT INTO case VALUES (41, '2013-pouyfbkjo');
COMMIT;
The test case query (a simplified version of the real one but captures the concepts)
SELECT ap.atty_id,
ap.nme,
p.phn_no,
c.docket
FROM atty ap
INNER JOIN phn p
ON ap.atty_id = p.atty_id
INNER JOIN atty_case_xref acx
ON ap.atty_id = acx.atty_id
LEFT OUTER JOIN case c
ON acx.case_id = c.case_id
UNION
SELECT ap.atty_id,
ap.nme,
p.phn_no,
NULL
FROM atty ap
INNER JOIN phn p
ON ap.atty_id = p.atty_id
INNER JOIN atty_case_xref acx
ON ap.atty_id = acx.atty_id;
The desired output (and what is output by the above SQL statement is) (order does not matter)
Atty_ID | NME | PHN_NO | DOCKET |
---|
1 | Bob | 800-555-1234 | |
2 | Perry | 800-555-7890 | 2013-abcdefg |
2 | Perry | 800-555-7890 | 2013-pouyfbkjo |
2 | Perry | 800-555-7890 | |
3 | Ben | 800-555-6541 | 2013-pouyfbkjo |
3 | Ben | 800-555-6541 | |
Basically, the first part (above the union) creates a record for each attorney on a case, even if a case record is not loaded.
The second part converts just a generic non case record for each unique attorney.
If attorney 1 is listed on 10 cases, need to create 11 records. One for each case and one with no case information
The data warehouse table the data is loaded into serves two purposes. It is a normal data table, but also acts as a code table. That way if someone wants a list of all attorneys, they don't have to do a distinct, but can just query for a certain type of record (don't ask why).
The cost for the real version of this query is 248 Billion and was killed after several days of running as it was still building the result set. The explain plan image I have shows the following row counts, ATTY = 96,255, PHN = 3,284,444, ATTY_CASE_XREF = 11,553,888, CASE = 14,421772. We are looking for options that turn this into a one-pass run (or just reduce the cost) so it would complete in a reasonable time.
Any suggestions need to work on 10.2. This seems like a good place for a MODEL clause, but I still struggle with that syntax.
And the cleanup
DROP TABLE atty purge;
DROP TABLE phn purge;
DROP TABLE atty_case_xref purge;
DROP TABLE case purge;