Hello
I working on the doing an application to customer match , where an application can have multiple customers (corporate) and customer can have multiple applications.
so i am doing a cartesian join and using UTL_MATCH .JARO_WINKLER_SMILIARTY for string matching.
Table A (applications) Approximatly 20,000 per day
Table B (CusomterPfile) approximatly 400,000 per month.
I have to match apps for month, so intermeidate volume is 20,000*400,000 = 8,000,000,000 and final data set with JW_SCORE>970
Its time consuming to create the table of 8billion and fillter for200K Approx having jw_score>970.
And also to filter in sub querry. please suggest a better approach .
CREATE TABLE tablea
(
APP_ID NUMBER(14),
APP_LOAD_DATE DATE,
SSN VARCHAR2(20 BYTE),
DOB DATE,
FIRST_NAME VARCHAR2(50 BYTE),
LAST_NAME VARCHAR2(50 BYTE),
ADDR VARCHAR2(92 BYTE),
CITY VARCHAR2(50 BYTE),
STATE VARCHAR2(20 BYTE),
ZIP VARCHAR2(20 BYTE)
);
CREATE TABLE tableb
(
ACCOUNT_NUMBER NUMBER(9) NOT NULL,
DOB DATE,
SSN NUMBER(9),
FIRST_NAME VARCHAR2(32 BYTE),
LAST_NAME VARCHAR2(60 BYTE),
ADDR VARCHAR2(100 BYTE),
CITY VARCHAR2(26 BYTE),
STATE CHAR(2 BYTE),
ZIP CHAR(5 BYTE),
ZIP_4 CHAR(4 BYTE)
);
SELECT A.* , JW_SSN+JW_DOB+JW_FNAME+JW_LNAME+JW_ADDR AS JW_SCORE FROM (
SELECT /*+ PARALLEL(16) */
a.appid
, a.ssn a_ssn
, b.ssn b_ssn
, a.dob AS a_dob
, b.dob AS b_dob
, a.first_name AS a_first_name
, b.first_name AS b_first_name
, a.last_name AS a_last_name
, b.last_name AS b_last_name
, A.FIRST_NAME ||' '|| A.LAST_NAME AS A_FULLNAME
, B.FIRST_NAME ||' ' ||B.LAST_NAME AS B_FULLNAME
, a.addr AS a_addr
, b.addr b_addr
, a.city AS a_city
, b.city b_city
, a.state AS a_state
, b.state b_state
, a.zip AS a_zip
, b.zip b_zip
, ( CASE WHEN ( A.ssn IS NULL or B.SSN IS NULL ) THEN 100
WHEN ( ( A.SSN = 111111111 OR B.SSN = 111111111)) THEN 100
ELSE UTL_MATCH.jaro_winkler_similarity ( LPAD( A.ssn,9,0) , LPAD( b.ssn ,9,0))
END)* 3 as JW_SSN
,( CASE WHEN ( A.dob IS NULL OR B.dob IS NULL ) THEN 100
ELSE UTL_MATCH.jaro_winkler_similarity ( A.dob ,B.dob)
END )*2 as JW_DOB
,( CASE WHEN ( A.first_name IS NULL OR B.first_name IS NULL ) THEN 100
ELSE UTL_MATCH.jaro_winkler_similarity ( A.first_name ,B.first_name)
END ) *2as JW_FNAME
, (CASE WHEN ( A.last_name IS NULL OR B.last_name IS NULL ) THEN 100
ELSE UTL_MATCH.jaro_winkler_similarity ( A.last_name ,B.last_name)
END) *2 as JW_LNAME
, CASE WHEN ( A.FIRST_NAME || A.LAST_NAME IS OR AND B.FIRST_NAME||B.LAST_NAME IS NULL ) THEN 100
ELSE UTL_MATCH.jaro_winkler_similarity ( A.FIRST_NAME ||' '|| A.LAST_NAME , B.FIRST_NAME ||' ' ||B.LAST_NAME)
END as JW_FULLNAME
, ( CASE WHEN ( A.addr IS NULL AND B.addr IS NULL ) THEN 100
ELSE UTL_MATCH.jaro_winkler_similarity ( A.addr ,B.addr)
END) * 1 as JW_ADDR
FROM tablea a, tableb b
WHERE 1 = 1
AND b.start_service_date >= a.app_load_date
) A
WHERE JW_SSN+JW_DOB+JW_FNAME+JW_LNAME+JW_ADDR > 970;
Please suggest.
Thanks in Advance.