Skip to Main Content

Using Oracle Text for dynamic comparison of splitted data

675725Dec 15 2008 — edited Dec 17 2008
Hi,

I have two tables with similar columns. There is a free text field (e.g. address) . Address fields contain 1 to 10 words in different order.
Address splited into 4 columns: address1, address2, address3, zip .

***************************************************************************************************************************

I want to be able to retrieve all the related rows in both tables using CONTAINS operator and/or fuzzy search. Any other solution is more then welcome.

***************************************************************************************************************************

TABLE oldaddress Data :

12385 OAK LN APT 9898 LAYTON UT 84040

TABLE newaddress Data :

APT 9898 12385 OAK LN LAYTON UT 84040
12385 OAK LN APT 9898 LAYTON UT 84040
12385 OAK LINE LAYTON UT 84040
APARTMENT 9898 12385 OAK LN LAYTON UT
12385 ,OAK LINE LAYTON UT 84040

Please, notice that basically {color:#008000}*all the addresses* are the same{color}. My goal is to retrieve all of new addresses by comparing data to oldaddress table.

My steps are the following :

--1.
create table OLDADDRESS
(
ID NUMBER(20) not null,
ADDRESS1 VARCHAR2(30),
ADDRESS2 VARCHAR2(30),
ADDRESS3 VARCHAR2(30),
ZIPCODE VARCHAR2(10),
ALL_COLS VARCHAR2(1)
);

--2.
create table NEWADDRESS
(
ID NUMBER(20) not null,
ADDRESS1 VARCHAR2(30),
ADDRESS2 VARCHAR2(30),
ADDRESS3 VARCHAR2(30),
ZIPCODE VARCHAR2(10),
ALL_COLS VARCHAR2(1)
);
--3. Insert data
begin
insert into OLDADDRESS (ID,ADDRESS1, ADDRESS2, ADDRESS3, ZIPCODE) values (11, '12385 OAK LN', 'APT 9898', 'LAYTON UT', '84040');
insert into NEWADDRESS (ID,ADDRESS1, ADDRESS2, ADDRESS3, ZIPCODE) values (22, 'APT 9898', '12385 OAK LN', 'LAYTON UT', '84040');
insert into NEWADDRESS (ID,ADDRESS1, ADDRESS2, ADDRESS3, ZIPCODE) values (33, '12385 OAK LN', 'APT 9898', 'LAYTON UT', '84040');
insert into NEWADDRESS (ID,ADDRESS1, ADDRESS2, ADDRESS3, ZIPCODE) values (44, '12385 ,OAK LN', 'APARTMENT 9898', 'LAYTON UT', '84040');
insert into NEWADDRESS (ID,ADDRESS1, ADDRESS2, ADDRESS3, ZIPCODE) values (55, '12385 OAK LINE', 'APT 9898', 'LAYTON UT', '84040');
insert into NEWADDRESS (ID,ADDRESS1, ADDRESS2, ADDRESS3, ZIPCODE) values (66, '12385 OAK LN.', 'APT 9898', 'LAYTON UT', '84040');
insert into NEWADDRESS (ID,ADDRESS1, ADDRESS2, ADDRESS3, ZIPCODE) values (77, 'APARTMENT 9898','12385 .OAK LN', 'LAYTON UT', '84040');
commit;
end;

--4. Created a field section:
BEGIN
CTX_DDL.CREATE_PREFERENCE ('my_datastore', 'MULTI_COLUMN_DATASTORE');
CTX_DDL.SET_ATTRIBUTE
('my_datastore',
'COLUMNS',
'ADDRESS1, ADDRESS2, ADDRESS3');
CTX_DDL.CREATE_SECTION_GROUP ('MYGRP','BASIC_SECTION_GROUP');
CTX_DDL.ADD_FIELD_SECTION ('MYGRP','FIELD1','ADDRESS1',TRUE);
CTX_DDL.ADD_FIELD_SECTION ('MYGRP','FIELD2','ADDRESS2',TRUE);
CTX_DDL.ADD_FIELD_SECTION ('MYGRP','FIELD3','ADDRESS3',TRUE);
END;
/

-- 5. CREATE INDEX
CREATE INDEX my_idx ON newaddress (all_cols)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS
('DATASTORE my_datastore
SECTION GROUP mygrp') ;

--6. Select from all fields :

SQL> select address1, address2, address3, zipcode
2 from newaddress
3 where CONTAINS(all_cols, ( '12385 & Oak & 9898'), 1) > 0;

ADDRESS1 ADDRESS2 ADDRESS3 ZIP
-----
APT 9898 12385 OAK LN LAYTON UT 84040
12385 OAK LN APT 9898 LAYTON UT 84040
12385 ,OAK LN APARTMENT 9898 LAYTON UT 84040
12385 OAK LINE APT 9898 LAYTON UT 84040
12385 OAK LN. APT 9898 LAYTON UT 84040
APARTMENT 9898 12385 .OAK LN LAYTON UT 84040

6 rows selected.

--7. *Here, I am not able to figure out how to make it massively for multiple rows ? How to use CONTAINS operator with external data instead of text ( '12385 & Oak & 9898')?
Should I use ref cursor/ bulk collection ? Maybe REGULAR EXPRESSIONS should be usefull (and how) ?*

select address1, address2, address3, zipcode
from newaddress
where CONTAINS(all_cols, ( 'Param 1 & Param 2 & Param 3... &...Param n'), 1) > 0;

Param 1 & Param 2 & Param 3... &...Param n refer to 'select address1||' '||address2||''||address3 from oldaddress'.

I am using oracle 10g.

Thanks ahead ;)


Regards,

Julia
Comments
Post Details
Added on Dec 15 2008
4 comments
1,327 views