I have a proposed table with unnormalized data like the following:
ID COLA COLB REFLIST
-- ---- ---- -------
21 xxx zzz 24,25,78,412
22 xxx xxx 21
24 yyy xxx 912,22
25 zzz fff 433,555,22
.. ... ... ...
.. ... ... ...
There are 200 million rows. There is maximum of about 10 IDs in the REFLIST, though typically two or three. How could I efficiently query this data on the REFLIST column? e.g. something like:
SELECT id FROM mytable WHERE :myval in reflist
Logically there is a many to many relationship between rows in this table. The REFLIST column contains pointers to ID values elsewhere in the table. The data could be normalized so that the relationship keys are in a separate table (in fact this is the current solution that we want to change).
ID REF
--- ---
21 24
21 25
21 78
21 412
22 21
24 912
... ...
The comma separated list seems instinctively like a bad idea, however there are various reasons for proposing it. The main reason is because the source for this data has it structured like the REFLIST example. It is an OLTP-like system rather than a data warehouse. The source code (and edit performance) would benefit greatly from not having to maintain the relationship table as the data changes.
Going back to querying the REFLIST column, the problem seems to be building an approriate index for the data. The ideas proposed so far are:
<li>Make a materialized view that presents the relationships as normalized (e.g. as in the example with ID, REF columns above), then index the plain column - the various methods of writing the view SQL have been widely posted.
<li>Use a Oracle Text Index (not something I have ever had call to use before).
Any other ideas? Its Oracle 10.2, though 11g could be possible.
Thanks
Jim