Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Finding the bad data in the DB

RanagalFeb 19 2019 — edited Feb 20 2019

Hello experts,

I have the structure like below:

Mutation_Container

     Mutation_1

          Mutation_Library

               Mut_Reason_1

               Mut_Reason_2

               Mut_Reason_3

               Mut_Reason_4

          Mutation

               Mut_Reason_1

               Mut_Reason_2

          Mutation_Subset_1

               Mut_Reason_1

               Mut_Reason_4

          Mutation_Subset_2

               Mut_Reason_1

               Mut_Reason_2

          Mutation_Subset_3

               Mut_Reason_4

And the dataset up is:

create table nodes (node_id number, parent_node_id number, object_id number, object_type_id number);

insert into nodes values (1000,  10,  200, 1); -- Mutation_1 under the parent Mutation_Container

insert into nodes values (1001, 1000, 201, 2); -- Mutation_Library under the parent Mutation_1

insert into nodes values (1002, 1000, 202, 3); -- Mutation under the parent Mutation_1

insert into nodes values (1003, 1000, 203, 4); -- Mutation_Subset_1 under the parent Mutation_1

insert into nodes values (1004, 1000, 204, 4); -- Mutation_Subset_2 under the parent Mutation_1

insert into nodes values (1005, 1000, 205, 4); -- Mutation_Subset_3 under the parent Mutation_1

insert into nodes values (1006, 1000, 206, 4); -- Mutation_Subset_4 under the parent Mutation_1

insert into nodes values (1007, 1000, 207, 4); -- Mutation_Subset_5 under the parent Mutation_1

insert into nodes values (1008, 1001, 208, 5); -- Mut_Reason_1 under the parent Mutation_Library

insert into nodes values (1009, 1001, 209, 5); -- Mut_Reason_2 under the parent Mutation_Library

insert into nodes values (1010, 1001, 210, 5); -- Mut_Reason_3 under the parent Mutation_Library

insert into nodes values (1011, 1001, 211, 5); -- Mut_Reason_4 under the parent Mutation_Library

insert into nodes values (1008, 1002, 208, 5); -- Mut_Reason_1 under the parent Mutation

insert into nodes values (1009, 1002, 209, 5); -- Mut_Reason_2 under the parent Mutation

insert into nodes values (1008, 1003, 208, 5); -- Mut_Reason_1 under the parent Mutation_Subset_1

insert into nodes values (1011, 1003, 211, 5); -- Mut_Reason_4 under the parent Mutation_Subset_1

insert into nodes values (1008, 1004, 208, 5); -- Mut_Reason_1 under the parent Mutation_Subset_2

insert into nodes values (1009, 1004, 209, 5); -- Mut_Reason_2 under the parent Mutation_Subset_2

insert into nodes values (1011, 1005, 211, 5); -- Mut_Reason_4 under the parent Mutation_Subset_3

/* I am not adding the Mut_Reason s and the Mutation_Subset s to the below stuctures as they do not add to the problem I am discussing */

insert into nodes values (2000,  10,  2200, 1); -- Mutation_2 under the parent Mutation_Container

insert into nodes values (2001, 2000, 2201, 2); -- Mutation_Library under the parent Mutation_2

insert into nodes values (2002, 2000, 2202, 3); -- Mutation under the parent Mutation_2

insert into nodes values (2003, 2000, 2203, 3); -- Mutation under the parent Mutation_2

insert into nodes values (2004, 2000, 2204, 3);  -- Mutation under the parent Mutation_2

insert into nodes values (3000,  10,  300, 1);  -- Mutation_3 under the parent Mutation_Container

insert into nodes values (3001, 3000, 301, 2);  -- Mutation_Library under the parent Mutation_3

insert into nodes values (3002, 3000, 302, 3);  -- Mutation under the parent Mutation_3

insert into nodes values (3003, 3000, 303, 3);  -- Mutation under the parent Mutation_3

insert into nodes values (4000,  10,  400, 1);  -- Mutation_4 under the parent Mutation_Container

insert into nodes values (4001, 4000, 401, 2);  -- Mutation_Library under the parent Mutation_4

insert into nodes values (4002, 4000, 402, 3);  -- Mutation under the parent Mutation_4

Basically I have to come up with the query that finds the bad data in the DB. I call the data in the DB bad when I find two or more Mutation (object_type_id = 3) under the Mutation_1(object_type_id = 1).  So, whenever there are 2 or more records belonging to object_type_id = 3 under object_type_id = 1 then we have to display it.

I have quite a lot of records in the table in the above said hierarchy.

Input: parent_node_id = 10. 

And I have come up with the below query. Can anyone please suggest me whether I am doing it correctly or not ? (I am using match_recognize here as I have learnt it recently and just wanted to give it a try). Or the query can be improved using the same match_recognize ?

 

with got_hierarchy as

(select n.*,level as lvl from nodes n

where object_type_id in (1,3)

start with parent_node_id = 10

connect by prior node_id = parent_node_id

--order by lvl

)

select * from got_hierarchy

match_recognize

(

   measures match_number() as mn, --Added for debugging

             classifier() as what  --Added for debugging

    all rows per match

    pattern ( mutations_numbers mutations{2, } )

    define mutations_numbers as object_type_id = 1,

           mutations as object_type_id = 3 and parent_node_id = mutations_numbers.node_id

);

Thanks and Regards,

Ranagal

This post has been answered by Frank Kulash on Feb 19 2019
Jump to Answer
Comments
Post Details
Added on Feb 19 2019
5 comments
265 views