Hello experts,
I have the structure like below:
Mutation_Container
Mutation_1
Mutation_Library
Mut_Reason_1
Mut_Reason_2
Mut_Reason_3
Mut_Reason_4
Mutation
Mut_Reason_1
Mut_Reason_2
Mutation_Subset_1
Mut_Reason_1
Mut_Reason_4
Mutation_Subset_2
Mut_Reason_1
Mut_Reason_2
Mutation_Subset_3
Mut_Reason_4
And the dataset up is:
create table nodes (node_id number, parent_node_id number, object_id number, object_type_id number);
insert into nodes values (1000, 10, 200, 1); -- Mutation_1 under the parent Mutation_Container
insert into nodes values (1001, 1000, 201, 2); -- Mutation_Library under the parent Mutation_1
insert into nodes values (1002, 1000, 202, 3); -- Mutation under the parent Mutation_1
insert into nodes values (1003, 1000, 203, 4); -- Mutation_Subset_1 under the parent Mutation_1
insert into nodes values (1004, 1000, 204, 4); -- Mutation_Subset_2 under the parent Mutation_1
insert into nodes values (1005, 1000, 205, 4); -- Mutation_Subset_3 under the parent Mutation_1
insert into nodes values (1006, 1000, 206, 4); -- Mutation_Subset_4 under the parent Mutation_1
insert into nodes values (1007, 1000, 207, 4); -- Mutation_Subset_5 under the parent Mutation_1
insert into nodes values (1008, 1001, 208, 5); -- Mut_Reason_1 under the parent Mutation_Library
insert into nodes values (1009, 1001, 209, 5); -- Mut_Reason_2 under the parent Mutation_Library
insert into nodes values (1010, 1001, 210, 5); -- Mut_Reason_3 under the parent Mutation_Library
insert into nodes values (1011, 1001, 211, 5); -- Mut_Reason_4 under the parent Mutation_Library
insert into nodes values (1008, 1002, 208, 5); -- Mut_Reason_1 under the parent Mutation
insert into nodes values (1009, 1002, 209, 5); -- Mut_Reason_2 under the parent Mutation
insert into nodes values (1008, 1003, 208, 5); -- Mut_Reason_1 under the parent Mutation_Subset_1
insert into nodes values (1011, 1003, 211, 5); -- Mut_Reason_4 under the parent Mutation_Subset_1
insert into nodes values (1008, 1004, 208, 5); -- Mut_Reason_1 under the parent Mutation_Subset_2
insert into nodes values (1009, 1004, 209, 5); -- Mut_Reason_2 under the parent Mutation_Subset_2
insert into nodes values (1011, 1005, 211, 5); -- Mut_Reason_4 under the parent Mutation_Subset_3
/* I am not adding the Mut_Reason s and the Mutation_Subset s to the below stuctures as they do not add to the problem I am discussing */
insert into nodes values (2000, 10, 2200, 1); -- Mutation_2 under the parent Mutation_Container
insert into nodes values (2001, 2000, 2201, 2); -- Mutation_Library under the parent Mutation_2
insert into nodes values (2002, 2000, 2202, 3); -- Mutation under the parent Mutation_2
insert into nodes values (2003, 2000, 2203, 3); -- Mutation under the parent Mutation_2
insert into nodes values (2004, 2000, 2204, 3); -- Mutation under the parent Mutation_2
insert into nodes values (3000, 10, 300, 1); -- Mutation_3 under the parent Mutation_Container
insert into nodes values (3001, 3000, 301, 2); -- Mutation_Library under the parent Mutation_3
insert into nodes values (3002, 3000, 302, 3); -- Mutation under the parent Mutation_3
insert into nodes values (3003, 3000, 303, 3); -- Mutation under the parent Mutation_3
insert into nodes values (4000, 10, 400, 1); -- Mutation_4 under the parent Mutation_Container
insert into nodes values (4001, 4000, 401, 2); -- Mutation_Library under the parent Mutation_4
insert into nodes values (4002, 4000, 402, 3); -- Mutation under the parent Mutation_4
Basically I have to come up with the query that finds the bad data in the DB. I call the data in the DB bad when I find two or more Mutation (object_type_id = 3) under the Mutation_1(object_type_id = 1). So, whenever there are 2 or more records belonging to object_type_id = 3 under object_type_id = 1 then we have to display it.
I have quite a lot of records in the table in the above said hierarchy.
Input: parent_node_id = 10.
And I have come up with the below query. Can anyone please suggest me whether I am doing it correctly or not ? (I am using match_recognize here as I have learnt it recently and just wanted to give it a try). Or the query can be improved using the same match_recognize ?
with got_hierarchy as
(select n.*,level as lvl from nodes n
where object_type_id in (1,3)
start with parent_node_id = 10
connect by prior node_id = parent_node_id
--order by lvl
)
select * from got_hierarchy
match_recognize
(
measures match_number() as mn, --Added for debugging
classifier() as what --Added for debugging
all rows per match
pattern ( mutations_numbers mutations{2, } )
define mutations_numbers as object_type_id = 1,
mutations as object_type_id = 3 and parent_node_id = mutations_numbers.node_id
);
Thanks and Regards,
Ranagal