Hello again all of you,
Yesterday, I received very good answers from Frank Kulash and mathguy for a question.
I'm so impressed about the possibilties of SQL language. I try to apply those suggestions to
my issue and I am facing another one. And as suggested by Frank Kulash yesterdat, I create then another topic.
Imagine the same structure as yesterday with a couple of additional fields. It's a tree structure with nodes. Root nodes are identified by parent_id is NULL.
Let's take the following sample data:
drop table nodes;
CREATE TABLE nodes
(
id number(10) NOT NULL, -- PK but we don't care
parent_id number(10), -- NULLs are roots
from_ DATE, -- a date that can never be null
derived_from_ DATE, -- min(from_) based on all descending nodes
code varchar2(10) -- just a code
);
delete nodes;
alter session set nls_date_format = 'YYYY';
-- insert some roots
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(1, NULL, TO_DATE('1999', 'YYYY'), NULL, 'AAA');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(2, NULL, TO_DATE('2000', 'YYYY'), TO_DATE('2000', 'YYYY'), 'B002');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(3, NULL, TO_DATE('2010', 'YYYY'), TO_DATE('2000', 'YYYY'), 'B003');
-- insert a hierachy for each root. ids are random in reality. here is only to make it easier to understand
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(10, 1, TO_DATE('2000', 'YYYY'), TO_DATE('2000', 'YYYY'), 'C010');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(11, 1, TO_DATE('2002', 'YYYY'), TO_DATE('2000', 'YYYY'), 'C011');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(12, 1, TO_DATE('2004', 'YYYY'), TO_DATE('2010', 'YYYY'), 'C012');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(121, 12, TO_DATE('2006', 'YYYY'), TO_DATE('2010', 'YYYY'), 'C0121');
WITH get_tree(id, parent_id, from_, derived_from_, code, lvl, full_from_) AS
(
SELECT r.id, r.parent_id, r.from_, r.derived_from_, r.code, 1,
TO_CHAR(r.from_, 'YYYY')
FROM nodes r
WHERE parent_id IS NULL
UNION ALL
SELECT n.id, n.parent_id, n.from_, n.derived_from_, n.code, t.lvl + 1,
t.full_from_ || ' > ' || TO_CHAR(n.from_, 'YYYY')
FROM get_tree t
INNER JOIN nodes n ON t.id = n.parent_id
)
SEARCH DEPTH FIRST BY id SET rnk
SELECT *
FROM get_tree;
ID PARENT_ID FROM_ DERIVED CODE LVL FULL_FROM_ RNK
---------- ---------- ------- ------- ---------- ---------- ------------------------------ ----------
1 <NULL> 1999 <NULL> AAA 1 1999 1
10 1 2000 2000 C010 2 1999 > 2000 2
11 1 2002 2000 C011 2 1999 > 2002 3
12 1 2004 2010 C012 2 1999 > 2004 4
121 12 2006 2010 C0121 3 1999 > 2004 > 2006 5
2 <NULL> 2000 2000 B002 1 2000 6
3 <NULL> 2010 2000 B003 1 2010 7
7 rows selected.
In this case, when we have a look at rows with rnk from 1 to 5 they all belong to the root with id = 1.
The from_ date in the root is 1999. And all from_ dates from descending nodes have a bigger start date.
So, the derived from, should be set to 1999. Because it's the MIN(from_) we have in the whole hierarchy
under that root. At the same time, since it's that node, then the code, should remain the same.
The derived from below should not be taken in consideration. Whatever the value it has, it can be erased.
The only value that is important is from_. Imagine for ID 121 I had specify a derived from value of 1900,
it shouldn't impact the result.
Now, if I take another scenario:
-- insert a hierachy for each root. ids are random in reality. here is only to make it easier to understand
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(20, 2, TO_DATE('2000', 'YYYY'), TO_DATE('1050', 'YYYY'), 'R020');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(21, 2, TO_DATE('1999', 'YYYY'), TO_DATE('1900', 'YYYY'), 'R021');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(22, 2, TO_DATE('1998', 'YYYY'), TO_DATE('2050', 'YYYY'), 'R022');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(221, 22, TO_DATE('1997', 'YYYY'), TO_DATE('2210', 'YYYY'), 'R221');
And if I rexecute the same query and have a look at the from_ dates for root id = 2, I have this:
ID PARENT_ID FROM_ DERIVED CODE LVL FULL_FROM_ RNK
---------- ---------- ------- ------- ---------- ---------- ------------------------------ ----------
...
2 <NULL> 2000 2000 B002 1 2000 6
20 2 2000 1050 R020 2 2000 > 2000 7
21 2 1999 1900 R021 2 2000 > 1999 8
22 2 1998 2050 R022 2 2000 > 1998 9
221 22 1997 2210 R221 3 2000 > 1998 > 1997 10
...
Here, again, we should forget the value that are already in derived. Here, we see that the oldest from_ is 1997 and it's the node with id = 221.
So, that value should be bring back to the top along with its code. Ideally, an output like this would be nice:
ID PARENT_ID FROM_ EXISTING_DERIVED EXISTING_CODE NEW_DERIVED NEW_CODE LVL FULL_FROM_ RNK
---------- ---------- ------- ---------------- ------------- ----------- -------- ---------- ------------------------------ ----------
...
2 <NULL> 2000 2000 B002 1997 R221 1 2000 6
20 2 2000 1050 R020 1997 R221 2 2000 > 2000 7
21 2 1999 1900 R021 1997 R221 2 2000 > 1999 8
22 2 1998 2050 R022 1997 R221 2 2000 > 1998 9
221 22 1997 2210 R221 1997 R221 3 2000 > 1998 > 1997 10
...
Now, if I take another scenario:
-- insert a hierachy for each root. ids are random in reality. here is only to make it easier to understand
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(30, 3, TO_DATE('2000', 'YYYY'), TO_DATE('1050', 'YYYY'), 'Q030');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(31, 3, TO_DATE('1999', 'YYYY'), TO_DATE('1900', 'YYYY'), 'Q031');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(311, 31, TO_DATE('1995', 'YYYY'), TO_DATE('2210', 'YYYY'), 'Q331');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(32, 3, TO_DATE('1998', 'YYYY'), TO_DATE('2050', 'YYYY'), 'Q032');
INSERT INTO nodes(id, parent_id, from_, derived_from_, code) VALUES(321, 32, TO_DATE('1997', 'YYYY'), TO_DATE('2210', 'YYYY'), 'Q321');
And if I describe the result that I would like from that.
ID PARENT_ID FROM_ EXISTING_DERIVED EXISTING_CODE NEW_DERIVED NEW_CODE LVL FULL_FROM_ RNK
---------- ---------- ------- ---------------- ------------- ----------- -------- ---------- ------------------------------ ----------
...
3 <NULL> 2010 2000 B003 1995 Q331 1 2010 11
30 3 2000 1050 Q030 1995 Q331 2 2010 > 2000 12
31 3 1999 1900 Q031 1995 Q331 2 2010 > 1999 13
311 31 1995 2210 Q331 1995 Q331 3 2010 > 1999 > 1995 14
32 3 1998 2050 Q032 1997 Q321 2 2010 > 1998 15
321 32 1997 2210 Q321 1997 Q321 3 2010 > 1998 > 1997 16
...
Here we have two branches. The branch that is 32 has a child with a from date in 1997. So, it's has to be brought back to node with id = 32.
But, since node 311 has a lower date, we have to bring that one to the root.
I hope my wish is now clear? This is quite easy to make it in PL/SQL. A couple of loops and few and few selects. But, I really like to see how SQL helps to resolve this and it allows me to improve my knowledge. But, if you tell me that is no feasible, it's no issue.
Regards,