Pretty much any person who works with data faces a task to trace attributes from source to target (also referred as data lineage).
In this post I will describe how that can be done for a given query.
It’s been known for quite a while that it is possible to parse a query and get result in XML format.
Philipp wrote about that 15 years ago.
However, if we also use dbms_utility.expand_sql_text then approach becomes much more flexible and generic.
Below is a quick demo.
create table tt1(a1, b1, c1) as select 1, 1, 1 from dual;
create table tt2(a2, b2, c2) as select 1, 1, 1 from dual;
create table tt3(a3, b3, c3) as select 1, 1, 1 from dual;
create table tt4(a4, b4, c4) as select 1, 1, 1 from dual;
create or replace view vvv as
select (x + c3) * tt3.a3 x1, t0.l - sign(b3) + (select max(c4+a4) from tt4) x2
from (select tt1.b1 + (tt1.a1 + 1) * tt2.c2 x,
length(substr('dummy', tt1.b1, tt2.b2)) l
from tt1
join tt2
on tt1.a1 = tt2.a2) t0
cross join tt3;
Now we can get all source columns and relevant tables which contribute to target columns by traversing parsed XML.
SQL> with t as
2 (
3 select col.*, dst_col, is_top
4 from xmltable('//SELECT_LIST_ITEM'
5 passing parse_query(expand_query('select * from vvv'))
6 columns
7 dst_col varchar2(30) path 'COLUMN_ALIAS',
8 is_top varchar2(30) path '1-count(ancestor::*[4])',
9 orig xmltype path '.'
10 ) sel,
11 xmltable('//COLUMN_REF'
12 passing sel.orig
13 columns
14 schema varchar2(30) path 'SCHEMA',
15 tbl varchar2(30) path 'TABLE',
16 alias varchar2(30) path 'TABLE_ALIAS',
17 src_col varchar2(30) path 'COLUMN'
18 ) col
19 ),
20 r(lvl, schema, tbl, alias, src_col, dst_col, root_col) as
21 (
22 select 1, t.schema, t.tbl, t.alias, t.src_col, t.dst_col, t.dst_col root_col
23 from t where is_top = 1
24 union all
25 select lvl+1, t.schema, t.tbl, t.alias, t.src_col, t.dst_col, r.root_col root_col
26 from t, r
27 where r.src_col = t.dst_col and r.alias <> t.alias
28 )
29 select tbl, src_col, root_col dst_col
30 from r
31 where tbl is not null
32 order by 1, 2, 3;
TBL SRC_COL DST_COL
-------------------- -------------------- --------------------
TT1 A1 X1
TT1 B1 X1
TT1 B1 X2
TT2 B2 X2
TT2 C2 X1
TT3 A3 X1
TT3 B3 X2
TT3 C3 X1
TT4 A4 X2
TT4 C4 X2
10 rows selected.
You can get all the source here.
PS. You do not want to mix connect by and xmltable in the same query because of this.