Hi!
I'm currently struggling to build a xmltable query which does the right thing and performs well.
The xml I'm trying to process is saved in a xmltype column and looks like this:
<customer>
<name>Mustermann</name>
<contract>
<contract_id>C1</contract_id>
<products>
<product>
<name>Product C1.P1</name>
<price>23.12</price>
<properties>
<property>Property C1.P1.A</property>
<property>Property C1.P1.B</property>
</properties>
</product>
<product>
<name>Product C1.P2</name>
<price>2.32</price>
</product>
</products>
</contract>
<contract>
<contract_id>C2</contract_id>
<products>
<product>
<name>Product C2.P1</name>
<price>143.33</price>
</product>
<product>
<name>Product C2.P2</name>
<price>231.76</price>
<properties>
<property>Property C2.P2.A</property>
<property>Property C2.P2.B</property>
</properties>
</product>
</products>
</contract>
<contract>
<contract_id>C3</contract_id>
</contract>
</customer>
The xml structure contains various repeating nodes most of which are optional. E.g. a contract does not need to have a product, a product does not need to have a property.
The desired output looks as follows:
customer_name | contract_id | product_name | product_price | product_property |
---|
Mustermann | C1 | Product C1.P1 | 23.12 | Property C1.P1.A |
Mustermann | C1 | Product C1.P1 | 23.12 | Property C1.P1.A |
Mustermann | C1 | Product C1.P2 | 2.32 | (null) |
Mustermann | C2 | Product C2.P1 | 143.33 | (null) |
Mustermann | C2 | Product C2.P2 | 231.76 | Property C2.P2.A |
Mustermann | C2 | Product C2.P2 | 231.76 | Property C2.P2.B |
Mustermann | C3 | (null) | (null) | (null) |
So basically the nested nodes are split up into separate rows as if I performed multiple left joins.
The Query I tried looks as follows:
create table t_xml_test_1 (exception_id number, xml_data xmltype);
insert into t_xml_test_1 values( 123456, xmltype(
'<customer>
<name>Mustermann</name>
<contract>
<contract_id>C1</contract_id>
<products>
<product>
<name>Product C1.P1</name>
<price>23.12</price>
<properties>
<property>Property C1.P1.A</property>
<property>Property C1.P1.B</property>
</properties>
</product>
<product>
<name>Product C1.P2</name>
<price>2.32</price>
</product>
</products>
</contract>
<contract>
<contract_id>C2</contract_id>
<products>
<product>
<name>Product C2.P1</name>
<price>143.33</price>
</product>
<product>
<name>Product C2.P2</name>
<price>231.76</price>
<properties>
<property>Property C2.P2.A</property>
<property>Property C2.P2.B</property>
</properties>
</product>
</products>
</contract>
<contract>
<contract_id>C3</contract_id>
</contract>
</customer>' )
);
SELECT
customer_name, contract_id, product_name, product_price, product_property
FROM
t_xml_test_1,
XMLTable(
'
for $g1 in /customer/contract
for $g2 in $g1/products/product
for $g3 in $g2/properties/property
return
<root>
<g0>{/customer/name}</g0>
<g1>{$g1/contract_id}</g1>
<g2>{$g2/name}{$g2/price}</g2>
<g3>{$g3}</g3>
</root>
'
PASSING t_xml_test_1.xml_data
columns
customer_name CLOB path '/root/g0/name',
contract_id CLOB path '/root/g1/contract_id',
product_name CLOB path '/root/g2/name',
product_price CLOB path '/root/g2/price',
product_property CLOB path '/root/g3'
) xmltab;
exception_id, customer_name, contract_id, product_name, product_price, product_property
This query performs quite well. The problem is, that the nested loop does not pick up the rows where a sub-node (e.g. $g2, $g3) does not exist.
To catch these rows as well, I added more loops:
SELECT
exception_id, customer_name, contract_id, product_name, product_price, product_property
FROM
t_xml_test_1,
XMLTable(
'
for $g1 in /customer/contract
for $g2 in $g1/products/product
for $g3 in $g2/properties/property
return
<root>
<g0>{/customer/name}</g0>
<g1>{$g1/contract_id}</g1>
<g2>{$g2/name}{$g2/price}</g2>
<g3>{$g3}</g3>
</root>,
for $g1 in /customer/contract
for $g2 in $g1/products/product
where empty($g2/properties/property)
return
<root>
<g0>{/customer/name}</g0>
<g1>{$g1/contract_id}</g1>
<g2>{$g2/name}{$g2/price}</g2>
</root>,
for
$g1 in /customer/contract
where empty($g1/products/product)
return
<root>
<g0>{/customer/name}</g0>
<g1>{$g1/contract_id}</g1>
</root>
'
PASSING t_xml_test_1.xml_data
columns
customer_name CLOB path '/root/g0/name',
contract_id CLOB path '/root/g1/contract_id',
product_name CLOB path '/root/g2/name',
product_price CLOB path '/root/g2/price',
product_property CLOB path '/root/g3'
) xmltab;
This query works as expected, picking up all desired result rows.
Unfortunately this query performs very badly. The execution plan shows a "COLLECTION ITERATOR (PICKLER FETCH)" - I've learned that this should be avoided whenever possible.
My question: Is there any way to speed up this query or to use another technique to pick up those optional nodes? It seems quite ugly to concatenate multiple for loops.
We are running Oracle 11gR2 on RedHat 6.4 (Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production)
Any help is appreciated!