Hello there,
I am trying to use the REGEXP_REPLACE SQL function on a piece of XML, to remove the actual XML data from the elements. My end goal is to end up with a comma-delimited list of XML element names. As part of the first step, I just want to rip out all the actual data.
I tested the following query, and it initially appeared to work:
SELECT REGEXP_REPLACE('>THIS IS A TEST<',
'>([[:alnum:]]\s|\S)*<',
'><' ) AS test_result
FROM dual;
Unfortunately, when I applied this to a full XML string, it didn't work. Here is the test query I used:
SELECT REGEXP_REPLACE('<ROW><TEST_ELEMENT1>123</TEST_ELEMENT1><TEST_ELEMENT2>THIS IS A TEST!</TEST_ELEMENT2></ROW>',
'>([[:alnum:]]\s|\S)*<',
'><') AS test_result
FROM dual;
I ended up with the following output:
*<ROW></ROW>*
What I was trying for was:
*<ROW><TEST_ELEMENT1></TEST_ELEMENT1><TEST_ELEMENT2></TEST_ELEMENT2></ROW>*
If you're reading this and you're a Posix RegExp Guru, please could you let me know exactly where I'm going wrong? RegExps are not my strong point, but I'd like to get better at them.