handling regexp_substr and consecutive delimiters
mieslepJul 21 2010 — edited Jul 21 2010I've been beating my head against the wall as the usual searches have turned up nothing. As I've come up with a solution, I thought I'd post this to the broader community in case there are others out there with this problem.
Imagine you have a string: ',2,3,,5,,,8,'
and you want to split this into different values. This can be accomplished with a PL/SQL function but it's pretty hard to use that as a SQL query source. Initial attempts would suggest using the '[^,]+' expression, along the lines of this:
with strings as (select ',2,3,,5,,,8,' vals from dual)
select
to_number(regexp_substr(vals,'[^,]+',1,1)) as a
,to_number(regexp_substr(vals,'[^,]+',1,2)) as b
,to_number(regexp_substr(vals,'[^,]+',1,3)) as c
,to_number(regexp_substr(vals,'[^,]+',1,4)) as d
,to_number(regexp_substr(vals,'[^,]+',1,5)) as e
,to_number(regexp_substr(vals,'[^,]+',1,6)) as f
,to_number(regexp_substr(vals,'[^,]+',1,7)) as g
,to_number(regexp_substr(vals,'[^,]+',1,8)) as h
,to_number(regexp_substr(vals,'[^,]+',1,9)) as i
,to_number(regexp_substr(vals,'[^,]+',1,10)) as j
from strings;
But what you'll find is that it doesn't deal with a number of test conditions. A=2,B=3,C=5,D=8, and E to J = NULL. Not what we want!
The trick we stumbled on was to replace the comma with a (space)(comma)(space):
with strings as (select ',2,3,,5,,,8,' vals from dual)
select
to_number(regexp_replace(regexp_substr(regexp_replace(vals,',',' , '),'[^,]+',1,1),' +',null)) as a
,to_number(regexp_replace(regexp_substr(regexp_replace(vals,',',' , '),'[^,]+',1,2),' +',null)) as b
,to_number(regexp_replace(regexp_substr(regexp_replace(vals,',',' , '),'[^,]+',1,3),' +',null)) as c
,to_number(regexp_replace(regexp_substr(regexp_replace(vals,',',' , '),'[^,]+',1,4),' +',null)) as d
,to_number(regexp_replace(regexp_substr(regexp_replace(vals,',',' , '),'[^,]+',1,5),' +',null)) as e
,to_number(regexp_replace(regexp_substr(regexp_replace(vals,',',' , '),'[^,]+',1,6),' +',null)) as f
,to_number(regexp_replace(regexp_substr(regexp_replace(vals,',',' , '),'[^,]+',1,7),' +',null)) as g
,to_number(regexp_replace(regexp_substr(regexp_replace(vals,',',' , '),'[^,]+',1,8),' +',null)) as h
,to_number(regexp_replace(regexp_substr(regexp_replace(vals,',',' , '),'[^,]+',1,9),' +',null)) as i
,to_number(regexp_replace(regexp_substr(regexp_replace(vals,',',' , '),'[^,]+',1,10),' +',null)) as j
from strings;
Because we have numeric values, the TO_NUMBER implicitly strips the whitespace. If you had character values instead of numerics, you could chop off the leading and trailing spaces (being aware that the first and last columns are special cases) using TRIM or another REGEXP_REPLACE.
There is perhaps a better way to handle this...if so I'd welcome the input. Otherwise, this certainly seems functional enough for our purposes!
Thanks,
:-Phil