Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Bug or feature? Mixing non-greedy and greedy quantifiers in regexp

mathguyMay 11 2018 — edited May 13 2018

I noticed this some time ago, and I thought I posted about it either here or on stackoverflow... but I just searched for it and I wasn't able to find anything. Perhaps my recollection is wrong (about posting about it), or my searching skills are getting poorer.

It seems to me - see a few examples below - that Oracle has an implementation bug when it comes to mixing non-greedy and greedy quantifiers.

Questions:

1. Am I interpreting the regular expressions incorrectly? (Meaning, is the observed behavior correct?)

2. If the behavior I describe below is incorrect, is this a known bug or issue? Does it depend on version? (I get the same results on 12.1.0.2 and 12.2.0.1)

3. If this is a bug: Does Oracle know about it? Are they planning to fix it?

with

  inputs ( str ) as (

    select 'axaybzb' from dual

  )

select str,

       regexp_substr(str, 'a.*?a.*b') as res_1,

       regexp_substr(str, 'ax*?a.*b') as res_2

from   inputs;

STR     RES_1   RES_2 

------- ------- -------

axaybzb axayb   axaybzb

In this example, RES_1 is incorrect; it should be the same as RES_2. In the computation of the first result, Oracle behaves as if the second quantifier is non-greedy, like the first. WORSE, Oracle is inconsistent about this: if the first quantifier is still non-greedy, but it quantifies a literal letter x rather than the dot, the behavior changes: the greedy quantifier following the dot behaves correctly.

I won't post the other tests (you can easily replicate them yourselves, or you can take my word for it): If you change the non-greedy quantifier to  +?  instead of  *?   in both calculations, the result will be correct in both cases. HOWEVER, if you do that and you also change the greedy quantifier from  *  to  +  in both calculations, then RES_1 will again be incorrect.

Similarly: If in the original test (shown in full above), in RES_1, we replace the dot with [^s] in one place but we keep the dot in the other, we get the correct answer. But if we replace the dot with [^s] in both places, the result is wrong again.

I thought in some tests in the past I was able to find cases when Oracle calculates incorrectly an expression where .*  is followed by  .*?  , greedy followed by non-greedy, but I wasn't able to replicate that right now.

MATCH_RECOGNIZE

Oracle introduced the MATCH_RECOGNIZE clause in 12.1, which has the PATTERN clause that uses some of the concepts and notation from regular expressions. The following test translates essentially the same case from regular expressions to MATCH_RECOGNIZE - testing only the RES_1 expression which gives the wrong answer for regular expression string functions. The result for MATCH_RECOGNIZE is the right one.

with

  inputs ( id, str ) as (

    select 1, 'a' from dual union all

    select 2, 'x' from dual union all

    select 3, 'a' from dual union all

    select 4, 'y' from dual union all

    select 5, 'b' from dual union all

    select 6, 'z' from dual union all

    select 7, 'b' from dual

  )

select id, str, cls

from   inputs

match_recognize(

  order by id

  measures classifier() as cls

  all rows per match

  pattern ( a f*? a f* b )

  define  a as str = 'a'

);

ID STR CLS

-- --- ---

1 a   A

2 x   F

3 a   A

4 y   F

5 b   F

6 z   F

7 b   B

This post has been answered by Paulzip on May 11 2018
Jump to Answer
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jun 10 2018
Added on May 11 2018
6 comments
1,160 views