SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Regexp Query

Ben SPAug 17 2020 — edited Aug 19 2020

HI All,

Db version : oracle 11g R2

i am working on a query which will identify certain type of data(unicode) in a column. Please find below search criteria .

Numericals
Any of |\/<>" characters
Dingbats
Tags
Enclosed Alphanumerics
Halfwidth and Fullwidth Forms (numbers only)
Mathematical Alphanumeric (numbers only)
CJK Compatibility (2 sets of numbers) china,japan,Korea
Superscripts and Subscripts (2 sets of numbers)

My attempt ( this is not completely correct and not sure how to fix it)

with rws as (
         select 'abc' s from dual
          union all
          select 'abc_' s from dual
          union all
          select 'abc-' s from dual
          union all
          select 'abc)' s from dual
          union all
          select 'abc12' s from dual
          union all
          select 'abc d' s from dual
          union all
          select 'test]' s from dual
          union all
          select '[test' s from dual
          union all
          select 'test|' s from dual
          union all
          select 'test\' s from dual
          union all
          select 'test/' s from dual
          union all
          select 'test<' s from dual
          union all
          select 'test>' s from dual
          union all
          select 'test"' s from dual
          union all
          select '1234' s from dual
          union all
          SELECT to_char(chr(ascii(UNISTR('\2700')))) FROM dual 
        union all
        SELECT to_char(chr(ascii(UNISTR('\E0000')))) FROM dual 
        union all
        SELECT to_char(chr(ascii(UNISTR('\2460')))) FROM dual 
        union all
        SELECT to_char(chr(ascii(UNISTR('\FF10')))) FROM dual 
        union all
        SELECT to_char(chr(ascii(UNISTR('\1D7CE')))) FROM dual 
        union all
        SELECT to_char(chr(ascii(UNISTR('\3358')))) FROM dual --
        union all
        --SELECT to_char(chr(ascii(UNISTR('\33FE')))) FROM dual 
        --union all
        SELECT to_char(chr(ascii(UNISTR('\2070')))) FROM dual 
        union all
        SELECT to_char(chr(ascii(UNISTR('\2080')))) FROM dual 
    )
     select * from rws
where  regexp_like(s, '[[:digit:][:punct:]]|[^[:alnum:]
/
/

correct query will not return first 4 rows ( "abc","abc"_,"abc-","abc)" ,"abc12" ).

Also, below is the unicode range which need to be identified.

([\d\|\\\/<>"

\x{2700}-\x{27BF}\

x{E0000}-\x{E007F}\

x{2460}-\x{24FF}\

x{FF10}-\x{FF19}\

x{1D7CE}-\x{1D7FF}\

x{3358}-\x{3370}\

x{33E0}-\x{33FE}\

x{2070}-\x{207E}\

x{2080}-\x{208E}]+)

Thanks a lot,

Sree

This post has been answered by Paulzip on Aug 17 2020

Jump to Answer

Added on Aug 17 2020

27 comments

1,532 views