Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Regexp Query

Ben SPAug 17 2020 — edited Aug 19 2020

HI All,

Db version : oracle 11g R2

i am working on a query which will identify certain type of data(unicode) in a column. Please find below search criteria .

  • Numericals
  • Any of |\/<>" characters
  • Dingbats
  • Tags
  • Enclosed Alphanumerics
  • Halfwidth and Fullwidth Forms (numbers only)
  • Mathematical Alphanumeric (numbers only)
  • CJK Compatibility (2 sets of numbers) china,japan,Korea
  • Superscripts and Subscripts (2 sets of numbers)

My attempt ( this is not completely correct and not sure how to fix it)

with rws as (

         select 'abc' s from dual

          union all

          select 'abc_' s from dual

          union all

          select 'abc-' s from dual

          union all

          select 'abc)' s from dual

          union all

          select 'abc12' s from dual

          union all

          select 'abc d' s from dual

          union all

          select 'test]' s from dual

          union all

          select '[test' s from dual

          union all

          select 'test|' s from dual

          union all

          select 'test\' s from dual

          union all

          select 'test/' s from dual

          union all

          select 'test<' s from dual

          union all

          select 'test>' s from dual

          union all

          select 'test"' s from dual

          union all

          select '1234' s from dual

          union all

          SELECT to_char(chr(ascii(UNISTR('\2700')))) FROM dual

        union all

        SELECT to_char(chr(ascii(UNISTR('\E0000')))) FROM dual

        union all

        SELECT to_char(chr(ascii(UNISTR('\2460')))) FROM dual

        union all

        SELECT to_char(chr(ascii(UNISTR('\FF10')))) FROM dual

        union all

        SELECT to_char(chr(ascii(UNISTR('\1D7CE')))) FROM dual

        union all

        SELECT to_char(chr(ascii(UNISTR('\3358')))) FROM dual --

        union all

        --SELECT to_char(chr(ascii(UNISTR('\33FE')))) FROM dual

        --union all

        SELECT to_char(chr(ascii(UNISTR('\2070')))) FROM dual

        union all

        SELECT to_char(chr(ascii(UNISTR('\2080')))) FROM dual

    )

     select * from rws

where  regexp_like(s, '[[:digit:][:punct:]]|[^[:alnum:]

/

/

correct query will not return first 4 rows ( "abc","abc"_,"abc-","abc)" ,"abc12" ).

Also, below is the unicode range which need to be identified.

([\d\|\\\/<>"

\x{2700}-\x{27BF}\

x{E0000}-\x{E007F}\

x{2460}-\x{24FF}\

x{FF10}-\x{FF19}\

x{1D7CE}-\x{1D7FF}\

x{3358}-\x{3370}\

x{33E0}-\x{33FE}\

x{2070}-\x{207E}\

x{2080}-\x{208E}]+)

Thanks a lot,

Sree

This post has been answered by Paulzip on Aug 17 2020
Jump to Answer
Comments
Post Details
Added on Aug 17 2020
27 comments
1,525 views