Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Regular expressions, multibyte characters, and NLS_SORT

Eric Olson 1Aug 21 2025

I have a strange edge case where a regular expression fails if I change languages. This works fine:

alter session set nls_language = 'ENGLISH';
select regexp_count( 'foo', '[Ø-ö]' ) as x from dual;

This raises ORA-12728, "Invalid range in regular expression:

alter session set nls_language = 'FRENCH';
select regexp_count( 'foo', '[Ø-ö]' ) as x from dual;

It looks like regular expression ranges can be influenced by the language setting, specifically NLS_SORT:

In the POSIX standard, a range includes all collation elements between the start and end of the range in the linguistic definition of the current locale. Thus, ranges are linguistic rather than byte values ranges; the semantics of the range expression are independent of character set. In Oracle Database, the linguistic range is determined by the NLS_SORT initialization parameter.

I can expand the range to all the characters involved, but what does the range **[Ø-ö]** actually refer to? Is it just U+00D8 to U+00F6, but French orders those differently?

This post has been answered by mathguy on Aug 21 2025
Jump to Answer
Comments
Post Details
Added on Aug 21 2025
3 comments
83 views