Skip to Main Content

Java APIs

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

DateFormat uses unexpected unicode characters

Mike DouglassMar 7 2025

On upgrading from Java 17 to 21 some tests started to fail with a confusing message “expected ”2:00 PM" - found ”2:00 PM"

Debugging this I discovered the date string I was testing contained nulls and other characters. I created a small piece of code to try to reproduce. While I couldn't reproduce it exactly I did get something unexpected. I was converting to DateFormat.SHORT.

Using the java.util routines I got the following characters for the above string:

value is "2:00 PM"
Bytes
50
58
48
48
-30
-128
-81
80
77
Characters
c-value is "2"
c-intvalue is 50
c-value is ":"
c-intvalue is 58
c-value is "0"
c-intvalue is 48
c-value is "0"
c-intvalue is 48
c-value is " "
c-intvalue is 8239
c-value is "P"
c-intvalue is 80
c-value is "M"
c-intvalue is 77

8239 is the unicode for Narrow No-Break Space

I switched to using the ICU routines to see what would happen and that returns what I would expect - the normal ascii space.

I am unable to reproduce what I had from my code before updating it but the bytes were something like

“50 0 58 0 48 0 48 0 47 32 0 80 0 77”

I tried the above with java 17 and 21 and got the same results. Why are we using odd values like Narrow No-Break Space? Should we be using basic ascii when at all possible? This showed up running selenium tests on a UI so it went through many layers - dateformat → jsp → xslt → html → xpath find. testing for specific values in the UI is a little difficult in these circumstances - I wouldn't even get a match on a character by character basis.

I don't see any suggestion anywhere that this is expected behavior - my assumption is that “hh:mm a” is what I'd get as a pattern - not a pattern “hh:mm” + Narrow No-Break Space + “a”.

Also what I get is obviously locale specific - as intended. But this looks more like trying to apply presentation to the result which I'd assert is wrong.

Comments

Post Details

Added on Mar 7 2025
0 comments
23 views