On upgrading from Java 17 to 21 some tests started to fail with a confusing message “expected ”2:00 PM" - found ”2:00 PM"
Debugging this I discovered the date string I was testing contained nulls and other characters. I created a small piece of code to try to reproduce. While I couldn't reproduce it exactly I did get something unexpected. I was converting to DateFormat.SHORT.
Using the java.util routines I got the following characters for the above string:
value is "2:00 PM"
Bytes
50
58
48
48
-30
-128
-81
80
77
Characters
c-value is "2"
c-intvalue is 50
c-value is ":"
c-intvalue is 58
c-value is "0"
c-intvalue is 48
c-value is "0"
c-intvalue is 48
c-value is " "
c-intvalue is 8239
c-value is "P"
c-intvalue is 80
c-value is "M"
c-intvalue is 77
8239 is the unicode for Narrow No-Break Space
I switched to using the ICU routines to see what would happen and that returns what I would expect - the normal ascii space.
I am unable to reproduce what I had from my code before updating it but the bytes were something like
“50 0 58 0 48 0 48 0 47 32 0 80 0 77”
I tried the above with java 17 and 21 and got the same results. Why are we using odd values like Narrow No-Break Space? Should we be using basic ascii when at all possible? This showed up running selenium tests on a UI so it went through many layers - dateformat → jsp → xslt → html → xpath find. testing for specific values in the UI is a little difficult in these circumstances - I wouldn't even get a match on a character by character basis.
I don't see any suggestion anywhere that this is expected behavior - my assumption is that “hh:mm a” is what I'd get as a pattern - not a pattern “hh:mm” + Narrow No-Break Space + “a”.
Also what I get is obviously locale specific - as intended. But this looks more like trying to apply presentation to the result which I'd assert is wrong.