Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

UTF-8 encoding/decoding problem

807606Mar 31 2007 — edited Apr 1 2007

Please, point me where I'm wrong.
When I encode a string using UTF-8 charset, the resulting byte array contains one extra byte (0x00) and when I decode that array I get a string with unexpected character at the end.

        charset = Charset.forName("UTF-8");
        CharsetEncoder encoder = charset.newEncoder();
        CharsetDecoder decoder = charset.newDecoder();
        
        byte[] digits;
        
        CharBuffer cb = CharBuffer.wrap("0123456789");
        print("CharBuffer: " + cb.toString());
        ByteBuffer bb = encoder.encode(cb);
        digits = bb.array();
        print("Encoded array " + digits.length + " bytes: " + Arrays.toString(digits));
        
        bb = ByteBuffer.wrap(digits);
        cb = decoder.decode(bb);
        print("Decoded digits: " + cb.toString());

Debug output:

CharBuffer: 0123456789
Encoded array 11 bytes: [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 0]
Decoded digits: 0123456789<0x00 here>

Thanks in advance.

Locked Post

New comments cannot be posted to this locked post.

Locked on Apr 29 2007

Added on Mar 31 2007

4 comments

348 views