Skip to Main Content

Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Converting from CP1252 (Windows) to ISO 8859-1 doesn't work with java.nio?

807569Jun 12 2006 — edited Jun 12 2006
Hi

I'm trying to write some code that checks whether an InputStream contains only characters with a given encoding. I'm using java.nio for that. For tests, I downloaded some character set examples from http://www.columbia.edu/kermit/csettables.html

When creating the CharsetDecoder, I want to get all errors:
    Charset charset = Charset.forName( encoding );
    CharsetDecoder decoder = charset.newDecoder();
    decoder.onMalformedInput( CodingErrorAction.REPORT );
    decoder.onUnmappableCharacter( CodingErrorAction.REPORT );
I then read an InputStream and try to convert it. If that fails, it can't contain the desired encoding:
    boolean isWellEncoded = true;
    ByteBuffer inBuffer = ByteBuffer.allocate( 1024 );
    ReadableByteChannel channel = Channels.newChannel( inputStream );

    while ( channel.read( inBuffer ) != -1 )
    {
      CharBuffer decoded = null;
      try
      {
        inBuffer.flip();
        decoded = decoder.decode( inBuffer );
      }
      catch ( MalformedInputException ex )
      {
        isWellEncoded = false;
      }
      catch ( UnmappableCharacterException ex )
      {
        isWellEncoded = false;
      }
      catch ( CharacterCodingException ex )
      {
        isWellEncoded = false;
      }
      if ( decoded != null )
      {
        LOG.debug( decoded.toString() );
      }

      if ( !isWellEncoded )
      {
        break;
      }

      inBuffer.compact();
    }
    channel.close();

    return isWellEncoded;
Now I want to check whether a file containing Windows 1252 characters is ISO-8859-1. From my point of view, the code above should fail when it gets to the Euro symbol (decimal 128), since that's not defined in ISO-8859-1.

But all I get is a ? character instead:
(})  125  07/13  175  7D                 RIGHT CURLY BRACKET, RIGHT BRACE
(~)  126  07/14  176  7E                 TILDE
[?]  128  08/00  200  80  EURO SYMBOL
[?]  130  08/02  202  82  LOW 9 SINGLE QUOTE
I also tried to replace the faulty character, using
    decoder.onUnmappableCharacter( CodingErrorAction.REPLACE );
    decoder.replaceWith("!");
but I still get the question marks.

I'm probably doing something fundamentally wrong, but I dont get it :-)

Any help is greatly appreciated!

Eric
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jul 10 2006
Added on Jun 12 2006
1 comment
406 views