Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

please explain an interesting i/o issue regarding utf16 encoding

807580Jun 26 2010 — edited Jul 3 2010

hello,

input file contains 3 characters: 123
calling the below method with the above input file produces output file containing 12ÿý
loop iterates twice.

input file contains 4 characters: 1234
calling the below method with the above input file produces output file containing 1234
loop iterates twice.

in general, if the number of non-carriage-return (ncr), non-line-feed (nlf) bytes in the input file is odd then the input and output files differ in the last character.
ÿý == U+FFFD ???

it must be that using encoding UTF-16BE, 2 ncr,nlf bytes are read at a time (to form a single unicode char), and any extra byte is paired with FFFD.

how do i remedy this and still use UTF-16BE?

thank you!

public void test(String input, String output)
{
   try
   {
      FileInputStream   fistream      = new FileInputStream(input);
      InputStreamReader istreamReader = new InputStreamReader(fistream, "UnicodeBigUnmarked");
      BufferedReader    reader        = new BufferedReader(istreamReader);

      FileOutputStream     fostream = new FileOutputStream(output, false);
      BufferedOutputStream bostream = new BufferedOutputStream(fostream);
      BufferedWriter       writer   = new BufferedWriter(new OutputStreamWriter(bostream, "UnicodeBigUnmarked"));

      try
      {
         int chr;

         while((chr = reader.read()) != -1)
         {
            echo((char) chr + " == " + chr + " == " + chr);
            writer.write(chr);
         }
      }
      finally
      {
         reader.close();
         writer.close();
      }
   }
   catch(IOException iox)
   {
      echo("test(): exceptions:" + iox.getMessage());
   }
}

Locked Post

New comments cannot be posted to this locked post.

Locked on Jul 31 2010

Added on Jun 26 2010

13 comments

162 views