Skip to Main Content

Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Reading text file in ASCII or UTF-8 or UTF-16 or UTF-32?

807589Dec 18 2008 — edited Dec 31 2008
The following code will include the UTF-8 byte-order-mark (EF BB BF) in the first line from the source file:

BufferedReader reader = new BufferedReader(new FileReader(sourceFile));
String firstLine = reader.readLine();

This isn't desirable. I don't want to get the UTF-8 BOM in the text contents that I get from the IO API.

Now, I can do something like this:
InputStreamReader reader = new InputStreamReader(new FileInputStream(sourceFile), "UTF8");

However, that assumes that the program knows the encoding of the input file at design time. Unfortunately, my app takes files from the user who may supply files in UTF-8, UTF-16, ASCII, or some other text encoding. Doesn't Java have some sort of simple file reading API to auto-detect the specific text encoding, strip out any internal BOM type markings and return my a simple Java string of just the actual file contents?
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jan 28 2009
Added on Dec 18 2008
11 comments
2,730 views