Reading text file in ASCII or UTF-8 or UTF-16 or UTF-32?
807589Dec 18 2008 — edited Dec 31 2008The following code will include the UTF-8 byte-order-mark (EF BB BF) in the first line from the source file:
BufferedReader reader = new BufferedReader(new FileReader(sourceFile));
String firstLine = reader.readLine();
This isn't desirable. I don't want to get the UTF-8 BOM in the text contents that I get from the IO API.
Now, I can do something like this:
InputStreamReader reader = new InputStreamReader(new FileInputStream(sourceFile), "UTF8");
However, that assumes that the program knows the encoding of the input file at design time. Unfortunately, my app takes files from the user who may supply files in UTF-8, UTF-16, ASCII, or some other text encoding. Doesn't Java have some sort of simple file reading API to auto-detect the specific text encoding, strip out any internal BOM type markings and return my a simple Java string of just the actual file contents?