How to write to file Unicode characters
932405Apr 22 2012 — edited Apr 23 2012I have PDF files that I need to copy some strings out of and put them in various fields in a Postgres database. The goal is a Java screen into the database, whiere I mark and copy the PDF text and then paste it into a field in a Swing window, and from there into the database.
I am unsuccessful at reading a PDF file, so I have opted to cut and paste the PDF file into an MS word file. This results in errors in certain unicode characters. I am trying to rectify them by a simple program, a start of which is below, by a replacement of the erroneous char by the proper unicode symbol. As, shown by the following, I cannot figure out how to write out a unicode character. Do I need to wrap (which I don't know much about yet)? Or do I have a file problem? (I have a Vista machine.) I don't think it should be impossible to write unicode into a file, as I am able to write into MS Word files phonological symbols, Russian, and Sanskrit. So, it must be in the java.
P.S.: I am reading Schildt's Java: A Beginner's Guide and am through chapter10, but remaining chapters are on threads, enumerations, autoboxing, static import, annotations; generics,; applets, events, and miscellaneous topics, and, finally Swing. Maybe its in the autoboxing?
Any help would be most appreciated.
import java.io.FileReader;
import java.io.FileWriter;
import java.io.PrintWriter;
import java.io.IOException;
public class CopyCharacters {
public static void main(String[] args) throws IOException {
FileReader inputStream = null;
//FileWriter outputStream = null;
PrintWriter outputStream = null;
char longa = 0x0101;
int longc = 0x0101;
char capA = 0x0041;
char longb = 0x0111;
// Unicode for uppercase Greek omega character char uniChar = '\u039A'
char uniChar = '\u039A';
// Character ca = new Character('0x0101'); // illegal
Character cb = new Character('\u0101');
Character cc = '\u0101';
int c;
try {
inputStream = new FileReader("Cardona1.txt");
outputStream = new
PrintWriter(new FileWriter(
"characteroutput.txt"));
outputStream.println( "character1 " + capA); //yields A
outputStream.println( "character2 " + longa); //yields ?
outputStream.println( "character3 " + '\u0101'); //yields ?
outputStream.println( "character4 " + longc); //yields 257
outputStream.println( "character5 " + "S\u00ED Se\u00F1or"); // yields character Sí Señor
outputStream.println( "character6 " + "S'\u00ED' Se\u00F1or"); // yields S'í' Señor
outputStream.println( "character7 " + "S\u0121 Se\u00F1or"); // yields character S? Señor
outputStream.println( "character8 " + "S'\u0121' Se\u00F1or"); // yields character S'?' Señor
outputStream.println( "character9 " + uniChar);// yields character ?
outputStream.println( "character10 " + '\u00FF');// yields character ÿ but fails on \u0100.
// only 0-255!!
outputStream.println( "character11 " + cc);// yields ?
outputStream.println( "character12 " + cb);// yields ?
outputStream.println( "character13 ?");// yields ?
while ((c = inputStream.read()) != -1) {
// outputStream.writeln(c);- error
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (outputStream != null) {
outputStream.close();
}
}
}
}