Hi,
On Monta-Vista Linux, I'm observing that java.io.File cannot manipulate (read, write, or list the correct name) files if the filename is encoded in ISO 8859-1 (Latin1) and has special characters (such as '�', unicode: u+00e9, latin1 value: 0xe9). I have tip-toed into the forbidden domain of Sun's proprietary properties and tried to force the set the encoding to be used for filenames with the property "sun.jnu.encoding" (according to jdk 1.5 's included native code), but it seems totally ignored by my jdk, which is 1.6.
What I observe when creating a file is that most special characters get replaced by the character '?' (value 0x3f). So the output of the enclosed application is:
ls
default-d??b default-d?b iso8859_1-d??b iso8859_1-d?b utf_8-d??b utf_8-d?b
I have even included the output of my `locale`, but it is not something I can really change.
I'd appreciate any pointers to make Java read ISO 8859-1 encoded filenames.
import java.lang.*;
import java.io.*;
import java.util.*;
import java.nio.charset.*;
class FilenameEncoding
{
protected static final String FILE_ENCODING = "file.encoding";
protected static final String SUN_JNU_ENCODING = "sun.jnu.encoding";
protected static final String ISO88591 = "ISO8859_1";
protected static void makeFile(String fileName)
{
System.out.println("Creating file "+fileName);
File aFile = null;
try {
aFile = new File(fileName);
aFile.createNewFile();
} catch(IOException exp) {
exp.printStackTrace();
}
}
public static void createFile(String prefix)
{
// Create files with :
// Unicode Character 'LATIN SMALL LETTER E WITH GRAVE' (U+00E8)
// Unicode Character 'LATIN SMALL LETTER E WITH ACUTE' (U+00E9)
makeFile(prefix + "d\u00E9\u00E8b");
makeFile(prefix + "d\u00E9b");
makeFile(prefix + "d\u00E8b");
}
public static void main(String args[])
{
// Display some current default
System.out.println("Default charset is "+Charset.defaultCharset().displayName());
System.out.println("The current "+SUN_JNU_ENCODING+" is "+System.getProperty(SUN_JNU_ENCODING));
System.out.println("The current "+FILE_ENCODING+" is "+System.getProperty(FILE_ENCODING));
// Uses the default
FilenameEncoding.createFile("default-");
// Uses ISO 8859-1
System.setProperty(SUN_JNU_ENCODING, ISO88591);
System.setProperty(FILE_ENCODING, ISO88591);
FilenameEncoding.createFile("iso8859_1-");
// Uses UTF-8
System.setProperty(SUN_JNU_ENCODING, "UTF_8");
System.setProperty(FILE_ENCODING, "UTF_8");
FilenameEncoding.createFile("utf_8-");
return;
}
}
For me, calling the program with "-Dfile.encoding=ISO8859-1" removed the ? from the
screenoutput_ of the program and correctly displayed the � (e acute) and � (e grave), and changed the default Charset used, but, as expected, no changes occured on the file system.
typical output for me when called without any properties set:
Default charset is US-ASCII
The current sun.jnu.encoding is ANSI_X3.4-1968
The current file.encoding is ANSI_X3.4-1968
Creating file default-d??b
Creating file default-d?b
Creating file default-d?b
Creating file iso8859_1-d??b
Creating file iso8859_1-d?b
Creating file iso8859_1-d?b
Creating file utf_8-d??b
Creating file utf_8-d?b
Creating file utf_8-d?b
---Misc information---
java -version
java version "1.6.0_01"
Java(TM) SE Runtime Environment (build 1.6.0_01-b06)
Java HotSpot(TM) Client VM (build 1.6.0_01-b06, mixed mode)
MontaVista(R) Linux(R) Professional Edition 3.1
locale
LANG=POSIX
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
Thanks
/Philippe