I've got a problem with Chinese text in Java, I'm using Java 5. The problem is very simple but the solution is eluding me. I've read up on Unicode and Java's use of it, but can't resolve the issue. If anyone has encountered it and has a solution then I would be really grateful.
The problem is best illustrated by the following code:
public class JinTian
{
public static void main(String[] args)
{
String jinTian = "今天" ;
System.out.println(jinTian.length() ) ;
}
}
When I compile this code I get the output 6. As can be seen this String has a length 2.
The text file I have used to write the Class is encoded in Unicode UTF-8 which is necessary for saving files with Chinese characters. I also thought that Java supported UTF-8.
My educated guess is that each character is encoded with three bytes and it is the bytes that String.length() is counting.
So then I would need a Class for a UTF-8 String that was able to distinguish between different character encodings. I thought that class String was that though, and am confused.
Message was edited by:
stanton_ian