New to Java

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Chinese text, Unicode and class String.

807598Aug 17 2006 — edited Oct 18 2006

I've got a problem with Chinese text in Java, I'm using Java 5. The problem is very simple but the solution is eluding me. I've read up on Unicode and Java's use of it, but can't resolve the issue. If anyone has encountered it and has a solution then I would be really grateful.

The problem is best illustrated by the following code:

public class JinTian
{
	public static void main(String[] args)
	{
		String jinTian = "&#x4eca;&#x5929;" ;
		System.out.println(jinTian.length() ) ;
	}
}

When I compile this code I get the output 6. As can be seen this String has a length 2.
The text file I have used to write the Class is encoded in Unicode UTF-8 which is necessary for saving files with Chinese characters. I also thought that Java supported UTF-8.
My educated guess is that each character is encoded with three bytes and it is the bytes that String.length() is counting.
So then I would need a Class for a UTF-8 String that was able to distinguish between different character encodings. I thought that class String was that though, and am confused.

Message was edited by:
stanton_ian

Locked Post

New comments cannot be posted to this locked post.

Locked on Nov 15 2006

Added on Aug 17 2006

6 comments

1,116 views