Skip to Main Content

Java APIs

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Reading Unicode data using HttpURLConnection

843790Jun 29 2009 — edited Jun 29 2009
Hi All,

I am trying to fetch some text from a web page that contains unicode data.
This is what I am doing:

1. opening HttpURLConnection on the target URL
2. setting some properties on httpURLConnection object, like : setDoOutput, setRequestProperty, setRequestMethod etc
3. get InputStream , create BufferedReader and reading text line by line (using BufferedReader.readLine)and even int by int (using BufferedReader.read)

I also tried providing various charsets while creating InputStreamReader:
BufferedReader br = new BufferedReader(new InputStreamReader(urlConn.getInputStream(), cs));
where cs = charset strings (UTF,UTF-8,UTF8,UTF-16,UTF16..etc)

The code then using the data to form a RSS feed XML for any RSS reader.

Observation is: all it reads well is only English characters, and all the Unicode characters gets messed up.

Any idea/or any other way to read unicode charaters from the stream??

By the way, I am trying to read Devnagri/Hindi - Indian unicode text.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jul 27 2009
Added on Jun 29 2009
2 comments
600 views