Skip to Main Content

New to Java

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Decoding HTML Characters

843789Nov 3 2009 — edited Nov 5 2009
A while ago I wrote in java an application which would sit on my college server and every 15mins parse an rss feed and create a duplicate of that feed but with full body content .The reason I have done this is so that I can view the feed offline.

All works well except I have one issue and Im sure it lies in decoding the stream, for the most part all the characters come out fine but for certain special characters they come out in junk
So again, I'll
listen to some Utada Hikaru to begin with and instantly I know something's
Gets decoded to
So again, Iâ??ll listen to some Utada Hikaru to begin with and instantly I know somethingâ??s
So obviously the character ' is throwing out the decoding (actually as I see it the character is not ' as above but a curly version of it, this format appers to be cleaning it up).

So I would like to know how to decode the stream correctly I tried using InputStreamReader and setting the CharsetDecoder but to no avail though of course I may not have done this correctly.

Thanks Ger.

Edited by: Ger@newToProgramming on Nov 3, 2009 4:04 AM
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Dec 3 2009
Added on Nov 3 2009
6 comments
492 views