Skip to Main Content

Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

PDFBox Help!

807589Jan 9 2009 — edited Jan 10 2009
Hi,

My project is using PDFBox to extract and index text from a pdf file. I have a pdf file that is not being properly extracted. The text within the pdf is:
Pære pære

and is being parsed out as:
P æ r e p æ r e

I have notified the open source PDFBox project, but no responses yet. I'm just wondering if anyone else has had an issue like this (it appears to be related to locales) and if so, if they have any recommendations on how to approach the problem from here. BTW, I have the sources for PDFBox and have been trying to debug it.

Also -- I can attach the pdf file or make it available somewhere if thats possible.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Feb 7 2009
Added on Jan 9 2009
2 comments
423 views