Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

PDFBox Help!

807589Jan 9 2009 — edited Jan 10 2009

Hi,

My project is using PDFBox to extract and index text from a pdf file. I have a pdf file that is not being properly extracted. The text within the pdf is:
Pære pære

and is being parsed out as:
P æ r e p æ r e

I have notified the open source PDFBox project, but no responses yet. I'm just wondering if anyone else has had an issue like this (it appears to be related to locales) and if so, if they have any recommendations on how to approach the problem from here. BTW, I have the sources for PDFBox and have been trying to debug it.

Also -- I can attach the pdf file or make it available somewhere if thats possible.

Locked Post

New comments cannot be posted to this locked post.

Locked on Feb 7 2009

Added on Jan 9 2009

2 comments

423 views