PDFBox Help!
807589Jan 9 2009 — edited Jan 10 2009Hi,
My project is using PDFBox to extract and index text from a pdf file. I have a pdf file that is not being properly extracted. The text within the pdf is:
Pære pære
and is being parsed out as:
P æ r e p æ r e
I have notified the open source PDFBox project, but no responses yet. I'm just wondering if anyone else has had an issue like this (it appears to be related to locales) and if so, if they have any recommendations on how to approach the problem from here. BTW, I have the sources for PDFBox and have been trying to debug it.
Also -- I can attach the pdf file or make it available somewhere if thats possible.