I've set up a web crawl on pdf's but all the texts are bundled in endeca_document_text field. Has anyone been able to extract metrics and texts separately from this? For example, in the endeca_document_text field:
Example for OTN
Date 13/06/13
Comments This is my comment. blah blah.
Product Code GSF0120
What I am aiming for is first for Endeca to recognise which is the 'Date' / 'Comments' / 'Product Code' separately and then to pivot them according to the document title 'Example for OTN'? Is this at all possible please?
Any ideas on where to start would be great
Thank you!