Skip to Main Content

Analytics Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

How to slice web crawl data from pdf's?

997309Jun 13 2013 — edited Jul 3 2013

I've set up a web crawl on pdf's but all the texts are bundled in endeca_document_text field. Has anyone been able to extract metrics and texts separately from this? For example, in the endeca_document_text field:

Example for OTN

Date 13/06/13

Comments This is my comment. blah blah.

Product Code GSF0120

What I am aiming for is first for Endeca to recognise which is the 'Date' / 'Comments' / 'Product Code' separately and then to pivot them according to the document title 'Example for OTN'? Is this at all possible please?

Any ideas on where to start would be great

  Thank you!

This post has been answered by Branchbird - Pat on Jun 14 2013
Jump to Answer
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jul 31 2013
Added on Jun 13 2013
2 comments
401 views