Loading MS Word Documents for Conversion to DB Records?
443706Jan 28 2009 — edited Feb 11 2009Here's what I'm working with...
Oracle 10.2.0.3.x (w/ companion cd installed)
Fedora Core 7
4 Dual Core Procs.
16G RAM
For the longest time, our intranet has accepted uploaded word files and converted them to PDFs and associated each file with a db record that stores the file location on disk, name of document, owner, and a few other details. The files are typically well formatted, each having a line for the title, effective date, author, owner, etc.
We've reached a point where the PDF conversion isn't working out too well. Plus, our search via a google mini returns outdated files or files that haven't been published yet. Basically, we need a better solution.
What I would like to do is...
1. use oracle text to scan/load/import the word files,
2. use oracle text to parse the word files identifying elements such as the effective date, owner, etc.
3. load the results of 2 into a table that is searchable by oracle. I can use my intranet app to convert the data to word or pdf or whatever the user needs.
I'm here because I a) don't know if this is possible, b) how exactly to go about accomplishing the task.
I would truly appreciate any tips, pointers or links to helpful documentation.
Thank you.