Hello all. I am new to ML and am trying to figure out the best way to predict healthcare diagnosis codes from Clinical notes. When visiting a doctor, the doctor or nurse will type data into the system (free text). Someone later on will code this into diagnosis codes.
For example, a nurse might enter John Smith has diabetes. The ICD_CODE for diabetes is E11.9, so the coder will enter that into the patient's chart.
My goal is to help automate some of this…
My data set is very large and is in Autonomous Database. The table is: FREE_TEXT (CLOB), CLAIMID (NUMBER), ICD_CODE (VARCHAR2), SEQUENCE_ID (NUMBER).
For each CLAIMID and FREE_TEXT, are multiple ICD_CODES with a sequential order (SEQUENCE_ID).
My goal is to be able to enter FREE_TEXT in, and the output would be possible diagnosis codes based on the text. It would be even better, if there was a confidence index or probability score. For example, ‘This Patient has Diabetes’ has 95% confidence of code E11.9.
It would be even more amazing if there was a way to go back to that snippet of text from the entire CLOB. This way the person coding could double check the results faster to see if it was correct.
I am wondering if I should use LANGAUGE AI or SQL4ML in Autonomous Database. Below is a quick summary of the data table. The FREE_TEXT CLOB is large - many 32,000 characters.
<table><tbody><tr><td style="height:15.0pt;width:56pt;">FREE_TEXT</td><td style="border-left-style:none;width:47pt;">CLAIMID</td><td style="border-left-style:none;width:55pt;">ICD_CODE</td><td style="border-left-style:none;width:74pt;">SEQUENCE_ID</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT1</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000024</td><td style="border-left-style:none;border-top-style:none;">D500</td><td style="border-left-style:none;border-top-style:none;text-align:right;">1</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT1</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000024</td><td style="border-left-style:none;border-top-style:none;">K521</td><td style="border-left-style:none;border-top-style:none;text-align:right;">2</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT1</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000024</td><td style="border-left-style:none;border-top-style:none;">I10</td><td style="border-left-style:none;border-top-style:none;text-align:right;">3</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT1</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000024</td><td style="border-left-style:none;border-top-style:none;">E538</td><td style="border-left-style:none;border-top-style:none;text-align:right;">4</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT1</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000024</td><td style="border-left-style:none;border-top-style:none;">M810</td><td style="border-left-style:none;border-top-style:none;text-align:right;">5</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT1</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000024</td><td style="border-left-style:none;border-top-style:none;">R270</td><td style="border-left-style:none;border-top-style:none;text-align:right;">6</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT1</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000024</td><td style="border-left-style:none;border-top-style:none;">Z9181</td><td style="border-left-style:none;border-top-style:none;text-align:right;">7</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT1</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000024</td><td style="border-left-style:none;border-top-style:none;">H548</td><td style="border-left-style:none;border-top-style:none;text-align:right;">8</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT1</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000024</td><td style="border-left-style:none;border-top-style:none;">T474X5A</td><td style="border-left-style:none;border-top-style:none;text-align:right;">9</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT1</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000024</td><td style="border-left-style:none;border-top-style:none;">Y92099</td><td style="border-left-style:none;border-top-style:none;text-align:right;">10</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">K831</td><td style="border-left-style:none;border-top-style:none;text-align:right;">1</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">K8689</td><td style="border-left-style:none;border-top-style:none;text-align:right;">2</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">K861</td><td style="border-left-style:none;border-top-style:none;text-align:right;">3</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">K869</td><td style="border-left-style:none;border-top-style:none;text-align:right;">4</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">R1032</td><td style="border-left-style:none;border-top-style:none;text-align:right;">5</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">R10819</td><td style="border-left-style:none;border-top-style:none;text-align:right;">6</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">R8279</td><td style="border-left-style:none;border-top-style:none;text-align:right;">7</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">J439</td><td style="border-left-style:none;border-top-style:none;text-align:right;">8</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">E1122</td><td style="border-left-style:none;border-top-style:none;text-align:right;">9</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">I129</td><td style="border-left-style:none;border-top-style:none;text-align:right;">10</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">N183</td><td style="border-left-style:none;border-top-style:none;text-align:right;">11</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">K7030</td><td style="border-left-style:none;border-top-style:none;text-align:right;">12</td></tr><tr><td style="border-top-style:none;height:15.0pt;">TEXT2</td><td style="border-left-style:none;border-top-style:none;text-align:right;">20000034</td><td style="border-left-style:none;border-top-style:none;">Z794</td><td style="border-left-style:none;border-top-style:none;text-align:right;">13</td></tr></tbody></table>