I have a large Text field - varchar2(4000) which has data stored in multiple lines (With Carriage Return) in the following formats for each Student in table STUDENT_TB. I need to extract each line from each record where when I encounter this particular pattern (Date followed by Text) I insert it into another table. If it does not meet the criteria, I would just ignore it.
The pattern is in 3 distinct formats where #1 is most widely used and #3 is least widely used. This is what I landed up identifying.
#1. DD MON YYYY Text
#2. DD MON YY Text
#3. D MON YY Text
I would like to extract these dates and then extract the text and insert into another table (STUDENT_TB_DTL) where the Date goes into the Date field and the Text goes into the text field for the same Student. I am guessing I would do it as a PL/SQL procedure.
I am currently able to identify the DD MON YYYY pattern using Regular expression but I am only able to do it for the first occurrence. After that, it just skipped to the next record. Any help in this regard is greatly appreciated.
Scripts for table creation and Insert statements:
CREATE TABLE STUDENT_TB (STUDENT_ID VARCHAR2(4),
MESSAGES VARCHAR2(4000)
);
CREATE TABLE STUDENT_TB_DTL (STUDENT_ID VARCHAR2(4),
CONTACT_DATE DATE,
MESSAGES VARCHAR2(4000)
);
Insert into STUDENT_TB (STUDENT_ID,MESSAGES) values ('0001','06 Aug 20: Test Line 1
05 Dec 2019: Test Line 2
26 Sep 2019: Test Line 3
12 Nov 2018: Test Line 4');
Insert into STUDENT_TB (STUDENT_ID,MESSAGES) values ('0002','23 Jul 2021: Test Line 1
22 Jul 2021: Test Line 2
12 July 2021: Test Line 3
04 Feb 2021: Test Line 4
03 Feb 2021: Test Line 5
9 Jan 20: Test Line 6
25 Nov 19: Test Line 7');
My SQL Query for the Pattern Matching is so far as follows. I have explained what my final result should be but my main hurdle is to get the "multiple instances" of the pattern I so desire to get.
select student_id, REGEXP_SUBSTR(messages, '\d{2} [a-zA-Z]{3} \d{4}') pattern_match,
messages
from student_tb
order by student_id;
My desired Output from STUDENT_TB_DTL should be as follows:
0001 06-AUG-2020 00:00:00 Test Line 1
0001 05-DEC-2019 00:00:00 Test Line 2
0001 26-SEP-2019 00:00:00 Test Line 3
0001 12-NOV-2018 00:00:00 Test Line 4
0002 23-JUL-2021 00:00:00 Test Line 1
0002 22-JUL-2021 00:00:00 Test Line 2
0002 12-JUL-2021 00:00:00 Test Line 3
0002 04-FEB-2021 00:00:00 Test Line 4
0002 03-FEB-2021 00:00:00 Test Line 5
0002 09-JAN-2000 00:00:00 Test Line 6
0002 25-NOV-2019 00:00:00 Test Line 7