Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Regular Expression Pattern Matching for certain Date Formats (Multiple instances in the same column)

Roxy rollersJul 26 2021

I have a large Text field - varchar2(4000) which has data stored in multiple lines (With Carriage Return) in the following formats for each Student in table STUDENT_TB. I need to extract each line from each record where when I encounter this particular pattern (Date followed by Text) I insert it into another table. If it does not meet the criteria, I would just ignore it.
The pattern is in 3 distinct formats where #1 is most widely used and #3 is least widely used. This is what I landed up identifying.

#1. DD MON YYYY Text
#2.  DD MON YY Text
#3. D MON YY Text

I would like to extract these dates and then extract the text and insert into another table (STUDENT_TB_DTL) where the Date goes into the Date field and the Text goes into the text field for the same Student. I am guessing I would do it as a PL/SQL procedure.
I am currently able to identify the DD MON YYYY pattern using Regular expression but I am only able to do it for the first occurrence. After that, it just skipped to the next record. Any help in this regard is greatly appreciated.
Scripts for table creation and Insert statements:

CREATE TABLE STUDENT_TB (STUDENT_ID VARCHAR2(4), 
	                 MESSAGES  VARCHAR2(4000)
                        );
CREATE TABLE STUDENT_TB_DTL (STUDENT_ID   VARCHAR2(4), 
                             CONTACT_DATE DATE,                             
	                     MESSAGES     VARCHAR2(4000)
                            );
Insert into STUDENT_TB (STUDENT_ID,MESSAGES) values ('0001','06 Aug 20: Test Line 1 
05 Dec 2019: Test Line 2
26 Sep 2019: Test Line 3
12 Nov 2018: Test Line 4');
Insert into STUDENT_TB (STUDENT_ID,MESSAGES) values ('0002','23 Jul 2021: Test Line 1
22 Jul 2021: Test Line 2
12 July 2021: Test Line 3
04 Feb 2021: Test Line 4
03 Feb 2021: Test Line 5
9 Jan 20: Test Line 6
25 Nov 19: Test Line 7');

My SQL Query for the Pattern Matching is so far as follows. I have explained what my final result should be but my main hurdle is to get the "multiple instances" of the pattern I so desire to get.

select student_id, REGEXP_SUBSTR(messages, '\d{2} [a-zA-Z]{3} \d{4}') pattern_match,
       messages
from   student_tb
order by student_id;

My desired Output from STUDENT_TB_DTL should be as follows:

0001	06-AUG-2020 00:00:00	Test Line 1
0001	05-DEC-2019 00:00:00	Test Line 2
0001	26-SEP-2019 00:00:00	Test Line 3
0001	12-NOV-2018 00:00:00	Test Line 4
0002	23-JUL-2021 00:00:00	Test Line 1
0002	22-JUL-2021 00:00:00	Test Line 2
0002	12-JUL-2021 00:00:00	Test Line 3
0002	04-FEB-2021 00:00:00	Test Line 4
0002	03-FEB-2021 00:00:00	Test Line 5
0002	09-JAN-2000 00:00:00	Test Line 6
0002	25-NOV-2019 00:00:00	Test Line 7
This post has been answered by Roxy rollers on Jul 27 2021
Jump to Answer
Comments
Post Details
Added on Jul 26 2021
13 comments
2,376 views