I need some help developing a regex that will be used to filter some ill-written legacy SQL prior to execution on SQL Server. I've demonstrated that I can intercept and modify the SQL but the scope of this regex is beyond my experience.
What I need to do is insert the string "with(nolock)" following each table name (and optional alias).
Here are two sample queries using both ANSII and "old school" join syntax:
select *
from table1
join table2 on table1.pk = table2.fk
join table3 on table2.pk = table3.fk
select *
from table1, table2, table3
where table1.pk = table2.fk
and table2.pk = table3.fk
It's possible that table names are aliased:
select *
from table1 a, table2 b, table3 c
where a.pk = b.fk
and b.pk = c.fk
The SQL being processed may already contain a "(nolock)" or "with(nolock)" for one or more tables so the regex has to include check for same (don't want to repeat for a table). There may be white space separating "with" and "(" and also between nolock and the enclosing parentheses.
I plan to begin by removing line terminations so you're dealing with a single String.
Objective:
select *
from table1 with(nolock)
join table2 with(nolock) on table1.pk = table2.fk
join table3 with(nolock) on table2.pk = table3.fk
or
select *
from table1 with(nolock), table2 with(nolock), table3 with(nolock)
where table1.pk = table2.fk
and table2.pk = table3.fk
I've taken several stabs at it - all to no avail. I use regex's daily but this one is beyond my current ability. My assumption was that parsing (capturing groups) would begin following "from" and be terminated by any of the following: where, order by, group by, close parentheses, or end of string.
Any takers? I'd be forever grateful and most importantly, this old dog will learn a new trick from someone far more knowledgeable. :-)