Good day all,
I am attempting to parse the City and State from a string.
These are example of the variations i am seeing in the data:
GLOVERSVILLE, NY 12078
NORTH CREEK NY 12853
BLUE MOUNTAIN LAKE NY 12812
INDIAN LAKE NY
SNOWFLAKE AZ
I have attempted to use REGEXP_SUBSTR:
2 Versions shown.
SELECT REGEXP_SUBSTR('GLOVERSVILLE, NY 12078', '[^ ]+', 1, 1) col_one,
REGEXP_SUBSTR('GLOVERSVILLE, NY 12078', '[^ ]+', 1, 2) col_two,
REGEXP_SUBSTR('GLOVERSVILLE, NY 12078', '[^ ]+', 1, 3) col_three,
REGEXP_SUBSTR('GLOVERSVILLE, NY 12078', '[^- ]+', 1, 4) col_four,
REGEXP_SUBSTR('GLOVERSVILLE, NY 12078', '[^- ]+', 1, 5) col_five,
REGEXP_SUBSTR('GLOVERSVILLE, NY 12078', '[^ ]+', 1, 1) || ' ' ||REGEXP_SUBSTR('GLOVERSVILLE, NY 12078', '[^ ]+1', 1, 2)
AS CITY_NAME
FROM dual
SELECT REGEXP_SUBSTR('NORTH CREEK NY 12853', '[^ ]+', 1, 1) col_one,
REGEXP_SUBSTR('NORTH CREEK NY 12853', '[^ ]+', 1, 2) col_two,
REGEXP_SUBSTR('NORTH CREEK NY 12853', '[^ ]+', 1, 3) col_three,
REGEXP_SUBSTR('NORTH CREEK NY 12853', '[^- ]+', 1, 4) col_four,
REGEXP_SUBSTR('NORTH CREEK NY 12853', '[^- ]+', 1, 5) col_five,
--case when LENGTH(REGEXP_SUBSTR('NORTH CREEK NY 12853', '[^ ]+', 1, 2)= '2')
--THEN
REGEXP_SUBSTR('NORTH CREEK NY 12853', '[^ ]+', 1, 1) || ' ' ||REGEXP_SUBSTR('NORTH CREEK NY 12853', '[^ ]+', 1, 2)
AS CITY_NAME
FROM dual
COL_ONE | COL_TWO | COL_THREE | COL_FOUR | COL_FIVE | CITY_NAME |
GLOVERSVILLE, | NY | 12078 | | | GLOVERSVILLE, |
NORTH | CREEK | NY | 12853 | | NORTH CREEK |
What I was attempting with the REGEXP_SUBSTR was to parse out the various segments and then CONCAT them back. I believe what i really need to be able to do is add logic that would determine if LENGTH(COL_TWO) > 2 THEN CONCAT COL_ONE & COL_TWO Else COL_ONE AS CITY_NAME.
I just do not know how to correctly about that to CASE statement. I also need to remove any punctuation that may be in the string as well.
I believe I am on the right track, but limited by experience. I would appreciate any suggestions/direction offered.
regards,
G