Parsing Street Addresses with Oracle Text.
I've been playing around with garage sale ads trying to find a system which can successfully read a RSS feed and parse the address and date out of it. My current method is to parse the adds for known patterns and start geocoding the patterns with google to get a match. However, this falls apart as soon as I reach the daily geocoding limit.
So my next idea was to grab the US Tiger, and Canadian NRN files and build my own geocoder. But now that I have the names/locations of every road in North America I though I could turn this around. Search the ads for actual street names based on proximity or RSS coverage.
I thought I'd see if Oracle could handle it. I've tried a few approaches including data mining. Right now I'm playing with 'NEAR(("'||x.street_name||'", $"||x.street_type||'", 2, TRUE) which seems to work rather well.
The questions:
1. Ok now I've found a matching street for the ad. How do I tell where the pattern is? As instr doesn't really work if it's matching on Main N ST instead of Main ST.
2. Usually there is a number in front of the street_name. If I know the start position of the pattern I can scan for 144 or 144 & 143 or 144 and 143 or ... you get the picture. Is there a way to have the NEAR search for a number near the pattern?
3. How do I do a NEAR with the street direction. $"N" is pretty useless.
Any ideas anyone?