Hi Friends,
We have a BOT that will read a PDF and send us an address. We then have to find out the match for that address in our system.
Below is the logic that we are currently using.
- Using util_match.edit_distance, calculate the distance between the address received from BOT and the addresses in our system.
- Filter all the addresses with distance is < 10
- Sort by the ascending order of the distance and take the address with the least distance.
Note: The reason for setting taking distances up to < 10 is to handle below scenarios where the address sent by BOT has RTE and the address in our system as ROUTE and vice versa. Below are other examples.
RTE vs ROUTE
HWY vs HIGHWAY
BLVD vs Boulevard
AVE vs Avenue
PKWY vs Parkway
DR vs Drive
RD vs Road
ST vs Street and so on...
Below tables shows address examples where this approach works and where it doesn't work.
Positive cases:
Address Sent by BOT
| Address derived by utl_match.edit_distance
|
|---|
| 820 EASTERN BYP | 820 EASTERN BYPASS |
| 3905 CONCORD PARKWAY S | 3905 CONCORD PRWY S |
| 100 NO. LHS DRIVE | 100 N LHS DR |
Negative cases:
Address Sent by BOT
| Address derived by utl_match.edit_distance
|
|---|
| 5800 US HIGHWAY 98 N | 4600 US HWY 98 N |
| 1955 E MONTGOMERY XRD | 1975 E MONTGOMERY XRD |
| SCARBOROUGH NE | 502 BOOTH RD |
| 4550 KESTER MILL RD | 930 HANES MALL BLVD |
Now, our expected result is that in case of negative cases mentioned above, we should flag as an error saying that the address is not defined in our system.
Could you please let me know how can I achieve my requirement using utl_match.edit_distance? If not, is there any other way to achieve this requirement.
Regards,
Sreekanth
Sreekanth