Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Trouble coding a phrase generator

Catfive LanderFeb 8 2009 — edited Feb 9 2009
This isn't an especially SQL or PL/SQL question, more a general programming one, but I'd be grateful for any help.

I have a table containing lots of words, called SEARCH_DICTIONARY. Consider it to be just a list of every English word that exists.
I have users submitting search phrases using free-form text
I need to write something that will suggest alternative search phrases to those submitted when those submitted contain a spelling error.
I have a 'spellcheck' function which lists word alternatives based on utl_match.edit_distance_similarity

I now write the following function:
create or replace function search_suggest (searchphrase varchar2)
return varchar2 is

       loopcount  number;
       loopmax    number;
       v_count    number;
       goodword   varchar2(2000);
       badword    varchar2(2000);
       searchterm varchar2(2000);
       goodphrase varchar2(32000);

begin

  loopcount := 1;

  select listlen(searchphrase,' ') into loopmax from dual;

  loop
    select listgetat(searchphrase,loopcount,' ') into searchterm from dual;
    select count(1) into v_count from search_dictionary where mainterm=searchterm;
    if v_count > 0 then
      goodword := searchterm;
      goodphrase := goodphrase||' '||goodword;
    else
      badword := spellcheck(searchterm,75);   
      goodphrase := goodphrase||' '||badword;  
    end if;

    loopcount := loopcount+1;
    exit when loopcount>loopmax;
        
  end loop;

  RETURN goodphrase;

end; 
This sort-of works:
select search_suggest('man with riod tomaghto') from dual;

ALT_SEARCH_SUGGEST('MANWITHRIO
-------------------------------
 man with rind,riot,rod,rood tomato
That's correctly spotting that 'tomaghto' should be 'tomato', and although it's not getting "red" for 'riod', I'm not asking for miracles. It will do, since at least it's suggesting valid English words as an alternative.

The key thing is that my SPELLCHECK function spits out a comma-separated list of near-matches for any word it's fed. In this case, when I say that the 'edit-distance-similarity' should be greater than 75, the function tells me that RIND, RIOT, ROD and ROOD are alternatives for the 'riod' originally supplied. It also correctly spots that 'tomato' is the only near-equivalent for 'tomaghto'. The code also correctly spots that 'Man' and 'with' are valid dictionary words, and therefore doesn't pass them to the SPELLCHECK function at all. As I say, it's not perfect, but it will do.

Or rather, it would do if only I could output my results in the form:

man with rind tomato
man with riot tomato
man with rod tomato
man with rood tomato

-i.e, as four phrases containing all words and all combinations of words, rather than as one phrase containing a chunk of bad word alternatives buried in the middle of the good words as at present.

I am simply getting very confused keeping track of what words are good (and therefore don't need a lookup with the SPELLCHECK function) and which are bad, and therefore am having great difficulty being able to repeat all the good words (in the right order!) for each of the bad ones found by my SPELLCHECK function.

Can anyone suggest a re-working of my code to achieve the desired output, please?
This post has been answered by Frank Kulash on Feb 9 2009
Jump to Answer
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Mar 9 2009
Added on Feb 8 2009
4 comments
432 views