Hi Community,
My team is exploring the potential use of Oracle Text in our search functionality.
While exploring Oracle Text to see how it can solve our exact use case I am facing some questions and doubts.
During indexing the below lexer config. I can retain the entire word. But smaller blocks seperated by special characters arent tokenized. And without the basic lexer I can only get the smaller blocks but not the whole string.
Example
Without Basic Lexer aab_cdf/e     → aab   cdf      e
Wtith Basic Lexer      aab_cdf/e     → aab_cdf/e
What i want is            aab_cdf/e     → aab   cdf    e   aab_cdf/e
Lexer Config used
exec ctx_ddl.create_preference('quote_lexer', 'BASIC_LEXER');
exec ctx_ddl.set_attribute('quote_lexer', 'printjoins', './_:');
exec ctx_ddl.set_attribute('quote_lexer', 'whitespace', './_: ');
exec ctx_ddl.set_attribute('quote_lexer', 'index_themes', 'NO');
exec ctx_ddl.set_attribute('quote_lexer', 'index_text', 'YES');
Can we create ngrams while tokenization ( In english ) ?
For context ngrams are
An ngram is a contiguous sequence of _**n**_ characters from a given sequence of text. The ngram parser tokenizes a sequence of text into a contiguous sequence of _**n**_ characters. For example, you can tokenize “abcd” for different values of _**n**_ using the ngram full-text parser.
n=1: 'a', 'b', 'c', 'd'
n=2: 'ab', 'bc', 'cd'
n=3: 'abc', 'bcd'
n=4: 'abcd'
During MultiColumn Datastore preference what is the relevance of adding one of the column name while creating index even though all the columns data are joined.
Does search query written to search a indexed column get tokenized the same way data was indexed before searching or a blind search is done. And can we control it.
What scoring mechanism is used to rank results and can we control it.
Any help or pointing to the right direction would be really helpful.