Boolean order of precedence - a counter-example needed!
I am trying to explain to someone why two queries give slightly different results.
As an example, go to http://www.gettyimages.com/ and do a search for (example 1):
man or woman not (girl or boy)
Then do a search for (example 2):
man or woman not girl not boy
You will get very slightly different results. 1,054,945 versus 1,054,853 results.
Now, I know that parentheses are evaluated before anything else, then NOTs, then ANDs and finally ORs, so I've explained the first example (with the bracketed ORs) as :
1.Find all images keyworded as 'girl' or 'boy' (because brackets are evaluated first)
2.Find all images keyworded 'man' or 'woman' which are not in the 'girl/boy' set
And I've explained the multi-NOT (second example) test as
1.Find all images which are not keyworded 'girl' (because that's the left-most NOT expression)
2.Find all the images in that 'not girl' set which are also not keyworded 'boy' (because that's the next available NOT expression)
3.Find all the images in that second set of images which are keyworded 'man' or 'woman'
I've explained, too, that because, in the second example, the girl and boy tests are applied one at a time, sequentially, before a set-wise comparison with the man/woman set whereas in the first example, girl and boy are evaluated as a single set, before a single set-wise comparison with the man/woman set -because those two different processes are different, you might/can/will get different results.
But the question I keep being asked is, when would an image not appear in the one example where it would in the other, or vice versa? And I can't think of an example of a record marked with various keywords which would be an example of the difference.
Could someone give such an example, working from first principles?
I'm also interested to know whether example 1 or example 2 is the 'better' way of doing things, or whether there is an efficiency difference between the two approaches. Seems to me the multiple NOT example is likely to require more database processing to evaluate, but I don't know if that's true or not.
Any thoughts appreciated.