There has been an interesting discussion, in a recent thread in this community, of the logical differences between placing predicates in the ON clause of a join vs. placing the same predicates in the WHERE clause of a query with an outer join. DISCLAIMER: If you are looking for a question, there is none; I am writing this more as a "note to self" (and I am looking for people to point out any flaws or misunderstandings, and perhaps others may find this useful as well).
I guess I've been lucky so far; whenever I needed outer joins and additional (non-join) predicates, I somehow - perhaps by trial and error - was able to "get it right". But I've never paid attention to this interesting topic. In the recent thread, we saw what happens when a predicate based on the "table on the right" is added to a left outer join. In this "elementary" post I would like to investigate that a little further and consider other situations as well. To keep it simple, I am not considering predicate pushing, which rows are read from disk to memory, the effect of indexes, or any optimizer shortcuts; everything is strictly at the logical level, having to do only with the end result of the query.
Suppose we are doing an outer join of two tables, call them a and b. I speculated in the other thread* that the order of evaluation must be: (1) all rows from a are evaluated against all rows from b; all the conditions in the ON clause are evaluated (both single-table conditions and cross-table conditions) Only the pairs (one row from each table) that meet all the conditions are kept, the rest are discarded. (2) To these resulting pairs, additional pairs may be added pursuant to the OUTER directive (namely, for all rows in a that don't have any matching rows in b; the b columns are filled with NULL for these added pairs, as they are in an outer join). (3) The predicates in the WHERE clause are applied to the result of the second step. And then whatever else "happens" in the query, if anything, does happen and we get the end result.
*EDIT: Here is a link to that other thread. My "speculation" is in Reply 15, at the top of page 2 of that thread. https://community.oracle.com/thread/4018202
Perhaps this logical order of evaluation is obvious from definitions (or from the documentation), I just never stopped to think about it.
I created a very small example to experiment with this. I didn't include NULLs anywhere; I don't think they are "special" for this discussion. Below I show the results of various tests, and there is an exercise at the very end. Using just the "order of (logical) evaluation" I described above, you may want to "close your eyes" and think about what you would expect the result of the query to be in each case. Imagine what the temporary result set should look like after each of the steps (1), (2) and (3) above.
What I am testing for is straightforward: add a single-table condition (like a.x = 1) to the "standard" join condition, either in the ON or in the WHERE clause, and vary what values you use. And do this for the table on the left and the one on the right.
SQL> create table a ( x ) as select 1 from dual union all select 2 from dual;
SQL> create table b ( y ) as select 2 from dual union all select 3 from dual;
SQL> select x from a;
X
----------
1
2
SQL> select y from b;
Y
----------
2
3
SQL> select a.x, b.y from a left join b on a.x = b.y and a.x = 1;
X Y
---------- ----------
2
1
SQL> select a.x, b.y from a left join b on a.x = b.y where a.x = 1;
X Y
---------- ----------
1
SQL> select a.x, b.y from a left join b on a.x = b.y and a.x = 2;
X Y
---------- ----------
2 2
1
SQL> select a.x, b.y from a left join b on a.x = b.y where a.x = 2;
X Y
---------- ----------
2 2
SQL> select a.x, b.y from a left join b on a.x = b.y and a.x = 3;
X Y
---------- ----------
2
1
SQL> select a.x, b.y from a left join b on a.x = b.y where a.x = 3;
-- no rows selected
SQL> select a.x, b.y from a left join b on a.x = b.y and b.y = 1;
X Y
---------- ----------
1
2
SQL> select a.x, b.y from a left join b on a.x = b.y where b.y = 1;
-- no rows selected
SQL> select a.x, b.y from a left join b on a.x = b.y and b.y = 2;
X Y
---------- ----------
2 2
1
SQL> select a.x, b.y from a left join b on a.x=b.y where b.y = 2;
X Y
---------- ----------
2 2
SQL> select a.x, b.y from a left join b on a.x = b.y and b.y = 3;
X Y
---------- ----------
1
2
SQL> select a.x, b.y from a left join b on a.x = b.y where b.y = 3;
-- no rows selected
Exercise: What should be the result of the following query? And why? What is the intermediate result set after each of the steps (1), (2) and (3)?
SQL> select a.x, b.y from a left join b on a.x = b.y and b.y = 2 where a.x = 1;