SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Efficiency test: adjacency list vs nested sets

mathguySep 27 2019 — edited Jul 11 2023

Oracle version used for tests (not all that relevant for this particular problem): 12.2.0.1 All tests run on SQL*Plus.

Hierarchies can be modeled in (at least) two equivalent ways. One is the adjacency list model, with each node being assigned a parent node - the familiar model on which CONNECT BY is normally used. The other is the nested sets model: each node is represented by a pair of numbers: a left bound and a right bound (usually non-negative integers), and a node is a descendant (immediate or otherwise - separated by any number of intervening generations) of another node if the descendant's number interval is contained in the ancestor's number interval. For example, the descendant could be represented by (483, 492) and the ancestor by (320, 830).

Suppose a table contains BOTH sets of information: both an adjacency list AND a nested sets representation (left bound and right bound for each node), and the data is correct, in the sense that it does, indeed, represent the same tree (the same hierarchy). Given a problem of hierarchical nature, is it more efficient to use the adjacency list / CONNECT BY approach, or a join on the "contains / is contained" condition for the corresponding nested sets?

In a recent thread, https://forums.oracle.com/ords/apexds/post/formatting-and-summary-rollup-5504, where the setup was similar to what I just described, I presented a solution using the adjacency list and ignoring the "left bound / right bound" business. At the time I didn't know about "nested sets" - that thread prompted me to read about it. Then I remembered that I answered a very old question some time ago having to do exactly with computing the nested set representation of a tree, given its adjacency list; but at that time I had no clue what the problem was really solving for. I use that algorithm below, where I populate a table with a hierarchy using both the adjacency list and the nested sets data.

After I posted my CONNECT BY solution, Mr. Kulash, one of the very frequent posters here, offered a solution using the nested sets data and ignoring the adjacency list. And he made the following claim in Reply #4:

Speaking of efficiency, it's much more efficient to use prentleft and parentright in this problem, rather than parentid. You can get the desired results ignoring parentleft and parentright (like Mathguy did in reply #3), but that requires a CONNECT BY query, which is inherently inefficient (and two CONNECT BYs, as in reply #3, are even slower).

That didn't seem right to me, so I asked Mr. Kulash to support that claim (see Reply #5 in that thread). Sadly, as he often does, Mr. Kulash chose to ignore my request.

This is a testable claim, so I set out to test it myself. I am posting my methodology and my findings here. I didn't set this thread up as a question, since I truly don't have one, but all comments are obviously welcome.

BOTTOM LINE: Mr. Kulash's claim turns out to be totally false. For the type of question presented in that thread, at least, the CONNECT BY approach is much, much faster, whether or not the base table has the proper indexes on the right columns. Methodology and test results presented below.

For anyone who disagrees with the findings, please tell us what you would suggest, and let me run the tests (or, alternatively, run them yourselves - all the needed code is posted below).

Added on Sep 27 2019

14 comments

1,663 views