Hierarchical query with many-to-many relationship
ChrisS.May 7 2010 — edited May 8 2010I have read with interest the creative solutions to complex hierarchical queries posted previously; they have been instructive but have not quite addressed this scenario.
We have a hierarchy table H, with columns for ID, name, parentID, and other attributes.
Within this table are a number of independent hierarchies, each existing for a different purpose.
We have a master list of hierarchies in table T which describes the purpose of each hierarchy, provides some default attributes which the nodes can inherit, and stores a unique id for each hierarchy and a pointer to the root node of the corresponding hierarchy in table H.
We have a master list of items M, with identically named columns to those in H, along with many other attributes.
The members of table M ALL belong to EACH of the Hierarchies. So we have a link table I to define the intersection of H and M.
So the leaf nodes of H are really containers for the list of elements from M which may be attached to them.
The universe of M is very volatile, with new members being added, old ones deleted, and existing ones being reclassified frequently from node to node within each hierarchy. Since the hierarchies have to be built to handle every possible scenario, so that the members of M can always find a suitable node to reside in, quite often, in fact more often than not, the majority of leaf nodes for each hierarchy are empty at any given moment.
Therefore, although we always know the root sector of a given hierarchy and can traverse downwards from there, if we worked our way up from the intersection table, we could eliminate up to 70% of the nodes of any given hierarchy from further consideration, as they don't need to be (in fact, must not be) included in reports.
As implied by the above, rows in M are structurally similar (in terms of columns, but not in any real world sense) and are a superset of rows in H. But combining them into the one table doesn't seem to help the reporting process due to the many-to-many relationship which prevents the ID/parentID relationship from being carried through to this level.
There are a number of other considerations of which the most pertinent is that the people using this database generally have an interest in only a subset of the master list of items in M. This relationship is also dynamic but important enough and rigid enough that another link table P exists to combine the Users in table U with the subset of M in which they are interested. (The users are also grouped into hierarchies of a totally different nature, but this aspect is secondary for now.)
The reporting is reasonably straightforward for any single combination of User and Hierarchy; they want to see all the items they are interested in, listed in hierarchical sequence, totalled on change of level with the individual items M listed beneath the nodes of H. This is unfortunately required in real time, so retrieval performance is paramount.
Some statistics might help to determine the optimum approach:
The largest hierarchy has 10,000 nodes. The smallest about 100.
The largest would have 70% or more of its nodes unused at any point in time, and even the smallest would have 25% unused.
The hierarchies tend to be broad rather than deep, the maximum number of levels being about 5; but the larger ones should be twice as deep as this if performance was not compromised.
There are dozens of hierarchies, but it may be possible to sharply reduce this number by exploiting the Order Siblings By clause.
The number of rows in M varies between 500,000 and 50,000; depending upon how long historical data is retained on-line (and performance permitting, it would be retained indefinitely).
The number of users varies between 1000 and 2000 but the range of M in which they are interested varies greatly; from as few as 100 to as many as 10,000+. So it is almost always worth beginning by eliminating the items in which they are not interested, implying once again that the hierarchy should be traversed upwards rather than down from the root.
The current system is very old and survives by a tactic of building what are essentially materialised views of the database structure for each user overnight using, ahem, non-relational technology. This is inefficient and not easily scaled (but it works) and hence this redevelopment project needs to (a) work, and (b) work better and faster.
I am happy to provide some DDL scripts if that helps explain the problem better than this narrative.
I can't help feeling that the solution lies in somehow extending the hierarchical query past the many-to-many link table so that the Master list can be merged directly into the hierarchy such that the M items become the leaf nodes rather than the design outlined above - but I don't know how to do that. But I am sure everyone reading this does! :)
All advice appreciated. Database version is not an issue; we are currently using version 10XE for experimentation, but production usage could be on 11 if that contains helpful features.
Thank you
CS