Y/N flags. Junk dimension or attributes on the fact table?
Hi folks. I know this is not a OWB question. Its more of a general DW design question but I believe that a key design decision is performance using your tool and our tool is Oracle DW and Cognos for reporting...So here it goes.
I have about 40 Y/N flags attributes that I need in my current DW subject area. These flags come in three groups - client type flags, case type flags and outcome type flags.
Solution 1:
Based on my modelling background I default to putting these type of flags into a junk dimension which is really a dimension with all distinct combinations of Y/N for a set of indicators. If you had 10 Y/N flags there would be 2**10 or 1024 combinations/rows in you dimension table. That's a nice small dimension to link to my fact table. The advantages of a junk dim is that its easy to extend and you can group flags together. Its also a small table to query for BI tools lookups (ie much smaller than the fact table!) that are populated by querying the domain of an attribute ( though this is only an advantage if it "really" improves performance).
Solution 2:
Leave these Y/N flags on the fact table and bitmap index them. The advantage with this design is that you dont have to join to a dimension table to get at your flags.
Test:
I tested out solution 1 versus solution 2 on a 10 million row fact table. I had two tables for the test. MyFact_F which has an FK to MyDim_D called junk_sid. MyFact_F has 10 Y/N attributes -> attribute_1 through attribute_10. MyDim_D also has 10 attributes, attribute_1 to attribute_10.
For solution 1, I got better performance with a bitmap index on the junk_sid dimension primary key and also bitmap indexes on the Y/N flags in the dimension. For solution 2 (attributes on fact), i just bitmap indexed the Y/F flags right there on the fact. On a 10 million row fact table, solution 2 is alot faster. Again, my training and past experience suggest that a better design is using a junk dimension but when performance is alot better using attributes on the fact table, i really makes me question which solution is "better". Any thoughts would be appreciated!