Hi,
I am working on a Data Warehousing project and one of my task is to provide business users with a google like search box for the data.
The below eg search terms should give an idea of what we are trying to accomplish.
| Search Term | Expected Result |
|---|
| Client Revenue | A list of all clients and their revenues. Basically a join between CLIENT and REVENUE tables. |
| European Client Revenue | List of all european clients and their revenues - filter on location. |
| Platinum Client Revenue | List of all platinum clients and their revenues - filter on client type. |
I am thinking of having three types of meta-data tables:
1. Entity level - store the list of all information/tags about the entity.
| ID | ENTITY | TAG |
|---|
| 1 | DIM_CLIENT | client |
| 2 | DIM_CLIENT | customer |
| 3 | DIM_CLIENT | account |
| 4 | FACT_EXPENSE | expense |
| 5 | FACT_EXPENSE | cost |
| 6 | FACT_FORECAST_EXPENSE | expense |
| 7 | FACT_FORECAST_EXPENSE | forecast |
| 8 | FACT_FORECAST_EXPENSE | future |
| 9 | FACT_FORECAST_EXPENSE | cost |
2. Category level - Store at the column level, the distinct categories (for columns with low cardinality like location or category), along with their aliases.
| ID | ENTITY | COLUMN | VALUE_TAG |
|---|
| 1 | DIM_CLIENT | Region | EMEA |
| 2 | DIM_CLIENT | Region | LATAM |
| 3 | DIM_CLIENT | Region | NAM |
| 4 | DIM_CLIENT | Region | ASPAC |
Have an alias table for each of these records - NAM can represent North America, USA etc.
3. Data level - An indexed table of all the text based columns in a table with its primary key/business key.
Something like this.
| ID | ENTITY | BUSINESS_KEY | TEXT |
|---|
| 1 | DIM_CLIENT | 1000022035 | societe generale |
| 2 | DIM_CLIENT | 1000022035 | 1000022035 |
| 3 | DIM_CLIENT | 1000022035 | societe generale sa-paris head office |
| 4 | DIM_CLIENT | 1000022035 | societe generale sa-paris head office |
| 5 | DIM_CLIENT | 1000022035 | barep asset management |
| 6 | DIM_CLIENT | 1000022035 | 522110 |
| 7 | DIM_CLIENT | 1000022035 | 6029 |
| 8 | DIM_CLIENT | 1000022035 | societe generale sa |
| 9 | DIM_CLIENT | 1000022035 | amc |
| 10 | DIM_CLIENT | 1000022035 | 58410 |
Based on the search string, if it matches any of the tables, the related entity can be found and the query built (I know wishful thinking!!).
I just want a feedback on this approach - am I missing anything obvious. Is there a better way of solving this problem.
Or are there any opensource tools that you might be aware of.
Feel free to suggest your ideas.
Thanks for your help.
-Anand