Integration

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

EDQ Functionality Observations (hints for future enhancements)

robdumoulinNov 23 2015 — edited Nov 24 2015

Upon seeing the great technical advice given here, I come to the conclusion that the product team monitors this forum. Though I have been exposed to OEDQ for several years now, it is not until recently that I have been deeply immersed in it to the point that I notice a few things that sure would make the tool easier to use, more versatile, and easier for somebody to maintain the code of others. Some of my observations may be simply a result of limited experience in the tool, I admit, and I realize that a skilled OEDQ and Java guru could write custom code to do some of it. That is not my point though. Sure EDQ is very customization and configurable, but what a user has to go through is clunky and unpolished (no offense to you developers). For the record, I have been using Oracle since Version 4, have created several home-grown DQ tools that go up against Oracle, SQL, and Teradata, and used other industry tools that are your competitors to hit SQL Server and DB2. I prefer the Oracle back end over all others. It does not seem too much to ask to package a few things up better in a tool that costs what it does based on what your competitors do. I limited this to the top 9 things on my mind but this community could and has provided more. Top 9 list for me:

1) Attributes within processes all have a unique name. This name appears to also be used as a unique identifier. If you wish to change a name to be more descriptive or to align to some standard format, all downstream uses of this attribute are invalidated. Would it not be a more flexible and user-friendly approach to just treat the column name as an alias and let the unique id be "under the covers"? This way, you would be able to change a name and let it propagate downstream.

2) 'Business Rules' are rather cumbersome to create and debug. The labels within them are case-sensitive and if you choose to implement several and store them in Reference Data, you have 3 reference data sets per business rule set. Would it not be cleaner to have one interface to create Business Rules components (Rules, Checks, Conditions, and Reference Data), validate they all fit together, assign each a unique identifier, and allow for them to be assembled into custom Business Rule Sets as needed to handle record-level validations?

3) Audit process configurations out of the box provide output paths for All, Valid, Unknown, and Invalid. Valid and Invalid are self-explanatory. Unknown appears to be ambiguous. NULL and spaces are not the same to me. When I validate a character field, I would want to distinguish between the two. In my limited experience, both NULL and Spaces get lumped into the Unknown path, making it necessary for me to use a NO_DATA or LENGTH_CHECK process before or after to filter out the NULL values. I would guess there might be some way to use the VALID and INVALID options to make this work, but I wasted too much time trying to get it to consistently act the way I wanted it to.

4) Can EDQ dashboards really not provide drill-down?

5) Would it be possible to expose some global constants into processes? For example, it would be nice to be able to configure global constants for a process (fixed, or based on a configuration table settings) and get to them whenever you would like. Currently, you have to manually add each into the data stream. Exposing simple real-time constants like #SYSDATE, #USERNAME, #JOBNAME, #PROCESSNAME, #USER_ROLES would make it much easier to do custom logging and security.

6) Publishing Alerts and Cases is probably the most primitive and difficult feature I have run into. Having to configure it through Advanced Options of Match Merge when the alerts really have nothing to do with a match merge process is band aid atop a cast placed around an artificial limb. Sorry for the extended metaphor.....I got passionate.

7) Why cannot one publish an Issue from within a process execution? You seem to only be able to do it from the Results Browser manually.

8) There are only certain types of processes I can combine into a processor. I can then publish that combined processor, but what is the point? I cannot map anything but the last process in the published processor. It sure would be nice if one could treat a combined published set of processes as a single unit and see all the outputs of each processor instead of just the last one. The next observations i related.

9) Setting Scope of variables within a group of processors would be nice and would encourage code reuse. For example, I create a sequence of 6 processes with a merge at the end to validate domain rules different ways and group all findings into separate messages at the end. I wish to replicate this test for several different domains. Basically, the only thing different is the column I act against and the domain I choose. By the way, the domain I choose determines what type of messages get produced but that is done by some of the processes in the group. All I care about for this group are the outputs I create. All of the internal mappings and intermediate columns are superfluous. This repeatable process would ideally be documented with consistent named variables, but each one has to be unique per EDQ because everything is a global variable.

Locked Post

New comments cannot be posted to this locked post.

Locked on Dec 22 2015

Added on Nov 23 2015

#data-quality, #master-data-management-mdm

3 comments

1,054 views