Popular Posts

Monday, April 11, 2011

MDM with Microsoft Master Data Services ( MDS )

- By David Nolting
Recently, we've been pulled in to look at how MDM ( Master Data Management ) can be achieved with Microsoft Technology. The obvious starting point is to look at the SQL 2008 R2 MDS ( Master Data Services ) component that is built into native SQL Enterprise Architecture.

In a nutshell, the MDS offering looks quite promising. It is lightweight in its scope which is probably an advantage to reduce scope and complexity for first time MDM implementations.

MDM is often described as an SOA Enabler. We're finding that MDM thinking is seeping into the mindset of companies, even in France, for various business reasons: managing too many disparate data sources for customers, products, and other business entities is too costly and involves too much financial risk when errors are propagated across business domains.

We have just completed a POC using MDS where the goal was to use MDS to replicate an ERP based MDM model.

In case you're not familiar with MDM: most enterprise data has grown with the enterprise through silos like this:
Here we see arrows going all over the place. An MDM solution lets you get rid of lots of these arrows as follows:
MDM becomes the "source of truth" for entity management.

What an MDM solution should be: To really review the MDS offering and see how "complete" it is, we looked to a great source for MDM : Enterprise Master Data Management: An SOA Approach to Managing Core Information by Allen Dreibelbis . This is an exhaustive, agnostic look at MDM. So here, I'm going to look at their MDM solution feature list and see how MDS stacks up. Here are the essential features:

Master Data Life Cycle Management: manages all the aspects of creating, versioning, and security access to data authoring. MDS does provide strong life cycle management capabilities such as versioning, hierarchies ( derived, explicit and derived with explicit caps ). However some of the features we did not find in MDS were:

  • Flexible mapping capability across business domains ( models in MDS ). Entity mapping across MDS models does not seem to be natively supported. For example, if you define a CDI ( Customer Model ) and a separate PIM ( Product Model ), entities across model domains can't be mapped in hierarchies. However, there are ways of getting around this through modified subscription views.
  • Versioning is supported at the model level and provides Model Locking and Version Ancestry Analysis. Versioning at the entity level seems to be a strong requirement. However, MDS provides notifications for data governance issues for data stewards. It seems that the best way to handle entity versioning is at the Model level with sub-versions. When using something like ODATA services over REST, URLs could point to different model versions that expose entities according to the macro model version number. So this doesn't seem like a big issue to me.
Aside from these exceptions ( which can be handled through minor workarounds ), MDS seems to provide the base Life Cycle Management functionality including Audit, Authoring Security, Data lineage, Referential MDM, multiple taxonomies, etc.

Pictured below is a Derived Hierarchy based on Geographical Sales. Derived Hierarchies are inherent, attribute based taxonomies.
Data Quality Management Capability: ability to provide data quality management as data is loaded into the MDM, and real time data governance to notify stewards of data issues. MDS seems to cover this quite well through a Business Rules Processor with extensibility through WF workflows if more heavy lifting is required or if human review is necessary.

MDS provides an intuitive Business Rules Layer as pictured below:
MDS provides Business Rules that are versioned and published and provide IF / ELSE / ACTION behavior:
These business rules allow for basic attribute value checking for data consistency and data logic. Business Rules can be neatly tied to Windows Workflows that are published as services in SharePoint or an SOA Server:
A couple of features such as Data Reconciliation and  Data Cleansing are shoved to the ETL ( SSIS ) products and are expected to be performed before staging data in MDS. Data Reconciliation or the ability to identify duplicate entities ( a customer name or address is slightly different ) can be a complex procedure. The most sensible, low cost way to identify data duplication could be through checking for entity likeness and reporting suspected duplicates to the data steward. So you could image the ETL or EAI procedure that is loading your data could query the SOA layer for a Customer-LIKE operation based on name, address, telephone and if > 0 entries are found, a business rule could be triggered to notify the data steward and reconcile the data.

So it seems that Data Reconciliation and Cleansing need to be handled outside of the MDS product.

Master Data Harmonization: is a fancy way of saying you need to get data from disparate LOB applications and possible unstructured data sources into your MDM solution and vice versa. MDS exposes staging tables and it's your responsibility to load them via ETL or other means. Microsoft provides lots of answers to Data Harmonization such as BizTalk for high end, messaging integration and SSIS for basic, good old fashioned  ETL loading of staging tables.  We will look into how an AppFabric WCF service could be exposed to help with data duplication issues. Such a service could be consumed by BizTalk in a messaging scenario or perhaps in the Business Rules layer as a default work flow check on all incoming entities. Any entities suspected of duplication could be flagged and sent to another workflow for review. 
The Microsoft Middleware stack provides lots of options to enrich the MDS layer. It's obvious that SharePoint and WCF will be the candidates of choice to extend MDS. But BizTalk, which naturally plays a role in data synchronization between an ERP and disparate apps, could be used as well for asynchronous message transfers and pub/sub bus like interaction with MDS. For lightweight enterprise application integration, AppFabric Connect, with the help of WCF Adapters and the BizTalk Mapper, could also be used, especially for synchronous situations. To be honest, we need to dig a little deeper into the MDS notifications to see how BizTalk or other systems could subscribe to MDS artifact modifications so as to update a subscribing system. AppFabric Cache is a strong candidate for performance in the event that thousands of online users could, for instance, request product PIM information from a web application. In this case, an MDM model could be published to the Cache and made available for online queries.

Analysis and Insight Capabilities: MDM solutions should provide inline analytical insight capability to help establish cross relational data and potential business opportunities. This seems to be the realm of BI and perhaps the StreamInsight offering which is built into SQL Server. Good modeling should help with insight capabilities as well. Take for example a banking account model that should tie together an individual with all his accounts ( mortgage, savings, investment cds ) and would be able to identify relationships such as spouses, siblings and remote family members. Business Insight coupled with good modeling would identify, for example, a mortgage loan opportunity as a result of a change of marital status for one of the siblings, or college loan opportunity as a sibling nears high school graduation date ( i.e. 18th birthday ), etc. StreamInsight could subscribe to events triggered by the MDS model and process them for business opportunities.

Building an SOA Layer on MDS

MDS data tables cannot be reliably queried directly and even if you reverse engineer the tables, developing such as strategy is risky because future product versions may change the structure and therefore break your query model. What MDS does provide are subscription views. Currently, subscription views allow for queries against single entity types, or derived hierarchies, and render a readable version of it with columns that match the attribute names. A view generated by MDS resembles the following:
As you can see, the TSQL code has to do a lot to render a technical, meta data driven mds entity into readable output. But once the entity view is created, composite views can be developed to join entities together and to filter data:
Subscription views make it easy to get to the entities and from there you can build upon the queries through LINQ, Entity Framework, or simple ADO.net to publish the data out through WCF Services. MDS could be a great candidate for ODATA WCF services to publish entities as urls. There exists a WCF api with MDS that I'm sure could be leveraged. However, we couldn't find much documentation on it so I guess it would require some trial and error. The API seems to provide full authoring and lifecycle services as well as business rules.

The business rules WCF services would be ideally used to prescreen data from an external system to see if it will pass the MDS business rules prior to shipping it to MDS or another application.


Our POC delivered a satisfactory model that was able to replicate the model defined in the ERP system, and we did this within a couple of days including taking the time to work through MDS examples to fully understand its Modeling API. A second phase of the POC is required to dig deeper into MDS to define the most suitable Data Integration and Harmonization techniques ( BizTalk, SSIS, etc. ) and to roadmap SOA extensibility by digging into the MDS WCF API or by building business domain services that tap into subscription views.

We are getting more and more requests to look into MDM for obvious reasons, and now that Microsoft SQL 2008 R2 provides MDS as a core MDM component, we are more than happy to add MDS to our integration stack. We've just started to scratch the surface and have concentrated on the real business value of MDS which is Modeling. The extended potential of MDS as an SOA backbone will require help from other components in the Microsoft stack such as WCF, WF, SharePoint, and potentially BizTalk. And, of course, an architect will be needed to glue it all together.

Great Resources for MDS and MDM



MDM Book: Enterprise Master Data Management: An SOA Approach to Managing Core Information by Allen Dreibelbis