ReachSOA: April 2011

Monday, April 11, 2011

High Performance MDM with AppFabric Cache, BizTalk, and Dynamics AX

- By David Nolting

The Context

Recently, ReachSOA architects were employed to help model a referential architecture for a renowned, international sports federation that managed both affiliates and public customers for public sporting events. The combined affiliate and public customer base ranged in the millions. At the time we were called in, they were at the tail end of a revamp of the CRM and ERP applications. As with most organizations, the affiliation relied on several disparate applications to manage affiliate membership, online ticket sales, and front office invoicing and back office order fulfillment. Microsoft Dynamics 2009 was chosen as the CRM product and would eventual morph into a full blown ERP solution. The ERP would become the MDM "source of truth" referential data source for customers (CDI), and products (PIM), as well as a historic base for product sales from the online and front office applications.

The solution was a classic BizTalk / Dynamics AX data synchronization model using BizTalk and the AX adapter:

The SLA Challenge
The integration between the affiliate database, BizTalk, and Dynamics AX proved to be very reliable. However, the integration time in AX was costly: database staging and AX business integration rules required to validate and integrate customers took time. BizTalk would deliver customer data to the AIF layer where it would be subjected to a lengthy staging process. Customers created from FrontOffice and Online Sales applications need to be made available for product fulfillment within minutes. Solicitation of the customer master record could occur from potentially thousands of connected users ( an SLA of 15 000 simultaneous connections ) was established.

At first, the architects looked at exposing Dynamics AX web services for consumption from the Front Office and Online applications. However, many technical problems such as performance, AX Business Connector Licenses, and "real time" reliability of the MDM database were put into question.

ReachSOA Analysis
We suggested a POC limited to 5 days to define and test an architecture using AppFabric Caching of the MDM. The cache would expose Pre-MDM services that would allow for recovery of master customer and product data and the ability to create the essential MDM Customer and place it into the cache so that it is available for consumption from the product fulfillment process within the state SLA of minutes ( in fact, it would be instantaneous with the Cache ).

To cut to the chase, we modeled the following technical architecture which combines synchronous and asynchronous flows using AppFabric and BizTalk:

We identified a series of Synchronous and Asynchronous flows. AppFabric Cache would be used for Synchronous Customer creation: a web service would allow for the creation of a customer from a subscribing app. The customer is updated to the cache and pushed to BizTalk for an asynchronous update to the MDM system. Ideally, MDM business rules would be called by AppFabric to pre-validate the Customer Information before it was updated to the cache. Business Rules integration with MDM was road mapped for a production version.

Rebooting the cache meant reloading the cache from the AX database. Fortunately, a cache reboot executed in about 30 minutes for millions of customers on a very modest VM platform.

We structured the cache so that it remained online at all times and a reboot merely updated entries in the cache rather than dumping and rebuilding it from scratch.

BizTalk Integration
We were able to update the cache from BizTalk Transactions that came from the Affiliate database. This meant that Affiliate information would be published to the MDM Cache well before it was staged into the MDM database meeting SLA requirements. BizTalk called a .Net Assembly that we built to wrap the AppFabric Cache API. Below is the component architecture of the Cache solution:

Once the cache api was wrapped in our .Net Service Facade, it was easy to hook up BizTalk and the cache :

WF Integration
We promoted use of WF ( Workflow Foundation ) to model the Creation/Update Process. This Work flow manages a transaction between the cache and the asynchronous integration tables that BizTalk would use to pickup and pass modifications to AX. If an update failed at the cache level or the SQL level, the transaction is rolled back and a fault is returned to the consuming app.

Functional Testing
The functional results of our POC were clearly positive. We managed to demonstrate flexible and rapid queries against the AX referential store thanks to AppFabric cache. The POC also demonstrated strategic possibilities such as full blown search and creation of SOA business domains for the affiliation.

Performance Testing
Performance testing was limited to Visual Studio Load tests to hammer the cache platform. Using this approach, we expressed the 15000 user connection SLA in terms of number of transactions we could handle per second; an SLA commonly used in EAI/ESB performance projects. This seemed to fit because the cache would sit well behind the online website and the transaction that involved querying the customer data would be less than 1 % of the entire online experience. Getting to a figure of number of transactions per second would help us project the number of servers required for a full blown implementation. Our cache was distributed across 3 machines as pictured below:

We used 3 virtual machines ( VMWare ESX ) to distribute the cache across multiple machines. Our initial goal was to show the High Availability/Fail Over capabilities of the cache; however, we did not have Windows 2008 R2 Enterprise installed on these machines and it was beyond the scope of the POC to upgrade from Standard to Enterprise.

We achieved anywhere from 85 to 95 transactions per second which seemed to largely satisfy the review committee given that we were not working in an optimal environment but had merely grabbed a few available machines.

We performed tests such as the following:

Test 20 minute duration

104 819 tests completed

87,3 tests / sec

210 threads (limitation VS Load Test / CPU)

0.35 Avg Test Time

A few Technical Gotchas
As our Cache needs to theoretically remain available at all times, we configured High-Availability for the Cache. One important characteristic for our caching model was to eliminate eviction. Regardless of hit ratios, we needed all data to stay in the cache. We stumbled upon this during the POC. In addition, we will need to activate notifications in the event the cache drops entries due to eviction. So, some time needs to be spent to understand and configure precisely the cache configuration. Perhaps in another post, will drill down to some of the technical issues. This post is more high-level and shows the business value of the technical architecture.

Conclusions/Opportunities

The POC not only met but surpassed expectations. The POC restitution helped underline future potential for the solution. Here are the key take away points from the POC:

Practical Points

Inexpensive Solution

Rapid Development

High Performance

Opportunities

Potential for Full Blown Search

First step towards MDM + SOA Enterprise Architecture

Opportunity to Provide Service Inventory for Affiliate databases ( without creating another Silo )

Best Practices and Governance in SOA

Cloud ( public or private ) elasticity

MDM with Microsoft Master Data Services ( MDS )

- By David Nolting
Recently, we've been pulled in to look at how MDM ( Master Data Management ) can be achieved with Microsoft Technology. The obvious starting point is to look at the SQL 2008 R2 MDS ( Master Data Services ) component that is built into native SQL Enterprise Architecture.

In a nutshell, the MDS offering looks quite promising. It is lightweight in its scope which is probably an advantage to reduce scope and complexity for first time MDM implementations.

MDM is often described as an SOA Enabler. We're finding that MDM thinking is seeping into the mindset of companies, even in France, for various business reasons: managing too many disparate data sources for customers, products, and other business entities is too costly and involves too much financial risk when errors are propagated across business domains.

We have just completed a POC using MDS where the goal was to use MDS to replicate an ERP based MDM model.

In case you're not familiar with MDM: most enterprise data has grown with the enterprise through silos like this:

Here we see arrows going all over the place. An MDM solution lets you get rid of lots of these arrows as follows:

MDM becomes the "source of truth" for entity management.

What an MDM solution should be: To really review the MDS offering and see how "complete" it is, we looked to a great source for MDM : Enterprise Master Data Management: An SOA Approach to Managing Core Information by Allen Dreibelbis . This is an exhaustive, agnostic look at MDM. So here, I'm going to look at their MDM solution feature list and see how MDS stacks up. Here are the essential features:

Master Data Life Cycle Management: manages all the aspects of creating, versioning, and security access to data authoring. MDS does provide strong life cycle management capabilities such as versioning, hierarchies ( derived, explicit and derived with explicit caps ). However some of the features we did not find in MDS were:

Flexible mapping capability across business domains ( models in MDS ). Entity mapping across MDS models does not seem to be natively supported. For example, if you define a CDI ( Customer Model ) and a separate PIM ( Product Model ), entities across model domains can't be mapped in hierarchies. However, there are ways of getting around this through modified subscription views.

Versioning is supported at the model level and provides Model Locking and Version Ancestry Analysis. Versioning at the entity level seems to be a strong requirement. However, MDS provides notifications for data governance issues for data stewards. It seems that the best way to handle entity versioning is at the Model level with sub-versions. When using something like ODATA services over REST, URLs could point to different model versions that expose entities according to the macro model version number. So this doesn't seem like a big issue to me.

Aside from these exceptions ( which can be handled through minor workarounds ), MDS seems to provide the base Life Cycle Management functionality including Audit, Authoring Security, Data lineage, Referential MDM, multiple taxonomies, etc.

Pictured below is a Derived Hierarchy based on Geographical Sales. Derived Hierarchies are inherent, attribute based taxonomies.

Data Quality Management Capability: ability to provide data quality management as data is loaded into the MDM, and real time data governance to notify stewards of data issues. MDS seems to cover this quite well through a Business Rules Processor with extensibility through WF workflows if more heavy lifting is required or if human review is necessary.

MDS provides an intuitive Business Rules Layer as pictured below:

MDS provides Business Rules that are versioned and published and provide IF / ELSE / ACTION behavior:

These business rules allow for basic attribute value checking for data consistency and data logic. Business Rules can be neatly tied to Windows Workflows that are published as services in SharePoint or an SOA Server:

A couple of features such as Data Reconciliation and Data Cleansing are shoved to the ETL ( SSIS ) products and are expected to be performed before staging data in MDS. Data Reconciliation or the ability to identify duplicate entities ( a customer name or address is slightly different ) can be a complex procedure. The most sensible, low cost way to identify data duplication could be through checking for entity likeness and reporting suspected duplicates to the data steward. So you could image the ETL or EAI procedure that is loading your data could query the SOA layer for a Customer-LIKE operation based on name, address, telephone and if > 0 entries are found, a business rule could be triggered to notify the data steward and reconcile the data.

So it seems that Data Reconciliation and Cleansing need to be handled outside of the MDS product.

Master Data Harmonization: is a fancy way of saying you need to get data from disparate LOB applications and possible unstructured data sources into your MDM solution and vice versa. MDS exposes staging tables and it's your responsibility to load them via ETL or other means. Microsoft provides lots of answers to Data Harmonization such as BizTalk for high end, messaging integration and SSIS for basic, good old fashioned ETL loading of staging tables. We will look into how an AppFabric WCF service could be exposed to help with data duplication issues. Such a service could be consumed by BizTalk in a messaging scenario or perhaps in the Business Rules layer as a default work flow check on all incoming entities. Any entities suspected of duplication could be flagged and sent to another workflow for review.

The Microsoft Middleware stack provides lots of options to enrich the MDS layer. It's obvious that SharePoint and WCF will be the candidates of choice to extend MDS. But BizTalk, which naturally plays a role in data synchronization between an ERP and disparate apps, could be used as well for asynchronous message transfers and pub/sub bus like interaction with MDS. For lightweight enterprise application integration, AppFabric Connect, with the help of WCF Adapters and the BizTalk Mapper, could also be used, especially for synchronous situations. To be honest, we need to dig a little deeper into the MDS notifications to see how BizTalk or other systems could subscribe to MDS artifact modifications so as to update a subscribing system. AppFabric Cache is a strong candidate for performance in the event that thousands of online users could, for instance, request product PIM information from a web application. In this case, an MDM model could be published to the Cache and made available for online queries.

Analysis and Insight Capabilities: MDM solutions should provide inline analytical insight capability to help establish cross relational data and potential business opportunities. This seems to be the realm of BI and perhaps the StreamInsight offering which is built into SQL Server. Good modeling should help with insight capabilities as well. Take for example a banking account model that should tie together an individual with all his accounts ( mortgage, savings, investment cds ) and would be able to identify relationships such as spouses, siblings and remote family members. Business Insight coupled with good modeling would identify, for example, a mortgage loan opportunity as a result of a change of marital status for one of the siblings, or college loan opportunity as a sibling nears high school graduation date ( i.e. 18^th birthday ), etc. StreamInsight could subscribe to events triggered by the MDS model and process them for business opportunities.

Building an SOA Layer on MDS

MDS data tables cannot be reliably queried directly and even if you reverse engineer the tables, developing such as strategy is risky because future product versions may change the structure and therefore break your query model. What MDS does provide are subscription views. Currently, subscription views allow for queries against single entity types, or derived hierarchies, and render a readable version of it with columns that match the attribute names. A view generated by MDS resembles the following:

As you can see, the TSQL code has to do a lot to render a technical, meta data driven mds entity into readable output. But once the entity view is created, composite views can be developed to join entities together and to filter data:

Subscription views make it easy to get to the entities and from there you can build upon the queries through LINQ, Entity Framework, or simple ADO.net to publish the data out through WCF Services. MDS could be a great candidate for ODATA WCF services to publish entities as urls. There exists a WCF api with MDS that I'm sure could be leveraged. However, we couldn't find much documentation on it so I guess it would require some trial and error. The API seems to provide full authoring and lifecycle services as well as business rules.

The business rules WCF services would be ideally used to prescreen data from an external system to see if it will pass the MDS business rules prior to shipping it to MDS or another application.

Conclusion

Our POC delivered a satisfactory model that was able to replicate the model defined in the ERP system, and we did this within a couple of days including taking the time to work through MDS examples to fully understand its Modeling API. A second phase of the POC is required to dig deeper into MDS to define the most suitable Data Integration and Harmonization techniques ( BizTalk, SSIS, etc. ) and to roadmap SOA extensibility by digging into the MDS WCF API or by building business domain services that tap into subscription views.

We are getting more and more requests to look into MDM for obvious reasons, and now that Microsoft SQL 2008 R2 provides MDS as a core MDM component, we are more than happy to add MDS to our integration stack. We've just started to scratch the surface and have concentrated on the real business value of MDS which is Modeling. The extended potential of MDS as an SOA backbone will require help from other components in the Microsoft stack such as WCF, WF, SharePoint, and potentially BizTalk. And, of course, an architect will be needed to glue it all together.

Great Resources for MDS and MDM

http://sqlblog.com/blogs/mds_team/archive/2011/01/11/new-master-data-services-training-available.aspx

http://www.packtpub.com/microsoft-sql-server-2008-r2-master-data-services/book

MDM Book: Enterprise Master Data Management: An SOA Approach to Managing Core Information by Allen Dreibelbis

Popular Posts

Monday, April 11, 2011

High Performance MDM with AppFabric Cache, BizTalk, and Dynamics AX

MDM with Microsoft Master Data Services ( MDS )