In Part 1 of this blog series, I presented the benefits that Talend Data Preparation 2.1 (Summer ’17) delivers: in particular, an unmatched level of industrialization for IT and the integration of data preparations to any type of Big Data scenario – batch or streaming.
In part 2, I’ll review the major new features we’ve added to Talend Data Stewardship. Since January 2017, Talend has offered a range of self-service data management applications that accelerate and facilitate digital transformation at all levels of your organization. Indeed, one of the main keys to the success of your digital transformation is your ability to put the data in the hands of all your users, to exploit all their experience and their knowledge of the data.
Our newest self-service data application, Talend Data Stewardship, enables you to mobilize a particular business expert and organize data optimization campaigns across the enterprise as a catalyst for data knowledge: you get data corrected, merged, certified in a controlled timeframe. So without further delay, let’s see what’s new with Talend Data Stewardship in our Summer ’17 release.
Talend Data Stewardship Ensures High Availability and Productivity
As data-driven culture spreads across the organization, data stewardship becomes a critical element in a company’s data strategy. A robust, scalable, and industrial application is needed to ensure the Data Steward’s success. This is why Talend Summer ’17 is based on a high availability architecture.
To optimize productivity, Talend Data Stewardship Summer ’17 enables users to perform mass actions such as arbitration, assignment, acceptance, reject of data. For the same purpose, the user controls the pace of his work according to a maximum time of resolution of his tasks displayed in his list of tasks. It can thus guarantee a Service Level Agreement to the campaign sponsor.
Talend Data Stewardship uses Machine Learning to manage your duplicates
The application orchestrated merging campaigns from multiple data sources, “arbitration” campaigns between multiple data sources, and “error” resolution campaigns.
Because duplicate management is one of the most time-consuming and critical tasks to improve data quality, Talend Data Stewardship now manages grouping campaigns.
Data sets containing duplicates often have thousands or millions of lines and are unmanageable by a user. Integrated with a Talend Big Data platform, Talend Data Stewardship provides a solution. It presents the user with a sample of the dataset polluted by duplicates. He must notify the application if it is a duplicate or not or if he does not know it. The results of this human evaluation are integrated in a Machine Learning engine on Apache Spark which establishes rules of duplicates by learning. These rules are then applied to the entire dataset. This saves time and improves the quality of the data.
Outside of a Big Data context, a grouping campaign will group the lines of a data set according to their nature: duplicate lines, duplicate duplicates, and voluntary duplicates. Thus, each category will be treated differently: true duplicates will include a deduplication campaign managed in Talend Studio; False duplicates will be separated and reintegrated into the destination application; Duplicates will remain as they are.
Talend Data Stewardship Integrates Naturally with a Master Data Management Project
The aim of the Master Data Management is to provide “a single version of the truth of the data” throughout the company: it consolidates into a single repository all customer data, products, suppliers. Obviously, new sources of data are constantly being fed, which are all risks of deteriorating the quality of the reference data. And, by nature, it is a centralized tool in the company: the risk of not getting to involve the end users of the data is great.
Talend Data Stewardship provides a unique solution to these challenges of maintaining data quality and user adoption: an intuitive and fully accessible tool, enabling users who know the data in the company to improve or certify the quality of their own data. It is a vector of data governance.
Talend Data Stewardship Summer ’17 integrates natively into Talend Master Data Management to manage the matching of master data with the new data streams that feed MDM continuously. As soon as a new data matches an existing data, it is automatically sent to Talend Data Stewardship. Matching rules can be set to address very complex business rules.
The MDM Integrated Matching
Similarly, Talend Data Stewardship Summer ’17 works on the same data model as Talend Master Data Management: the application automatically inherits any changes to the MDM data model. Changes are propagated directly and used for data controls in the Talend Data Stewardship interface.
To learn more about Talend Data Stewardship, watch this 4-minute video.