In this article, I want to highlight some of the new features we’ve packed into Talend Data Preparation version 2.0.
In a nutshell, Talend Data Preparation 2.0 is a new version of Talend Data Preparation that allows users to change their dimensions: in terms of their uses – notably by democratizing the exploitation of big data and by taking into account the data types specific to each client – as well as in terms of the scalability of performance it offer: It is the first data-processing tool that supports Apache Beam technology and it enables to remain at the forefront of data processing environments (first Spark, then MapR, Flint, APEX, etc.)
Talend Data Preparation, version 2.0 retains all the functional power and user experience that allow users to get data perfectly cleaned, enhanced and standardized in minutes rather than days. It also allows IT to guarantee users secure access, data and preparation exchanges with confidence, compliance with data governance rules and connectivity to all applications in the enterprise. Here are 3 big updates to Talend Data Preparation that you should know:
Data Preparation to Democratize your Big Data and Data Lake
How do you allow non-technical business users to fully utilize the intelligence of big data stored in bulk in data lakes? Like marketing teams who want to analyze click streams from the website or sales tickets returned from the store network. Or how about finance, accounting or purchasing users who want to use their vendor’s billing details or historical customer financial health data?
Talend Data Preparation helps you unleash the full power of your data lake! These business users can confidently access all the data sources to which they are entitled – regardless of access mode – so that they can be viewed, discovered, cleansed, standardized and presented according to their own management rules in minutes rather than days.
Functionally, IT provides – on-demand or on a live mode – users with self-service ‘sanctioned big datasets’ from the data lake through HDFS connector. Depending on their rights, users can even benefit from even greater access autonomy: they can access the data lake themselves. Then, users intuitively prepare the data via their web browser, at the pace of their discovery of the data file. They are guided by functions of self-discovery of the data, self-diagnosis of their quality, autosuggestion of cleansing functions. To enable them to manage thousands or millions of lines of big data files, Talend Data Preparation helps users work on smart and selective sampling of data and their work is automatically applied to the data set. The user preparations are then put into production by the IT to inject them again in the data lake, or in any business application, on-prem or cloud. Here again, users can benefit from a greater autonomy according to your management rules: they can generate their own export files.
Talend Data Preparation provides self-service access and a Hadoop File and Storage System (HDFS) export for CSV, Parquet, Avro and JSON files by natively embedding the Kerberos authentication system.
Note that Talend Data Preparation also allows any user to prepare and integrate data from any given database type (JDBC connector), any application, any Excel or CSV file received by email or Stored locally. Talend Data Preparation’s maximum connectivity serves all data exploitation scenarios.
Talend Data Preparation Automatically Learns Data Language
Each company works on both standard data (name, first name, telephone number, VAT number, cities, countries, etc.) and on specific data (customer numbers, product codes, analytical accounting codes, Etc.).
If your data preparation application cannot recognize the semantic type of these specific data, how can it guarantee reliable self-discovery of data, effective self-diagnosis and self-sufficiency of relevant cleaning functions? If your data preparation application does not know how to recognize and then learn your specific data types, you will lose in productivity because your users will be forced to do more preparation work manually.
Talend Data Preparation speaks your business language by taking into account your specific semantic types of data. Its Data Dictionary Service analyzes and defines them once and for all. You benefit from the automation and fineness of analysis of optimal data whatever your data to prepare.
Talend Data Preparation Now Supports Apache Beam
Talend Data Preparation opens the possibilities for all users by democratizing the operation of Big Data and Data Lake in a few minutes. But the exploitation of these enormous volumes of data that are both extremely varied and generated in real time requires advanced data processing performance. Given the pace of innovations in the field of big data, investments are quickly obsolete and cost prohibitive for companies. The race for innovation (there is on average a new version of Spark every 6 months) becomes a brake on adoption.
In order to help companies escape this vicious circle that Talend has decided to adopt Apache Beam technology: Talend announced the first data preparation solution for Big Data on Beam. This technology enables companies to deliver a sustainable data preparation service to their users, regardless of the platform used.
Functionally, Apache Beam avoids having to rewrite applications as innovations, migrations of systems towards the cloud or evolutions of the scenarios of integration (batch, real-time). Users create their data preparation models once and run them anywhere on unlimited data volumes. Talend Data Preparation 2.0 delivers unprecedented agility, perfect scalability, and state-of-the-art performance.
Technically, Apache Beam adds an abstraction layer between the data preparation application and the various data processing execution environments. Beam hides this complexity from customers’ eyes by allowing Talend Data Preparation to be agnostic about technologies.