Hu Yoshida

Pentaho Streamlines the ETL Process with Metadata Injection

Blog Post created by Hu Yoshida Employee on Apr 13, 2016


Data analytics is critical for business success. The ability to quickly gather data from different sources, analyze them and deliver actionable outcomes, gives you a competitive edge. We have a lot of data that can help us make better decisions. But the problem is that the data is generated by multiple applications, developed and supported by different vendors, managed by different departments and individuals, and run on different computer platforms. That means that the data formats are different and this makes it difficult and to run an analytics program across a combination of disparate data sources. In addition to having different headers and table formats, we need to understand if H. Yoshida from one data source is the same person as Yoshida, H. from another data source.


ETL systems are commonly used to integrate data from different repositories. ETL stands for Extract, Transform, and Load and involves three steps. Extract is the process of extracting data, in part or in whole from different data sources. Transform then reformats the data for use in the target system. During this process the data is cleansed to resolves the differences between H. Yoshida and Yoshida, H. Load is the process of loading the transformed data into the target repository. It may be a data warehouse or data mart for data analysis, or another repository for migration or consolidation of data from different applications.


Traditional static ETL or hand-coding approaches require repetitive, time-consuming manual design that can increase the risk of human error. This becomes even more time consuming and risky when you need to ETL hundreds of different files and tables, and maintenance workload increases as the formats change. Streamlining the ETL process of data onboarding and ETL maintenance shortens the analytics task and leads to faster decisions and business advantage.


Pentaho helps solve this problem with metadata injection which is being introduced in the 6.1 release announced today. Metadata is data about the data such as the field names, field types (string, number, etc.). Metadata injection refers to the dynamic passing of metadata to PDI (Pentaho Data Integration) transformations at run time in order to control complex data integration logic. The metadata (from the data source, a user defined file, or an end user request) can be injected on the fly into a transformation template, providing the “instructions” to generate actual transformations.  This enables teams to drive hundreds of data ingestion and preparation processes through just a few actual transformations. In data onboarding use cases, metadata injection reduces development time and resources required, accelerating time to value.

MetaData PDI.png

PDI steps supported for metadata injection are listed below:   

PDI steps image.png


For more information about Metadata Injection as well as additional new features in release 6.1 please see the press release. A short video that explains Metadata injection is also available on YouTube.



Learn more about Pentaho 6.1 – watch the video, access the download and free trial

Register for the Pentaho 6.1 webinar on Thursday, May 5, 2016 

Follow Pentaho on LinkedIN, Twitter and Facebook