Last week Hitachi Vantara Labs announced Machine Learning Model Management To accelerate model deployment and reduce business risk. This innovation provides machine learning orchestration to help data scientists monitor, test, retrain and redeploy supervised models in production. These new tools can be used in a data pipeline built in Pentaho to help improve business outcomes and reduce risk by making it easier to update models in response to continual change. Improved transparency gives people inside organizations better insights and confidence in their algorithms. Hitachi Vantara Labs is making machine learning model management available as a plug-in through the Pentaho Marketplace.
Machine learning explores the study and construction of algorithms that can “learn” from and make predictions on data through building a model from sample inputs without being specifically programmed. These algorithms and models become a key competitive advantage – and potentially a risk. Once a model is in production, it must be monitored, tested and retrained continually in response to changing conditions, then redeployed. Today this work involves considerable manual effort and is often done infrequently. When this happens, prediction accuracy will deteriorate and impact the profitability of data-driven businesses.
David Menninger, SVP & Research Director, Hitachi Vantara Research, said, “According to our research, two-thirds of organizations do not have an automated process to seamlessly update their predictive analytics models. As a result, less than one-quarter of machine learning models are updated daily, approximately one-third are updated weekly and just over half are updated monthly. Out-of-date models can create significant risk to organizations.”
So, what is Machine Learning Model Management and where does it fit in the analytic process?
Machine Learning Model Management recognizes that machine learning models need to be updated periodically as the underlying distribution of data changes and the model predictions become less accurate over time. The four steps to Machine Learning Model Management include, Monitor, Evaluate, Compare, and Rebuild as shown in the diagram above. Each step implements a concept called “Champion/Challenger”. The idea is to compare two or more models against each other and promote the one model that performs the best. Each model may be trained differently, or use different algorithms, but all run against the same data. These 4 steps to Machine Learning Model Management is a continuous process and can be run on a scheduled basis to reduce the manual effort of rebuilding these models.
Hitachi Vantara’s implementation of Machine Learning Model Management is part of the Pentaho data flow which makes machine learning easier by combining it with Pentaho’s data integration tool. In the diagram above the preparation of data may take 80% of the time to implement a model with preparation processes that rely on coding or scripting by a developer. Pentaho Data Integration empowers data analysts to prepare the data they need in a self-service fashion without waiting on IT. An easy to use graphical interface simplifies the transformation, blending, and cleansing of any data for data analysts, business analysts, data scientists, and other users. PDI also has a new capability that provides direct access to various supervised machine learning algorithms as full PDI steps that can be designed directly into your PDI data flow transformations.
For more information on PDI and how it integrates with Machine Learning Model Management see the following blog posts by Ken Wood.