Design Principles for Packaged Machine Learning Solutions

One of the biggest advantages IT leaders gain by working with an AI-powered IT Business Analytics solution provider, such as Numerify, is being able to tap into the predictive power of machine learning (ML) without getting into the complex code and algorithm behind it. There are several use cases where IT teams benefit from the application of ML and advanced analytic techniques, including Change Risk Prediction.

While ML models are specialized in nature and cannot be applied using a one-size-fits-all approach, there are shared business problems among customers that can be solved with pre-defined ML pipelines and common algorithms. Doing so allows us to cost effectively  train and deploy these solutions so that customers can benefit from state-of-the-art analytic techniques without having to acquire expensive and hard to hire data scientists. 

How does it work? Here’s a peek behind the curtain at some of the core concepts behind Numerify’s pre-packaged ML pipelines:

1) Business Problem Driven: The first step in building a successful productized ML model is identifying the right business problem. The key is  to define business problems in a way that could apply to a broad set of customers and still derive high value.

The next step is to package all possible ML approaches to solve the same objective. For instance, Incident Volume Reduction is a common objective of any large-scale IT organization However, it cannot be addressed by a single ML solution.

Our approach is to package multiple ML techniques such as root cause analysis, topic models to identify key topics behind clusters of incidents, associated sets of attributes with high incident volume, etc. A packaged solution, therefore, must contain an end-to-end solution comprising multiple ML pipelines. It’s the actual data that verifies different hypothesis and converts the verified one into actionable insights.

2) Data Cleansing: A domain specific ML pipeline has knowledge of common data problems such as imbalanced data, high cardinality, short text problem, common terms in textual data and standard outlier detection, and so on. Pre-built rules and algorithms  handle them to make ML pipeline robust and production ready.

3) Functional User control on Feature Engineering: Feature engineering is the most crucial component of any applied data science projects. Robust feature engineering requires strong domain knowledge as well as data science skills.Domain knowledge helps in drafting potential relationships and interactions within data. Data scientists use this knowledge and apply them to create stronger predictors from raw data. While many tools aim at automating feature engineering, functional knowledge of data sources and business processes is a must to start feature engineering on the right data-sets.Numerify's Design Principles for Packaged Machine Learning SolutionsNumerify’s ML workbench is a core component in the Numerify Platform where functional users can customize or extend the data-set being fed into the pre-defined Machine Learning Solutions. This enables the seamless application of domain knowledge into data science solutions.

Data and metadata profiling capabilities of the platform enable functional users to select the right data-set and seamlessly feed it into the downstream ML pipeline. They can also segment and branch off ML pipeline if different distribution is observed in different business segment to fit separate models for different segment with just a few clicks. 

4) Segregation of data-driven and data-agnostic phases: Segregating data driven phases such as feature selection and feature engineering from model training, evaluation, and deployment helps in complete abstraction and automation of data-agnostic steps. Customization can be limited to data-driven phases, significantly bringing down the time to customize and deliver an ML solution for a specific customer.

5) Automated Inference Delivery: Numerify’s ML workbench allows users to train multiple models simultaneously that can be compared and evaluated based on pre-defined evaluation criteria. Users can then promote the approved model directly from development to production. Regular batch ETL calls the deployed model via the scoring API to score incremental data and loads the inferences in high performance datastore. Interpretable insight from the ML model is generated with custom and open source libraries such as LIME and presented as actionable insights in user’s dashboards.

Our mission is to expose the power of machine learning and advanced analytics to as many users as possible while handling the complex modeling involved on their behalf. Our ML pipeline and ML workbench are designed to do exactly this, so that IT leaders can easily and quickly harness the powerful abilities of the latest AI technology to improve their organizations.

Related Blogs

How Success Should Be Defined for an IT Department in an Organization
Posted by Amit Shah | September 9, 2019
How Success Should Be Defined for an IT Department in an Organization
Defining success for an IT department based on meeting goals and Key Performance Indicators (KPIs) can lead to steadier, more measurable long-term improvement for the entire organization. KPIs also allow...
analytics maturity - Assessing IT Analytics Maturity with Gartner’s 4 Stages | Numerify
Posted by Ted Sapountzis | September 3, 2019
IT Decision-Making Through the Lens of Gartner’s Analytics Maturity Model
An IT business analytics solution’s capabilities should be evaluated for its ability to aid in proactive decision-making. Whether an IT team is analyzing their current solution or evaluating a new...
Numerify - Proactive-analytics-vs-reactive-reporting
Posted by Ted Sapountzis | August 26, 2019
Why Your Organizations Needs Proactive IT Analytics
Many IT departments find themselves constantly putting out fires, reacting to problems on a day-to-day basis. However, tending to issues as they arise often swallows valuable time and resources. That’s...