CIOs can often observe too many AI/ML projects stalling and eventually being cancelled. Here, we discuss the ten most common causes of this unfortunate situation, and what to do to correct the problem. CIOs can ask their teams to evaluate their AI/ML project work in light of these ten causes, and formulate a plan to accelerate project progress. The ten most common causes we discuss are:
- Deteriorating business case
- Underestimating model training
- Lacking data quality
- Addressing data integration
- Managing data volumes
- Incorporating iterative development
- Responding to data shift
- Under-specifying the model
- Validating results
- Complicating the algorithm
Deteriorating business case
Organizations approve an AI/ML project based on an appealing business case. As the project proceeds, some events such as the following can undermine the business case:
- Discovery of additional complexity in the solution leading to a significant increase in the cost-to-complete forecast and the annual operating cost estimate.
- Recognition that the solution requires data the organization does not own.
- Changes in customer expectations or preferences.
- Actions by competitors.
The best practice response is to update the business case and determine if continuing the project is still appealing. Allowing projects that are unlikely to produce a net benefit to drag on wastes the organization’s resources.
Underestimating model training
Organizations underestimate the work that goes into training AI/ML models sufficiently. AI/ML project teams tend to underestimate the:
- Data scientists’ efforts required to train models.
- Business expertise and effort needed to collaborate with the data scientists.
- Data volume and variety required.
The best practice response is arbitrarily double the model training estimate during project planning. If this increase materially undermines the project’s business case, it’s best not to approve the project and avoid failure.
Lacking data quality
Too frequently, data quality issues in internal datastores stall AI/ML project progress. The common issues are:
- Inaccuracy – The available data is not accurate enough to produce reliable results from the model.
- Insufficient richness – The available data often lacks values and rows required to train the model effectively.
Improving internal data quality requires the organization to recognize the value of its data and improve its data stewardship processes. Improving data richness to achieve high-quality training data requires the project team to create synthetic data. These actions add schedule and cost to the project.
Addressing data integration
Too frequently, these data integration issues stall AI/ML project progress:
- Data accessibility complexity – The required data is scattered across too many internal and external datastores housed in multiple data centers and clouds to support reliable real-time access.
- Inability to integrate data – Differences in data values and formats across datastores are too complex to integrate the data cost-effectively.
Addressing complex data integration issues requires the project to abandon the preferred approach of accessing data in its original location. Instead, the project team will be forced to transform and load the data into a data warehouse. Adding the creation and operation of a data warehouse to the project scope adds schedule and cost to the project.
Managing data volumes
As AI/ML models become larger by feeding them more data and increasing the number of parameters, they become better and better. However, increasing the data volume will stall AI/ML project progress by increasing the:
- Cost and elapsed time for each model test run.
- Effort to evaluate the results of each test run.
Managing growing data volumes requires:
- Consumption of more compute resources.
- Implementation of a graph DBMS.
- More attention to optimizing the algorithm.
Incorporating iterative development
Developing AI/ML applications is an iterative or agile development process, even though that’s not widely recognized. It is difficult to predict precisely:
- How many algorithm tuning iterations will be required to achieve sufficient reliable functionality.
- Which data or parameters you will need to complete the model until you learn from the algorithm training iterations.
Incorporate iterative development into the project plan by explicitly including at least five iterations. Plan for more if your team is new to the problem space of the project.
Responding to data shift
Data shift is the term used for the situation when the difference between the training and real-world data is too much for the AI/ML model. The result is unreliable model recommendations and false positives and negatives.
Respond to data shift issues by creating a richer training dataset and tweaking the algorithm.
Under-specifying the model
Underspecification of an AI/ML model produces multiple results with high confidence. It occurs when model versions differ in small, arbitrary ways, such as:
- The random values assigned to the nodes in a neural network before training starts.
- The way training data is selected or represented.
- The number of training runs conducted until the model is deemed ready for production use.
These minor, often random, differences are typically overlooked because they don’t affect model results during training. However, these subtle differences can lead to enormous variations in model performance in the real world.
Underspecification is addressed by conducting additional tests using different datasets focusing on specific features. This testing work can materially increase the cost of AI/ML model development to reduce the risk of nasty surprises during the production use of the model.
If your AI/ML model is not delivering the intended results, it’s most likely because:
- The training data was incomplete or skewed the model results in the wrong direction.
- The algorithm selected is not appropriate for the problem space.
These issues are best addressed by carefully validating results after every test run and not deferring validation until later in the project. In most cases, the action will be to curate additional training data.
Complicating the algorithm
Recent advances in AI/ML algorithms have come chiefly from minor tweaks and significant additional compute power rather than breakthrough innovations. Examples of limited algorithm advances, where the last breakthrough is about a decade old, include:
- Information retrieval algorithms – are used heavily in search engines.
- Pruning algorithms – make neural networks more efficient by trimming unneeded connections.
- Loss functions – specify the algorithm’s objective mathematically.
- Generative adversarial networks (GANs) – pair neural networks in a create-and-critique cycle.
- Recommendation algorithms – apply neural networks to product and entertainment recommendations.
Addressing AI/ML algorithm issues requires the AI/ML project team to review iwhether the complexity that’s been added to the algorithm during the project has, in fact, produced accuracy and performance benefits. Simplifying the algorithm used to generate the AI/ML model will improve performance with no reduction in accuracy.