This is part 3 of “101 Ways AI Can Go Wrong” - a series exploring the interaction of AI and human endeavor through the lens of the Crossfactors framework.
View all posts in this series or explore the Crossfactors Framework
Model inference accuracy can be a matter of life and death - but is it so simple to assess?
Accuracy is the degree to which a measurement or prediction conforms to the known, correct value. Machine learning models are used in countless ways for both classification and prediction. In these functions, model inference accuracy refers to how well the models perform when making classifications or predictions on new, unseen data. It is a measure of the model’s reliability and effectiveness in real-world applications.
Why It Matters
Inaccurate inferences can cost an organisation money, break related processes and even harm humans. The models are very complex, such that visibility to justify an inference is usually not possible. While models may provide a confidence interval along with their predictions, wrong answers provided with high confidence are very dangerous. Accuracy metrics should be tracked at every stage of the development and deployment pipeline.
Real-World Example
Google Flu Trends promised to turn unequaled access to multi-spectral data into formidably accurate flu predictions. But in the end, it could be outperformed by simple algorithms that only used current and past flu rates. This example shows that more data did not necessarily lead to better predictions.
Key Dimensions
Expected value: accuracy as a number doesn’t tell the whole story. The cost of a bad prediction vs. the expected value of a good prediction matters.
Context matters: the type of classification or prediction task being performed and real-world characteristics of the data (e.g. rare event) will dictate what metrics should be used to measure accuracy.
Garbage in - garbage out: this old idiom holds true across the entire machine learning pipeline, from the training data and labels used, to the real world data on which inferences are made.
Given the importance of accuracy in every implementation of machine learning, how should you measure it? Are you tracking it appropriately? What safeguards are in place to guard against failure? If these safeguards involve humans, are you aware of the new layer of complexity you’re introducing?