Explainable AI Helps Thwart Risks of Black Box in Deep Learning

When British firm Darktrace needed to ward off potential ransomware attacks, machine learning came to the rescue, aiding clients in filtering and prioritizing threats. When health care organizations need to identify markers for aortic stenosis or cancer, they turn to machine learning solutions from IBM Watson and Microsoft. And when enterprises need to analyze volumes of data to retain clients and prevent fraud, they turn to machine learning.

Machine learning is a subset of artificial intelligence (AI) in which systems learn from large volumes of data, discover patterns and make decisions with reduced human involvement.

The stakes are high – even life saving – for most AI users, so trusting the output is important.

Of course people will have more confidence with a model when they have access to it. This lets them understand the essentials of its logic.

But not all products allow you access to the algorithms.

In order to make AI more accessible to organizations or departments that don’t have data scientists, software companies are offering applications that have AI — powered by a “black box.” The algorithms are accessible only by the software manufacturer.

Further, many of the black-box solutions are cloud based and use the same algorithm across all of their clients’ data sets. So the application is being trained — but perhaps not in the way domain experts might want.

In retail, for example, black box merchandisers will boost certain types of products — based on the behaviors across all retailer customers. This might be great if you are selling more commodity products, but certainly not acceptable if you cater to more niche customers.

The Problem of the Black Box in Deep Learning

Deep learning often serves as the foundation for powerful applications that make mind-boggling tasks seem effortless to the user. Beneath that ease of use, however, deep learning is complicated. That complexity makes it highly useful, but also muddies the ability of a deep-learning system to explain each success.

When humans fail, the human may be questioned and their actions retraced in order to fix or improve the outcome. Many popular AI processes likewise may be monitored and queried. But neural networks get in the way of that.

In a deep learning system, innumerable calculations are made within the many hidden interior layers of nodes. The unseen calculations are so variable, so elaborate, and so voluminous that even if they are made visible, they may defy human analysis. In industries such as health care, some argue, life-and-death decisions cannot be entrusted to an algorithm that no one understands.

But others contend that deep learning simply reflects the complexity of the real world in which human analysts don’t fully understand their observations, either. Deep learning and transparency, they say, can be complementary.

Beyond having confidence in an initial AI insight, enterprises must also have confidence that the insight will hold true for future data input. Was the initial input comprehensive enough? Is the conclusion robust? Might the input be maliciously manipulated? The answer is yes, and data scientists must strive to anticipate and defend against adversarial machine learning.

Without adequate insight into a deep-learning model, and without adequate testing, enterprises have faced costly setbacks and high-profile embarrassments.

Deep Learning Remains Indispensable

Alarmists and Luddites recall such risks and argue against deep-learning methodologies. But if we forego deep learning, we cede the realms of virtual assistants, automatic translation software, image recognition and analysis, advanced business analytics, and question-and-answer systems to our more-innovative competitors.

The benefits of deep learning are too great to pass up. The downsides, meanwhile, are successfully mediated with robust testing — and with measures to make black-box activity interpretable. To reassure enterprise managers, AI developers are seeking to increase industry focus on machine learning that is explainable.

What and Why of Explainable AI

iStock 1165343609

Explainable or interpretable AI describes any machine language algorithm whose output can be explained. This is an important distinction from understanding how AI reached a conclusion.

Explainable AI tells us the data source; it identifies the algorithms or models and the reason for their use; it records how learning has altered parameters over time; it summarizes how the system appeared for a specific outcome; and it traces cause-effect relationships.

These explanations must be intelligible to developers and eventually users. The trade-offs of such explanation include reduced speed and exposure of proprietary intellectual property.

As software engineer Ben Dickson observes, explainable AI includes both local and global explanations. Local explanations interpret individual decisions; global explanations interpret the general logic behind a model’s behavior. Development must include comprehensive stress testing — where users throw poorly classified data at the system to see how it reacts — alongside algorithms that communicate interim internal states and decisions.

Extending Explanation Into the Black Box

To some extent, AI has always been explainable, particularly outside the niche of neural networks. “In reality, the machine learning field started with many simple models, all of which can be fairly easily expressed and explained using charts, graphs, or human language,” says Lucidworks Chief Algorithms Officer Trey Grainger.

“The problem with neural networks is that they effectively build their own low-level model internally to represent concepts and relationships, and while that model can output the correct results, it’s not human interpretable.

”You would have the same problem if you asked a person to answer a question and then tried to explain their answer by doing a brain scan. You might be able to identify some coarse-grained phenomena and patterns going on, but you wouldn’t have a reliable human-interpretable understanding of the factors leading to the answer.”

Grainger foresees a future in which machine-based neural networks have a meta-level understanding of their own thought processes or a supervisory understanding of sibling networks. This is similar to how people try to explain their own thought processes and gut feelings and change their actions based on the conclusions they come to.

Explanation Step-By-Step or After the Fact

Businesses that require the benefits of deep learning sooner rather than later have tools available that approximate explainable AI:

Compact statistical models with straightforward logic, such as a linear regression, are intrinsically explainable, but less predictive than a more-complex deep-learning model.
Models that are not intrinsically explainable, may have to be interpreted post hoc through natural language summary, visualization of learned representations, or explanation by comparison to similar outcomes.

Post hoc explanation has the benefit of preserving a complex deep-learning model’s predictive capacity — but a meticulous step-by-step explanation of a deep-learning outcome remains out of reach due to the depth, variability, complexity, and inaccessibility of calculations.

Global and Local Explanation

Surrogate models represent one method of interpreting deep-learning output. Microsoft Data Scientist Lars Hulstaert explains that a simplified surrogate version of a model is created to represent the decision making process of the more complex model. It is a model trained on the input and model predictions, rather than input and targets. Examples include linear models and decision tree models.

Despite the added difficulty of explaining them, black-box models remain indispensable because of their capacity for superior analysis and prediction — and because in many contexts, a broad or approximate explanation is more than sufficient, provided that rigorous testing has found the output of a model to be reliable.

Accessible Model

Even within a black box application, it is possible to design a model that can be accessed, audited, and amended by users. Lucidworks Fusion, for example, includes user-configurable interfaces for automated learning of synonyms, query rewrites, misspellings, personalization profiles, and natural language understanding.

How to Choose and Implement a More Explainable AI

Deep learning is as explainable as a medical scan for cancer — or missing pennies in a corporate financial report. Doctors cannot explain every pixel in an MRI scan, and accountants may not trace every dollar spent by an enterprise. But such details are inconsequential: They understand, globally and locally, how their image or document was generated, and they trust the image or document because the processes that created it are well-tested and because the results may be confirmed by other means.

Likewise, AI is trustworthy, and satisfactorily explainable, when:

we are knowledgeable about the data sources
we have access to the underlying model
we have robustly tested the model with different data sets, including malicious data
we understand the relationships between tested input and output
we employ trusted algorithms for intrinsic or post-hoc explanation at local and global levels of the model

At Lucidworks, Grainger emphasizes that explainable AI is a well-established reality in areas of machine learning that aren’t dependent upon a black box. Yet much work remains to be done to make sophisticated deep learning models – which can be much better than humans at finding the right answers under ideal conditions – so explainable and correctable that they are no longer considered black box.

Meantime, Grainger says, enterprises will combine deep learning models with other AI algorithms to make the results more robust by accounting for the limits of a single approach.

The 2025 AI reality check: What 1,100+ companies actually deploy vs. what they claim

2025 Generative AI Benchmark Report reveals only 6% deployed agentic AI while...

The State of Generative AI in Global Business: 2025 Benchmark Report, Dawn of the Agentic AI Era

The first-of-its-kind study using autonomous AI agents to benchmark AI capabilities across...

How an electronics giant meets engineers where they are, with 44 million products in catalog

Meet Mohammad Mahboob: A search platform director navigating 44 million products across...