Gartner Warns: Explainable AI Will Drive LLM Monitoring

The race to deploy generative artificial intelligence in business environments is entering a new phase. It’s no longer enough to just implement copilots, assistants, or automated workflows: now the focus is on proving these systems are reliable, behave as expected, and responses can be audited. In this context, Gartner predicts that the increasing importance of explainable AI, known as XAI by its acronym in English, will lead to investments in observability for large language models reaching 50% of GenAI deployments by 2028, compared to the current 15%.

This forecast reflects a profound change in how companies view AI. During the initial adoption wave, interest centered on testing capabilities, launching pilots, and measuring productivity gains. But as these systems begin to take on more sensitive tasks, the emphasis shifts toward trust: why a model responds a certain way, what data it relies on, what biases it carries, how its behavior evolves, and to what extent it can be safely used in production.

Gartner defines explainable AI as a set of capabilities that allow describing a model, highlighting its strengths and weaknesses, anticipating its behavior, and identifying potential biases. The observability of LLMs, for its part, goes beyond measuring response times or resource consumption; it includes metrics specific to these systems, such as hallucinations, drift, token usage, biases, or factual accuracy of responses.

From experimentation to real-world control in production

The consulting firm’s warning points to an increasingly visible problem in businesses: AI is reaching operational environments faster than supervision mechanisms are evolving. When a model is used to summarize internal reports, assist employees, or automate low-impact tasks, errors may be tolerable. But if it enters critical processes, customer service, document analysis, internal advising, or the generation of sensitive content, the lack of traceability becomes more than a technical detail—it becomes an operational, reputational, and regulatory risk.

Therefore, Gartner argues that without a solid foundation in XAI and observability, many GenAI initiatives will be limited to low-risk, internal, or easily verifiable tasks, which would significantly restrict actual return on investment. Economically, the firm also predicts that the global market for GenAI models will exceed $25 billion by 2026 and reach $75 billion by 2029. If spending on models and applications continues growing at this pace, so will the pressure to control how they operate.

This trend is already emerging in the market. Gartner has even created a specific category for AI evaluation and observability platforms, called AEOPs, tools designed to manage the nondeterministic nature of these systems and turn metrics, traces, and assessments into a continuous improvement cycle. It’s a sign of maturity: the business conversation is shifting from “which model to use” to “how to monitor, evaluate, and govern it.”

Regulation also drives this shift

The need to explain and oversee AI isn’t solely driven by technical concerns. It’s also a response to the new regulatory and governance framework being established, particularly in Europe. The European Commission states on its official AI Act page that the new regulation introduces transparency obligations for certain AI systems and models, with a significant portion of these rules becoming applicable from August 2026. Before that, starting August 2025, general-purpose models are already subject to obligations.

This regulatory evolution reinforces Gartner’s thesis: enterprise AI needs more than just performance or efficiency. It must be defensible. Practically, this means organizations will need to justify why a system produced a particular response, what controls it has passed, what limits have been set, and how it has been validated to continue performing appropriately over time.

It’s not only about satisfying regulators or auditors. It’s also about internal management. Legal teams, compliance officers, operations, SREs, and security departments will need to speak a common language with data and AI teams. This requires new tools, processes, and metrics.

A shift in corporate priorities

What’s interesting about Gartner’s forecast is that it shifts the debate from speed to response quality. The firm emphasizes that traditional observability mainly focused on speed and cost, but now the priority is moving toward factual accuracy, logical correctness, and even the tendency of some models to give complacent or biased answers. In other words, AI is no longer evaluated solely by how quickly it responds but by whether its outputs can be trusted.

This approach also aligns with the work of the U.S. National Institute of Standards and Technology (NIST). The institute explains in its AI risk management framework that one of its core goals is to improve the ability to incorporate trustworthiness criteria into the design, development, use, and evaluation of AI systems. In 2024, they also published a specific profile for generative AI to help organizations identify the unique risks associated with these technologies and suggest mitigation actions.

All of this leads to the same conclusion: the second phase of enterprise AI won’t rely solely on more powerful models but on more robust mechanisms to understand and monitor them. Gartner’s forecast does not guarantee that all companies will reach this level by 2028 but indicates where the market is headed. This movement aligns with what is already observed in regulation, security, governance, and operations.

Frequently Asked Questions

What does LLM observability in a company mean?
It’s the ability to monitor and analyze how a language model performs in production, not only in terms of technical performance but also regarding hallucinations, biases, token usage, drift, or response quality.

What is explainable AI or XAI, and why does it matter?
Explainable AI encompasses techniques and capabilities that help understand why a model responds a certain way, what its limits are, and what risks it presents. It’s key for auditing, compliance, security, and business decision-making.

Why does Gartner believe investment in this area will grow so much?
Because companies are moving from testing GenAI in controlled environments to deploying it in real processes, where it’s no longer enough for the system to just function; it must also be traceable, governable, and defensible against risks, errors, or regulatory demands.

What’s the connection with the European AI Act?
The AI Act introduces transparency and governance obligations for certain AI systems and models. This drives many organizations to strengthen monitoring, explanation, and continuous control tools for their AI deployments.

via: Gartner

Scroll to Top