If Not Transformers, Then What?

Julius Hollmann

May 18, 2026

•

min read

Executive Summary

Most AI agents today rely on Transformer-based models because they perform well across many tasks and allow for scaling.
Alternatives such as state-space models, hybrid architectures, and diffusion-style generation mainly address efficiency issues like memory use, latency, and long-context handling.
The hardest enterprise problems are still not solved by alternative model architectures. Factual accuracy, hallucination, and prompt injection require a strong agent harness with retrieval, validation, permissions, and monitoring.
‍

The Transformer stays at the center, what changes is everything around it.

‍

1. Transformers Are Still the Default

Almost every model used for modern AI agents is built on the Transformer architecture. Originally designed for language, the Transformer turned out to be a much more general pattern. It converts inputs into token sequences, uses self-attention to compare tokens, and builds context-aware representations that support prediction, generation, and action selection.

That same pattern now extends well beyond text. Transformer-based models are used not only for language modelling, but also for images, audio, video, and action-related tasks. This is one of the main reasons they became the foundation of many agent systems: the same core design can be adapted across very different modalities and use cases.

They also became dominant for practical reasons. Transformer training can be parallelized efficiently on GPUs, the models scale well with more data and parameters, and they have delivered strong results in text generation, coding, classification, and multimodal reasoning. In practice, they are not only versatile but also proven in production across a wide range of applications.

The Transformer pattern now appears in several model families:

Large Language Models (LLMs): These power chatbots, coding assistants, research agents, and workflow automation. They generate text, write code, produce structured outputs, and enable calling tools, APIs, and other agents.
Vision-Language Models (VLMs): These models combine images and text. They can interpret screenshots, extract data from scans, and answer questions about visual content.
Speech and Audio-Language Models: These models support transcription and real-time voice agents. In many production systems, the architecture is still modular: audio is first converted to text by a speech-to-text model, then processed by an LLM for reasoning, and finally rendered back into speech by a text-to-speech model.
Vision-Language-Action Models (VLAs): Include, in addition to image and language understanding, action prediction, allowing them in robotics, for example, to translate camera input and an instruction such as “pick up the red cup” into movement commands.
World Models: These predict how an environment will evolve. For example, they may estimate how traffic will move, how an object will fall, or what a room may look like after a robot takes an action.

So, Transformers are no longer just the architecture behind chatbots. They are the foundation for a much broader class of AI systems, including multimodal applications, voice systems, and physical AI.

2. What Alternative Models Actually Change

The main strength of transformers — attention over many tokens — also creates bottlenecks as agents move from short chats to long documents, video, real-time voice, and physical control.

Cost and latency at long context

In a standard Transformer, self-attention compares tokens with one another. As inputs get longer, computation grows quickly, and memory usage increases because the model must keep track of past tokens. This becomes expensive for agents working with long documents, large codebases, databases, or video streams.

State-space models address this by carrying forward a compressed memory of the past instead of comparing every token with every other token. This can make them more efficient for long inputs. The best-known recent example is Mamba. These models are not yet dominant because the Transformer ecosystem is much more mature, and pure state-space models can struggle with exact recall from context. For now, they are more often used in hybrids or specialized settings than as full replacements.

Sequential generation

Many Transformer-based language models generate outputs one token at a time. This creates a latency bottleneck for long answers, code files, and structured outputs.

Other alternatives focus less on the architecture and more on how outputs are generated. Standard language models often produce one token at a time, which creates latency for long answers or large structured outputs. Diffusion and masked-generation methods try to improve this by refining multiple parts of the output in parallel. In some cases, this can improve both speed and global consistency.

Still, these alternatives are mostly solving operational issues. They target lower memory use, faster inference, better long-context handling, or better performance for continuous signals such as audio, sensor data, and video. Those are important gains, but they do not automatically make an AI agent more accurate and secure.

3. The Biggest Challenges Are Not Solved by Alternative Model Architectures

Current architectural improvements can make agents faster, cheaper, and easier to deploy, but they do not automatically make them trustworthy. Many of the remaining problems are not solved inside the model architecture itself. They are handled by the agent harness: the surrounding system that controls data access, retrieval, tool use, validation, permissions, monitoring, and human oversight.

Factual accuracy

Enterprise agents are expected to answer business-critical questions and automate internal processes reliably. The risk is that a model may answer from incomplete training data, outdated information, or unsupported assumptions.

Enterprises mitigate this by grounding agents in approved data sources rather than relying only on the model’s internal knowledge. The most common approach is retrieval-augmented generation, where the agent first retrieves relevant documents, database records, or knowledge-base entries and then generates an answer based on that evidence. Schema-RAG agents go further: they use knowledge graphs to generate queries and ground answers in large enterprise data.

Hallucination

Hallucination happens when a model produces an answer that is unsupported, fabricated, or subtly wrong. This is especially dangerous in agent systems because the output may trigger an action: sending an email, updating a CRM record, writing SQL, approving a transaction, or changing a configuration.

Hallucination can be reduced by limiting what the model is allowed to do and by adding verification steps around important actions. Common controls include deterministic workflows, query validation, calculation checks, and human approval for high-risk operations. In mature deployments, the LLM is not treated as the system of record. It is treated as a reasoning interface wrapped with validation and guardrails.

Prompt injection

Prompt injection is a security risk where malicious or untrusted content causes the agent to violate its instructions. This can happen when an agent reads a message, web page, email, or document that contains hidden instructions. The risk grows as agents gain access to more tools and internal systems.

The core issue is that language models do not naturally separate instructions from data with the same rigor as secure operating systems. To the model, system instructions, user prompts, retrieved documents, and malicious text all appear as tokens in context. There is no alternative model architecture that solves the vulnerability.

Enterprises mitigate this risk by restricting tool permissions. Sensitive actions should require confirmation or human review. Other defenses include guardrails, sandboxed tool execution, output monitoring, and in some cases fine-tuning. The safest enterprise agents are designed as controlled systems, not autonomous models with unlimited authority.

Conclusion

Transformer architecture remains the foundation of modern AI agents because it is flexible, scalable, and mature across many tasks. Large language models, vision-language models, audio-language systems, vision-language-action models, and world models all show how far the Transformer pattern has spread.

Alternative architectures such as state-space models, hybrids, and diffusion-style generation are important because they improve efficiency, especially for long context, streaming inputs, and constrained environments. However, they do not solve the most critical enterprise challenges on their own. Reliable AI agents still require grounding, validation, security controls, and careful system design around the model.

So, if the question is “If not Transformers, then what?”, the answer is not a simple replacement. It is a broader design choice: use the right model architecture for the workload but build the real trust layer outside the model.

‍

Checkout our latest articles:

Deep dive into further insights and knowledge nuggets.

Business

Two Kinds of Reasoning Your AI Agent Needs to Succeed

If you're evaluating AI agent strategies, the question isn't whether to use an LLM or a knowledge graph, it's how to combine them so your agents can think creatively and act correctly.

Julius Hollmann

June 29, 2026

•

min read

Business

Why governed semantics beats fine-tuning for enterprise agents

Fine-tuning can improve model’s performance on domain-specific tasks, but it stores business knowledge in model weights that are difficult to audit, difficult to reliably update, and costly to maintain over time.

Julius Hollmann

June 8, 2026

•

min read

Business

Why OpenClaw Is Not Enough for Enterprise Data Agents

Platforms like OpenClaw solve the visibility problem: they make it possible to ask questions of your data through a conversational interface. The harder problem ensuring those answers are accurate, consistent, explainable, and secure requires an investment in knowledge architecture that no agent runtime provides on its own.

Julius Hollmann

April 10, 2026

•

min read

Business

The Zero-Copy Illusion: Why Your Multi-Platform Iceberg Strategy is Doomed to Fail

A shared Iceberg format doesn’t make zero‑copy possible across platforms. This article explains why physics breaks the illusion and how a knowledge layer provides the real path forward.

Julius Hollmann

March 12, 2026

•

min read

5 Best Enterprise Knowledge Graph Platforms in 2026

We compare the 5 best enterprise knowledge graph platforms in 2026. Evaluate d.AP, Stardog, Neo4j, Foundry, eccenca & GraphAware using a practical buyer framework

Julius Hollmann

February 19, 2026

•

min read

Business

The Semantic Renaissance: Why Ontologies Are the Key to Enterprise AI

LLMs can talk, but they don't understand your business. Ontologies provide the missing layer of meaning, turning generative AI from a promising demo into a correct, scalable, and trustworthy enterprise tool. Here’s why semantics are having a renaissance.

Julius Hollmann

February 4, 2026

•

min read

Business

Knowledge Graphs Are the Key to Enterprise AI

Knowledge Graphs provide the semantic context, constraints and explicit relationships that LLMs lack. This enables true reasoning, like navigating a map of your business, instead of just text retrieval.

Julius Hollmann

January 26, 2026

•

min read

A highly detailed 3D visualization of a formal ontology network — glowing nodes and structured semantic connections arranged in perfect geometric symmetry, representing knowledge organization and reasoning in Agentic AI systems.

Business

Why Formal Ontologies Are So Powerful in the Age of Agentic-AI

In this article, you’ll discover why Agentic-AI systems demand more than data; they require explicit structure and meaning. Learn how formal ontologies bring coherence, reasoning and reliability to enterprise AI by turning fragmented data into governed, machine-understandable knowledge.

Julius Hollmann

October 29, 2025

•

min read

A dark futuristic scene showing a glowing blue knowledge graph connecting multiple fragmented enterprise systems into one unified semantic network. Digital data flows, nodes, and holographic interfaces represent order emerging from complexity. Ultra-detail

Business

Why Every Buy-and-Build Needs a Knowledge Layer

In this article you'll explore how Knowledge Graphs bring coherence to complexity, creating a shared semantic layer that enables true data-driven integration and scalable growth.

Julius Hollmann

October 28, 2025

•

min read

A futuristic digital interface visualizing a glowing blue holographic brain surrounded by data panels and neural network diagrams on a dark background, symbolizing artificial intelligence, data processing, and advanced technology.

Business

MCP: why simplicity isn’t an architecture

If you’re building AI systems, you’ll want to read this before assuming MCP is your integration answer. The article breaks down why the Model Context Protocol is brilliant for quick demos but dangerously fragile for enterprise-scale architectures.

Julius Hollmann

October 20, 2025

•

min read

A complex digital visualization showing interconnected data pipelines and network pathways in blue and white tones on a dark background, representing data flow, system integration, and digital infrastructure.

Business

Breaking the Data Bottleneck: Why Enterprises Struggle to Become Truly Data-Driven

Despite heavy investments, enterprises remain stuck - learn how Knowledge Graphs and AI-powered ontologies finally unlock fast, trusted and scalable data access.

Julius Hollmann

September 12, 2023

•

min read

A digital network visualization showing interconnected nodes and lines forming a complex web of data connections on a dark background, symbolizing digital communication, cloud networks, and information exchange.

Business

Why Knowledge Graphs Are the Foundation of Modern Data Architecture

Discover how Knowledge Graphs connect scattered data into one smart network - making it easier to use AI, speed up automation, and build a future-ready data strategy.

Julius Hollmann

September 12, 2023

•

min read

A glowing digital brain made of interconnected circuits and lights on a dark futuristic interface background, symbolizing artificial intelligence, neural networks, and advanced technology.

Business

Beyond GenAI: Why Semantics, Not Algorithms, Unlock Enterprise Intelligence

GenAI alone isn’t enough. Learn how Knowledge Graphs give AI real meaning, transforming it into a trustworthy, explainable assistant grounded in enterprise reality.

Julius Hollmann

September 12, 2023

•

min read

View all

Data silos out. Smart insights in. Discover d.AP.

Schedule a call with our team and learn how we can help you get ahead in the fast-changing world of data & AI.