Blog
Business

The Zero-Copy Illusion: Why Your Multi-Platform Iceberg Strategy is Doomed to Fail

Julius Hollmann
March 12, 2026
5
min read

Enterprise architects believe a shared table format enables seamless cross-platform analytics. The physical reality of networks and compute proves them wrong.

Executive Summary

  • The False Promise: A popular myth suggests that if all your data platforms (Databricks, Snowflake, SAP, etc.) use Apache Iceberg, you can perform zero-copy queries across them. This is technically and physically impossible.
  • Iceberg is a Format, Not a Network: Iceberg is a powerful open table format that solves for storage and schema evolution. It does not solve for network latency, egress costs, or the physical requirement to co-locate data for computation (like JOINs).
  • Zero-Copy Has One Rule: Proximity. True zero-copy analytics works only when the compute engine is brought directly to the data ("compute-to-data"). This is only possible within a single, physically co-located system (e.g., within one Databricks environment in one cloud region).
  • The Inescapable Truth of JOINs: Any query that joins large, high-cardinality tables from separate platforms (e.g., customers in Databricks and orders in SAP) must move data over a network into one engine’s memory. At that moment, zero-copy ends.
  • The Real Solution is a Knowledge Layer: Instead of chasing the zero-copy illusion, the correct architecture accepts the need for selective data movement. A knowledge layer unifies the meaning (semantics, identities, relationships) of your data, not the raw data itself, providing a stable foundation for analytics and AI that survives the chaos of a multi-platform reality.

Introduction: The Architectural Fallacy Sweeping the Enterprise

There is a seductive story being told in enterprise architecture circles. It goes like this: “Because all our major platforms, Databricks, Azure Fabric, SAP Datasphere, Snowflake, can now speak a common language called Apache Iceberg, we can finally achieve a true zero-copy data landscape. No more replication, no more ETL, just one unified query space.”

This vision is powerful. It is also a dangerous misinterpretation of what Iceberg and zero-copy can actually do.
The hard truth is that a shared table format does not magically erase physical boundaries. Your multi-platform Iceberg strategy will not deliver a zero-copy utopia. In fact, for the most important enterprise queries, those that cross domains, it will fail. This isn't a matter of opinion; it's a matter of physics.

Let's dissect the technical reality, piece by piece.

What Zero-Copy Really Is (And When It Works)

Zero-copy is not about having zero data movement. It is about eliminating the need for persistent, redundant copies of data. The only way this works in practice is through a pattern called compute-to-data:

Instead of moving petabytes of data to a central engine, the query engine pushes its logic down to where the data physically resides.


This works beautifully under one non-negotiable condition: proximity. The compute cluster and the storage layer must live within the same logical and physical boundary the same cloud region, the same virtual network, and ideally, the same platform.


Within a single Databricks or Snowflake environment, this is highly effective. The engine can scan Parquet files directly in an S3 bucket in the same region, apply filters, and only move the tiny result set back. This is true zero-copy analytics.


The problem arises when architects assume this benefit extends across platforms.

Where the Illusion Shatters: The Physics of Data JOINs

A JOIN operation is the workhorse of analytics, combining data from multiple tables. And it is precisely here that the cross-platform zero-copy dream dies. Any JOIN requires that the rows from both tables be brought together in the same memory space to be compared.
When tables are on different platforms, in different storage accounts, or in different clouds, the engine has no choice but to pull data over the network. This is especially painful for high-cardinality joins.


The Breaking Point: High Cardinality "For Dummies"

Let’s use an analogy: the two phone books.

Imagine you have two massive phone books, one for Berlin and one for Hamburg.

  • A Low-Cardinality Join: Your task is to find every person named "Anna Schmidt" who appears in both phone books. This is easy. You can have one person in Berlin look up "Anna Schmidt" and read the handful of results over the phone to a colleague in Hamburg, who does the same. You compare the short lists and find the matches. This is like joining on a field with few unique values. It’s manageable via federation.
  • A High-Cardinality Join: Now, your task is to find every single person who is listed in both phone books. This is a "many-to-many" (m:n) join on a field with millions of unique values (the person's name). Trying to do this over the phone is impossible. You can't read millions of names back and forth. The only sane way is to physically ship one phone book to the other city (replication/materialization). With both books in the same room, you can efficiently compare them side-by-side.

Enterprise JOINs, like matching millions of customer IDs to billions of transaction events, are the second phone book problem. You must bring the data together. There is no magical zero-copy JOIN in the sky.

The Modern Reality: Iceberg Data Islands

"But," the architect objects, "our data isn't in the ERP or CRM anymore. We've already replicated it into data products on our various lakehouse platforms!"

This is true, and it's a good first step. But it doesn't solve the core problem. It just moves the islands. Your landscape now looks like this:

  • SAP data is landed as Iceberg tables in an SAP Datasphere environment.
  • Salesforce and weblog data is landed as Iceberg tables in a Databricks on AWS environment.
  • Finance data is landed as Iceberg tables in an Azure Fabric environment.

You still have three physically separate data islands. A query that needs to join customer activity from Databricks with finance data from Azure Fabric still has to ship data across a network. The format is the same, but the physical locations are different. The zero-copy promise is still broken.

The Real Architectural Answer: A Knowledge Layer


If cross-platform zero-copy is a myth, what is the solution?


You stop fighting physics and start managing reality. The goal should not be to eliminate data movement but to make it intelligent, minimal, and meaningful. This is the role of an Enterprise Knowledge Layer.


A knowledge layer, built on a formal ontology, does not aim to unify raw data. It sits above your data product islands and unifies the knowledge about the data:

  • It copies semantics, not data. It materializes the critical identities, relationships, and business rules that define your enterprise. This is typically less than 1% of your total data volume.
  • It provides a stable point of reference. Systems and platforms can change, but the semantic definition of "Customer" remains stable in the knowledge graph.
  • It enables intelligent queries. Instead of guessing at joins, an AI assistant or BI tool queries the knowledge graph first to understand the semantic context, then generates an efficient plan to fetch only the necessary data, using federation for small tables and leveraging selective materialization for large, high-cardinality joins.
Conclusion: Stop Chasing the Illusion

Zero-copy analytics is a real and valuable pattern but only within the walls of a single, homogeneous platform. The idea that standardizing on Iceberg will extend this benefit across your entire, heterogeneous enterprise is a fallacy.


Even when your data lives in modern Iceberg data products, those products exist on physically separate platforms. A shared format does not create a shared architecture. The winning strategy is to embrace this physical reality and build a stable, semantic layer of knowledge to govern it. That is how you turn a collection of data islands into a truly connected enterprise.

Checkout our latest articles:

Deep dive into further insights and knowledge nuggets.

A shared Iceberg format doesn’t make zero‑copy possible across platforms. This article explains why physics breaks the illusion and how a knowledge layer provides the real path forward.
Julius Hollmann
March 12, 2026
5
min read
We compare the 5 best enterprise knowledge graph platforms in 2026. Evaluate d.AP, Stardog, Neo4j, Foundry, eccenca & GraphAware using a practical buyer framework
Julius Hollmann
February 19, 2026
10
min read
LLMs can talk, but they don't understand your business. Ontologies provide the missing layer of meaning, turning generative AI from a promising demo into a correct, scalable, and trustworthy enterprise tool. Here’s why semantics are having a renaissance.
Julius Hollmann
February 4, 2026
4
min read
Knowledge Graphs provide the semantic context, constraints and explicit relationships that LLMs lack. This enables true reasoning, like navigating a map of your business, instead of just text retrieval.
Julius Hollmann
January 26, 2026
4
min read
In this article, you’ll discover why Agentic-AI systems demand more than data; they require explicit structure and meaning. Learn how formal ontologies bring coherence, reasoning and reliability to enterprise AI by turning fragmented data into governed, machine-understandable knowledge.
Julius Hollmann
October 29, 2025
5
min read
In this article you'll explore how Knowledge Graphs bring coherence to complexity, creating a shared semantic layer that enables true data-driven integration and scalable growth.
Julius Hollmann
October 28, 2025
3
min read
If you’re building AI systems, you’ll want to read this before assuming MCP is your integration answer. The article breaks down why the Model Context Protocol is brilliant for quick demos but dangerously fragile for enterprise-scale architectures.
Julius Hollmann
October 20, 2025
4
min read
Despite heavy investments, enterprises remain stuck - learn how Knowledge Graphs and AI-powered ontologies finally unlock fast, trusted and scalable data access.
Julius Hollmann
September 12, 2023
3
min read
Discover how Knowledge Graphs connect scattered data into one smart network - making it easier to use AI, speed up automation, and build a future-ready data strategy.
Julius Hollmann
September 12, 2023
4
min read
GenAI alone isn’t enough. Learn how Knowledge Graphs give AI real meaning, transforming it into a trustworthy, explainable assistant grounded in enterprise reality.
Julius Hollmann
September 12, 2023
3
min read

Data silos out. Smart insights in. Discover d.AP.

Schedule a call with our team and learn how we can help you get ahead in the fast-changing world of data & AI.