Why Iceberg Demands a Portable Semantic Layer

What Google announced at Cloud Next wasn’t an upgrade. It was a starting gun.

Google announced a cross-cloud lakehouse standardized on Apache Iceberg. Not BigQuery’s tables. Not a Google format. Iceberg. Open. Sitting in object storage you own. Readable by any engine that wants to show up. Google paired it with an open-source Spark engine that reads Iceberg directly, with no BigQuery in the path, and ran the same wiring into SaaS apps. The message under the message was simple: the compute format wars are over.

Iceberg makes compute storage a commodity. Now, make sure your semantics travel with you.

Compute runs on Iceberg now. Your semantics have to travel.

What just got settled, technically (Iceberg)

The lakehouse format war is over. The traces are everywhere if you look. Databricks paid more than a billion dollars for Tabular, the company founded by Iceberg’s creators, then put them to work unifying Delta and Iceberg. Snowflake open-sourced Polaris as a vendor-neutral Iceberg catalog, with AWS, Google, Microsoft, Salesforce, and dbt Labs signed on as collaborators. AWS shipped S3 Tables, the first cloud object store with Iceberg built in. Microsoft built Iceberg into OneLake and Fabric. And now Google has made Iceberg the front door of its cross-cloud lakehouse. Four hyperscalers. One format. No winner, because the winner is the standard.

The proof is on the field. Netflix, the company that invented Iceberg, now runs an exabyte-scale data warehouse with three million Iceberg tables, ten petabytes ingested per day, and a 99.5% Iceberg adoption across the warehouse (Netflix Trino Summit 2024 slides). They run Snowflake, Trino, and Spark on top of the same open data, with no migration between them. Netflix turned Iceberg from a thesis into a working architecture. Everyone else is following.

The trail goes through the application layer too. Salesforce Data Cloud rebuilt its lakehouse around Iceberg and now runs four million Iceberg tables holding fifty petabytes of customer data. Their head of software engineering called Iceberg “an open storage standard that simplifies zero copy data access for organizations across their ecosystem.” It’s the foundation under Agentforce.

Five years ago your data sat inside whichever warehouse you signed with, in a format only that warehouse could read, and big switching costs. Today your data sits in your bucket, in your account, in a format every engine can read, and switching costs have plummeted.

What just got urgent, strategically (semantics)

If you run a data platform at a bank, an insurer, or a global manufacturer, semantics just got more urgent.

For years your compute strategy was a procurement strategy. You picked a warehouse, you negotiated a contract, you rebuilt your pipelines to match its dialect, and you hoped the workload mix you signed up for matched the workload mix you actually got three years later. It rarely did. You ended up overpaying for general-purpose compute on workloads that needed something specialized, or fighting your vendor to add a capability they were never going to prioritize for you.

Iceberg ends that bargain. When your data sits in your own object storage in an open format, you stop choosing a warehouse and start choosing an engine for the job in front of you, this quarter, this workload. Snowflake for the SQL your finance team already knows. Spark for the heavy transforms. A specialized engine for the AI workload that didn’t exist when you signed your last EDW contract. Same data. Different tools. No migration.

That’s a procurement win and an architecture win at the same time. You stop paying a premium for switching costs that no longer exist. You stop being held hostage by a roadmap you don’t control. And the next time a new class of workload shows up, you can pick the right engine for it without rebuilding the warehouse underneath.

The catch: this only works if your metric definitions travel with you. If “customer,” “exposure,” “net revenue” only render correctly inside one engine’s SQL dialect or one BI tool’s modeling layer, you didn’t actually decouple. You moved the lock-in upstairs.

That’s the part the vendors are quietly counting on. Snowflake shipped Semantic Views. Databricks shipped Metric Views. Microsoft’s pushing Fabric IQ. They’re racing to be the place your definitions live, because they know that’s the new switching cost. The compute moat is gone. The meaning moat is being dug right now.

Iceberg made compute a commodity. Now make sure your semantics travels with you.

Three questions to ask about Iceberg and semantics on Monday

The data finally stays put. The meaning has to travel. Before your next vendor pitch, your next architecture review, your next AI agent pilot, ask these out loud.

If I switch engines next quarter, do my metric definitions come with me? If the honest answer involves rebuilding, you don’t have a semantic layer. You have a dependency on a SQL dialect.
Do my BI users, my Python notebooks, and my AI agents return the same number for the same metric? If not, your governance problem is wearing a tooling costume. Iceberg will rip the costume off.
Is my semantic logic portable to any Iceberg engine, or does it live inside one? Portable is the right answer. Locked is the common one.

If you can’t answer all three with conviction, you have work to do this quarter. Not next year. This quarter.

Open storage demands open semantics

The data is going open. The engines are going commodity. Whatever is left that locks you in is load-bearing for the vendor and dead weight for you. Metrics are the last hiding place, and every vendor at Google Cloud Next this year was racing to claim them.

We’ve argued this case at AtScale for thirteen years: the semantic layer belongs in your infrastructure, not inside a BI tool, not inside a warehouse, not inside someone’s AI stack. We put that argument on the record this year by joining the Open Semantic Interchange alongside Snowflake, Salesforce, dbt Labs, and others. Open metric definitions. Vendor-neutral. Built to travel.

Iceberg just turned that bet from a principle into a deadline.

If your metrics live inside a warehouse or a BI tool today, you’re the lock-in now. The vendors changed the game on you. Change with it.

Iceberg made compute a commodity. Now make sure your semantics travels with you.

Google Just Made Iceberg the Format Foundation of Compute. Make Sure Your Semantics Are Ready to Travel.

What just got settled, technically (Iceberg)

What just got urgent, strategically (semantics)

Three questions to ask about Iceberg and semantics on Monday

Open storage demands open semantics

SHARE

See AtScale in Action