Cloudera Accelerates AI and Analytics with an Open Lakehouse: Iceberg REST Catalog for Zero-Copy and Interoperability, and an Optimizer Promising Up to 13× More Performance

Cloudera announced at EVOLVE25 NYC two initiatives that reinforce its commitment to an open lakehouse built on Apache Iceberg: on one hand, the integration of Cloudera Iceberg REST Catalog across its platform —to share data without copies and with unified metadata—; on the other hand, Cloudera Lakehouse Optimizer, an automated optimization and smart maintenance service for Iceberg tables that, based on its internal tests, accelerates queries up to 13× and reduces storage by 36%. Both products are available starting today, while the on-premises version of the Optimizer will arrive in a future release.

The move clearly sends a message: as companies strive to bring AI to their data, where it resides, architectural complexity, silos, and inconsistent governance become obstacles. Cloudera proposes a single fabric of security and governance —its Shared Data Experience (SDX)— with a REST catalog that speaks Iceberg and opens content to multiple engines without migrations or duplication, along with an optimization engine that keeps tables healthy and efficient for any compatible engine.


Why it matters: AI “on any cloud” without moving data

In daily operations, moving data for training, inference, or analysis increases costs, adds attack surface, and delays decision-making. The Iceberg REST Catalog integrated by Cloudera tackles this bottleneck: exposing tables and metadata via REST, with unified policies and lineage/auditing extended to any authorized consumer. The promise is to enable zero-copy interoperability with third-party engines —Snowflake, Databricks, AWS Athena, AWS EMR, Salesforce, among others— maintaining ACID compliance and consistent access policies through SDX.

Simultaneously, the Lakehouse Optimizer reduces the “invisible friction” of Iceberg tables: smart rewriting of manifests and position deletes, compaction, clustering, proactive maintenance, and declarative policies at table or entire catalog level. Where once there were ad hoc jobs, maintenance windows, playbooks, and operational debt, now automation promises enterprise-level observability.


Cloudera Iceberg REST Catalog: open interoperability, unified governance, and lower TCO

Cloudera claims to be the first provider to integrate the Iceberg REST Catalog into a end-to-end data and AI platform — from real-time ingestion and mass processing, to consumption in BI and AI. Key capabilities include:

  • Zero-copy data sharing: third-party accesses data directly managed by Cloudera without copying or moving, whether in public cloud, data center, or edge environments.
  • Unified governance and security: with SDX, access policies, lineage, and auditing extend to external tools, avoiding “gray zones” on the perimeter.
  • Open metadata: instant asset discovery without lock-in to proprietary catalogs; the REST Catalog becomes a single source of truth to accelerate AI development and analytics.
  • Lower TCO and shorter time-to-value: Cloudera reports that customers have achieved up to 79% savings in storage costs while improving visibility across business lines. For example, a satellite sector multinational reportedly gained these savings while strengthening their AI data pipelines.

The core takeaway: by standardizing access via REST and Iceberg, Cloudera aims for each company to future-proof their data strategy without being trapped in a proprietary catalog — maintaining control, visibility, and compliance.


Lakehouse Optimizer: “hands-free” maintenance for Iceberg (and any engine)

The Optimizer arrives as an intelligent service, open to any engine compatible with Iceberg, with a granular policy interface:

  • Advanced optimization: beyond basic maintenance, it rewrites manifest and position delete files, manages compaction and layout for more performance at less cost.
  • Declarative policies: applied per table or per catalog; the engine executes and monitors.
  • Observability: metrics, dashboards, and traceability to understand what is being optimized, when, and how much improvement.
  • Measured benefits (internal): up to 13× query performance improvements and -36% storage reduction.

Furthermore, Cloudera emphasizes that this will be the only service of its kind available on-premises in a future release, providing a distinctive offering for regulated sectors or those with sovereignty requirements who cannot or prefer not to delegate control plane operations to the cloud.


“A truly open lakehouse”: position and promise

Cloudera’s narrative — once a pioneer of “Big Data” — centers around Apache Iceberg as an open table format and de facto standard for lakehouses. The REST Catalog offers interoperability and shared metadata; SDX applies security and governance over 100% of the data; the Optimizer automates table hygiene and efficiency without depending on which engine is used for querying.

According to Leo Brunnick, Chief Product Officer of Cloudera, the company continues investing to make Iceberg “enterprise-ready”, with the trio of flexibility, scalability, and uncompromising insights, “when and where needed”. The declared ambition: to be “the only platform capable of bringing AI to the data — across all clouds, data centers, and edges — while maintaining unified governance and multi-engine analytics without copies or lock-in”.


What it means for a data team… and for the CFO?

Less ETL for sharing

The REST Catalog removes copy pipelines just to “serve” a subset of data to third-party engines. Less repetition, less latency between source and consumer.

Unified governance across the perimeter

Policies, lineage, and auditing are inherited; the risk of “shadow zones” decreases. For compliance, internal audits, and security, that means less surprises.

Costs

The promise of -79% in storage costs in real cases and -36% from the Optimizer in internal tests are headlines that a CFO will want to validate: less copies, more compact files, tables always optimized, and cheaper queries because they read less.

Multi-engine as a default

If one team uses Athena, another Databricks, and another Snowflake, the catalog doesn’t “trap” you: just point and respect policies. The political friction between teams drops; the ROI of data rises.


Market outlook: Iceberg as lingua franca

Cloudera’s push aligns with the consolidation of Iceberg as an open format enabling ACID tables over object storage, with evolving schemas and rich metadata. In this landscape, differentiation hinges on:

  • Governance model (SDX vs. closed stacks).
  • Sharing mechanism (REST catalog vs. proprietary catalogs).
  • Maintenance approach (Optimizer vs. manual jobs and housekeeping).
  • Deployment environment (cloud, on-prem, edge).

Cloudera aims to tie these aspects together. Its “data anywhere → AI everywhere” narrative and open architecture address a recurring demand: interoperability and sovereignty without starting from scratch.


Availability and next steps

Cloudera confirms that Data Sharing with Iceberg REST Catalog and Lakehouse Optimizer are available now. The on-premises version of the Optimizer will arrive in a future release, with no specific date announced yet. More commercial and technical information is already on Cloudera.com.


Frequently Asked Questions

What exactly is the “Iceberg REST Catalog” and how does it differ from a proprietary catalog?
It is an open REST implementation of the Apache Iceberg catalog that exposes tables and metadata in an open and standardized way. Unlike proprietary catalogs, it allows third-party engines (Snowflake, Databricks, Athena, EMR, Salesforce…) to connect directly to data managed by Cloudera without copying, inheriting policies, lineage, and auditing via SDX, and avoiding catalog lock-in.

How does the Lakehouse Optimizer achieve up to 13× query performance improvements?
Cloudera states that the service applies advanced Iceberg table optimization: rewriting manifests and position deletes, managing compaction and layout for better performance, and automating housekeeping that would otherwise require manual jobs. By reducing files, fragmentation, and unnecessary reads, queries process less bytes and run faster.

Can these capabilities be used on any cloud and also on-premises?
The REST Catalog and zero-copy interoperability are available in public clouds, data centers, and edge environments managed by Cloudera. The Lakehouse Optimizer is already offered as a cloud service; Cloudera has announced it will also be the only of its kind available on-premises in a future release.

Where does the “up to 79%” storage savings come from?
Cloudera attributes this figure to existing customers who, by eliminating redundant copies and unifying access via REST Catalog and SDX, reduced the number of replicated datasets and optimized their object storage footprint. It is not a universal guarantee; actual savings depend on copy patterns, historical volume, and the level of zero-copy sharing and Optimizer adoption in each case.

via: cloudera

Scroll to Top