Architecture

One platform, eight integrated layers

OpenLakes organizes 20+ open-source tools into a coherent architecture. Each layer is documented, versioned, and wired for ingress, storage, and observability so you can inspect every component before deploying.

Layer 01

Infrastructure

PostgreSQL, MinIO, Kafka, Nessie, Redis, Traefik

Layer 02

Compute

Trino 478, Spark 4.1 with Iceberg + Nessie

Layer 03

Streaming

Kafka Streams, Spark Structured Streaming

Layer 04

Orchestration

Airflow 3.x with KubernetesExecutor

Layer 05

Analytics

Superset 4.1, JupyterHub 5.4

Layer 06

Ingestion

Meltano 4.0, Debezium 3.0 CDC

Layer 07

Catalog

OpenMetadata 1.10 with OpenSearch

Layer 08

Monitoring

Prometheus, Grafana, Loki, Alertmanager

Run it your way

The same platform architecture, two deployment options.

Managed

OpenLakes Harbor

We deploy, configure, and operate all 8 layers for you. Sign up and start using the platform immediately.

  • All components pre-integrated
  • Single sign-on included
  • Automatic updates and patches
  • 30-day free trial available
Start free trial

Self-hosted

OpenLakes Core

Deploy on your own Kubernetes cluster. Full control over every layer, configuration, and resource.

  • 100% open-source (Apache 2.0)
  • Works on any Kubernetes
  • Customize every component
  • Community-supported
Deploy guide

Component versions

OpenLakes pins every upstream version for predictable deployments.

Trino 478

Interactive SQL with Nessie + MinIO catalogs configured out of the box.

Spark 4.1

Custom image with Iceberg 1.8, Nessie 0.77, OpenLineage, and S3A support.

Airflow 3.x

KubernetesExecutor with Spark, Trino, dbt, and Papermill providers.

Superset 4.1

Self-service dashboards with OAuth and Trino as the default engine.

JupyterHub 5.4

Multi-user notebooks with Spark kernels matching the cluster stack.

Meltano 4.0

ELT platform with 500+ Singer taps and S3-compatible targets.

Debezium 3.0

CDC from PostgreSQL, MySQL, and MongoDB to Kafka topics.

OpenMetadata 1.10

Data catalog with lineage, quality checks, and glossary features.

Resource profiles

OpenLakes Core auto-selects a profile based on your cluster size.

compact-vm

For single-node VMs (Rancher Desktop, Lima). Minimum footprint while keeping all services running.

compact

Single-node bare metal. Reduced Spark, Trino, and Loki footprints for workstation deployments.

small

Default for lab clusters. Balanced resources up to ~24 cores / 128 GB across the cluster.

full

Production clusters with dedicated hardware. Higher retention, multiple executors, beefier workers.

Ready to explore?

Try OpenLakes Harbor free for 30 days, or deploy Core on your own cluster.

Start free trial Read the docs