Architecture
OpenLakes organizes 20+ open-source tools into a coherent architecture. Each layer is documented, versioned, and wired for ingress, storage, and observability so you can inspect every component before deploying.
Layer 01
Infrastructure
PostgreSQL, MinIO, Kafka, Nessie, Redis, Traefik
Layer 02
Compute
Trino 478, Spark 4.1 with Iceberg + Nessie
Layer 03
Streaming
Kafka Streams, Spark Structured Streaming
Layer 04
Orchestration
Airflow 3.x with KubernetesExecutor
Layer 05
Analytics
Superset 4.1, JupyterHub 5.4
Layer 06
Ingestion
Meltano 4.0, Debezium 3.0 CDC
Layer 07
Catalog
OpenMetadata 1.10 with OpenSearch
Layer 08
Monitoring
Prometheus, Grafana, Loki, Alertmanager
The same platform architecture, two deployment options.
Managed
We deploy, configure, and operate all 8 layers for you. Sign up and start using the platform immediately.
Self-hosted
Deploy on your own Kubernetes cluster. Full control over every layer, configuration, and resource.
OpenLakes pins every upstream version for predictable deployments.
Interactive SQL with Nessie + MinIO catalogs configured out of the box.
Custom image with Iceberg 1.8, Nessie 0.77, OpenLineage, and S3A support.
KubernetesExecutor with Spark, Trino, dbt, and Papermill providers.
Self-service dashboards with OAuth and Trino as the default engine.
Multi-user notebooks with Spark kernels matching the cluster stack.
ELT platform with 500+ Singer taps and S3-compatible targets.
CDC from PostgreSQL, MySQL, and MongoDB to Kafka topics.
Data catalog with lineage, quality checks, and glossary features.
OpenLakes Core auto-selects a profile based on your cluster size.
For single-node VMs (Rancher Desktop, Lima). Minimum footprint while keeping all services running.
Single-node bare metal. Reduced Spark, Trino, and Loki footprints for workstation deployments.
Default for lab clusters. Balanced resources up to ~24 cores / 128 GB across the cluster.
Production clusters with dedicated hardware. Higher retention, multiple executors, beefier workers.
Try OpenLakes Harbor free for 30 days, or deploy Core on your own cluster.