|
| 1 | +# ICE REST Catalog Architecture |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | +## Components |
| 6 | + |
| 7 | +- **ice-rest-catalog**: Stateless REST API service (Kubernetes Deployment) |
| 8 | +- **etcd**: Distributed key-value store for catalog state (Kubernetes StatefulSet) |
| 9 | +- **Object Storage**: S3-compatible storage for data files |
| 10 | +- **Clients**: ClickHouse or other Iceberg-compatible engines |
| 11 | + |
| 12 | +## Design Principles |
| 13 | + |
| 14 | +### Stateless Catalog |
| 15 | + |
| 16 | +The `ice-rest-catalog` is completely stateless and deployed as a Kubernetes Deployment with multiple replicas. |
| 17 | +It can be scaled horizontally without coordination. The catalog does not store any state locally—all metadata is persisted in etcd. |
| 18 | + |
| 19 | +### State Management |
| 20 | + |
| 21 | +All catalog state (namespaces, tables, schemas, snapshots, etc.) is maintained in **etcd**, a distributed, consistent key-value store. |
| 22 | +Each etcd instance runs as a StatefulSet pod with persistent storage, ensuring data durability across restarts. |
| 23 | + |
| 24 | +### Service Discovery |
| 25 | + |
| 26 | +`ice-rest-catalog` uses the k8s service to access the cluster. |
| 27 | +The catalog uses jetcd library to interact with etcd https://github.com/etcd-io/jetcd. |
| 28 | +In the etcd cluster, the data is replicated in all the nodes of the cluster. |
| 29 | +The service provides a round-robin approach to access the nodes in the cluster. |
| 30 | + |
| 31 | +### High Availability |
| 32 | + |
| 33 | +- Multiple `ice-rest-catalog` replicas behind a load balancer |
| 34 | +- etcd cluster. |
| 35 | +- Persistent volumes for etcd data |
| 36 | +- S3 for durable object storage |
| 37 | + |
| 38 | +## Backup/Recovery |
| 39 | +All state information for the catalog is maintained in etcd. To back up the ICE REST Catalog state, you can use standard etcd snapshot tools. The official etcd documentation provides guidance on [snapshotting and recovery](https://etcd.io/docs/v3.5/op-guide/recovery/). |
| 40 | + |
| 41 | +**Backup etcd Example**: |
| 42 | +```shell |
| 43 | +etcdctl --endpoints=<etcd-endpoint> \ |
| 44 | + --cacert=<trusted-ca-file> \ |
| 45 | + --cert=<cert-file> \ |
| 46 | + --key=<key-file> \ |
| 47 | + snapshot save /path/to/backup.db |
| 48 | +``` |
| 49 | + |
| 50 | +Replace the arguments as appropriate for your deployment (for example, endpoints, authentication, and TLS options). |
| 51 | + |
| 52 | +**Restore etcd Example**: |
| 53 | +```shell |
| 54 | +etcdctl snapshot restore /path/to/backup.db \ |
| 55 | + --data-dir /var/lib/etcd |
| 56 | +``` |
| 57 | + |
| 58 | +The ICE REST Catalog is designed such that if you restore etcd and point the catalog services at the restored etcd cluster, all catalog state (databases, tables, schemas, snapshots) will be recovered automatically. |
| 59 | + |
| 60 | +**Note:** Data files themselves (table/parquet data) are stored in Object Storage (e.g., S3, MinIO), and should be backed up or protected in accordance with your object storage vendor's recommendations. |
| 61 | + |
| 62 | +### k8s Manifest Files |
| 63 | + |
| 64 | +Kubernetes deployment manifests and configuration files are available in the [`examples/eks`](../examples/eks/) folder: |
| 65 | + |
| 66 | +- [`etcd.eks.yaml`](../examples/eks/etcd.eks.yaml) - etcd StatefulSet deployment |
| 67 | +- [`ice-rest-catalog.eks.envsubst.yaml`](../examples/eks/ice-rest-catalog.eks.envsubst.yaml) - ice-rest-catalog Deployment (requires envsubst) |
| 68 | +- [`eks.envsubst.yaml`](../examples/eks/eks.envsubst.yaml) - Combined EKS deployment template |
| 69 | + |
| 70 | +See the [EKS README](../examples/eks/README.md) for detailed setup instructions. |
0 commit comments