Skip to content

Commit f538843

Browse files
authored
Add an admin client for dataset management (#17)
* *: Add dependencies and model generation for admin client - Add pydantic>=2.0,<2.12 for data models (constrained for PyIceberg) - Add datamodel-code-generator for model generation from OpenAPI - Add respx for HTTP mocking in tests - Add Makefile target for regenerating models - Generate 838 lines of Pydantic v2 models from OpenAPI spec - Add test manifest files for dataset registration testing * admin: Implement admin client infrastructure - Create AdminClient base class with HTTP request handling and error mapping - Implement DatasetsClient with register/deploy/list/delete operations - Implement JobsClient with get/list/wait/stop/delete operations - Implement SchemaClient for SQL validation and schema inference - Create DeploymentContext for chainable deployment workflows - Add exception hierarchy with 30+ typed error classes mapped from API codes - Support automatic job polling with configurable timeout * client: Integrate admin client with unified Client class - Add query_url and admin_url parameters to Client (backward compatible with url) - Add datasets, jobs, schema properties for admin operations - Extend QueryBuilder with with_dependency() for manifest dependencies - Add to_manifest() for generating dataset manifests from SQL queries - Add register_as() for one-line registration returning DeploymentContext - Support fluent API: query → with_dependency → register_as → deploy - Maintain backward compatibility (existing Client(url=...) still works) * tests: Add admin client tests - Add 10 unit tests for error mapping and exception hierarchy - Add 10 unit tests for Pydantic model validation - Add 10 integration tests for AdminClient HTTP operations - Add 10 integration tests for DatasetsClient operations - Add 18 integration tests for JobsClient operations including polling - All 48 tests use respx for HTTP mocking (no real server required) - 0.65s execution time on dev machine * docs: Update README with admin client features - Add admin client to feature list - Add quick start examples for admin operations - Add links to admin client guide and API reference - Update overview to highlight dataset management capabilities * docs: Add admin client documentation and examples - Add comprehensive admin_client_guide.md with usage patterns and best practices - Add complete API reference in docs/api/admin_api.md
1 parent 7c66375 commit f538843

34 files changed

+10819
-10
lines changed

Makefile

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
SHELL := /bin/bash
2-
.PHONY: test test-unit test-integration test-all clean setup lint format
2+
.PHONY: test test-unit test-integration test-all clean setup lint format generate-models
33

44
# Use UV for all commands
55
PYTHON = uv run --env-file .test.env
@@ -59,10 +59,19 @@ lint:
5959
@echo "🔍 Linting code..."
6060
$(PYTHON) ruff check .
6161

62+
lint-fix:
63+
@echo "🔍 Linting code..."
64+
$(PYTHON) ruff check . --fix
65+
6266
format:
6367
@echo "✨ Formatting code..."
6468
$(PYTHON) ruff format .
6569

70+
# Generate Pydantic models from OpenAPI spec
71+
generate-models:
72+
@echo "🏗️ Generating Pydantic models from OpenAPI spec..."
73+
$(PYTHON) python scripts/generate_models.py
74+
6675
# Setup development environment
6776
setup:
6877
@echo "🚀 Setting up development environment..."
@@ -115,6 +124,7 @@ clean:
115124
help:
116125
@echo "Available commands:"
117126
@echo " make setup - Setup development environment"
127+
@echo " make generate-models - Generate Pydantic models from OpenAPI spec"
118128
@echo " make test-unit - Run unit tests (fast)"
119129
@echo " make test-integration - Run integration tests"
120130
@echo " make test-parallel-streaming - Run parallel streaming integration tests"

README.md

Lines changed: 66 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,16 @@
55
[![Formatting status](https://github.com/edgeandnode/amp-python/actions/workflows/ruff.yml/badge.svg?event=push)](https://github.com/edgeandnode/amp-python/actions/workflows/ruff.yml)
66

77

8-
## Overview
8+
## Overview
99

10-
Client for issuing queries to an Amp server and working with the returned data.
10+
Python client for Amp - a high-performance data infrastructure for blockchain data.
11+
12+
**Features:**
13+
- **Query Client**: Issue Flight SQL queries to Amp servers
14+
- **Admin Client**: Manage datasets, deployments, and jobs programmatically
15+
- **Data Loaders**: Zero-copy loading into PostgreSQL, Redis, Snowflake, Delta Lake, Iceberg, and more
16+
- **Parallel Streaming**: High-throughput parallel data ingestion with automatic resume
17+
- **Manifest Generation**: Fluent API for creating and deploying datasets from SQL queries
1118

1219
## Installation
1320

@@ -21,7 +28,57 @@ Client for issuing queries to an Amp server and working with the returned data.
2128
uv venv
2229
```
2330

24-
## Useage
31+
## Quick Start
32+
33+
### Querying Data
34+
35+
```python
36+
from amp import Client
37+
38+
# Connect to Amp server
39+
client = Client(url="grpc://localhost:8815")
40+
41+
# Execute query and convert to pandas
42+
df = client.query("SELECT * FROM eth.blocks LIMIT 10").to_pandas()
43+
print(df)
44+
```
45+
46+
### Admin Operations
47+
48+
```python
49+
from amp import Client
50+
51+
# Connect with admin capabilities
52+
client = Client(
53+
query_url="grpc://localhost:8815",
54+
admin_url="http://localhost:8080",
55+
auth_token="your-token"
56+
)
57+
58+
# Register and deploy a dataset
59+
job = (
60+
client.query("SELECT block_num, hash FROM eth.blocks")
61+
.with_dependency('eth', '_/[email protected]')
62+
.register_as('_', 'my_dataset', '1.0.0', 'blocks', 'mainnet')
63+
.deploy(parallelism=4, end_block='latest', wait=True)
64+
)
65+
66+
print(f"Deployment completed: {job.status}")
67+
```
68+
69+
### Loading Data
70+
71+
```python
72+
# Load query results into PostgreSQL
73+
loader = client.query("SELECT * FROM eth.blocks").load(
74+
loader_type='postgresql',
75+
connection='my_pg_connection',
76+
table_name='eth_blocks'
77+
)
78+
print(f"Loaded {loader.rows_written} rows")
79+
```
80+
81+
## Usage
2582

2683
### Marimo
2784

@@ -30,19 +87,23 @@ Start up a marimo workspace editor
3087
uv run marimo edit
3188
```
3289

33-
The Marimo app will open a new browser tab where you can create a new notebook, view helpful resources, and
90+
The Marimo app will open a new browser tab where you can create a new notebook, view helpful resources, and
3491
browse existing notebooks in the workspace.
3592

3693
### Apps
3794

38-
You can execute python apps and scripts using `uv run <path>` which will give them access to the dependencies
95+
You can execute python apps and scripts using `uv run <path>` which will give them access to the dependencies
3996
and the `amp` package. For example, you can run the `execute_query` app with the following command.
4097
```bash
4198
uv run apps/execute_query.py
4299
```
43100

44101
## Documentation
45102

103+
### Getting Started
104+
- **[Admin Client Guide](docs/admin_client_guide.md)** - Complete guide for dataset management and deployment
105+
- **[Admin API Reference](docs/api/admin_api.md)** - Full API documentation for admin operations
106+
46107
### Features
47108
- **[Parallel Streaming Usage Guide](docs/parallel_streaming_usage.md)** - User guide for high-throughput parallel data loading
48109
- **[Parallel Streaming Design](docs/parallel_streaming.md)** - Technical design documentation for parallel streaming architecture

0 commit comments

Comments
 (0)