Skip to content

Commit 4ada458

Browse files
jmhsiehclaude
andauthored
Add Geneva job lifecycle documentation (#113)
* Add Geneva job lifecycle documentation Document job states, monitoring via JobStateManager and fault tolerance with checkpoint-based recovery. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent ddbcd96 commit 4ada458

File tree

2 files changed

+120
-0
lines changed

2 files changed

+120
-0
lines changed

docs/docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,7 @@
163163
{
164164
"group": "Job execution",
165165
"pages": [
166+
"geneva/jobs/lifecycle",
166167
"geneva/jobs/backfilling",
167168
"geneva/jobs/conflicts",
168169
"geneva/jobs/startup",

docs/geneva/jobs/lifecycle.mdx

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
title: Job Lifecycle
3+
sidebarTitle: Job Lifecycle
4+
description: Understanding how Geneva jobs work, their lifecycle states, and how to monitor and manage them.
5+
icon: arrows-spin
6+
---
7+
8+
Geneva uses background jobs to execute long-running operations like backfills and materialized view refreshes. This guide explains how jobs work, their lifecycle states, and how to monitor and manage them.
9+
10+
## Overview
11+
12+
Jobs in Geneva are asynchronous operations that process data in the background. There are two primary job types:
13+
14+
| Job Type | Purpose | Created By |
15+
|----------|---------|------------|
16+
| **Backfill** | Compute column values using UDFs | `table.backfill()` |
17+
| **Materialized View Refresh** | Update precomputed query results | `view.refresh()` |
18+
19+
Both job types share the same lifecycle states and monitoring capabilities.
20+
21+
## Job States
22+
23+
Every job progresses through a well-defined state machine:
24+
25+
```mermaid
26+
stateDiagram-v2
27+
[*] --> PENDING
28+
PENDING --> RUNNING
29+
RUNNING --> DONE
30+
RUNNING --> FAILED
31+
DONE --> [*]
32+
FAILED --> [*]
33+
```
34+
35+
| State | Description |
36+
|-------|-------------|
37+
| **PENDING** | Job has been created and is queued for execution |
38+
| **RUNNING** | Job is actively processing data |
39+
| **DONE** | Job completed successfully |
40+
| **FAILED** | Job encountered an error during execution |
41+
42+
## Monitoring Jobs
43+
44+
The [Geneva Console](/geneva/jobs/console) provides a web-based interface for monitoring job status, progress, and history across your database. This is the recommended way to track jobs in collaborative environments.
45+
46+
For programmatic access, you can query job status directly via the API:
47+
48+
### Querying Job Status
49+
50+
```python
51+
import geneva
52+
53+
db = geneva.connect("/path/to/db")
54+
55+
# Get job state manager
56+
jsm = db._history
57+
58+
# Get a specific job
59+
job = jsm.get(job_id)[0]
60+
print(f"Status: {job.status}")
61+
print(f"Started: {job.launched_at}")
62+
print(f"Completed: {job.completed_at}")
63+
64+
# List jobs for a table
65+
pending_jobs = jsm.list_jobs(table_name="my_table", status="PENDING")
66+
running_jobs = jsm.list_jobs(table_name="my_table", status="RUNNING")
67+
```
68+
69+
### Progress Metrics
70+
71+
Jobs report progress through metrics:
72+
73+
```python
74+
# Access job metrics
75+
for metric in job.metrics:
76+
print(f"{metric['name']}: {metric['count']}/{metric['total']}")
77+
```
78+
79+
Common metrics include:
80+
81+
| Metric | Description |
82+
|--------|-------------|
83+
| `fragments` | [Fragments](https://lance.org/format/table/?h=fragment#fragments) scheduled for processing |
84+
| `writer_fragments` | Fragments written to storage |
85+
| `udf_values_computed` | Rows processed by UDFs |
86+
| `rows_checkpointed` | Rows saved to checkpoint store |
87+
| `rows_committed` | Rows committed to the table |
88+
| `workers` | Workers started for parallel execution |
89+
90+
### Job Events
91+
92+
Jobs log significant events during execution:
93+
94+
```python
95+
for event in job.events:
96+
print(f"{event['timestamp']}: {event['message']}")
97+
```
98+
99+
Example events:
100+
- "Job started"
101+
- "Checkpointing complete for fragment 42"
102+
- "Partial commit: 64 fragments"
103+
- "Job completed successfully"
104+
105+
## Fault Tolerance
106+
107+
Geneva jobs are designed to be resilient to failures:
108+
109+
### Checkpoint-Based Recovery
110+
111+
Jobs save intermediate results to a checkpoint store. If a job fails:
112+
113+
1. **Completed work is preserved** - Checkpointed batches are not lost
114+
2. **Resume from checkpoint** - Restarted jobs skip already-processed data
115+
3. **No duplicate processing** - Each batch is processed exactly once
116+
117+
### Resuming Failed Jobs
118+
119+
To resume a failed job, simply re-run the same backfill or refresh command. The job will automatically detect existing checkpoints, skip already-processed fragments, and continue from where it left off.

0 commit comments

Comments
 (0)