Skip to content

Commit 6d542e5

Browse files
committed
docs(blog): translate temporal post to english
1 parent 110a2a6 commit 6d542e5

File tree

1 file changed

+117
-0
lines changed

1 file changed

+117
-0
lines changed

src/posts/company/2026-01-08-16.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
layout: post
3+
title: "From Celery/Redis to Temporal: A Journey Toward Idempotency and Reliable Workflows"
4+
date: 2026-01-08
5+
category: Company Work
6+
tags: Temporal Celery Redis Idempotency Architecture Python Backend
7+
description: "Sharing our experience of migrating complex asynchronous KYC processing from Celery to Temporal to ensure idempotency and enhance system reliability."
8+
---
9+
10+
When handling asynchronous tasks in distributed systems, the combination of Celery and Redis is often the go-to choice. I also chose Celery for the initial design of our KYC (Know Your Customer) orchestrator due to its familiarity. However, as the service grew in complexity, we hit a massive wall: guaranteeing **idempotency** and managing **complex states**.
11+
12+
In this post, I want to share our technical journey of why we moved away from Celery to Temporal and how we ensured idempotency during that process.
13+
14+
---
15+
16+
## 1. Limitations of Celery/Redis: Why Change Was Necessary?
17+
18+
### Difficulties in Idempotency Management
19+
While Celery is excellent for "Fire and Forget" tasks, there's a high risk of duplicate execution during retries caused by network failures or worker downs. Especially for face recognition tasks that consume significant GPU resources, duplicate execution was critical in terms of both cost and performance.
20+
21+
### Fragmentation of State
22+
The KYC process follows this sequence:
23+
1. User uploads an ID card image.
24+
2. User uploads a selfie video.
25+
3. Compare face similarity once both files exist.
26+
27+
In a Celery environment, since we don't know when images and videos will be uploaded, we needed complex logic to query the DB every time or store intermediate states in Redis. The logic to check "Are all files collected?" was scattered across multiple places, making maintenance difficult.
28+
29+
---
30+
31+
## 2. Introducing Temporal: A Paradigm Shift in Orchestration
32+
33+
Temporal is not just a message queue; it's a **Stateful Workflows** engine.
34+
35+
### Workflow Logic Must Be "Deterministic"
36+
Since Temporal workflow code is based on the premise of "Replay," it must always produce the same sequence of workflow API calls for the same input and history. Therefore, you should not directly perform "external-world-dependent operations" like network I/O, file I/O, system time (e.g., `DateTime.now`), randomness, or threading within a workflow. These side effects should be pushed to activities, while the workflow focuses solely on orchestration.
37+
38+
Official Docs: https://docs.temporal.io/develop/python/core-application#workflow-logic-requirements
39+
40+
### Workflow-Centric Design
41+
The first thing that changed after introducing Temporal was the visibility of business logic. `FaceSimilarityWorkflow` now gracefully waits until files are ready.
42+
43+
```python
44+
# Core logic of FaceSimilarityWorkflow
45+
@workflow.run
46+
async def run(self, data: SimilarityData) -> SimilarityResult:
47+
# Wait up to 1 hour until both image and video are collected
48+
await workflow.wait_condition(
49+
lambda: any(f["type"] == "image" for f in self._files)
50+
and any(f["type"] == "video" for f in self._files),
51+
timeout=timedelta(hours=1),
52+
)
53+
54+
# Execute GPU activity once all files are ready
55+
result = await workflow.execute_activity(
56+
check_face_similarity_activity,
57+
activity_data,
58+
retry_policy=RetryPolicy(maximum_attempts=3)
59+
)
60+
return result
61+
```
62+
63+
This code uses `workflow.wait_condition` to suspend the workflow until the condition is met without blocking the event loop. In Celery, this would have required complex polling or webhook logic.
64+
65+
---
66+
67+
## 3. Idempotency Strategy: Building a Double Defense
68+
69+
Even with Temporal, idempotency at the activity level remains crucial. We established a double defense strategy as follows.
70+
71+
### Step 1: Temporal's Basic Guarantee
72+
Temporal records the progress of a workflow as event history. Therefore, even if a worker restarts, it resumes exactly from the last successful point.
73+
74+
### Step 2: Explicit Checks within Activities
75+
Since Temporal activities follow an "at-least-once" execution model, an activity might be retried if a worker crashes after successfully performing it but before notifying the server. Thus, official documentation strongly recommends making activities idempotent.
76+
77+
Official Docs: https://docs.temporal.io/develop/python/error-handling#make-activities-idempotent
78+
79+
In practice, we use the following two together:
80+
- For external system calls, pass an **idempotency key** combined from the workflow execution and activity identifiers.
81+
- Internally, use unique keys (or check for existing results) in the DB to prevent **duplicate storage/processing**.
82+
83+
```python
84+
@activity.defn
85+
async def check_face_similarity_activity(data: SimilarityData) -> SimilarityResult:
86+
info = activity.info()
87+
idempotency_key = f"{info.workflow_run_id}-{info.activity_id}"
88+
session_id = data["session_id"]
89+
90+
with get_db_context() as db:
91+
existing = (
92+
db.query(FaceSimilarity)
93+
.filter(FaceSimilarity.idempotency_key == idempotency_key)
94+
.first()
95+
)
96+
if existing:
97+
return SimilarityResult(success=True, message="Already processed.")
98+
99+
# Perform actual GPU-intensive work...
100+
```
101+
102+
---
103+
104+
## 4. Results: What Has Changed?
105+
106+
| Comparison Item | Celery/Redis Based | Temporal Based |
107+
| :--- | :--- | :--- |
108+
| **State Management** | Manual storage in DB/Redis | Automatically managed by engine |
109+
| **Retry Strategy** | Manual exponential backoff | Declarative Retry Policy |
110+
| **Visibility** | Must dig through logs | Check history in Temporal UI |
111+
| **Idempotency** | Very difficult to guarantee | Structurally achievable |
112+
113+
## Conclusion
114+
115+
The transition from Celery to Temporal was not just about changing tools; it was about changing **how we define business processes in code**. Especially in financial/authentication systems where idempotency is paramount, Temporal provided irreplaceable stability.
116+
117+
If you are losing sleep over complex asynchronous logic and idempotency issues, I strongly recommend migrating to Temporal.

0 commit comments

Comments
 (0)