Skip to content

Commit a7b5070

Browse files
authored
Merge pull request #10 from rtCamp/fix/coupon-code-race-condition
Optimize coupon code update logic for race conditions
2 parents ddf6b85 + 963f6ef commit a7b5070

File tree

3 files changed

+211
-0
lines changed

3 files changed

+211
-0
lines changed

frappe_optimizations/__init__.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,15 @@
11
__version__ = "0.0.1"
2+
3+
4+
def monkey_patch():
5+
from .monkey_patches.update_coupon_code_count import update_coupon_code_count_monkey_patch
6+
7+
update_coupon_code_count_monkey_patch()
8+
9+
10+
try:
11+
monkey_patch()
12+
except Exception:
13+
import frappe
14+
15+
frappe.log_error("Frappe Optimizations: Monkey Patch Failed", frappe.get_traceback())
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Coupon Code Update Monkey Patch
2+
3+
## Problem Statement
4+
5+
We are facing deadlocks during the subscription purchase to payment process when updating coupon code usage counts. This monkey patch addresses race conditions that occur when multiple concurrent requests attempt to update the same coupon code's `used` count.
6+
7+
## Original Issue
8+
9+
The original implementation in ERPNext's `erpnext.accounts.doctype.pricing_rule.utils.update_coupon_code_count` uses Frappe's ORM runtime operations:
10+
11+
```python
12+
def update_coupon_code_count(coupon_name, transaction_type):
13+
coupon = frappe.get_doc("Coupon Code", coupon_name)
14+
if coupon:
15+
if transaction_type == "used":
16+
if not coupon.maximum_use:
17+
coupon.used = coupon.used + 1
18+
coupon.save(ignore_permissions=True)
19+
elif coupon.used < coupon.maximum_use:
20+
coupon.used = coupon.used + 1
21+
coupon.save(ignore_permissions=True)
22+
```
23+
24+
**Problems with this approach:**
25+
1. **Race Condition**: Multiple concurrent requests load the same count value, increment it, and save - resulting in lost updates
26+
2. **Deadlocks**: Under high load, concurrent transactions compete for locks on the same row, causing deadlocks
27+
3. **No Atomic Operations**: The read-modify-write cycle is not atomic at the database level
28+
29+
## Solution
30+
31+
This monkey patch replaces the original function with an optimized version that:
32+
33+
### 1. **Direct SQL Updates (Atomic Operations)**
34+
```python
35+
frappe.db.sql("""
36+
UPDATE `tabCoupon Code`
37+
SET used = used + 1
38+
WHERE name = %s
39+
AND (maximum_use IS NULL OR maximum_use = 0 OR used < maximum_use)
40+
""", (coupon_name,))
41+
```
42+
43+
**Benefits:**
44+
- Single atomic database operation
45+
- No race conditions between read and write
46+
- Database handles the increment directly
47+
48+
### 2. **Row Count Validation**
49+
```python
50+
affected_rows = frappe.db.sql("""SELECT ROW_COUNT();""")[0][0]
51+
if not affected_rows:
52+
frappe.throw("Coupon code is no longer available or has reached its usage limit")
53+
```
54+
55+
- Ensures the update actually happened (coupon is still valid)
56+
- Prevents silent failures
57+
58+
### 3. **Automatic Retry with Exponential Backoff**
59+
```python
60+
max_retries = 3
61+
for attempt in range(max_retries):
62+
try:
63+
# ... perform update ...
64+
break
65+
except frappe.QueryDeadlockError:
66+
if attempt == max_retries - 1:
67+
raise
68+
69+
# Exponential backoff with jitter
70+
base_delay = 0.05 # 50ms
71+
jitter = random.uniform(0.01, 0.5) # 10-500ms random
72+
exponential_backoff = base_delay * (2**attempt) # 50ms, 100ms, 200ms
73+
total_delay = exponential_backoff + jitter + random.random() * 0.1
74+
time.sleep(total_delay)
75+
```
76+
77+
**Why this works:**
78+
- **Exponential backoff**: Each retry waits longer (50ms → 100ms → 200ms)
79+
- **Random jitter**: Prevents multiple concurrent requests from retrying in lockstep
80+
- **Limited retries**: Fails fast after 3 attempts instead of indefinite loops
81+
82+
### 4. **Safe Decrement for Cancellations**
83+
```python
84+
frappe.db.sql("""
85+
UPDATE `tabCoupon Code`
86+
SET used = GREATEST(used - 1, 0), modified = %s
87+
WHERE name = %s AND used > 0
88+
""", (frappe.utils.now(), coupon_name))
89+
```
90+
91+
- Uses `GREATEST()` to prevent negative counts
92+
- Only updates if `used > 0` to avoid unnecessary writes
93+
94+
## Implementation
95+
96+
The monkey patch is applied in the [`frappe_optimizations`](../hooks.py) app hooks:
97+
98+
```python
99+
def update_coupon_code_count_monkey_patch():
100+
from erpnext.accounts.doctype.pricing_rule import utils
101+
utils.update_coupon_code_count = update_coupon_code_count
102+
```
103+
104+
This replaces the function at runtime before any coupon code operations occur.
105+
106+
## Impact
107+
108+
- **Reduced deadlocks**: Direct SQL operations and retries significantly reduce deadlock occurrences
109+
- **Better performance**: Atomic operations are faster than ORM save operations
110+
- **Data integrity**: Row count validation ensures coupon limits are respected
111+
- **High concurrency support**: Handles multiple simultaneous purchase attempts gracefully
112+
113+
## Related Questions
114+
115+
### Does MariaDB revert the whole transaction on a deadlock error?
116+
117+
**Yes.** When MariaDB detects a deadlock:
118+
1. It chooses one transaction as the "victim"
119+
2. Rolls back that entire transaction
120+
3. Returns a deadlock error to the application
121+
4. The other transaction(s) can proceed
122+
123+
This is why our retry mechanism works - we can safely retry the entire operation after a deadlock because no partial state remains.
124+
125+
## Testing
126+
127+
This patch has been tested under load and has shown:
128+
- Significant reduction in deadlock errors
129+
- No lost coupon code updates
130+
- Proper enforcement of usage limits even under concurrent load
131+
132+
## Future Improvements
133+
134+
Consider exploring:
135+
- Database-level advisory locks for more complex multi-row operations
136+
- Optimistic locking patterns for the subscription document itself
137+
- Connection pool tuning to reduce lock contention
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
import frappe
2+
3+
4+
def update_coupon_code_count(coupon_name, transaction_type):
5+
max_retries = 3
6+
for attempt in range(max_retries):
7+
try:
8+
if transaction_type == "used":
9+
frappe.db.sql(
10+
"""
11+
UPDATE `tabCoupon Code`
12+
SET used = used + 1
13+
WHERE name = %s
14+
AND (maximum_use IS NULL OR maximum_use = 0 OR used < maximum_use)
15+
""",
16+
(coupon_name,),
17+
)
18+
affected_rows = frappe.db.sql("""SELECT ROW_COUNT();""")[0][0]
19+
20+
if not affected_rows:
21+
frappe.throw(
22+
frappe._("Coupon code is no longer available or has reached its usage limit")
23+
)
24+
25+
elif transaction_type == "cancelled":
26+
frappe.db.sql(
27+
"""
28+
UPDATE `tabCoupon Code`
29+
SET used = GREATEST(used - 1, 0), modified = %s
30+
WHERE name = %s AND used > 0
31+
""",
32+
(frappe.utils.now(), coupon_name),
33+
)
34+
35+
# Success - break out of retry loop
36+
break
37+
38+
except frappe.QueryDeadlockError:
39+
if attempt == max_retries - 1:
40+
# Last attempt failed, re-raise the error
41+
raise
42+
43+
import random
44+
import time
45+
46+
# Random backoff to avoid concurrent retries colliding
47+
base_delay = 0.05 # 50ms base
48+
jitter = random.uniform(0.01, 0.5) # 10ms to 500ms random jitter
49+
exponential_backoff = base_delay * (2**attempt) # Exponential: 50ms, 100ms, 200ms
50+
total_delay = exponential_backoff + jitter + random.random() * 0.1
51+
52+
time.sleep(total_delay)
53+
continue
54+
55+
56+
def update_coupon_code_count_monkey_patch():
57+
# nosemgrep
58+
from erpnext.accounts.doctype.pricing_rule import utils
59+
60+
utils.update_coupon_code_count = update_coupon_code_count # nosemgrep

0 commit comments

Comments
 (0)