|
| 1 | +# Coupon Code Update Monkey Patch |
| 2 | + |
| 3 | +## Problem Statement |
| 4 | + |
| 5 | +We are facing deadlocks during the subscription purchase to payment process when updating coupon code usage counts. This monkey patch addresses race conditions that occur when multiple concurrent requests attempt to update the same coupon code's `used` count. |
| 6 | + |
| 7 | +## Original Issue |
| 8 | + |
| 9 | +The original implementation in ERPNext's `erpnext.accounts.doctype.pricing_rule.utils.update_coupon_code_count` uses Frappe's ORM runtime operations: |
| 10 | + |
| 11 | +```python |
| 12 | +def update_coupon_code_count(coupon_name, transaction_type): |
| 13 | + coupon = frappe.get_doc("Coupon Code", coupon_name) |
| 14 | + if coupon: |
| 15 | + if transaction_type == "used": |
| 16 | + if not coupon.maximum_use: |
| 17 | + coupon.used = coupon.used + 1 |
| 18 | + coupon.save(ignore_permissions=True) |
| 19 | + elif coupon.used < coupon.maximum_use: |
| 20 | + coupon.used = coupon.used + 1 |
| 21 | + coupon.save(ignore_permissions=True) |
| 22 | +``` |
| 23 | + |
| 24 | +**Problems with this approach:** |
| 25 | +1. **Race Condition**: Multiple concurrent requests load the same count value, increment it, and save - resulting in lost updates |
| 26 | +2. **Deadlocks**: Under high load, concurrent transactions compete for locks on the same row, causing deadlocks |
| 27 | +3. **No Atomic Operations**: The read-modify-write cycle is not atomic at the database level |
| 28 | + |
| 29 | +## Solution |
| 30 | + |
| 31 | +This monkey patch replaces the original function with an optimized version that: |
| 32 | + |
| 33 | +### 1. **Direct SQL Updates (Atomic Operations)** |
| 34 | +```python |
| 35 | +frappe.db.sql(""" |
| 36 | + UPDATE `tabCoupon Code` |
| 37 | + SET used = used + 1 |
| 38 | + WHERE name = %s |
| 39 | + AND (maximum_use IS NULL OR maximum_use = 0 OR used < maximum_use) |
| 40 | +""", (coupon_name,)) |
| 41 | +``` |
| 42 | + |
| 43 | +**Benefits:** |
| 44 | +- Single atomic database operation |
| 45 | +- No race conditions between read and write |
| 46 | +- Database handles the increment directly |
| 47 | + |
| 48 | +### 2. **Row Count Validation** |
| 49 | +```python |
| 50 | +affected_rows = frappe.db.sql("""SELECT ROW_COUNT();""")[0][0] |
| 51 | +if not affected_rows: |
| 52 | + frappe.throw("Coupon code is no longer available or has reached its usage limit") |
| 53 | +``` |
| 54 | + |
| 55 | +- Ensures the update actually happened (coupon is still valid) |
| 56 | +- Prevents silent failures |
| 57 | + |
| 58 | +### 3. **Automatic Retry with Exponential Backoff** |
| 59 | +```python |
| 60 | +max_retries = 3 |
| 61 | +for attempt in range(max_retries): |
| 62 | + try: |
| 63 | + # ... perform update ... |
| 64 | + break |
| 65 | + except frappe.QueryDeadlockError: |
| 66 | + if attempt == max_retries - 1: |
| 67 | + raise |
| 68 | + |
| 69 | + # Exponential backoff with jitter |
| 70 | + base_delay = 0.05 # 50ms |
| 71 | + jitter = random.uniform(0.01, 0.5) # 10-500ms random |
| 72 | + exponential_backoff = base_delay * (2**attempt) # 50ms, 100ms, 200ms |
| 73 | + total_delay = exponential_backoff + jitter + random.random() * 0.1 |
| 74 | + time.sleep(total_delay) |
| 75 | +``` |
| 76 | + |
| 77 | +**Why this works:** |
| 78 | +- **Exponential backoff**: Each retry waits longer (50ms → 100ms → 200ms) |
| 79 | +- **Random jitter**: Prevents multiple concurrent requests from retrying in lockstep |
| 80 | +- **Limited retries**: Fails fast after 3 attempts instead of indefinite loops |
| 81 | + |
| 82 | +### 4. **Safe Decrement for Cancellations** |
| 83 | +```python |
| 84 | +frappe.db.sql(""" |
| 85 | + UPDATE `tabCoupon Code` |
| 86 | + SET used = GREATEST(used - 1, 0), modified = %s |
| 87 | + WHERE name = %s AND used > 0 |
| 88 | +""", (frappe.utils.now(), coupon_name)) |
| 89 | +``` |
| 90 | + |
| 91 | +- Uses `GREATEST()` to prevent negative counts |
| 92 | +- Only updates if `used > 0` to avoid unnecessary writes |
| 93 | + |
| 94 | +## Implementation |
| 95 | + |
| 96 | +The monkey patch is applied in the [`frappe_optimizations`](../hooks.py) app hooks: |
| 97 | + |
| 98 | +```python |
| 99 | +def update_coupon_code_count_monkey_patch(): |
| 100 | + from erpnext.accounts.doctype.pricing_rule import utils |
| 101 | + utils.update_coupon_code_count = update_coupon_code_count |
| 102 | +``` |
| 103 | + |
| 104 | +This replaces the function at runtime before any coupon code operations occur. |
| 105 | + |
| 106 | +## Impact |
| 107 | + |
| 108 | +- **Reduced deadlocks**: Direct SQL operations and retries significantly reduce deadlock occurrences |
| 109 | +- **Better performance**: Atomic operations are faster than ORM save operations |
| 110 | +- **Data integrity**: Row count validation ensures coupon limits are respected |
| 111 | +- **High concurrency support**: Handles multiple simultaneous purchase attempts gracefully |
| 112 | + |
| 113 | +## Related Questions |
| 114 | + |
| 115 | +### Does MariaDB revert the whole transaction on a deadlock error? |
| 116 | + |
| 117 | +**Yes.** When MariaDB detects a deadlock: |
| 118 | +1. It chooses one transaction as the "victim" |
| 119 | +2. Rolls back that entire transaction |
| 120 | +3. Returns a deadlock error to the application |
| 121 | +4. The other transaction(s) can proceed |
| 122 | + |
| 123 | +This is why our retry mechanism works - we can safely retry the entire operation after a deadlock because no partial state remains. |
| 124 | + |
| 125 | +## Testing |
| 126 | + |
| 127 | +This patch has been tested under load and has shown: |
| 128 | +- Significant reduction in deadlock errors |
| 129 | +- No lost coupon code updates |
| 130 | +- Proper enforcement of usage limits even under concurrent load |
| 131 | + |
| 132 | +## Future Improvements |
| 133 | + |
| 134 | +Consider exploring: |
| 135 | +- Database-level advisory locks for more complex multi-row operations |
| 136 | +- Optimistic locking patterns for the subscription document itself |
| 137 | +- Connection pool tuning to reduce lock contention |
0 commit comments