Skip to content

Commit d28a9d9

Browse files
raonitimoclaude
andcommitted
Add PostgreSQL AWS backend with failover support
This commit adds a new PostgreSQL database backend that integrates the AWS Advanced Python Wrapper to provide automatic failover capabilities for Amazon RDS clusters while maintaining comprehensive Prometheus metrics collection. Key features: - Automatic failover handling for AWS RDS clusters - Enhanced Prometheus metrics for AWS-specific events - Query retry on successful failover - Configurable AWS wrapper plugins and timeouts - Seamless integration with existing django-prometheus functionality The backend adds three new metrics: - django_db_aws_failover_success_total - django_db_aws_failover_failed_total - django_db_aws_transaction_resolution_unknown_total Usage: DATABASES = { 'default': { 'ENGINE': 'django_prometheus.db.backends.postgresql_aws', 'OPTIONS': { 'aws_plugins': 'failover,host_monitoring', 'connect_timeout': 30, 'socket_timeout': 30, }, } } 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
1 parent 8795c8b commit d28a9d9

File tree

6 files changed

+574
-0
lines changed

6 files changed

+574
-0
lines changed

django_prometheus/db/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@
77
execute_many_total,
88
execute_total,
99
query_duration_seconds,
10+
aws_failover_success_total,
11+
aws_failover_failed_total,
12+
aws_transaction_resolution_unknown_total,
1013
)
1114

1215
__all__ = [
@@ -17,4 +20,7 @@
1720
"execute_many_total",
1821
"execute_total",
1922
"query_duration_seconds",
23+
"aws_failover_success_total",
24+
"aws_failover_failed_total",
25+
"aws_transaction_resolution_unknown_total",
2026
]

django_prometheus/db/backends/README.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,109 @@
1+
# Database Backends
2+
3+
This directory contains Django database backends with Prometheus metrics integration.
4+
5+
## Available Backends
6+
7+
### Standard Backends
8+
9+
- **postgresql/** - PostgreSQL backend with Prometheus metrics
10+
- **mysql/** - MySQL backend with Prometheus metrics
11+
- **sqlite3/** - SQLite3 backend with Prometheus metrics
12+
- **postgis/** - PostGIS (PostgreSQL + GIS) backend with Prometheus metrics
13+
- **spatialite/** - SpatiaLite (SQLite + GIS) backend with Prometheus metrics
14+
15+
### Enhanced Backends
16+
17+
- **postgresql_aws/** - PostgreSQL backend with AWS Advanced Python Wrapper integration
18+
19+
## PostgreSQL AWS Backend
20+
21+
The `postgresql_aws` backend extends the standard PostgreSQL backend with AWS Advanced Python Wrapper integration, providing automatic failover capabilities for Amazon RDS clusters while maintaining comprehensive Prometheus metrics collection.
22+
23+
### Features
24+
25+
- **Automatic Failover**: Seamlessly handles RDS cluster failovers using AWS Advanced Python Wrapper
26+
- **Prometheus Metrics**: Collects all standard database metrics plus AWS-specific failover metrics
27+
- **Connection Monitoring**: Built-in health checks and connection monitoring
28+
- **Query Retry**: Automatically retries queries after successful failover
29+
- **Error Handling**: Proper handling for failed failovers and transaction resolution issues
30+
31+
### AWS-Specific Metrics
32+
33+
The backend adds these additional Prometheus metrics:
34+
35+
- `django_db_aws_failover_success_total` - Counter of successful database failovers
36+
- `django_db_aws_failover_failed_total` - Counter of failed database failovers
37+
- `django_db_aws_transaction_resolution_unknown_total` - Counter of transactions with unknown resolution status
38+
39+
### Usage
40+
41+
```python
42+
DATABASES = {
43+
'default': {
44+
'ENGINE': 'django_prometheus.db.backends.postgresql_aws',
45+
'HOST': 'database.cluster-xyz.us-east-1.rds.amazonaws.com',
46+
'NAME': 'mydb',
47+
'USER': 'myuser',
48+
'PASSWORD': 'mypassword',
49+
'PORT': '5432',
50+
'OPTIONS': {
51+
'aws_plugins': 'failover,host_monitoring', # AWS wrapper plugins
52+
'connect_timeout': 30, # Connection timeout in seconds
53+
'socket_timeout': 30, # Socket timeout in seconds
54+
# Additional psycopg connection options can be added here
55+
},
56+
}
57+
}
58+
```
59+
60+
### Prerequisites
61+
62+
1. Install the AWS Advanced Python Wrapper:
63+
```bash
64+
pip install aws-advanced-python-wrapper
65+
```
66+
67+
2. Configure your RDS cluster for failover (reader/writer endpoints)
68+
69+
3. Ensure proper IAM permissions for RDS cluster access
70+
71+
### Configuration Options
72+
73+
| Option | Default | Description |
74+
|--------|---------|-------------|
75+
| `aws_plugins` | `'failover,host_monitoring'` | Comma-separated list of AWS wrapper plugins |
76+
| `connect_timeout` | `30` | Connection timeout in seconds |
77+
| `socket_timeout` | `30` | Socket timeout in seconds |
78+
79+
### Monitoring
80+
81+
The backend automatically logs failover events and metrics. Monitor these key indicators:
82+
83+
- Connection success/failure rates
84+
- Failover frequency and success rates
85+
- Query execution times during normal operation vs. failover
86+
- Transaction resolution status
87+
88+
### Best Practices
89+
90+
1. **Connection Pooling**: Use with Django's database connection pooling
91+
2. **Health Checks**: Monitor the failover metrics to detect cluster issues
92+
3. **Timeout Configuration**: Tune timeout values based on your application requirements
93+
4. **Testing**: Test failover scenarios in a staging environment
94+
5. **Monitoring**: Set up alerts for failover events and failures
95+
96+
### Troubleshooting
97+
98+
- **ImportError**: Ensure `aws-advanced-python-wrapper` is installed
99+
- **Connection Issues**: Verify RDS cluster configuration and IAM permissions
100+
- **Slow Queries**: Monitor query duration metrics during failover events
101+
- **Transaction Issues**: Check transaction resolution unknown metrics for application logic issues
102+
103+
For more information, see the [AWS Advanced Python Wrapper documentation](https://github.com/aws/aws-advanced-python-wrapper).
104+
105+
---
106+
1107
# Adding new database wrapper types
2108

3109
Unfortunately, I don't have the resources to create wrappers for all
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
"""PostgreSQL database backend with AWS Advanced Python Wrapper integration.
2+
3+
This backend provides automatic failover capabilities for Amazon RDS clusters
4+
while maintaining comprehensive Prometheus metrics collection.
5+
6+
Usage in Django settings:
7+
8+
DATABASES = {
9+
'default': {
10+
'ENGINE': 'django_prometheus.db.backends.postgresql_aws',
11+
'HOST': 'database.cluster-xyz.us-east-1.rds.amazonaws.com',
12+
'NAME': 'mydb',
13+
'USER': 'myuser',
14+
'PASSWORD': 'mypassword',
15+
'PORT': '5432',
16+
'OPTIONS': {
17+
'aws_plugins': 'failover,host_monitoring',
18+
'connect_timeout': 30,
19+
'socket_timeout': 30,
20+
},
21+
}
22+
}
23+
24+
The backend automatically handles:
25+
- Database failover for RDS clusters
26+
- Connection monitoring and health checks
27+
- Prometheus metrics for all database operations
28+
- Query retry on successful failover
29+
- Proper error handling for failed failovers
30+
"""
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
import logging
2+
3+
from django.core.exceptions import ImproperlyConfigured
4+
from django.db.backends.postgresql import base
5+
from django.db.backends.postgresql.base import Cursor
6+
7+
from django_prometheus.db import (
8+
aws_failover_failed_total,
9+
aws_failover_success_total,
10+
aws_transaction_resolution_unknown_total,
11+
connection_errors_total,
12+
connections_total,
13+
errors_total,
14+
execute_many_total,
15+
execute_total,
16+
query_duration_seconds,
17+
)
18+
from django_prometheus.db.common import DatabaseWrapperMixin, ExceptionCounterByType
19+
20+
try:
21+
import psycopg
22+
from aws_advanced_python_wrapper import AwsWrapperConnection
23+
from aws_advanced_python_wrapper.errors import (
24+
FailoverFailedError,
25+
FailoverSuccessError,
26+
TransactionResolutionUnknownError,
27+
)
28+
except ImportError as e:
29+
raise ImproperlyConfigured(
30+
"AWS Advanced Python Wrapper is required for this backend. "
31+
"Install it with: pip install aws-advanced-python-wrapper"
32+
) from e
33+
34+
logger = logging.getLogger(__name__)
35+
36+
37+
class AwsPrometheusCursor(Cursor):
38+
def __init__(self, connection, alias, vendor):
39+
super().__init__(connection)
40+
self.alias = alias
41+
self.vendor = vendor
42+
self._labels = {"alias": alias, "vendor": vendor}
43+
44+
def execute(self, sql, params=None):
45+
execute_total.labels(self.alias, self.vendor).inc()
46+
with (
47+
query_duration_seconds.labels(**self._labels).time(),
48+
ExceptionCounterByType(errors_total, extra_labels=self._labels),
49+
):
50+
return self._execute_with_failover_handling(sql, params)
51+
52+
def executemany(self, sql, param_list):
53+
param_count = len(param_list) if param_list else 0
54+
execute_total.labels(self.alias, self.vendor).inc(param_count)
55+
execute_many_total.labels(self.alias, self.vendor).inc(param_count)
56+
with (
57+
query_duration_seconds.labels(**self._labels).time(),
58+
ExceptionCounterByType(errors_total, extra_labels=self._labels),
59+
):
60+
return self._executemany_with_failover_handling(sql, param_list)
61+
62+
def _execute_with_failover_handling(self, sql, params=None):
63+
try:
64+
return super().execute(sql, params)
65+
except FailoverSuccessError:
66+
logger.info("Database failover completed successfully, retrying query")
67+
aws_failover_success_total.labels(self.alias, self.vendor).inc()
68+
self._configure_session_state()
69+
return super().execute(sql, params)
70+
except FailoverFailedError as e:
71+
logger.error("Database failover failed: %s", e)
72+
aws_failover_failed_total.labels(self.alias, self.vendor).inc()
73+
raise
74+
except TransactionResolutionUnknownError as e:
75+
logger.error("Transaction resolution unknown after failover: %s", e)
76+
aws_transaction_resolution_unknown_total.labels(self.alias, self.vendor).inc()
77+
raise
78+
79+
def _executemany_with_failover_handling(self, sql, param_list):
80+
try:
81+
return super().executemany(sql, param_list)
82+
except FailoverSuccessError:
83+
logger.info("Database failover completed successfully, retrying executemany")
84+
aws_failover_success_total.labels(self.alias, self.vendor).inc()
85+
self._configure_session_state()
86+
return super().executemany(sql, param_list)
87+
except FailoverFailedError as e:
88+
logger.error("Database failover failed during executemany: %s", e)
89+
aws_failover_failed_total.labels(self.alias, self.vendor).inc()
90+
raise
91+
except TransactionResolutionUnknownError as e:
92+
logger.error("Transaction resolution unknown during executemany: %s", e)
93+
aws_transaction_resolution_unknown_total.labels(self.alias, self.vendor).inc()
94+
raise
95+
96+
def _configure_session_state(self):
97+
pass
98+
99+
100+
class DatabaseWrapper(DatabaseWrapperMixin, base.DatabaseWrapper):
101+
def __init__(self, settings_dict, alias=None):
102+
super().__init__(settings_dict, alias)
103+
options = self.settings_dict.get("OPTIONS", {})
104+
self.aws_plugins = options.get("aws_plugins", "failover,host_monitoring")
105+
self.connect_timeout = options.get("connect_timeout", 30)
106+
self.socket_timeout = options.get("socket_timeout", 30)
107+
108+
def get_new_connection(self, conn_params):
109+
connections_total.labels(self.alias, self.vendor).inc()
110+
try:
111+
host = conn_params.get("host", "localhost")
112+
port = conn_params.get("port", 5432)
113+
database = conn_params.get("database", "")
114+
user = conn_params.get("user", "")
115+
password = conn_params.get("password", "")
116+
options = conn_params.get("options", {})
117+
118+
connection = AwsWrapperConnection.connect(
119+
psycopg.Connection.connect,
120+
host=host,
121+
port=port,
122+
dbname=database,
123+
user=user,
124+
password=password,
125+
plugins=self.aws_plugins,
126+
connect_timeout=self.connect_timeout,
127+
socket_timeout=self.socket_timeout,
128+
autocommit=False,
129+
**options,
130+
)
131+
132+
connection.cursor_factory = lambda conn: AwsPrometheusCursor(conn, self.alias, self.vendor)
133+
logger.info("Successfully created AWS wrapper connection to %s:%s", host, port)
134+
return connection
135+
136+
except Exception as e:
137+
connection_errors_total.labels(self.alias, self.vendor).inc()
138+
logger.error("Failed to create AWS wrapper connection: %s", e)
139+
raise
140+
141+
def create_cursor(self, name=None):
142+
if name:
143+
cursor = self.connection.cursor(name=name)
144+
else:
145+
cursor = self.connection.cursor()
146+
return AwsPrometheusCursor(cursor.connection, self.alias, self.vendor)
147+
148+
def _close(self):
149+
if self.connection is not None:
150+
try:
151+
self.connection.close()
152+
except Exception as e:
153+
logger.warning("Error closing AWS wrapper connection: %s", e)
154+
155+
def is_usable(self):
156+
try:
157+
with self.connection.cursor() as cursor:
158+
cursor.execute("SELECT 1")
159+
return True
160+
except Exception as e:
161+
logger.warning("Connection is not usable: %s", e)
162+
return False
163+
164+
def ensure_connection(self):
165+
if self.connection is None:
166+
self.connect()
167+
elif not self.is_usable():
168+
logger.info("Connection is not usable, reconnecting...")
169+
self.close()
170+
self.connect()

django_prometheus/db/metrics.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,3 +46,25 @@
4646
buckets=PROMETHEUS_LATENCY_BUCKETS,
4747
namespace=NAMESPACE,
4848
)
49+
50+
# AWS Advanced Wrapper specific metrics
51+
aws_failover_success_total = Counter(
52+
"django_db_aws_failover_success_total",
53+
"Counter of successful AWS database failovers by database and vendor.",
54+
["alias", "vendor"],
55+
namespace=NAMESPACE,
56+
)
57+
58+
aws_failover_failed_total = Counter(
59+
"django_db_aws_failover_failed_total",
60+
"Counter of failed AWS database failovers by database and vendor.",
61+
["alias", "vendor"],
62+
namespace=NAMESPACE,
63+
)
64+
65+
aws_transaction_resolution_unknown_total = Counter(
66+
"django_db_aws_transaction_resolution_unknown_total",
67+
"Counter of AWS database transactions with unknown resolution status.",
68+
["alias", "vendor"],
69+
namespace=NAMESPACE,
70+
)

0 commit comments

Comments
 (0)