Skip to content

Commit 2260f22

Browse files
fix
1 parent c6cf15d commit 2260f22

File tree

14 files changed

+1153
-631
lines changed

14 files changed

+1153
-631
lines changed

README.md

Lines changed: 110 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,112 +1,149 @@
1-
# Superstream Client for Python (`superclient`)
1+
<div align="center">
22

3-
Superclient is a zero-code optimisation agent for Python applications that use Apache Kafka.
4-
It transparently intercepts producer creation in the popular client libraries and tunes
5-
configuration parameters (compression, batching, etc.) based on recommendations
6-
provided by the Superstream platform.
3+
<img src="https://github.com/user-attachments/assets/35899c78-24eb-4507-97ed-e87e84c49fea#gh-dark-mode-only" width="300">
4+
<img src="https://github.com/user-attachments/assets/8a7bca49-c362-4a8c-945e-a331fb26d8eb#gh-light-mode-only" width="300">
75

8-
---
6+
</div>
97

10-
## Why use Superclient?
8+
# Superstream Client For Python
119

12-
**No code changes** – simply install the package and run your program.
13-
**Dynamic configuration** – adapts to cluster-specific and topic-specific insights
14-
coming from `superstream.metadata_v1`.
15-
**Safe by design** – any internal failure falls back to your original
16-
configuration; the application never crashes because of the agent.
17-
**Minimal overhead** – uses a single lightweight background thread (or an async
18-
coroutine when running with `aiokafka`).
10+
A Python library for automatically optimizing Kafka producer configurations based on topic-specific recommendations.
1911

20-
---
12+
## Overview
2113

22-
## Supported Kafka libraries
14+
Superstream Clients works as a Python import hook that intercepts Kafka producer creation and applies optimized configurations without requiring any code changes in your application. It dynamically retrieves optimization recommendations from Superstream and applies them based on impact analysis.
2315

24-
| Library | Producer class | Status |
25-
|---------|----------------|--------|
26-
| kafka-python | `kafka.KafkaProducer` | ✓ implemented |
27-
| aiokafka | `aiokafka.AIOKafkaProducer` | ✓ implemented |
28-
| confluent-kafka | `confluent_kafka.Producer` | ✓ implemented |
16+
## Supported Libraries
2917

30-
Other libraries/frameworks that wrap these producers (e.g. Faust, FastAPI event
31-
publishers, Celery Kafka back-ends) inherit the benefits automatically.
18+
Works with any Python library that implements Kafka producers, including:
3219

33-
---
20+
- kafka-python
21+
- aiokafka
22+
- confluent-kafka
23+
- Faust
24+
- FastAPI event publishers
25+
- Celery Kafka backends
26+
- Any custom wrapper around these Kafka clients
27+
28+
## Features
29+
30+
- **Zero-code integration**: No code changes required in your application
31+
- **Dynamic configuration**: Applies optimized settings based on topic-specific recommendations
32+
- **Intelligent optimization**: Identifies the most impactful topics to optimize
33+
- **Graceful fallback**: Falls back to default settings if optimization fails
34+
- **Minimal overhead**: Uses a single lightweight background thread (or async coroutine for aiokafka)
35+
36+
## Important: Producer Configuration Requirements
37+
38+
When initializing your Kafka producers, please ensure you pass the configuration as a mutable object. The Superstream library needs to modify the producer configuration to apply optimizations. The following initialization patterns are supported:
39+
40+
**Supported (Recommended)**:
41+
```python
42+
# Using kafka-python
43+
from kafka import KafkaProducer
44+
producer = KafkaProducer(
45+
bootstrap_servers=['localhost:9092'],
46+
compression_type='snappy',
47+
batch_size=16384
48+
)
49+
50+
# Using aiokafka
51+
from aiokafka import AIOKafkaProducer
52+
producer = AIOKafkaProducer(
53+
bootstrap_servers='localhost:9092',
54+
compression_type='snappy',
55+
batch_size=16384
56+
)
57+
58+
# Using confluent-kafka
59+
from confluent_kafka import Producer
60+
producer = Producer({
61+
'bootstrap.servers': 'localhost:9092',
62+
'compression.type': 'snappy',
63+
'batch.size': 16384
64+
})
65+
```
66+
67+
**Not Supported**:
68+
```python
69+
# Using frozen dictionaries or immutable configurations
70+
from types import MappingProxyType
71+
config = MappingProxyType({
72+
'bootstrap.servers': 'localhost:9092'
73+
})
74+
producer = KafkaProducer(**config)
75+
```
76+
77+
### Why This Matters
78+
The Superstream library needs to modify your producer's configuration to apply optimizations based on your cluster's characteristics. This includes adjusting settings like compression, batch size, and other performance parameters. When the configuration is immutable, these optimizations cannot be applied.
3479

3580
## Installation
3681

82+
### Step 1: Install Superclient
83+
3784
```bash
3885
pip install superclient
3986
```
4087

41-
The package ships with a `sitecustomize.py` entry-point, therefore Python
42-
imports the agent automatically **before your application's code starts**.
43-
If `sitecustomize` is disabled in your environment you can initialise manually:
88+
### Step 2: Run
89+
90+
The package ships with a `sitecustomize.py` entry-point, therefore Python imports the agent automatically before your application's code starts. If `sitecustomize` is disabled in your environment, you can initialize manually:
4491

4592
```python
4693
import superclient # side-effects automatically enable the agent
4794
```
4895

49-
---
96+
### Docker Integration
5097

51-
## Environment variables
98+
When using Superstream Clients with containerized applications, include the package in your Dockerfile:
5299

53-
| Variable | Description |
54-
|----------|-------------|
55-
| `SUPERSTREAM_DISABLED` | `true` disables all functionality |
56-
| `SUPERSTREAM_DEBUG` | `true` prints verbose debug logs |
57-
| `SUPERSTREAM_TOPICS_LIST` | Comma-separated list of topics your application *may* write to |
58-
| `SUPERSTREAM_LATENCY_SENSITIVE` | `true` prevents the agent from lowering `linger.ms` |
100+
```dockerfile
101+
FROM python:3.8-slim
59102

60-
At start-up the agent logs the set of variables detected:
103+
# Install superclient
104+
RUN pip install superclient
61105

106+
# Your application code
107+
COPY . /app
108+
WORKDIR /app
109+
110+
# Run your application
111+
CMD ["python", "your_app.py"]
62112
```
63-
[superstream] INFO agent: Superstream Agent initialized with environment variables: {'SUPERSTREAM_DEBUG': 'true', ...}
64-
```
65113

66-
---
114+
### Required Environment Variables
115+
116+
- `SUPERSTREAM_TOPICS_LIST`: Comma-separated list of topics your application produces to
67117

68-
## How it works
118+
### Optional Environment Variables
69119

70-
1. An **import hook** patches the producer classes once their modules are
71-
imported.
72-
2. When your code creates a producer the agent:
73-
a. Skips internal Superstream clients (their `client_id` starts with
74-
`superstreamlib-`).
75-
b. Fetches the latest optimisation metadata from
76-
`superstream.metadata_v1`.
77-
c. Computes an optimal configuration for the most impactful topic (or falls
78-
back to sensible defaults) while respecting the
79-
latency-sensitive flag.
80-
d. Overrides producer kwargs/in-dict values before the original constructor
81-
executes.
82-
e. Sends a *client_info* message to `superstream.clients` that contains both
83-
original and optimised configurations.
84-
3. A single background heartbeat thread (or async task for `aiokafka`) emits
85-
*client_stats* messages every `report_interval_ms` (default 5 minutes).
86-
4. When the application closes the producer the agent stops tracking it and
87-
ceases heart-beats.
120+
- `SUPERSTREAM_LATENCY_SENSITIVE`: Set to "true" to prevent any modification to linger.ms values
121+
- `SUPERSTREAM_DISABLED`: Set to "true" to disable optimization
122+
- `SUPERSTREAM_DEBUG`: Set to "true" to enable debug logs
88123

89-
---
124+
Example:
125+
```bash
126+
export SUPERSTREAM_TOPICS_LIST=orders,payments,user-events
127+
export SUPERSTREAM_LATENCY_SENSITIVE=true
128+
```
90129

91-
## Logging
130+
### SUPERSTREAM_LATENCY_SENSITIVE Explained
92131

93-
Log lines are printed to `stdout`/`stderr` and start with the `[superstream]`
94-
prefix so they integrate with existing log pipelines. Set
95-
`SUPERSTREAM_DEBUG=true` for additional diagnostic messages.
132+
The linger.ms parameter follows these rules:
96133

97-
---
134+
1. If SUPERSTREAM_LATENCY_SENSITIVE is set to true:
135+
- Linger value will never be modified, regardless of other settings
98136

99-
## Security & compatibility
137+
2. If SUPERSTREAM_LATENCY_SENSITIVE is set to false or not set:
138+
- If no explicit linger exists in original configuration: Use Superstream's optimized value
139+
- If explicit linger exists: Use the maximum of original value and Superstream's optimized value
100140

101-
• Authentication/SSL/SASL/DNS settings are **copied from your original
102-
configuration** to every short-lived internal client.
103-
• The agent only relies on the Kafka library already present in your
104-
environment, therefore **no dependency conflicts** are introduced.
105-
• All exceptions are caught internally; your application will **never crash or
106-
hang** because of Superclient.
141+
## Prerequisites
107142

108-
---
143+
- Python 3.8 or higher
144+
- Kafka cluster that is connected to the Superstream's console
145+
- Read and write permissions to the `superstream.*` topics
109146

110147
## License
111148

112-
Apache 2.0
149+
This project is licensed under the Apache License 2.0.

sitecustomize.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22
import superclient
33
except Exception as e:
44
import sys
5-
sys.stderr.write(f"[superstream] ERROR [ERR-001] Failed to import superclient: {str(e)}\n")
5+
sys.stderr.write(f"[superstream] ERROR [ERR-000] Failed to import superclient: {str(e)}\n")
66
sys.stderr.flush()

0 commit comments

Comments
 (0)