|
1 | | -# Superstream Client for Python (`superclient`) |
| 1 | +<div align="center"> |
2 | 2 |
|
3 | | -Superclient is a zero-code optimisation agent for Python applications that use Apache Kafka. |
4 | | -It transparently intercepts producer creation in the popular client libraries and tunes |
5 | | -configuration parameters (compression, batching, etc.) based on recommendations |
6 | | -provided by the Superstream platform. |
| 3 | +<img src="https://github.com/user-attachments/assets/35899c78-24eb-4507-97ed-e87e84c49fea#gh-dark-mode-only" width="300"> |
| 4 | +<img src="https://github.com/user-attachments/assets/8a7bca49-c362-4a8c-945e-a331fb26d8eb#gh-light-mode-only" width="300"> |
7 | 5 |
|
8 | | ---- |
| 6 | +</div> |
9 | 7 |
|
10 | | -## Why use Superclient? |
| 8 | +# Superstream Client For Python |
11 | 9 |
|
12 | | -• **No code changes** – simply install the package and run your program. |
13 | | -• **Dynamic configuration** – adapts to cluster-specific and topic-specific insights |
14 | | - coming from `superstream.metadata_v1`. |
15 | | -• **Safe by design** – any internal failure falls back to your original |
16 | | - configuration; the application never crashes because of the agent. |
17 | | -• **Minimal overhead** – uses a single lightweight background thread (or an async |
18 | | - coroutine when running with `aiokafka`). |
| 10 | +A Python library for automatically optimizing Kafka producer configurations based on topic-specific recommendations. |
19 | 11 |
|
20 | | ---- |
| 12 | +## Overview |
21 | 13 |
|
22 | | -## Supported Kafka libraries |
| 14 | +Superstream Clients works as a Python import hook that intercepts Kafka producer creation and applies optimized configurations without requiring any code changes in your application. It dynamically retrieves optimization recommendations from Superstream and applies them based on impact analysis. |
23 | 15 |
|
24 | | -| Library | Producer class | Status | |
25 | | -|---------|----------------|--------| |
26 | | -| kafka-python | `kafka.KafkaProducer` | ✓ implemented | |
27 | | -| aiokafka | `aiokafka.AIOKafkaProducer` | ✓ implemented | |
28 | | -| confluent-kafka | `confluent_kafka.Producer` | ✓ implemented | |
| 16 | +## Supported Libraries |
29 | 17 |
|
30 | | -Other libraries/frameworks that wrap these producers (e.g. Faust, FastAPI event |
31 | | -publishers, Celery Kafka back-ends) inherit the benefits automatically. |
| 18 | +Works with any Python library that implements Kafka producers, including: |
32 | 19 |
|
33 | | ---- |
| 20 | +- kafka-python |
| 21 | +- aiokafka |
| 22 | +- confluent-kafka |
| 23 | +- Faust |
| 24 | +- FastAPI event publishers |
| 25 | +- Celery Kafka backends |
| 26 | +- Any custom wrapper around these Kafka clients |
| 27 | + |
| 28 | +## Features |
| 29 | + |
| 30 | +- **Zero-code integration**: No code changes required in your application |
| 31 | +- **Dynamic configuration**: Applies optimized settings based on topic-specific recommendations |
| 32 | +- **Intelligent optimization**: Identifies the most impactful topics to optimize |
| 33 | +- **Graceful fallback**: Falls back to default settings if optimization fails |
| 34 | +- **Minimal overhead**: Uses a single lightweight background thread (or async coroutine for aiokafka) |
| 35 | + |
| 36 | +## Important: Producer Configuration Requirements |
| 37 | + |
| 38 | +When initializing your Kafka producers, please ensure you pass the configuration as a mutable object. The Superstream library needs to modify the producer configuration to apply optimizations. The following initialization patterns are supported: |
| 39 | + |
| 40 | +✅ **Supported (Recommended)**: |
| 41 | +```python |
| 42 | +# Using kafka-python |
| 43 | +from kafka import KafkaProducer |
| 44 | +producer = KafkaProducer( |
| 45 | + bootstrap_servers=['localhost:9092'], |
| 46 | + compression_type='snappy', |
| 47 | + batch_size=16384 |
| 48 | +) |
| 49 | + |
| 50 | +# Using aiokafka |
| 51 | +from aiokafka import AIOKafkaProducer |
| 52 | +producer = AIOKafkaProducer( |
| 53 | + bootstrap_servers='localhost:9092', |
| 54 | + compression_type='snappy', |
| 55 | + batch_size=16384 |
| 56 | +) |
| 57 | + |
| 58 | +# Using confluent-kafka |
| 59 | +from confluent_kafka import Producer |
| 60 | +producer = Producer({ |
| 61 | + 'bootstrap.servers': 'localhost:9092', |
| 62 | + 'compression.type': 'snappy', |
| 63 | + 'batch.size': 16384 |
| 64 | +}) |
| 65 | +``` |
| 66 | + |
| 67 | +❌ **Not Supported**: |
| 68 | +```python |
| 69 | +# Using frozen dictionaries or immutable configurations |
| 70 | +from types import MappingProxyType |
| 71 | +config = MappingProxyType({ |
| 72 | + 'bootstrap.servers': 'localhost:9092' |
| 73 | +}) |
| 74 | +producer = KafkaProducer(**config) |
| 75 | +``` |
| 76 | + |
| 77 | +### Why This Matters |
| 78 | +The Superstream library needs to modify your producer's configuration to apply optimizations based on your cluster's characteristics. This includes adjusting settings like compression, batch size, and other performance parameters. When the configuration is immutable, these optimizations cannot be applied. |
34 | 79 |
|
35 | 80 | ## Installation |
36 | 81 |
|
| 82 | +### Step 1: Install Superclient |
| 83 | + |
37 | 84 | ```bash |
38 | 85 | pip install superclient |
39 | 86 | ``` |
40 | 87 |
|
41 | | -The package ships with a `sitecustomize.py` entry-point, therefore Python |
42 | | -imports the agent automatically **before your application's code starts**. |
43 | | -If `sitecustomize` is disabled in your environment you can initialise manually: |
| 88 | +### Step 2: Run |
| 89 | + |
| 90 | +The package ships with a `sitecustomize.py` entry-point, therefore Python imports the agent automatically before your application's code starts. If `sitecustomize` is disabled in your environment, you can initialize manually: |
44 | 91 |
|
45 | 92 | ```python |
46 | 93 | import superclient # side-effects automatically enable the agent |
47 | 94 | ``` |
48 | 95 |
|
49 | | ---- |
| 96 | +### Docker Integration |
50 | 97 |
|
51 | | -## Environment variables |
| 98 | +When using Superstream Clients with containerized applications, include the package in your Dockerfile: |
52 | 99 |
|
53 | | -| Variable | Description | |
54 | | -|----------|-------------| |
55 | | -| `SUPERSTREAM_DISABLED` | `true` disables all functionality | |
56 | | -| `SUPERSTREAM_DEBUG` | `true` prints verbose debug logs | |
57 | | -| `SUPERSTREAM_TOPICS_LIST` | Comma-separated list of topics your application *may* write to | |
58 | | -| `SUPERSTREAM_LATENCY_SENSITIVE` | `true` prevents the agent from lowering `linger.ms` | |
| 100 | +```dockerfile |
| 101 | +FROM python:3.8-slim |
59 | 102 |
|
60 | | -At start-up the agent logs the set of variables detected: |
| 103 | +# Install superclient |
| 104 | +RUN pip install superclient |
61 | 105 |
|
| 106 | +# Your application code |
| 107 | +COPY . /app |
| 108 | +WORKDIR /app |
| 109 | + |
| 110 | +# Run your application |
| 111 | +CMD ["python", "your_app.py"] |
62 | 112 | ``` |
63 | | -[superstream] INFO agent: Superstream Agent initialized with environment variables: {'SUPERSTREAM_DEBUG': 'true', ...} |
64 | | -``` |
65 | 113 |
|
66 | | ---- |
| 114 | +### Required Environment Variables |
| 115 | + |
| 116 | +- `SUPERSTREAM_TOPICS_LIST`: Comma-separated list of topics your application produces to |
67 | 117 |
|
68 | | -## How it works |
| 118 | +### Optional Environment Variables |
69 | 119 |
|
70 | | -1. An **import hook** patches the producer classes once their modules are |
71 | | - imported. |
72 | | -2. When your code creates a producer the agent: |
73 | | - a. Skips internal Superstream clients (their `client_id` starts with |
74 | | - `superstreamlib-`). |
75 | | - b. Fetches the latest optimisation metadata from |
76 | | - `superstream.metadata_v1`. |
77 | | - c. Computes an optimal configuration for the most impactful topic (or falls |
78 | | - back to sensible defaults) while respecting the |
79 | | - latency-sensitive flag. |
80 | | - d. Overrides producer kwargs/in-dict values before the original constructor |
81 | | - executes. |
82 | | - e. Sends a *client_info* message to `superstream.clients` that contains both |
83 | | - original and optimised configurations. |
84 | | -3. A single background heartbeat thread (or async task for `aiokafka`) emits |
85 | | - *client_stats* messages every `report_interval_ms` (default 5 minutes). |
86 | | -4. When the application closes the producer the agent stops tracking it and |
87 | | - ceases heart-beats. |
| 120 | +- `SUPERSTREAM_LATENCY_SENSITIVE`: Set to "true" to prevent any modification to linger.ms values |
| 121 | +- `SUPERSTREAM_DISABLED`: Set to "true" to disable optimization |
| 122 | +- `SUPERSTREAM_DEBUG`: Set to "true" to enable debug logs |
88 | 123 |
|
89 | | ---- |
| 124 | +Example: |
| 125 | +```bash |
| 126 | +export SUPERSTREAM_TOPICS_LIST=orders,payments,user-events |
| 127 | +export SUPERSTREAM_LATENCY_SENSITIVE=true |
| 128 | +``` |
90 | 129 |
|
91 | | -## Logging |
| 130 | +### SUPERSTREAM_LATENCY_SENSITIVE Explained |
92 | 131 |
|
93 | | -Log lines are printed to `stdout`/`stderr` and start with the `[superstream]` |
94 | | -prefix so they integrate with existing log pipelines. Set |
95 | | -`SUPERSTREAM_DEBUG=true` for additional diagnostic messages. |
| 132 | +The linger.ms parameter follows these rules: |
96 | 133 |
|
97 | | ---- |
| 134 | +1. If SUPERSTREAM_LATENCY_SENSITIVE is set to true: |
| 135 | + - Linger value will never be modified, regardless of other settings |
98 | 136 |
|
99 | | -## Security & compatibility |
| 137 | +2. If SUPERSTREAM_LATENCY_SENSITIVE is set to false or not set: |
| 138 | + - If no explicit linger exists in original configuration: Use Superstream's optimized value |
| 139 | + - If explicit linger exists: Use the maximum of original value and Superstream's optimized value |
100 | 140 |
|
101 | | -• Authentication/SSL/SASL/DNS settings are **copied from your original |
102 | | - configuration** to every short-lived internal client. |
103 | | -• The agent only relies on the Kafka library already present in your |
104 | | - environment, therefore **no dependency conflicts** are introduced. |
105 | | -• All exceptions are caught internally; your application will **never crash or |
106 | | - hang** because of Superclient. |
| 141 | +## Prerequisites |
107 | 142 |
|
108 | | ---- |
| 143 | +- Python 3.8 or higher |
| 144 | +- Kafka cluster that is connected to the Superstream's console |
| 145 | +- Read and write permissions to the `superstream.*` topics |
109 | 146 |
|
110 | 147 | ## License |
111 | 148 |
|
112 | | -Apache 2.0 |
| 149 | +This project is licensed under the Apache License 2.0. |
0 commit comments