superstreamlabs
diff --git a/‎.gitignore‎
Lines changed: 4 additions & 0 deletions b/‎.gitignore‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎.vscode/launch.json‎
Lines changed: 13 additions & 0 deletions b/‎.vscode/launch.json‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎.vscode/settings.json‎
Lines changed: 4 additions & 0 deletions b/‎.vscode/settings.json‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 5 additions & 0 deletions b/‎LICENSE‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎MANIFEST.in‎
Lines changed: 2 additions & 0 deletions b/‎MANIFEST.in‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 112 additions & 0 deletions b/‎README.md‎
Lines changed: 112 additions & 0 deletions
diff --git a/‎examples/asynciokafka.py‎
Lines changed: 85 additions & 0 deletions b/‎examples/asynciokafka.py‎
Lines changed: 85 additions & 0 deletions
diff --git a/‎examples/confluent.py‎
Lines changed: 89 additions & 0 deletions b/‎examples/confluent.py‎
Lines changed: 89 additions & 0 deletions
diff --git a/‎examples/json_generator.py‎
Lines changed: 43 additions & 0 deletions b/‎examples/json_generator.py‎
Lines changed: 43 additions & 0 deletions
@@ -0,0 +1,4 @@
+venv/
+__pycache__/
+*.py[cod]
+*$py.class
@@ -0,0 +1,13 @@
+{
+    "version": "0.2.0",
+    "configurations": [
+        {
+            "name": "Python: Main",
+            "type": "debugpy",
+            "request": "launch",
+            "program": "main.py",
+            "console": "integratedTerminal",
+            "justMyCode": true
+        }
+    ]
+}
@@ -0,0 +1,4 @@
+{
+    "python.defaultInterpreterPath": "${workspaceFolder}/venv/bin/python",
+    "python.terminal.activateEnvironment": true
+}
@@ -0,0 +1,5 @@
+Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 
@@ -0,0 +1,2 @@
+include README.md
+include LICENSE 
@@ -0,0 +1,112 @@
+# Superstream Client for Python (`superclient`)
+
+Superclient is a zero-code optimisation agent for Python applications that use Apache Kafka.  
+It transparently intercepts producer creation in the popular client libraries and tunes
+configuration parameters (compression, batching, etc.) based on recommendations
+provided by the Superstream platform.
+
+---
+
+## Why use Superclient?
+
+• **No code changes** – simply install the package and run your program.  
+• **Dynamic configuration** – adapts to cluster-specific and topic-specific insights
+  coming from `superstream.metadata_v1`.  
+• **Safe by design** – any internal failure falls back to your original
+  configuration; the application never crashes because of the agent.  
+• **Minimal overhead** – uses a single lightweight background thread (or an async
+  coroutine when running with `aiokafka`).
+
+---
+
+## Supported Kafka libraries
+
+| Library | Producer class | Status |
+|---------|----------------|--------|
+| kafka-python | `kafka.KafkaProducer` | ✓ implemented |
+| aiokafka | `aiokafka.AIOKafkaProducer` | ✓ implemented |
+| confluent-kafka | `confluent_kafka.Producer` | ✓ implemented |
+
+Other libraries/frameworks that wrap these producers (e.g. Faust, FastAPI event
+publishers, Celery Kafka back-ends) inherit the benefits automatically.
+
+---
+
+## Installation
+
+```bash
+pip install superclient
+```
+
+The package ships with a `sitecustomize.py` entry-point, therefore Python
+imports the agent automatically **before your application's code starts**.
+If `sitecustomize` is disabled in your environment you can initialise manually:
+
+```python
+import superclient  # side-effects automatically enable the agent
+```
+
+---
+
+## Environment variables
+
+| Variable | Description |
+|----------|-------------|
+| `SUPERSTREAM_DISABLED` | `true` disables all functionality |
+| `SUPERSTREAM_DEBUG` | `true` prints verbose debug logs |
+| `SUPERSTREAM_TOPICS_LIST` | Comma-separated list of topics your application *may* write to |
+| `SUPERSTREAM_LATENCY_SENSITIVE` | `true` prevents the agent from lowering `linger.ms` |
+
+At start-up the agent logs the set of variables detected:
+
+```
+[superstream] INFO agent: Superstream Agent initialized with environment variables: {'SUPERSTREAM_DEBUG': 'true', ...}
+```
+
+---
+
+## How it works
+
+1. An **import hook** patches the producer classes once their modules are
+   imported.
+2. When your code creates a producer the agent:  
+   a. Skips internal Superstream clients (their `client_id` starts with
+      `superstreamlib-`).  
+   b. Fetches the latest optimisation metadata from
+      `superstream.metadata_v1`.  
+   c. Computes an optimal configuration for the most impactful topic (or falls
+      back to sensible defaults) while respecting the
+      latency-sensitive flag.  
+   d. Overrides producer kwargs/in-dict values before the original constructor
+      executes.  
+   e. Sends a *client_info* message to `superstream.clients` that contains both
+      original and optimised configurations.
+3. A single background heartbeat thread (or async task for `aiokafka`) emits
+   *client_stats* messages every `report_interval_ms` (default 5 minutes).
+4. When the application closes the producer the agent stops tracking it and
+   ceases heart-beats.
+
+---
+
+## Logging
+
+Log lines are printed to `stdout`/`stderr` and start with the `[superstream]`
+prefix so they integrate with existing log pipelines.  Set
+`SUPERSTREAM_DEBUG=true` for additional diagnostic messages.
+
+---
+
+## Security & compatibility
+
+• Authentication/SSL/SASL/DNS settings are **copied from your original
+  configuration** to every short-lived internal client.  
+• The agent only relies on the Kafka library already present in your
+  environment, therefore **no dependency conflicts** are introduced.  
+• All exceptions are caught internally; your application will **never crash or
+  hang** because of Superclient.
+
+---
+
+## License
+
+Apache 2.0 
@@ -0,0 +1,85 @@
+"""
+Kafka Producer using aiokafka library
+This is an asyncio-based client, great for async Python applications
+"""
+import json
+import asyncio
+from aiokafka import AIOKafkaProducer
+from aiokafka.errors import KafkaError
+from json_generator import generate_random_json
+
+async def create_producer(client_id):
+    """Create and configure Kafka producer"""
+    producer = AIOKafkaProducer(
+        bootstrap_servers='localhost:9092',
+        client_id=client_id,
+        compression_type=None,  # No compression
+        max_batch_size=150,  # Batch size in bytes
+        linger_ms=10,  # Linger time
+        value_serializer=lambda v: json.dumps(v).encode('utf-8'),
+    )
+    await producer.start()
+    return producer
+
+async def send_messages_to_topics(producer, topics, producer_name, num_messages=50):
+    """Send random JSON messages to specified Kafka topics"""
+    
+    successful = 0
+    failed = 0
+    
+    for i in range(num_messages):
+        try:
+            # Generate random JSON of at least 1KB
+            message = generate_random_json(min_size_kb=1)
+            message['message_number'] = i + 1
+            message['producer'] = producer_name
+            
+            # Send message to each topic
+            for topic in topics:
+                result = await producer.send_and_wait(topic, message)
+            
+            successful += 1
+            
+        except KafkaError as e:
+            failed += 1
+            print(f"Failed to send message {i+1}: {e}")
+        
+        # Small delay between messages (optional)
+        await asyncio.sleep(0.01)
+    
+    print(f"\n{producer_name} Summary: {successful} successful, {failed} failed")
+
+async def main():
+    producer1 = None
+    producer2 = None
+    try:
+        # Create two separate producers
+        producer1 = await create_producer('aiokafka-producer-1')
+        producer2 = await create_producer('aiokafka-producer-2')
+        
+        # First producer sends to test-topic and test-topic-1
+        topics1 = ['test-topic', 'test-topic-1']
+        await send_messages_to_topics(producer1, topics1, 'aiokafka-producer-1')
+        
+        # Second producer sends to test-topic-2 and test-topic-3
+        topics2 = ['test-topic-2', 'test-topic-3']
+        await send_messages_to_topics(producer2, topics2, 'aiokafka-producer-2')
+        
+    except Exception as e:
+        print(f"Error: {e}")
+    finally:
+        if producer1:
+            await producer1.stop()
+            print("Producer 1 closed")
+        if producer2:
+            await producer2.stop()
+            print("Producer 2 closed")
+    
+    # Sleep for 10 minutes at the end
+    print("Sleeping for 10 minutes...")
+    await asyncio.sleep(600)
+    print("Sleep completed")
+
+if __name__ == "__main__":
+    # Run the async main function
+    asyncio.run(main())
@@ -0,0 +1,89 @@
+"""
+Kafka Producer using confluent-kafka library
+This is the Python wrapper around librdkafka (C library), offering high performance
+"""
+import json
+import time
+from confluent_kafka import Producer
+from json_generator import generate_random_json
+
+def delivery_report(err, msg):
+    """Callback for message delivery reports"""
+    if err is not None:
+        print(f'Message delivery failed: {err}')
+
+def create_producer(client_id):
+    """Create and configure Kafka producer"""
+    config = {
+        'bootstrap.servers': 'localhost:9092',
+        'client.id': client_id,
+        'compression.type': 'none',
+        'batch.size': 150,  # Batch size in bytes
+        'linger.ms': 10,  # Linger time
+    }
+    return Producer(config)
+
+def send_messages_to_topics(producer, topics, producer_name, num_messages=50):
+    """Send random JSON messages to specified Kafka topics"""
+    
+    successful = 0
+    failed = 0
+    
+    for i in range(num_messages):
+        try:
+            # Generate random JSON of at least 1KB
+            message = generate_random_json(min_size_kb=1)
+            message['message_number'] = i + 1
+            message['producer'] = producer_name
+            
+            # Serialize to JSON
+            message_json = json.dumps(message)
+            
+            # Send message to each topic
+            for topic in topics:
+                producer.produce(
+                    topic=topic,
+                    value=message_json.encode('utf-8'),
+                    callback=delivery_report
+                )
+                # Trigger any available delivery report callbacks
+                producer.poll(0)
+            
+            successful += 1
+            
+        except Exception as e:
+            failed += 1
+            print(f"Failed to send message {i+1}: {e}")
+        
+        # Small delay between messages (optional)
+        time.sleep(0.01)
+
+    producer.flush(timeout=30)
+    print(f"\n{producer_name} Summary: {successful} successful, {failed} failed")
+
+def main():
+    producer1 = None
+    producer2 = None
+    try:
+        # Create two separate producers
+        producer1 = create_producer('confluent-kafka-producer-1')
+        producer2 = create_producer('confluent-kafka-producer-2')
+        
+        # First producer sends to test-topic and test-topic-1
+        topics1 = ['test-topic', 'test-topic-1']
+        send_messages_to_topics(producer1, topics1, 'confluent-kafka-producer-1')
+        
+        # Second producer sends to test-topic-2 and test-topic-3
+        topics2 = ['test-topic-2', 'test-topic-3']
+        send_messages_to_topics(producer2, topics2, 'confluent-kafka-producer-2')
+        
+    except Exception as e:
+        print(f"Error: {e}")
+    
+    # Sleep for 10 minutes at the end
+    print("Sleeping for 10 minutes...")
+    time.sleep(600)
+    print("Sleep completed")
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,43 @@
+import random
+import string
+import json
+from datetime import datetime
+
+def generate_random_json(min_size_kb=1):
+    """Generate a random JSON object of at least min_size_kb size"""
+    base_data = {
+        "timestamp": datetime.now().isoformat(),
+        "event_id": f"evt_{random.randint(100000, 999999)}",
+        "user_id": f"user_{random.randint(1000, 9999)}",
+        "session_id": f"session_{random.randint(10000, 99999)}",
+        "event_type": random.choice(["click", "view", "purchase", "login", "logout"]),
+        "device_type": random.choice(["mobile", "desktop", "tablet"]),
+        "os": random.choice(["Windows", "macOS", "Linux", "iOS", "Android"]),
+        "browser": random.choice(["Chrome", "Firefox", "Safari", "Edge"]),
+        "country": random.choice(["US", "UK", "DE", "FR", "JP", "BR", "IN"]),
+        "metrics": {
+            "load_time": round(random.uniform(0.1, 5.0), 3),
+            "response_time": round(random.uniform(0.01, 1.0), 3),
+            "cpu_usage": round(random.uniform(0, 100), 2),
+            "memory_usage": round(random.uniform(0, 100), 2)
+        }
+    }
+    
+    # Calculate current size
+    current_json = json.dumps(base_data)
+    current_size = len(current_json.encode('utf-8'))
+    target_size = min_size_kb * 1024
+    
+    # Add padding data if needed to reach target size
+    if current_size < target_size:
+        padding_size = target_size - current_size
+        # Generate random string data for padding
+        padding_data = {
+            "additional_data": {
+                f"field_{i}": ''.join(random.choices(string.ascii_letters + string.digits, k=50))
+                for i in range(padding_size // 50)
+            }
+        }
+        base_data.update(padding_data)
+    
+    return base_data
-Original file line number
+Diff line change
@@ @@ -0,0 +1,4 @@ @@
 +{
 +    "python.defaultInterpreterPath": "${workspaceFolder}/venv/bin/python",
 +    "python.terminal.activateEnvironment": true
 +}
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+include README.md`
	`2`	`+include LICENSE`