🥥 Project Coconut: The S-Tier AI MLOps Harness (v1.1.0)

Project Coconut is a professional, modular AI ecosystem designed for high-performance production deployments. It bridges the gap between raw models and scalable services with its unique 5-layer architecture.

🏛️ S-Tier Architecture (The "Modular Armor")

Hardware Auto-Sensing: Automatically detects NVIDIA GPUs and applies FP16 precision. Falls back to optimized FP32 on CPUs.
Semantic Inference Cache: Verified 99% latency reduction for repeat queries. Responds in <1s by reusing previous model logic.
Dynamic Feature Toggles: Enable/Disable RAG, Semantic Caching, and Session Memory instantly via environment variables.
Auto-MLOps (CI/CD): Integrated GitHub Actions with Trivy security scanning and automated Docker Hub deployment.

🚀 The S-Tier Setup Workflow (Step-by-Step)

Follow this exact sequence to go from a clean server to a production-ready AI backend.

Step 1: Initialize the Stack

Option A: Local Build (Developer Mode)

git clone https://github.com/Omdeepb69/Cococnut-container.git
cd coconut
docker compose up --build -d

Option B: Docker Cloud (Production Mode)

# Pull the pre-built S-Tier image
docker pull omdeep22/coconut_can:latest

Step 2: Establish Identity (Generate API Key)

The API is locked by default. Generate your first production key:

# This creates a 'pro' tier key with 100 req/min limit
curl -X POST "http://localhost:8000/generate-key?tier=pro"

Important

Save the returned api_key immediately. It is hashed for security and cannot be shown again.

Step 4: Verify the Pipeline

# Replace YOUR_KEY with the key from Step 2
curl -X POST "http://localhost:8000/chat" \
     -H "X-API-Key: YOUR_KEY" \
     -H "Content-Type: application/json" \
     -d '{"prompt": "What is Project Coconut?", "session_id": "init_test"}'

🧩 Universal Model Compatibility

Project Coconut's S-Tier Engine is built on the industry-standard AutoModelForCausalLM and AutoTokenizer frameworks. This means you can swap the "Brain" with almost any model on Hugging Face simply by changing the MODEL_ID.

Supported Model Families

Mistral / Mixtral: mistralai/Mistral-7B-Instruct-v0.2
Llama 3 / 2: meta-llama/Meta-Llama-3-8B-Instruct
Gemma: google/gemma-7b-it
Falcon: tiiuae/falcon-7b-instruct
GPT-2 / Neo: gpt2 (Great for low-resource testing)

How it works?

The harness uses apply_chat_template, which automatically detects and applies the correct prompt format (System/User/Assistant) for whichever model you choose. No manual prompt engineering required!

Tip

Resource Planning: Large models (7B+ parameters) require significant VRAM/RAM. Ensure your server has 16GB+ RAM for 7B models on CPU, or 24GB+ VRAM for GPU deployment.

🐋 Docker Registry: The "Coconut Can" Deep Dive

If you are using the image omdeep22/coconut_can without the full repository, use these commands to control the engine.

1. Simple Run (Standalone)

# Start a Redis dependency first
docker run -d --name redis-brain -p 6380:6379 redis/redis-stack:latest

# Launch the Coconut Can
docker run -d \
  --name coconut-engine \
  -p 8000:8000 \
  -e REDIS_HOST=host.docker.internal \
  -e REDIS_PORT=6380 \
  omdeep22/coconut_can:latest

2. Full Ingestion (Adding Data)

# Run ingestion inside the active container
docker exec -it coconut-engine python3 ingest.py "The secret verification code is COCO-99."

3. Hardware Acceleration (NVIDIA GPU)

docker run -d \
  --name coconut-gpu \
  --gpus all \
  -p 8000:8000 \
  -e DEVICE=cuda \
  omdeep22/coconut_can:latest

4. System Maintenance & Log Monitoring

Task	Command
Follow Live AI Logic	`docker logs -f coconut-engine`
Check Resource Usage	`docker stats coconut-engine`
Enter Shell (Debug)	`docker exec -it coconut-engine bash`
Inspect Env Config	`docker inspect coconut-engine
Reset Knowledge	`docker exec redis-stack redis-cli FT.DROPINDEX coconut_idx`
Clear Logic Cache	`docker exec redis-stack redis-cli FT.DROPINDEX coconut_cache_idx`
Debug Auth Keys	`docker exec redis-stack redis-cli KEYS "api_key:*"`
Clean Deep Exit	`docker system prune -a --volumes`

🛠️ How to "Edit Setup" (Configuration)

Project Coconut is designed to be modified without changing code. You can inject these as environment variables (-e).

1. Changing the AI Model

To swap the "Brain", set the MODEL_ID:

docker run -e MODEL_ID=gpt2 omdeep22/coconut_can:latest

2. Toggling Production Features

ENABLE_RAG=False: Disables knowledge lookup.
ENABLE_CACHE=False: Disables semantic reuse (forces fresh AI generation every time).
ENABLE_MEMORY=False: Disables session history.
CACHE_THRESHOLD=0.90: Makes the semantic cache harder to hit (for more precise matches).

3. Hardware Maximizer: 4/8-bit Quantization

Run 7B+ models on consumer GPUs (T4, 3060) by enabling quantization:

docker run -e LOAD_IN_4BIT=True omdeep22/coconut_can

4. Production UX: Real-Time Streaming

Use the new streaming endpoint for a high-end, typing-effect UI:

curl -X POST "http://localhost:8000/chat/stream" \
     -H "Content-Type: application/json" \
     -d '{"prompt": "Hello Gonyai!"}'

☸️ S-Tier Kubernetes Scaling (1M+ Users)

For hyper-scale deployments, use the provided Kubernetes manifests to orchestrate a resilient cluster.

1. Unified Deployment

Apply all manifests in the k8s/ directory to launch the API, Redis cluster, and Auto-scaler:

kubectl apply -f k8s/

2. Manual Scaling (Emergency)

If traffic spikes beyond the HPA's reaction time, scale manually:

kubectl scale deployment coconut-api --replicas=50

3. Monitoring the Cluster

Pods Status: kubectl get pods -l app=coconut
Auto-scaling events: kubectl get hpa coconut-hpa
Service URL: kubectl get svc coconut-service

🛠️ Gonyai Production Manual (The Master Commands)

Use these essential commands to manage and secure your Gonyai Engine deployment.

🐳 1. Docker Deployment (Hardened)

Run the AI engine with a secure Admin Root Key and 4-bit Quantization.

docker run -d \
  -p 8000:8000 \
  -e ADMIN_ROOT_KEY="your_master_key_2026" \
  -e LOAD_IN_4BIT=True \
  -e DEVICE=cuda \
  omdeep22/coconut:latest

☸️ 2. Kubernetes Setup (Resilient)

# 1. Apply Secrets (Edit k8s/secrets.yaml first)
kubectl apply -f k8s/secrets.yaml

# 2. Launch the Stack
kubectl apply -f k8s/

🔑 3. Identity Management (Issue User Keys)

Since the API is secured, use your ADMIN_ROOT_KEY to generate keys for your users.

curl -X POST "http://localhost:8000/generate-key?tier=pro" \
     -H "X-API-KEY: your_master_key_2026"

📊 4. Observability & Health

Monitor the "Brain" and track real-time cache hits/latency.

Readiness: curl http://localhost:8000/health/ready
Metrics: curl http://localhost:8000/metrics

🧪 5. Real-Time Streaming Test

curl -N -X POST "http://localhost:8000/chat/stream" \
     -H "X-API-KEY: user_api_key" \
     -H "Content-Type: application/json" \
     -d '{"prompt": "Give me a 5-step plan for global scale."}'

Created with ❤️ by

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
k8s		k8s
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
brain.py		brain.py
cache.py		cache.py
config.py		config.py
docker-compose.yml		docker-compose.yml
ingest.py		ingest.py
main.py		main.py
memory.py		memory.py
requirements.txt		requirements.txt
security.py		security.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🥥 Project Coconut: The S-Tier AI MLOps Harness (v1.1.0)

🏛️ S-Tier Architecture (The "Modular Armor")

🚀 The S-Tier Setup Workflow (Step-by-Step)

Step 1: Initialize the Stack

Step 2: Establish Identity (Generate API Key)

Step 4: Verify the Pipeline

🧩 Universal Model Compatibility

Supported Model Families

How it works?

🐋 Docker Registry: The "Coconut Can" Deep Dive

1. Simple Run (Standalone)

2. Full Ingestion (Adding Data)

3. Hardware Acceleration (NVIDIA GPU)

4. System Maintenance & Log Monitoring

🛠️ How to "Edit Setup" (Configuration)

1. Changing the AI Model

2. Toggling Production Features

3. Hardware Maximizer: 4/8-bit Quantization

4. Production UX: Real-Time Streaming

☸️ S-Tier Kubernetes Scaling (1M+ Users)

1. Unified Deployment

2. Manual Scaling (Emergency)

3. Monitoring the Cluster

🛠️ Gonyai Production Manual (The Master Commands)

🐳 1. Docker Deployment (Hardened)

☸️ 2. Kubernetes Setup (Resilient)

🔑 3. Identity Management (Issue User Keys)

📊 4. Observability & Health

🧪 5. Real-Time Streaming Test

About

Uh oh!

Releases

Packages

Languages

Omdeepb69/Cococnut-container

Folders and files

Latest commit

History

Repository files navigation

🥥 Project Coconut: The S-Tier AI MLOps Harness (v1.1.0)

🏛️ S-Tier Architecture (The "Modular Armor")

🚀 The S-Tier Setup Workflow (Step-by-Step)

Step 1: Initialize the Stack

Step 2: Establish Identity (Generate API Key)

Step 4: Verify the Pipeline

🧩 Universal Model Compatibility

Supported Model Families

How it works?

🐋 Docker Registry: The "Coconut Can" Deep Dive

1. Simple Run (Standalone)

2. Full Ingestion (Adding Data)

3. Hardware Acceleration (NVIDIA GPU)

4. System Maintenance & Log Monitoring

🛠️ How to "Edit Setup" (Configuration)

1. Changing the AI Model

2. Toggling Production Features

3. Hardware Maximizer: 4/8-bit Quantization

4. Production UX: Real-Time Streaming

☸️ S-Tier Kubernetes Scaling (1M+ Users)

1. Unified Deployment

2. Manual Scaling (Emergency)

3. Monitoring the Cluster

🛠️ Gonyai Production Manual (The Master Commands)

🐳 1. Docker Deployment (Hardened)

☸️ 2. Kubernetes Setup (Resilient)

🔑 3. Identity Management (Issue User Keys)

📊 4. Observability & Health

🧪 5. Real-Time Streaming Test

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages