DeepSeek Inference API

Description

A FastAPI-based service for running inference with DeepSeek language models. This API provides a simple interface for text generation using DeepSeek's 7B model.

Deploy DeepSeek-R1 on GCP Cloud Run!

Why Cloud Run?

Cloud Run offers several advantages for deploying AI inference APIs, especially for early-stage projects and startups:

Pay-per-use pricing: Only pay for actual compute time used, ideal for sporadic workloads
Auto-scaling: Scales to zero when not in use, perfect for development and testing
Cost efficiency: No need to maintain constantly running instances
Serverless: Focus on code, not infrastructure
GPU support: Access to T4/V100 GPUs without long-term commitments
Quick deployment: From code to production in minutes

Benefits for Early-Stage Projects

Cost Optimization
- Zero cost when the service is idle
- Perfect for development and testing phases
- No minimum monthly commitments
Development Flexibility
- Easy A/B testing of different models
- Quick iteration and deployment
- Simple rollback capabilities
Security & Control
- Self-hosted solution reduces dependency on third-party services
- Protection against service disruptions
- Full control over model versions and updates
Scalability
- Handles traffic spikes automatically
- Scales down to zero during quiet periods
- No infrastructure management overhead

Features

FastAPI-based REST API
Support for DeepSeek models
Environment-based configuration
Token-based authentication with Hugging Face
Docker support for containerization
GPU acceleration support

Prerequisites

Python 3.11+
Hugging Face account and API token
GPU support (recommended)
pip or another Python package manager

Cost Optimization Tips

Set Concurrency
- Adjust request concurrency to optimize resource usage
- Example: --concurrency 80
Memory/CPU Allocation
- Start with minimal resources
- Scale up based on actual usage patterns
Monitoring
- Use Cloud Monitoring to track usage
- Set up alerts for unusual patterns

Download the model During Build time rather than at runtime

This approach:

Downloads model during build time
Caches the model in the image
Uses local files at runtime
No need for token at runtime
Faster container startup

Benefits:

Faster cold starts
No runtime downloads
More reliable
Works in airgapped environments
Better for production
The tradeoff is a larger container image, but the runtime benefits usually outweigh this.

Installation

Clone or Fork the repository

Usage

# set up env
python3 -m venv venv

# activate the python environment
source venv/bin/activate

# install dependencies
pip install -r requirements.txt

# run the FastAPI application (development)
uvicorn src.main:app --reload --port 8000

Running Docker Locally

# Build with Hugging Face token
docker build -t deepseek-inference-api .

# Run the container
docker run deepseek-inference-api

deploy using one of these two methods:

Option 1: Using Cloud Build directly

# Build the image using Cloud Build
gcloud builds submit --config cloudbuild.yaml

# Then deploy the built image to Cloud Run
gcloud run deploy deepseek-service \
    --image gcr.io/$PROJECT_ID/deepseek-inference-api \
    --region us-central1 \
    --platform managed \
    --gpu \
    --memory 16Gi \
    --cpu 4 \
    --allow-unauthenticated

Option 2: Using gcloud run deploy with source

gcloud run deploy deepseek-service \
    --source . \
    --region us-central1 \
    --platform managed \
    --gpu \
    --memory 16Gi \
    --cpu 4 \
    --allow-unauthenticated

Option 3: ClickOps with Cloud Run

Simply go to Cloud Run on the GCP UI
Create a new Cloud Run Service
Connect to Cloud Build and to the Github Service associated with this code
Deploy with the Dockerfile
Ensure the Cloud Run Service has the same recommended config for serverless GPU

Running Tests

Simply run pytest to run the unit & integration tests.

pytest

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepSeek Inference API

Description

Deploy DeepSeek-R1 on GCP Cloud Run!

Why Cloud Run?

Benefits for Early-Stage Projects

Features

Prerequisites

Cost Optimization Tips

Download the model During Build time rather than at runtime

Installation

Usage

Running Docker Locally

Option 1: Using Cloud Build directly

Option 2: Using gcloud run deploy with source

Option 3: ClickOps with Cloud Run

Running Tests

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ve-varun-sharma/deepseek-cloud-run-inference-api

Folders and files

Latest commit

History

Repository files navigation

DeepSeek Inference API

Description

Deploy DeepSeek-R1 on GCP Cloud Run!

Why Cloud Run?

Benefits for Early-Stage Projects

Features

Prerequisites

Cost Optimization Tips

Download the model During Build time rather than at runtime

Installation

Usage

Running Docker Locally

Option 1: Using Cloud Build directly

Option 2: Using gcloud run deploy with source

Option 3: ClickOps with Cloud Run

Running Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages