Skip to content

mnfst/awesome-free-llm-apis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


Awesome Free LLM APIs

Awesome

LLM APIs with permanent free tiers for text inference.



Contents

Provider APIs

APIs run by the companies that train or fine-tune the models themselves.

  • Cohere πŸ‡ΊπŸ‡Έ - Command A, Command R+, Aya Expanse 32B +9 more. 20 RPM, 1K/mo.
  • Google Gemini πŸ‡ΊπŸ‡Έ - Gemini 2.5 Pro, Flash, Flash-Lite +4 more. 5-15 RPM, 100-1K RPD. 1
  • Mistral AI πŸ‡ͺπŸ‡Ί - Mistral Large 3, Small 3.1, Ministral 8B +3 more. 1 req/s, 1B tok/mo.
  • Zhipu AI πŸ‡¨πŸ‡³ - GLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-Flash. Limits undocumented.

Inference providers

Third-party platforms that host open-weight models from various sources.

  • Cerebras πŸ‡ΊπŸ‡Έ - Llama 3.3 70B, Qwen3 235B, GPT-OSS-120B +3 more. 30 RPM, 14,400 RPD.
  • Cloudflare Workers AI πŸ‡ΊπŸ‡Έ - Llama 3.3 70B, Qwen QwQ 32B +47 more. 10K neurons/day.
  • GitHub Models πŸ‡ΊπŸ‡Έ - GPT-4o, Llama 3.3 70B, DeepSeek-R1 +more. 10-15 RPM, 50-150 RPD.
  • Groq πŸ‡ΊπŸ‡Έ - Llama 3.3 70B, Llama 4 Scout, Kimi K2 +17 more. 30 RPM, 1K RPD (14,400 for Llama 3.1 8B). 2
  • Hugging Face πŸ‡ΊπŸ‡Έ - Llama 3.3 70B, Qwen2.5 72B, Mistral 7B +many more. $0.10/mo in free credits.
  • Kluster AI πŸ‡ΊπŸ‡Έ - DeepSeek-R1, Llama 4 Maverick, Qwen3-235B +2 more. Limits undocumented.
  • LLM7.io πŸ‡¬πŸ‡§ - DeepSeek R1, Flash-Lite, Qwen2.5 Coder +27 more. 30 RPM (120 with token).
  • NVIDIA NIM πŸ‡ΊπŸ‡Έ - Llama 3.3 70B, Mistral Large, Qwen3 235B +more. 40 RPM.
  • Ollama Cloud πŸ‡ΊπŸ‡Έ - DeepSeek-V3.2, Qwen3.5, Kimi-K2.5 +17 more. 1 concurrent model, light usage. 3
  • OpenRouter πŸ‡ΊπŸ‡Έ - DeepSeek R1, Llama 3.3 70B, GPT-OSS-120B +29 more. 20 RPM, 50 RPD (1K with $10+ in purchased credits). 4
  • SiliconFlow πŸ‡¨πŸ‡³ - Qwen3-8B, DeepSeek-R1-Distill-Qwen-7B, GLM-4.1V-9B-Thinking +10 more. 1K RPM, 50K TPM.

Contributing

Know a free tier that's missing? Open a PR. Include the provider, endpoint, rate limits (link to their docs), and a few notable models. Trial credits and time-limited promos don't count.

Footnotes

  • RPM -- requests per minute. RPD -- requests per day.
  • "Limits undocumented" means the provider doesn't publish their rate limits.
  • All endpoints are OpenAI SDK-compatible unless noted.
  • Each link points to the provider's API key page.

Footnotes

  1. Free tier not available in the EU, UK, or Switzerland (available regions). ↩

  2. 14,400 RPD only applies to Llama 3.1 8B Instant. Most other models (Llama 3.3 70B, Llama 4 Scout, Kimi K2, etc.) are limited to 1,000 RPD (rate limits). ↩

  3. Ollama Cloud measures usage by GPU time, not tokens or requests. Free tier described as "light usage" with session limits resetting every 5 hours and weekly limits every 7 days. Pro (50x more) and Max (250x more) plans available. Not OpenAI SDK-compatible; uses Ollama API. ↩

  4. Free models default to 50 RPD. A one-time purchase of $10+ in credits unlocks 1,000 RPD for free models. OpenRouter also offers a Free Models Router (openrouter/free) and model fallbacks for chaining models in priority order. ↩

Releases

No releases published

Packages

 
 
 

Contributors