Skip to content

Add Ministral/Mistral model implementation#5

Merged
dfalbel merged 1 commit intomainfrom
add-ministral
Feb 9, 2026
Merged

Add Ministral/Mistral model implementation#5
dfalbel merged 1 commit intomainfrom
add-ministral

Conversation

@dfalbel
Copy link
Member

@dfalbel dfalbel commented Feb 9, 2026

Summary

  • Implements Ministral-style models with YaRN RoPE and GQA (Grouped Query Attention)
  • Supports both standard Mistral models (e.g., mistralai/Mistral-7B-v0.1) and multimodal Ministral models
  • Verified against HuggingFace transformers with max diff ~6e-7

Features

  • YaRN RoPE: Extended context support with factor, beta_fast, beta_slow, mscale parameters
  • GQA: Configurable n_head vs n_kv_head for grouped query attention
  • SwiGLU MLP: Gate/up/down projections with SiLU activation
  • RMSNorm: Pre-normalization

API

# Load pretrained model
model <- ministral_from_pretrained("mistralai/Mistral-7B-v0.1")

# Or create with custom config
model <- ministral(vocab_size = 32000, n_embd = 4096, ...)

Test plan

  • Verified logits match Python transformers output (position 0: exact, position 2: max diff 0.005)
  • Test loading pretrained Mistral-7B
  • Test custom config model creation
  • Test text generation with streaming

🤖 Generated with Claude Code

Implements Ministral-style models with:
- YaRN RoPE (Yet another RoPE extension) for extended context
- GQA (Grouped Query Attention) with configurable num_key_value_heads
- SwiGLU MLP with SiLU activation
- RMSNorm

Verified against HuggingFace transformers MistralForCausalLM with
max diff ~6e-7 (floating point precision).

Includes tests for:
- Loading pretrained Mistral-7B and comparing logits
- Creating models with custom config
- Text generation with streaming output

Co-Authored-By: Claude <noreply@anthropic.com>
@dfalbel dfalbel merged commit 655d881 into main Feb 9, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant