Releases: Player9753193/CrystalLM
CrystalLM
CrystalLM
🚀 Vocabulary-Governed GRU Release
This release marks a structural stabilization milestone for CrystalLM.
✨ Highlights
-
GRU-based word-level language model
- Replaced earlier experimental RNN variants with a stable multi-layer GRU
- Improved training stability and long-context behavior
-
Vocabulary governance refined
- Explicit low-frequency pruning
- More controlled
<UNK>mapping - Reduced vocabulary noise and improved generation coherence
-
Generation behavior improved
- Fixed premature
<END>termination - Temperature sampling behavior is now consistent and predictable
- Fixed premature
-
Training workflow hardened
- More stable loss convergence
- CPU-friendly training path (Intel Mac verified)
- Checkpoint reuse without retraining
🎯 Scope & Intent
This release prioritizes clarity, correctness, and observability over scale.
It represents a solid educational baseline for understanding word-level language modeling before moving to Transformers.
CrystalLM
Overview
This release marks a major milestone in the training stability and learning capacity of the model. With extended training (10 epochs) on a larger Chinese corpus, the model now demonstrates significantly improved convergence and basic text generation ability.
Loss Progression
The model shows a clear and healthy downward trend in loss:
| Epoch | Loss |
|---|---|
| 0 | 5.8614 |
| 1 | 4.9849 |
| 2 | 4.3866 |
| 3 | 3.8325 |
| 4 | 3.3136 |
| 5 | 2.8286 |
| 6 | 2.3916 |
| 7 | 1.9992 |
| 8 | 1.6587 |
| 9 | 1.3666 |
This indicates effective learning with no signs of divergence or early overfitting.
Improvements
- Substantial reduction in training loss across all epochs
- More stable token prediction behavior
- The model begins to reproduce sentence-like structures instead of random noise
- Improved handling of common vocabulary and punctuation
Known Limitations
- Generated text may still contain semantic incoherence
- Long-range context is not reliably preserved
- Occasional token repetition or malformed phrases
- Output quality varies depending on prompt length and structure
Model Status
- ✅ Training completed successfully
- ✅ Model checkpoint saved
- 🧪 Still in experimental / research phase
Next Steps
- Introduce sequence-aware architectures (e.g., RNN/LSTM or Transformer-style attention)
- Improve context window handling
- Expand and clean training data
- Add validation metrics beyond training loss
- Experiment with sampling strategies (temperature, top-k, top-p)
📊 Training Snapshot
Prefix dict has been built successfully.
词数: 147719
训练文本总长度(字符数):236696
训练文本总行数:979
训练文本独立字符数(vocab size):3463
词表大小: 5602
epoch 0, loss 5.8614, [10:58:50]
epoch 1, loss 4.9849, [11:03:26]
epoch 2, loss 4.3866, [11:07:30]
epoch 3, loss 3.8325, [11:10:58]
epoch 4, loss 3.3136, [11:14:39]
epoch 5, loss 2.8286, [11:18:40]
epoch 6, loss 2.3916, [11:22:22]
epoch 7, loss 1.9992, [11:25:47]
epoch 8, loss 1.6587, [11:29:19]
epoch 9, loss 1.3666, [11:32:43]
✅ 模型已保存
你收拾出生黑曜石…一马时间……挖是备份真实的墙,比如你区域体会与一百二十八系统
CrystalLM
This release marks the final iteration of CrystalLM using a single-layer GRU architecture.
The focus of this version is stability, memory efficiency, and clarity of the training pipeline, before moving on to deeper or more advanced models.
✨ Highlights
-
Single-layer GRU architecture
- Simple, interpretable, and stable
- Serves as a clean baseline for future multi-layer or Transformer-based experiments
-
Significantly reduced memory usage
- Training memory footprint reduced from ~16 GB to ~350 MB
- Enables training on consumer-grade hardware
-
Expanded training corpus
- ~125k tokens
- ~200k characters
- Vocabulary size: ~14k
- Improved output diversity compared to earlier versions
-
Stable convergence
- Loss consistently drops from ~6.6 to ~2.6 within a few epochs
- No observed exploding or vanishing gradients
📊 Training Snapshot
Prefix dict has been built successfully.
词数: 125425
训练文本总长度(字符数):200945
训练文本总行数:805
训练文本独立字符数(vocab size):3119
词表大小: 14202
epoch 0, loss 6.6937, [23:18:20]
epoch 1, loss 5.4224, [23:23:00]
epoch 2, loss 4.3564, [23:27:31]
epoch 3, loss 3.3981, [23:30:59]
epoch 4, loss 2.6177, [23:33:42]
✅ 模型已保存
你适应力很小安眠,从最高点垂直向下喜欢它,从强大的身份到一些自然松弛配额,便会面临、精确303才是自省的能力,和对现有的责任是影响开阔的命题。
CrystalLM
🚀 Performance & Training Improvements
This release focuses on dramatically reducing memory usage during training and improving dataset scale, making CrystalLM usable on low-memory machines.
✨ What’s new
-
💾 Massive memory optimization
- Replaced full in-memory training tensor construction with
Dataset + DataLoader - Training RAM usage reduced from ~16 GB → ~250 MB
- Enables smooth training on older devices (e.g. 8GB RAM MacBook Pro with i5-7360U)
- Replaced full in-memory training tensor construction with
-
📚 Larger training corpus
- Increased total training text size and token count
- Improved language continuity and semantic richness in generation
-
⚙️ Training stability
- Mini-batch training instead of full-batch forward pass
- Faster iteration speed and more predictable loss curves
🧠 Model
- Word-level language model
- Embedding + LSTM architecture (unchanged)
- Context window: 20 tokens
📦 Notes
- Model quality improves noticeably with additional text
- This version lays the foundation for future architecture upgrades (e.g. GRU / Transformer)
CrystalLM
CrystalLM is a micro language model implemented from scratch using PyTorch.
It is designed as a learning-oriented project for exploring tokenization,
vocabulary construction, and text generation on Chinese / mixed-language data.
Current Status
- Word-level text generation
- Custom tokenizer and data pipeline
- The model is still in an early stage and may produce
<UNK>tokens
Purpose
This project focuses on understanding:
- Basic language model architectures
- Vocabulary, OOV handling, and context window design
- End-to-end training and inference workflows
CrystalLM is a learning-oriented project and is not intended for production use.
CrystalLM
Publish New Model