25 Jan 08:50

Player9753193

3c8185b

CrystalLM Latest

Latest

距离我上次更新已经有好长时间了，这次突然想起来，我来更新一下，也是来汇报一次工作进度。

CrystalLM——A New Architect

我们现在更新了架构，从GRU更新到了GPT,中间的transformers架构由于效果不太好所以没发release，现在模型变成了训练生成分离，参数模型分离，更便于控制参数。

Assets 2

15 Jan 10:39

Player9753193

v1.1.1

43c5491

CrystalLM

🚀 Vocabulary-Governed GRU Release

This release marks a structural stabilization milestone for CrystalLM.

✨ Highlights

GRU-based word-level language model
- Replaced earlier experimental RNN variants with a stable multi-layer GRU
- Improved training stability and long-context behavior
Vocabulary governance refined
- Explicit low-frequency pruning
- More controlled <UNK> mapping
- Reduced vocabulary noise and improved generation coherence
Generation behavior improved
- Fixed premature <END> termination
- Temperature sampling behavior is now consistent and predictable
Training workflow hardened
- More stable loss convergence
- CPU-friendly training path (Intel Mac verified)
- Checkpoint reuse without retraining

🎯 Scope & Intent

This release prioritizes clarity, correctness, and observability over scale.
It represents a solid educational baseline for understanding word-level language modeling before moving to Transformers.

Assets 2

10 Jan 03:59

Player9753193

v1.1

a5ab1a7

CrystalLM

Overview

This release marks a major milestone in the training stability and learning capacity of the model. With extended training (10 epochs) on a larger Chinese corpus, the model now demonstrates significantly improved convergence and basic text generation ability.

Loss Progression

The model shows a clear and healthy downward trend in loss:

Epoch	Loss
0	5.8614
1	4.9849
2	4.3866
3	3.8325
4	3.3136
5	2.8286
6	2.3916
7	1.9992
8	1.6587
9	1.3666

This indicates effective learning with no signs of divergence or early overfitting.

Improvements

Substantial reduction in training loss across all epochs
More stable token prediction behavior
The model begins to reproduce sentence-like structures instead of random noise
Improved handling of common vocabulary and punctuation

Known Limitations

Generated text may still contain semantic incoherence
Long-range context is not reliably preserved
Occasional token repetition or malformed phrases
Output quality varies depending on prompt length and structure

Model Status

✅ Training completed successfully
✅ Model checkpoint saved
🧪 Still in experimental / research phase

Next Steps

Introduce sequence-aware architectures (e.g., RNN/LSTM or Transformer-style attention)
Improve context window handling
Expand and clean training data
Add validation metrics beyond training loss
Experiment with sampling strategies (temperature, top-k, top-p)

📊 Training Snapshot

Prefix dict has been built successfully.
词数： 147719
训练文本总长度（字符数）：236696
训练文本总行数：979
训练文本独立字符数(vocab size)：3463
词表大小: 5602
epoch 0, loss 5.8614, [10:58:50]
epoch 1, loss 4.9849, [11:03:26]
epoch 2, loss 4.3866, [11:07:30]
epoch 3, loss 3.8325, [11:10:58]
epoch 4, loss 3.3136, [11:14:39]
epoch 5, loss 2.8286, [11:18:40]
epoch 6, loss 2.3916, [11:22:22]
epoch 7, loss 1.9992, [11:25:47]
epoch 8, loss 1.6587, [11:29:19]
epoch 9, loss 1.3666, [11:32:43]
✅ 模型已保存
你收拾出生黑曜石…一马时间……挖是备份真实的墙，比如你区域体会与一百二十八系统

Assets 2

08 Jan 16:03

Player9753193

v1.0.3

8156204

CrystalLM

This release marks the final iteration of CrystalLM using a single-layer GRU architecture.
The focus of this version is stability, memory efficiency, and clarity of the training pipeline, before moving on to deeper or more advanced models.

✨ Highlights

Single-layer GRU architecture
- Simple, interpretable, and stable
- Serves as a clean baseline for future multi-layer or Transformer-based experiments
Significantly reduced memory usage
- Training memory footprint reduced from ~16 GB to ~350 MB
- Enables training on consumer-grade hardware
Expanded training corpus
- ~125k tokens
- ~200k characters
- Vocabulary size: ~14k
- Improved output diversity compared to earlier versions
Stable convergence
- Loss consistently drops from ~6.6 to ~2.6 within a few epochs
- No observed exploding or vanishing gradients

📊 Training Snapshot

Prefix dict has been built successfully.
词数： 125425
训练文本总长度（字符数）：200945
训练文本总行数：805
训练文本独立字符数(vocab size)：3119
词表大小: 14202
epoch 0, loss 6.6937, [23:18:20]
epoch 1, loss 5.4224, [23:23:00]
epoch 2, loss 4.3564, [23:27:31]
epoch 3, loss 3.3981, [23:30:59]
epoch 4, loss 2.6177, [23:33:42]
✅ 模型已保存
你适应力很小安眠，从最高点垂直向下喜欢它，从强大的身份到一些自然松弛配额，便会面临、精确303才是自省的能力，和对现有的责任是影响开阔的命题。

Assets 2

07 Jan 13:14

Player9753193

v1.0.2

ba5046a

CrystalLM

🚀 Performance & Training Improvements

This release focuses on dramatically reducing memory usage during training and improving dataset scale, making CrystalLM usable on low-memory machines.

✨ What’s new

💾 Massive memory optimization
- Replaced full in-memory training tensor construction with Dataset + DataLoader
- Training RAM usage reduced from ~16 GB → ~250 MB
- Enables smooth training on older devices (e.g. 8GB RAM MacBook Pro with i5-7360U)
📚 Larger training corpus
- Increased total training text size and token count
- Improved language continuity and semantic richness in generation
⚙️ Training stability
- Mini-batch training instead of full-batch forward pass
- Faster iteration speed and more predictable loss curves

🧠 Model

Word-level language model
Embedding + LSTM architecture (unchanged)
Context window: 20 tokens

📦 Notes

Model quality improves noticeably with additional text
This version lays the foundation for future architecture upgrades (e.g. GRU / Transformer)

Assets 2

05 Jan 15:03

Player9753193

v1.0.1

ee99a5c

CrystalLM

CrystalLM is a micro language model implemented from scratch using PyTorch.
It is designed as a learning-oriented project for exploring tokenization,
vocabulary construction, and text generation on Chinese / mixed-language data.

Current Status

Word-level text generation
Custom tokenizer and data pipeline
The model is still in an early stage and may produce <UNK> tokens

Purpose

This project focuses on understanding:

Basic language model architectures
Vocabulary, OOV handling, and context window design
End-to-end training and inference workflows

CrystalLM is a learning-oriented project and is not intended for production use.

Assets 2

05 Jan 08:07

Player9753193

v1.0

493c318

CrystalLM

Publish New Model

Assets 2

Releases: Player9753193/CrystalLM

CrystalLM

CrystalLM——A New Architect

我们现在更新了架构，从GRU更新到了GPT,中间的transformers架构由于效果不太好所以没发release，现在模型变成了训练生成分离，参数模型分离，更便于控制参数。

Uh oh!

CrystalLM

🚀 Vocabulary-Governed GRU Release

✨ Highlights

🎯 Scope & Intent

Uh oh!

CrystalLM

Overview

Loss Progression

Improvements

Known Limitations

Model Status

Next Steps

📊 Training Snapshot

Uh oh!

CrystalLM

✨ Highlights

📊 Training Snapshot

Uh oh!

CrystalLM

🚀 Performance & Training Improvements

✨ What’s new

🧠 Model

📦 Notes

Uh oh!

CrystalLM

Current Status

Purpose

Uh oh!

CrystalLM

Uh oh!