Skip to content

Releases: Player9753193/CrystalLM

CrystalLM

25 Jan 08:50
3c8185b

Choose a tag to compare

距离我上次更新已经有好长时间了,这次突然想起来,我来更新一下,也是来汇报一次工作进度。

CrystalLM——A New Architect

我们现在更新了架构,从GRU更新到了GPT,中间的transformers架构由于效果不太好所以没发release,现在模型变成了训练生成分离,参数模型分离,更便于控制参数。

CrystalLM

15 Jan 10:39
43c5491

Choose a tag to compare

🚀 Vocabulary-Governed GRU Release

This release marks a structural stabilization milestone for CrystalLM.

✨ Highlights

  • GRU-based word-level language model

    • Replaced earlier experimental RNN variants with a stable multi-layer GRU
    • Improved training stability and long-context behavior
  • Vocabulary governance refined

    • Explicit low-frequency pruning
    • More controlled <UNK> mapping
    • Reduced vocabulary noise and improved generation coherence
  • Generation behavior improved

    • Fixed premature <END> termination
    • Temperature sampling behavior is now consistent and predictable
  • Training workflow hardened

    • More stable loss convergence
    • CPU-friendly training path (Intel Mac verified)
    • Checkpoint reuse without retraining

🎯 Scope & Intent

This release prioritizes clarity, correctness, and observability over scale.
It represents a solid educational baseline for understanding word-level language modeling before moving to Transformers.

CrystalLM

10 Jan 03:59
a5ab1a7

Choose a tag to compare

Overview

This release marks a major milestone in the training stability and learning capacity of the model. With extended training (10 epochs) on a larger Chinese corpus, the model now demonstrates significantly improved convergence and basic text generation ability.

Loss Progression

The model shows a clear and healthy downward trend in loss:

Epoch Loss
0 5.8614
1 4.9849
2 4.3866
3 3.8325
4 3.3136
5 2.8286
6 2.3916
7 1.9992
8 1.6587
9 1.3666

This indicates effective learning with no signs of divergence or early overfitting.

Improvements

  • Substantial reduction in training loss across all epochs
  • More stable token prediction behavior
  • The model begins to reproduce sentence-like structures instead of random noise
  • Improved handling of common vocabulary and punctuation

Known Limitations

  • Generated text may still contain semantic incoherence
  • Long-range context is not reliably preserved
  • Occasional token repetition or malformed phrases
  • Output quality varies depending on prompt length and structure

Model Status

  • ✅ Training completed successfully
  • ✅ Model checkpoint saved
  • 🧪 Still in experimental / research phase

Next Steps

  • Introduce sequence-aware architectures (e.g., RNN/LSTM or Transformer-style attention)
  • Improve context window handling
  • Expand and clean training data
  • Add validation metrics beyond training loss
  • Experiment with sampling strategies (temperature, top-k, top-p)

📊 Training Snapshot

Prefix dict has been built successfully.
词数: 147719
训练文本总长度(字符数):236696
训练文本总行数:979
训练文本独立字符数(vocab size):3463
词表大小: 5602
epoch 0, loss 5.8614, [10:58:50]
epoch 1, loss 4.9849, [11:03:26]
epoch 2, loss 4.3866, [11:07:30]
epoch 3, loss 3.8325, [11:10:58]
epoch 4, loss 3.3136, [11:14:39]
epoch 5, loss 2.8286, [11:18:40]
epoch 6, loss 2.3916, [11:22:22]
epoch 7, loss 1.9992, [11:25:47]
epoch 8, loss 1.6587, [11:29:19]
epoch 9, loss 1.3666, [11:32:43]
✅ 模型已保存
你收拾出生黑曜石…一马时间……挖是备份真实的墙,比如你区域体会与一百二十八系统

CrystalLM

08 Jan 16:03
8156204

Choose a tag to compare

This release marks the final iteration of CrystalLM using a single-layer GRU architecture.
The focus of this version is stability, memory efficiency, and clarity of the training pipeline, before moving on to deeper or more advanced models.


✨ Highlights

  • Single-layer GRU architecture

    • Simple, interpretable, and stable
    • Serves as a clean baseline for future multi-layer or Transformer-based experiments
  • Significantly reduced memory usage

    • Training memory footprint reduced from ~16 GB to ~350 MB
    • Enables training on consumer-grade hardware
  • Expanded training corpus

    • ~125k tokens
    • ~200k characters
    • Vocabulary size: ~14k
    • Improved output diversity compared to earlier versions
  • Stable convergence

    • Loss consistently drops from ~6.6 to ~2.6 within a few epochs
    • No observed exploding or vanishing gradients

📊 Training Snapshot

Prefix dict has been built successfully.
词数: 125425
训练文本总长度(字符数):200945
训练文本总行数:805
训练文本独立字符数(vocab size):3119
词表大小: 14202
epoch 0, loss 6.6937, [23:18:20]
epoch 1, loss 5.4224, [23:23:00]
epoch 2, loss 4.3564, [23:27:31]
epoch 3, loss 3.3981, [23:30:59]
epoch 4, loss 2.6177, [23:33:42]
✅ 模型已保存
你适应力很小安眠,从最高点垂直向下喜欢它,从强大的身份到一些自然松弛配额,便会面临、精确303才是自省的能力,和对现有的责任是影响开阔的命题。

CrystalLM

07 Jan 13:14
ba5046a

Choose a tag to compare

🚀 Performance & Training Improvements

This release focuses on dramatically reducing memory usage during training and improving dataset scale, making CrystalLM usable on low-memory machines.

✨ What’s new

  • 💾 Massive memory optimization

    • Replaced full in-memory training tensor construction with Dataset + DataLoader
    • Training RAM usage reduced from ~16 GB → ~250 MB
    • Enables smooth training on older devices (e.g. 8GB RAM MacBook Pro with i5-7360U)
  • 📚 Larger training corpus

    • Increased total training text size and token count
    • Improved language continuity and semantic richness in generation
  • ⚙️ Training stability

    • Mini-batch training instead of full-batch forward pass
    • Faster iteration speed and more predictable loss curves

🧠 Model

  • Word-level language model
  • Embedding + LSTM architecture (unchanged)
  • Context window: 20 tokens

📦 Notes

  • Model quality improves noticeably with additional text
  • This version lays the foundation for future architecture upgrades (e.g. GRU / Transformer)

CrystalLM

05 Jan 15:03
ee99a5c

Choose a tag to compare

CrystalLM is a micro language model implemented from scratch using PyTorch.
It is designed as a learning-oriented project for exploring tokenization,
vocabulary construction, and text generation on Chinese / mixed-language data.

Current Status

  • Word-level text generation
  • Custom tokenizer and data pipeline
  • The model is still in an early stage and may produce <UNK> tokens

Purpose

This project focuses on understanding:

  • Basic language model architectures
  • Vocabulary, OOV handling, and context window design
  • End-to-end training and inference workflows

CrystalLM is a learning-oriented project and is not intended for production use.

CrystalLM

05 Jan 08:07

Choose a tag to compare

Publish New Model