Skip to content

Commit c7b7ed4

Browse files
Merge branch 'main' into akshay/data_prep_docs
2 parents 2f38e4a + 53286a3 commit c7b7ed4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+1111
-851
lines changed

.markdownlint.yaml

Lines changed: 0 additions & 35 deletions
This file was deleted.

.pre-commit-config.yaml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,3 @@ repos:
4848
args:
4949
- "--config"
5050
- "./pyproject.toml"
51-
- repo: https://github.com/markdownlint/markdownlint
52-
rev: v0.11.0
53-
hooks:
54-
- id: markdownlint

docs/help.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ If you're the type who loves configurations and tweaking every detail, the [**Co
4444

4545
We've got some excellent tutorials to help you get the most out of Fast-LLM:
4646
47-
- [**Quick-Start Guide**](quick-start.md): Perfect for launching Fast-LLM on a single GPU machine. We walk you through setting up Docker, running your first training job, and handling common issues.
47+
- [**Quick-Start Guide**](quick-start.md): Perfect for launching Fast-LLM on a single GPU machine. We walk you through running your first training job (either locally or on a cluster), and handling common issues.
4848
4949
- [**Cookbook**](recipes/train-llama-8b.md): Ready to go big? These recipes cover real-world scenarios like training big models from scratch, continuing training from a checkpoint, and more. This is where Fast-LLM really shows its power.
5050

docs/index.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,24 +10,22 @@ Introducing **Fast-LLM**, the cutting-edge open-source library built for trainin
1010

1111
## Why Fast-LLM?
1212

13-
Fast-LLM is designed for professionals who demand exceptional performance for efficient large-scale (FLOPS) language model training on GPUs. Fast-LLM integrates effortlessly into existing ML pipelines and goes beyond off-the-shelf commercial frameworks, like NVIDIA NeMo Megatron, to deliver a **robust, flexible, and high-performance open-source alternative**. Whether you're optimizing for speed, cost, or scalability, Fast-LLM helps you get the most out of your training infrastructure.
13+
Fast-LLM is designed for professionals who demand exceptional performance for efficient, large-scale language model training on GPUs, where maximizing FLOPS is key. Fast-LLM integrates effortlessly into existing ML pipelines and goes beyond off-the-shelf commercial frameworks to deliver a **robust, flexible, and high-performance open-source alternative**. Whether you're optimizing for speed, cost, or scalability, Fast-LLM helps you get the most out of your training infrastructure.
1414

1515
### The Fast-LLM Advantage
1616

1717
Fast-LLM isn't just another library, **it's a platform for powering the next generation of AI breakthroughs**. Here's what sets it apart:
1818

19-
- **🚀 Purpose-Built for Small- and Large-Scale AI:** Optimized specifically for training language models of all sizes, Fast-LLM excels from **small models around 1B parameters to massive clusters running 70B+ parameter models**, with kernels that are fine-tuned for maximum throughput across this entire range. At 10B-parameter scale, Fast-LLM avoids costly 3D-paralelism through memory optimization techniques such as ZeRO and activation recomputation, whereas at 100B-parameter scale, Fast-LLM optimally supports 3D-parallelism; making Fast-LLM the go-to choice for diverse training needs.
19+
- **🚀 Purpose-Built for Small- and Large-Scale AI:** Optimized specifically for training language models of all sizes, Fast-LLM excels from **small models around 1B parameters to massive clusters running 70B+ parameter models**, with kernels that are fine-tuned for maximum throughput across this entire range. At 10B-parameter scale, Fast-LLM avoids costly 3D-parallelism through memory optimization techniques such as ZeRO and activation recomputation, whereas at 100B-parameter scale, Fast-LLM optimally supports 3D-parallelism; making Fast-LLM the go-to choice for diverse training needs.
2020

21-
- **🧠 Unified Support for GPT-Like Architectures:** Fast-LLM **unifies all GPT-like model implementations** in a [single Python file](https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/models/gpt/model.py), and unlike HuggingFace transformers where every model has it's own, mostly independent, implementation, Fast-LLM reduces coding and adapts effortlessly, even with custom architectures.
21+
- **🧠 Unified Support for GPT-Like Architectures:** Fast-LLM streamlines the implementation of GPT-like models into a [single, unified module](https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/models/gpt/model.py), significantly reducing redundancy and simplifying adaptation to custom architectures. This approach ensures consistency and flexibility while minimizing development overhead.
2222

2323
- **💰 Cost Efficiency That Sets Fast-LLM Apart:**
2424

25-
- **Lower Training Costs:** With higher throughput per GPU, Fast-LLM reduces the training time required. For instance, training models can cheaper compared to other frameworks due to faster processing and better memory efficiency.
25+
- **Lower Training Costs:** With higher throughput per GPU, Fast-LLM reduces the training time required. Training models can be cheaper compared to other frameworks due to faster processing and better memory efficiency.
2626

2727
- **More Tokens for Your Budget:** Train on more tokens for the same budget, leading to better-trained models without breaking your financial constraints.
2828

29-
<!-- [Learn more about Fast-LLM's cost efficiency and see detailed comparisons](cost-efficiency.md). -->
30-
3129
- **🔓 Openness Without Compromise:** Fast-LLM's open-source approach ensures that you can **fully customize and extend the library** to fit your exact needs, without the restrictions of proprietary software. Developed transparently by a community of experts on GitHub, every change is **publicly discussed and vetted**, fostering **trust and collaboration** so you can innovate with confidence, knowing the entire development process and decision making is out in the open.
3230

3331
- **🌍 Community-Driven Development:** Built by professionals for professionals, Fast-LLM's development is transparent, with an open invitation to the community to contribute. [**Join the Fast-LLM community**](join-us.md) to help shape the future of large-scale AI training.
@@ -36,15 +34,15 @@ Fast-LLM isn't just another library, **it's a platform for powering the next gen
3634

3735
Fast-LLM offers all the capabilities you need to accelerate your LLM training and **push the boundaries of what's possible**:
3836

39-
- **🚀 Speed Like No Other:** Achieve record-breaking training throughput with Fast-LLM. For instance, train Mistral-7B at **9,800 tokens/s/GPU** on a 4-node cluster with 32 H100 GPUs (batch size 32, sequence length 8k). Our optimized kernels, advanced parallelism, and memory-efficient techniques drastically reduce training time and cost.
37+
- **🚀 Speed Like No Other:** Achieve record-breaking training throughput with Fast-LLM. For instance, train Mistral-7B at **10,350 tokens/s/GPU** on a 4-node cluster with 32 H100 GPUs (batch size 64, sequence length 8k). Our optimized kernels, advanced parallelism, and memory-efficient techniques drastically reduce training time and cost.
4038

4139
- **📡 Unmatched Scalability:** Seamlessly scale from a single GPU to large compute clusters. Fast-LLM supports 3D parallelism (data, tensor, and pipeline), sequence length parallelism, and ZeRO-1,2,3 techniques for maximum memory efficiency. Scale to the size you need without sacrificing performance.
4240

4341
- **🎛️ Total Flexibility:** Compatible with all major language model architectures, including but not limited to Llama, Mistral, StarCoder, and Mixtral. Fast-LLM's modular design gives you full control over your training workflows.
4442

45-
- **📦 Seamless Integration:** Integrate smoothly with popular libraries such as [Hugging Face Transformers](https://huggingface.co/transformers). Benefit from Fast-LLM's optimizations without disrupting your existing pipelines.
43+
- **📦 Seamless Integration:** Integrate smoothly with popular libraries such as [HuggingFace Transformers](https://huggingface.co/transformers). Benefit from Fast-LLM's optimizations without disrupting your existing pipelines.
4644

47-
- **🛠️ Professional-Grade Tools:** Enjoy mixed precision training, large batch training, and gradient accumulation. Fast-LLM ensures reproducibility through deterministic behavior and provides pre-built Docker images, YAML configurations, and a simple, intuitive command-line interface.
45+
- **🛠️ Professional-Grade Tools:** Enjoy mixed precision training, large batch training, and gradient accumulation. Fast-LLM ensures reproducibility through deterministic behavior and provides [pre-built Docker images](https://github.com/ServiceNow/Fast-LLM/pkgs/container/fast-llm), YAML configurations, and a simple, intuitive command-line interface.
4846

4947
[Get Fast-LLM](https://github.com/ServiceNow/Fast-LLM/releases) and start training your large language models in record time. [Join the Fast-LLM community](join-us.md) and collaborate with like-minded professionals to advance the state-of-the-art in AI research and development.
5048

@@ -64,7 +62,7 @@ Fast-LLM is designed to be the **go-to solution** for those training the most so
6462

6563
- **Accelerating Training Workflows:** Deliver the fastest LLM training experience with optimized kernel efficiency, parallelism, and memory management.
6664
- **Supporting a Broad Range of Architectures:** Offer built-in support for all major language model architectures, with an architecture-agnostic approach that allows users to easily adapt the framework to emerging models.
67-
- **Enabling Seamless Integration and Deployment:** Integrate effortlessly into existing ML pipelines, including [Hugging Face Transformers](https://huggingface.co/transformers) and [Kubernetes](https://kubernetes.io)-based clusters.
65+
- **Enabling Seamless Integration and Deployment:** Integrate effortlessly into existing ML pipelines, including [HuggingFace Transformers](https://huggingface.co/transformers) and [Kubernetes](https://kubernetes.io)-based clusters.
6866
- **Advancing LLM Research and Production-Readiness:** Be suitable for both cutting-edge research and mission-critical production workloads.
6967

7068
## Collaboration and Contribution

0 commit comments

Comments
 (0)