Skip to content

Commit 01e7214

Browse files
authored
Updates to FAQ.md
Scrubbed the FAQ content and updated based on current status of the project and tooling. Signed-off-by: Máirín Duffy <[email protected]>
1 parent 0555c6c commit 01e7214

File tree

1 file changed

+27
-22
lines changed

1 file changed

+27
-22
lines changed

docs/community/FAQ.md

Lines changed: 27 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
# InstructLab FAQ
22

3-
Last updated: April 2024
3+
Last updated: October 2024
4+
5+
> [!TIP]
6+
> AI is a rapidly-developing field with a lot of specialized terminology. You may wish read through [the glossary](https://docs.instructlab.ai/community/FAQ/#glossary) before getting started with the documentation.
47
58
## Table of Contents
69

@@ -39,16 +42,16 @@ Last updated: April 2024
3942

4043
This page serves as a comprehensive FAQ for the InstructLab project, detailing how it works, how to begin contribution, and the goals behind the project. Key information includes:
4144

42-
- **InstructLab Overview**: This open source project allows users to interact with and train the Merlinite-7b (default) or Granite-7b AI Large Language Models (LLMs) by contributing skills and knowledge.
43-
- **LAB Method**: A synthetic data-based tuning method for LLMs consisting of a taxonomy-driven data curation process, a synthetic data generator, and two-phased training with replay buffers.
44-
- **Contribution Process**: Contributors can add skills or knowledge to the LLM by creating YAML files and testing changes locally before submitting a pull request to InstructLab’s GitHub repository.
45-
- **Project Goals**: To democratize contributions to AI and LLMs, allowing rapid model development through community collaboration facilitated by weekly builds that integrate community contributions.
45+
- **InstructLab Overview**: This open source project allows users to interact with and train the Granite-7b community AI Large Language Model (LLM) by contributing skills and knowledge.
46+
- **LAB Method**: A synthetic data-based tuning method for LLMs consisting of a taxonomy-driven data curation process, a synthetic data generator, and two-phased training with replay buffers. Learn more in the [Large-Scale Alignment for ChatBots](https://arxiv.org/abs/2403.01081) paper outlining the methodology.
47+
- **Contribution Process**: [Contributors can add skills or knowledge to the LLM](https://docs.instructlab.ai/taxonomy/) by creating YAML files and testing changes locally before submitting a pull request to InstructLab’s GitHub taxonomy repository. Contributors may also [contribute to the InstructLab tooling](https://github.com/instructlab/instructlab/blob/main/CONTRIBUTING/CONTRIBUTING.md) and library codebases.
48+
- **Project Goal**: To democratize contributions to AI and LLMs.
4649

4750
## Documentation disclaimer
4851

4952
There are currently three repositories that contain documentation crucial to getting users starting with the project:
5053

51-
- [Community](https://github.com/instructlab/community) This repository shares InstructLab's activity and collaboration details across the community and include the most current information about the project. It should be approached as the primary repository for getting started, and contains procedures and links to relevant information to make the process as simple as possible.
54+
- [Community](https://github.com/instructlab/community) This repository shares InstructLab's activity and collaboration details across the community and include the most current information about the project, communication channels, and people processes.
5255
- [`ilab` command-line interface (CLI) tool](https://github.com/instructlab/instructlab). This repository is responsible for the `ilab` CLI tool. It provides information about how to download the `ilab` CLI, how to contribute to the `ilab` CLI tool, among others.
5356
- [Taxonomy Tree](https://github.com/instructlab/taxonomy). This repository is responsible for the taxonomy tree that allows you to create models tuned with your data. It provides information about what skills and knowledge are, how to create a pull request to contribute to the AI model, and expectations for pull request review.
5457

@@ -60,7 +63,10 @@ Unless otherwise noted, all documentation for the InstructLab project is license
6063

6164
### What is InstructLab?
6265

63-
InstructLab (**L**arge-scale **A**lignment for chat**B**ots) is an open source initiative that provides a platform for easy engagement with AI Large Language Models (LLM) by using the `ilab` command-line interface (CLI) tool. You can use the CLI to work with Merlinite-7b or Granite-7b to test new skills and knowledge, for example, asking it to write a poem or answer a question about a particular subject. Users can then augment the LLM’s capabilities by submitting the skills and knowledge they have tested to the project’s taxonomy repository on GitHub by creating a pull request. This approach encourages community-driven enhancements without the need for complex model forking or fine-tuning of the model, promoting rapid development through collaborative contributions.
66+
InstructLab (**L**arge-scale **A**lignment for chat**B**ots) is an open source initiative that provides a platform for easy engagement with AI Large Language Models (LLM) by using the `ilab` command-line interface (CLI) tool. You can use the CLI to work with Granite-7b to test new skills and knowledge, for example, asking it to write a meeting notes summary or answer a question about a particular subject. Users can then augment the LLM’s capabilities by submitting the skills and knowledge they have tested to the project’s taxonomy repository on GitHub by creating a pull request. This approach encourages community-driven enhancements without the need for complex model forking or fine-tuning of the model, promoting rapid development through collaborative contributions.
67+
68+
> [!IMPORTANT]
69+
> Building models locally on consumer-grade hardware using quantized models with the `ilab` CLI is not meant for production-grade model creation. The `ilab` desktop configuration is meant for testing single knowledge or skill contributions on top of an already trained and quantized model. It is not for building a complete, production-grade model. For the full InstructLab production-grade model build process, multi-GPU hardware configurations are required, and the student model must be an untrained, unquantized base model.
6470
6571
### What is LAB?
6672

@@ -80,9 +86,11 @@ InstructLab is driven by taxonomies and works by empowering users to add new [_s
8086

8187
### What are the goals of the InstructLab project?
8288

83-
In its current state, openly contributing to a large language model (LLM) has been difficult because of the large compute infrastructure needed to run one.
89+
The goal on the InstructLab project is to emocratize contributions to AI and LLMs. There are two approaches to achieving this goal in our community:
8490

85-
The InstructLab project seeks to democratize the contribution to AI and LLMs through its _taxonomy_ repository. When users contribute to the InstructLab project, the taxonomy repository resynthesizes the open source training data for InstructLab-trained LLMs. The model is then re-trained regularly (weekly builds), ensuring that community contributions are integrated while enriching the model’s capabilities over time.
91+
* Enabling collaborative contribution to a large language model (LLM) through [the project's _taxonomy_ repository](https://github.com/instructlab/taxonomy). When users contribute to this repository, the project resynthesizes its open source training data. Our community Granite-based model is then retrained, ensuring that community contributions are integrated while enriching the model’s capabilities over time.
92+
93+
* Providing open source tooling to enable the InstructLab methodology and enabling community contributions to this toolset in accordance with open source project principles. This tooling includes [the InstructLab core engine & CLI](https://github.com/instructlab/instructlab) as well as libraries such as the [sdg](https://github.com/instructlab/sdg), [training](https://github.com/instructlab/training), and [evaluation](https://github.com/instructlab/eval) libraries.
8694

8795
### How can I contribute?
8896

@@ -102,19 +110,13 @@ A list of common problems associated with downloading the `ilab` CLI tool can be
102110

103111
### Why should I contribute?
104112

105-
InstructLab is designed to enable collaboration around Merlinite-7b and Granite-7b, an open source licensed LLM that contributors can access through [Hugging Face](https://huggingface.co/instructlab). Participating is an opportunity to contribute to open source AI regardless of technical background.
113+
InstructLab is designed to enable collaboration around the InstructLab Granite models, open source licensed LLMs that contributors can access through [Hugging Face](https://huggingface.co/instructlab). Participating is an opportunity to contribute to open source AI regardless of technical background.
106114

107115
When contributors write an addition to the existing taxonomy, make a pull request, and get it reviewed and merged, their changes are rolled out in the next build. This update strategy expedites the model’s capabilities and allows contributors to see the impact that they have made on the model much sooner than other LLMs.
108116

109117
### What large language models (LLMs) am I contributing to through the InstructLab project?
110118

111-
Contributions to the InstructLab project include fine-tuning Merlinite-7b or Granite-7b, an open-source licensed LLM. Contributors have direct access to the model they are improving through [Hugging Face](https://huggingface.co/instructlab).
112-
113-
### What is Merlinite-7b?
114-
115-
Merlinite-7b is a Mistral-7b derivative model fine-tuned with the LAB (**L**arge-scale **A**lignment for chat**B**ots) method using Mixtral-8x7b-Instruct as a teacher model.
116-
117-
More information about the Merlinite-7b can be found on the [Hugging Face project page](https://huggingface.co/instructlab/merlinite-7b-lab).
119+
Contributions to the InstructLab project include fine-tuning Granite-7b, an open-source licensed LLM. Contributors have direct access to the model they are improving through [Hugging Face](https://huggingface.co/instructlab).
118120

119121
### What is Granite-7-lab?
120122

@@ -126,9 +128,9 @@ More information about the Granite-7b can be found on the [Hugging Face project
126128

127129
In the context of InstructLab, a [_skill_](https://github.com/instructlab/taxonomy/blob/main/README.md#getting-started-with-skill-contributions) is a capability domain submitted by a contributor intending to train the AI model on the submitted information. In other words, when you submit a skill, you teach the AI model _how to do something_.
128130

129-
InstructLab skills are broken down into two main categories:
131+
InstructLab skills are broken down into two main categories, compositional and foundational:
130132

131-
- [**Composition skills.**](https://github.com/instructlab/taxonomy/blob/main/docs/SKILLS_GUIDE.md#compositional-skills) Composition or _performative_ skills allow AI models to perform specific tasks or functions. With InstructLab, there are two types of composition skills:
133+
- [**Compositional skills.**](https://github.com/instructlab/taxonomy/blob/main/docs/SKILLS_GUIDE.md#compositional-skills) Composition or _performative_ skills allow AI models to perform specific tasks or functions. With InstructLab, there are two types of composition skills:
132134
- [**Freeform compositional skills**](https://github.com/instructlab/taxonomy/blob/main/docs/SKILLS_GUIDE.md#freeform-compositional-skills) are performative skills that do not require additional context. For example, to train an AI model to write a poem, you would provide examples of poems.
133135
- [**Grounded compositional skills**](https://github.com/instructlab/taxonomy/blob/main/docs/SKILLS_GUIDE.md#grounded-compositional-skills) are performative skills that require additional context. One example is how an AI model reads the value of a cell in a table layout. To create the grounded skill to read a table formatted in Markdown, the additional context might be an example table layout.
134136
- **Foundational skills.** Foundational skills are skills like math, reasoning, and coding.
@@ -195,7 +197,7 @@ After a pull request is accepted, the changes are regularly incorporated into In
195197

196198
### What is the software license for InstructLab?
197199

198-
The InstructLab project as well as the Merlinite-7b and Granite-7b models are distributed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
200+
The InstructLab project as well as the Granite-7b models are distributed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
199201

200202
### What is the content license for InstructLab documentation?
201203

@@ -224,13 +226,13 @@ You can find more information about useful tools for managing DCO sign-off in ou
224226

225227
### Where can I download updated models of InstructLab?
226228

227-
The latest version of InstructLab can be downloaded using the `ilab download` CLI command.
229+
The latest version of InstructLab can be downloaded using the `ilab download` CLI command, as well as from [InstructLab on Hugging Face](https://huggingface.co/instructlab).
228230

229231
### I have a question about the project. Where should I go?
230232

231233
Currently, the best method for communicating with peers and project maintainers is in the Community Slack Channel. Visit our [InstructLab Slack Workspace Guide](https://github.com/instructlab/community/blob/main/InstructLabSlackGuide.md) for information on how to join.
232234

233-
TODO: Update with mailing list details once these are created. Related issue <https://github.com/instructlab/community/issues/89>
235+
See our [community collaboration page](https://github.com/instructlab/community/blob/main/Collaboration.md), including information on our mailing list, meetings, and other ways of interacting with the community.
234236

235237
### What are the software and hardware requirements for using InstructLab?
236238

@@ -247,6 +249,9 @@ To run and train InstructLab locally, you must meet the following requirements:
247249
- Approximately 60GB of free disk space is needed to run the entire process locally on Apple hardware
248250
- About 32 GB RAM
249251

252+
> [!IMPORTANT]
253+
> Some of our community members have reported challenges in working with Windows and WSL for InstructLab support. If possible, you may want to work with Linux or Mac for the smoothest experience. We are continuing to work on improvements across our supported operating systems for the local desktop InstructLab tooling experience.
254+
250255
## Glossary
251256

252257
| Term | Explanation | Additional Reference |

0 commit comments

Comments
 (0)