⚠️ Deprecated Example Notice
These example files are no longer actively maintained and may be outdated.👉 For the latest and fully supported examples, please visit the official repository:
Red Hat AI Innovation Team – SDG Hub Skills Tuning Examples
The provided notebooks demonstrate how to customize language models by generating training data for specific skills, following the methodology outlined in the LAB (Large-scale Alignment for Chatbots) framework [paper link].
The LAB framework enables us to shape how a model responds to various tasks by training it on carefully crafted examples. Want your model to write emails in your company's tone? Need it to follow specific formatting guidelines? This customization is achieved through what the paper defines as compositional skills.
Compositional skills are tasks that combine different abilities to handle complex queries. For example, if you want your model to write company emails about quarterly performance, it needs to:
- Understand financial concepts
- Perform basic arithmetic
- Write in your preferred communication style
- Follow your organization's email format
The example notebooks will show you how to:
- Set up a teacher model for generating training data
- Create examples that reflect your preferred style and approach
- Generate Synthetic Data
- Validate that the generated data matches your requirements
The end goal is to create training data that will help align the model with your specific needs, whether that's matching your company's communication style, following particular protocols, or handling specialized tasks in your preferred way.
InstructLab uses a multi-step process of generation and evaluation to generate synthetic data. For grounded skills it looks like this:
When teaching a language model a new skill, carefully crafted seed examples are the foundation. Seed examples show the model what good behavior looks like by pairing inputs with ideal outputs, allowing the model to learn patterns, structure, reasoning, and formatting that generalize beyond the examples themselves.
A strong seed example, regardless of domain, should:
✅ Clearly define the task context and expected behavior
✅ Provide a realistic, natural input that mimics what users or systems would actually produce
✅ Include a high-quality output that fully satisfies the task requirements—accurate, complete, and formatted correctly
✅ Minimize ambiguity: avoid examples where multiple interpretations are possible without explanation
✅ Reflect diverse edge cases: cover a variety of structures, phrasings, or difficulty levels to help the model generalize

