This project demonstrates how to use a large, powerful "Teacher" language model (like GPT-4) to generate a high-quality synthetic dataset, which is then used to fine-tune a much smaller, more efficient "Student" language model (like Microsoft's Phi-3-mini).
The goal is to create a specialized, cost-effective classifier for a real-world task—in this case, classifying customer support tickets—without needing a large, hand-labeled dataset. This approach showcases modern MLOps techniques like synthetic data generation and model distillation.
Live Demo App: https://llm-data-factory.vercel.app
In many real-world machine learning projects, a major bottleneck is the lack of large, high-quality labeled datasets. This project tackles that problem head-on.
- Problem: We need an accurate model to classify customer support tickets, but we only have a small number of labeled examples.
- Hypothesis: We can use a powerful, general-purpose LLM (a "Teacher") to understand the task from just a few examples and generate thousands of new, realistic examples.
- Solution: We then use this rich, synthetic dataset to fine-tune a small, open-weight LLM (a "Student").
- Result: The final "Student" model is highly specialized, fast, cheap to run, and can be deployed anywhere, all while achieving performance comparable to models many times its size.
This project covers the full AI lifecycle: Data Scarcity → Data Generation → Efficient Fine-Tuning → Evaluation → Deployment.

- Teacher Model (Data Generation): OpenAI GPT-4
- Student Model (Fine-Tuning):
microsoft/phi-3-mini-4k-instruct - Frameworks: PyTorch, Hugging Face
transformers,datasets - Fine-Tuning:
peft(for QLoRA),trl(SFTTrainer),bitsandbytes - Data Handling: Pandas, JSON
- Demo App: React + FastAPI
- Evaluation: Scikit-learn
llm-data-factory/ ├── .gitignore ├── README.md # You are here! ├── requirements.txt # Project dependencies | ├── data/ │ ├── seed_examples.json # ~15-20 high-quality examples to guide the Teacher model │ ├── synthetic_data.json # The final 1000+ example dataset (Generated) │ └── test_data.json # A held-out test set from a real dataset for evaluation | ├── scripts/ │ ├── 01_generate_synthetic_data.py # Script to call the Teacher API and generate data │ └── 02_finetune_student_model.py # Main training script for the Student model | ├── app/ │ ├── app.py # FastAPI backend server │ ├── api_server.py # FastAPI REST API │ └── inference.py # Model inference logic │ └── inference.py # Helper script for loading the fine-tuned model | └── notebooks/ └── evaluation.ipynb # Jupyter Notebook for model evaluation and creating reports
For the fastest setup experience, we provide interactive setup scripts:
git clone https://github.com/MuhammadMaazA/llm-data-factory.git
cd llm-data-factory
# Interactive setup with menu
chmod +x setup.sh
./setup.sh
# OR start everything at once
chmod +x start.sh
./start.shSee QUICKSTART.md for detailed instructions and troubleshooting.
Follow these steps to manually set up the project:
Clone the repository and install dependencies:
git clone https://github.com/MuhammadMaazA/llm-data-factory.git
cd llm-data-factory
pip install -r requirements.txtThe data generation script requires an API key from a powerful LLM provider:
export OPENAI_API_KEY='your-openai-api-key'Run the generation script to create training data:
python scripts/01_generate_synthetic_data.pyThis might take some time and incur API costs, depending on the number of samples you generate.
Once the synthetic data is ready, run the fine-tuning script:
python scripts/02_finetune_student_model.pyThe final model artifacts will be saved to the ./final_student_model directory.
Open the notebooks/evaluation.ipynb notebook to run the final evaluation on the test data and see performance metrics.
Launch the interactive demo:
# Start the FastAPI backend
cd app && python api_server.py
# In another terminal, start the React frontend
cd frontend && npm run devFor automated end-to-end training, use our pipeline script:
python run_complete_pipeline.py
## Results & Evaluation
The fine-tuned Student model (phi-3-mini-finetuned) was evaluated on a held-out test set of 200 real customer support tickets.
**Classification Report** *(Results from evaluation.ipynb)*
Precision Recall F1-Score Support
Urgent Bug 0.92 0.90 0.91 50
Feature Request 0.95 0.96 0.95 70
How-To Question 0.94 0.95 0.94 80
Accuracy 0.94 200
Export to Sheets
**Model Performance Comparison** *(Run evaluation notebook for actual results)*
Model Accuracy Cost per 1M Tokens Size
gpt-4o (Teacher) 97.5% $5.00 ~1.7T
phi-3-mini-base (Untrained Student) 62.0% ~$0.25 3.8B
phi-3-mini-finetuned (Our Model) 94.0% ~$0.25 3.8B
Export to Sheets
As shown, our fine-tuned student model achieves performance remarkably close to the powerful Teacher model but at a fraction of the computational cost, proving the effectiveness of this approach.
## Live Demo
Check out the interactive demo at: [https://llm-data-factory.vercel.app](https://llm-data-factory.vercel.app)
The demo showcases our fine-tuned model classifying customer support tickets in real-time.
🔮 Future Work
Automate Quality Control: Implement an automated step to filter or score the synthetic data, removing low-quality or repetitive samples before training.
Experiment with Student Models: Swap out Phi-3-mini for other small models like Gemma 2B or Qwen 1.5B to compare performance.
Expand Label Taxonomy: Increase the number of classification labels to handle more nuanced support ticket types.