This repository contains all the necessary code and documentation for our study on the efficient tuning of Natural Language Inference (NLI) models using Context Distillation and Parameter-Efficient Fine-Tuning methods like LoRA and QLoRA. The focus is on enhancing performance on NLI tasks using the OPT 125M model, employing datasets from the GLUE benchmark.
Large Language Models (LLMs) have shown remarkable capabilities in various NLP tasks but often at the cost of high computational demands. Our project investigates methods to optimize these models to be less resource-intensive, particularly focusing on Context Distillation integrated with Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA and QLoRA.
- To reduce the computational and memory demands of LLMs.
- To enhance the performance of LLMs on NLI tasks without extensive data.
- To explore the efficacy of Context Distillation combined with PEFT methods.
- RTE (Recognizing Textual Entailment): Binary classification to determine if the meaning of one sentence (the hypothesis) can be inferred from another (the premise).
- QQP (Quora Question Pairs): Determines if two questions asked on Quora are semantically equivalent.
- HANS (Heuristic Analysis for NLI Systems): Tests the reliance of NLI models on invalid heuristics.
We employed the following methods in our study:
- Vanilla Fine Tuning: Fine-tuning all layers of the model.
- LoRA (Low-Rank Adaptation): An adapter-based model fine-tuning method that modifies specific weights within a model’s layers.
- QLoRA (Quantized Low-Rank Adaptation): Extends LoRA by incorporating quantization to reduce the precision of the computations.
- Context Distillation: A teacher-student model approach where the teacher model (GPT-4) provides contextual knowledge to a simpler student model (OPT 125M).
Experiments were conducted on Google Colab using NVIDIA’s L4 GPU. We used Hugging Face's transformers library for model management and training.