Skip to content

This project investigates the efficacy of integrating context distillation techniques with parameter-efficient tuning methods such as LoRA, QLoRA, and traditional fine-tuning approaches, utilizing Facebook’s pre-trained OPT 125M model.

Notifications You must be signed in to change notification settings

anandketan/Efficient_LLM_Fine_Tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Efficient Tuning of Natural Language Inference Models

This repository contains all the necessary code and documentation for our study on the efficient tuning of Natural Language Inference (NLI) models using Context Distillation and Parameter-Efficient Fine-Tuning methods like LoRA and QLoRA. The focus is on enhancing performance on NLI tasks using the OPT 125M model, employing datasets from the GLUE benchmark.

Project Overview

Large Language Models (LLMs) have shown remarkable capabilities in various NLP tasks but often at the cost of high computational demands. Our project investigates methods to optimize these models to be less resource-intensive, particularly focusing on Context Distillation integrated with Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA and QLoRA.

Objectives

  • To reduce the computational and memory demands of LLMs.
  • To enhance the performance of LLMs on NLI tasks without extensive data.
  • To explore the efficacy of Context Distillation combined with PEFT methods.

Datasets Used

  • RTE (Recognizing Textual Entailment): Binary classification to determine if the meaning of one sentence (the hypothesis) can be inferred from another (the premise).
  • QQP (Quora Question Pairs): Determines if two questions asked on Quora are semantically equivalent.
  • HANS (Heuristic Analysis for NLI Systems): Tests the reliance of NLI models on invalid heuristics.

Methods

We employed the following methods in our study:

  1. Vanilla Fine Tuning: Fine-tuning all layers of the model.
  2. LoRA (Low-Rank Adaptation): An adapter-based model fine-tuning method that modifies specific weights within a model’s layers.
  3. QLoRA (Quantized Low-Rank Adaptation): Extends LoRA by incorporating quantization to reduce the precision of the computations.
  4. Context Distillation: A teacher-student model approach where the teacher model (GPT-4) provides contextual knowledge to a simpler student model (OPT 125M).

Experimental Setup

Experiments were conducted on Google Colab using NVIDIA’s L4 GPU. We used Hugging Face's transformers library for model management and training.

About

This project investigates the efficacy of integrating context distillation techniques with parameter-efficient tuning methods such as LoRA, QLoRA, and traditional fine-tuning approaches, utilizing Facebook’s pre-trained OPT 125M model.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published