Skip to content

Starscream-11813/Frugal-ICL

Repository files navigation

FʀᴜɢᴀʟPʀᴏᴍᴘᴛ: Reducing Contextual Overhead in Large Language Models via Token Attribution

project arXiv GoogleScholar code license

Overview

FrugalPrompt is a novel prompt compression framework that retains only the most semantically significant tokens in an input prompt before sending it to a Large Language Model (LLM). By leveraging token attribution methods — GlobEnc and DecompX — FrugalPrompt assigns saliency scores to every token in the input, ranks them, and retains only the top-k% most salient tokens to form a sparse, frugalized prompt. This reduces monetary costs, carbon footprint, and inference-time latency while maintaining competitive task performance.

The framework is evaluated across four NLP tasks:

  • Text Classification (CLS) — Sentiment Analysis
  • Text Summarization (SUM) — News Article Summarization
  • Question Answering (QA) — Commonsense QA
  • Text Reasoning (RSN) — Mathematical Reasoning

Pipeline

The FrugalPrompt pipeline works as follows:

  1. Token Attribution: The input prompt tokens ⟨t₁, t₂, …, tₙ⟩ are passed through a task-specific token attribution module (GlobEnc or DecompX built on a 110M BERT encoder) to generate saliency scores ⟨s₁, s₂, …, sₙ⟩ for each token.
  2. Token Ranking & Filtering: Tokens are ranked by their saliency scores. The top p = k% × n tokens are selected while preserving the original token order to form the reduced (frugalized) prompt.
  3. LLM Inference: The frugalized prompt is passed to a frozen LLM for task inference.

Repository Structure

The codebase is primarily organized into three core Jupyter Notebooks:

  • frugal_ICL_clean.ipynb: The main experimental pipeline. This notebook is responsible for:
    • Loading and sampling datasets (IMDB, Argilla News Summary, GSM8K, Cosmos QA).
    • Calculating token attribution scores utilizing DecompX and GlobEnc.
    • Compressing the prompts by filtering out low-attribution tokens to generate "frugal texts" (at varying reduction rates like 80%, 60%, and 50%).
    • Querying various LLM APIs using the compressed prompts.
    • Saving intermediate token scores and LLM responses to local JSON/CSV files.
  • frugal_results.ipynb: Contains the final evaluation scripts. It processes the predicted outputs generated by frugal_ICL_clean.ipynb and compiles the performance metrics reported in the paper.
  • frugal_token_counts.ipynb: Analyzes the efficiency gains. It calculates and visualizes the exact token count reductions achieved by FrugalPrompt across different datasets and compression percentages.

Supported Datasets & Models

Datasets Evaluated:

  • Sentiment Analysis: stanfordnlp/imdb
  • Summarization: argilla/news-summary
  • Mathematical Reasoning: openai/gsm8k
  • Commonsense QA: allenai/cosmos_qa

Attribution Algorithms:

  • DecompX
  • GlobEnc

LLMs Evaluated:

  • Llama-3 8B
  • Llama-3 70B
  • GPT-3.5
  • Gemini 2.0 Flash Thinking
  • o3-mini

Setup & Installation

1. Clone the repository and install dependencies:

git clone [https://github.com/Starscream-11813/Frugal-ICL.git](https://github.com/Starscream-11813/Frugal-ICL.git)
cd Frugal-ICL
pip install sentence-transformers evaluate peft accelerate datasets trl bitsandbytes galore-torch lexrank replicate openai backoff humanfriendly alive_progress google-generativeai

(Note: You will also need PyTorch and Torchvision installed according to your CUDA version).

2. Configure API Keys: Create a file named config.json inside an api/ directory at the root of the project:

{
  "REPLICATE_API_TOKEN": "your_replicate_token",
  "OPENAI_API_KEY": "your_openai_key",
  "KIMI_API_KEY": "your_kimi_key",
  "GEMINI_API_KEY": "your_gemini_key",
  "HUGGING_FACE_TOKEN": "your_hf_token"
}

Reproducing the Results

To reproduce the findings presented in the paper, follow these steps in order:

Generate Attribution Scores & LLM Predictions: Open and run frugal_ICL_clean.ipynb. This will download the datasets, compute the DecompX/GlobEnc attribution scores for the input texts, generate the FrugalPrompts, and fetch predictions from the configured LLM APIs. (Outputs are saved in the scores/ and responses/ directories).

Analyze Token Reduction: Open and run frugal_token_counts.ipynb to process the generated FrugalPrompts and reproduce the context length reduction statistics.

Evaluate Performance: Open and run frugal_results.ipynb. This notebook will ingest the JSON files from the responses/ directory, compute the reports, and reproduce the exact performance tables shown in the paper.

Citation

If you find this work useful, please cite our paper:

@article{raiyan2025frugalprompt,
  title={FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution},
  author={Raiyan, Syed Rifat and Ishmam, Md Farhan and Imran, Abdullah Al and Moni, Mohammad Ali},
  journal={arXiv preprint arXiv:2510.16439},
  year={2025}
}

About

This repository contains the code and data of the paper titled "FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution."

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors