FrugalPrompt is a novel prompt compression framework that retains only the most semantically significant tokens in an input prompt before sending it to a Large Language Model (LLM). By leveraging token attribution methods — GlobEnc and DecompX — FrugalPrompt assigns saliency scores to every token in the input, ranks them, and retains only the top-k% most salient tokens to form a sparse, frugalized prompt. This reduces monetary costs, carbon footprint, and inference-time latency while maintaining competitive task performance.
The framework is evaluated across four NLP tasks:
- Text Classification (CLS) — Sentiment Analysis
- Text Summarization (SUM) — News Article Summarization
- Question Answering (QA) — Commonsense QA
- Text Reasoning (RSN) — Mathematical Reasoning
The FrugalPrompt pipeline works as follows:
- Token Attribution: The input prompt tokens ⟨t₁, t₂, …, tₙ⟩ are passed through a task-specific token attribution module (GlobEnc or DecompX built on a 110M BERT encoder) to generate saliency scores ⟨s₁, s₂, …, sₙ⟩ for each token.
- Token Ranking & Filtering: Tokens are ranked by their saliency scores. The top p = k% × n tokens are selected while preserving the original token order to form the reduced (frugalized) prompt.
- LLM Inference: The frugalized prompt is passed to a frozen LLM for task inference.
The codebase is primarily organized into three core Jupyter Notebooks:
frugal_ICL_clean.ipynb: The main experimental pipeline. This notebook is responsible for:- Loading and sampling datasets (IMDB, Argilla News Summary, GSM8K, Cosmos QA).
- Calculating token attribution scores utilizing
DecompXandGlobEnc. - Compressing the prompts by filtering out low-attribution tokens to generate "frugal texts" (at varying reduction rates like 80%, 60%, and 50%).
- Querying various LLM APIs using the compressed prompts.
- Saving intermediate token scores and LLM responses to local JSON/CSV files.
frugal_results.ipynb: Contains the final evaluation scripts. It processes the predicted outputs generated byfrugal_ICL_clean.ipynband compiles the performance metrics reported in the paper.frugal_token_counts.ipynb: Analyzes the efficiency gains. It calculates and visualizes the exact token count reductions achieved by FrugalPrompt across different datasets and compression percentages.
Datasets Evaluated:
- Sentiment Analysis:
stanfordnlp/imdb - Summarization:
argilla/news-summary - Mathematical Reasoning:
openai/gsm8k - Commonsense QA:
allenai/cosmos_qa
Attribution Algorithms:
- DecompX
- GlobEnc
LLMs Evaluated:
- Llama-3 8B
- Llama-3 70B
- GPT-3.5
- Gemini 2.0 Flash Thinking
- o3-mini
1. Clone the repository and install dependencies:
git clone [https://github.com/Starscream-11813/Frugal-ICL.git](https://github.com/Starscream-11813/Frugal-ICL.git)
cd Frugal-ICL
pip install sentence-transformers evaluate peft accelerate datasets trl bitsandbytes galore-torch lexrank replicate openai backoff humanfriendly alive_progress google-generativeai(Note: You will also need PyTorch and Torchvision installed according to your CUDA version).
2. Configure API Keys:
Create a file named config.json inside an api/ directory at the root of the project:
{
"REPLICATE_API_TOKEN": "your_replicate_token",
"OPENAI_API_KEY": "your_openai_key",
"KIMI_API_KEY": "your_kimi_key",
"GEMINI_API_KEY": "your_gemini_key",
"HUGGING_FACE_TOKEN": "your_hf_token"
}To reproduce the findings presented in the paper, follow these steps in order:
Generate Attribution Scores & LLM Predictions: Open and run frugal_ICL_clean.ipynb. This will download the datasets, compute the DecompX/GlobEnc attribution scores for the input texts, generate the FrugalPrompts, and fetch predictions from the configured LLM APIs. (Outputs are saved in the scores/ and responses/ directories).
Analyze Token Reduction: Open and run frugal_token_counts.ipynb to process the generated FrugalPrompts and reproduce the context length reduction statistics.
Evaluate Performance: Open and run frugal_results.ipynb. This notebook will ingest the JSON files from the responses/ directory, compute the reports, and reproduce the exact performance tables shown in the paper.
If you find this work useful, please cite our paper:
@article{raiyan2025frugalprompt,
title={FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution},
author={Raiyan, Syed Rifat and Ishmam, Md Farhan and Imran, Abdullah Al and Moni, Mohammad Ali},
journal={arXiv preprint arXiv:2510.16439},
year={2025}
}