FʀᴜɢᴀʟPʀᴏᴍᴘᴛ: Reducing Contextual Overhead in Large Language Models via Token Attribution

Overview

FrugalPrompt is a novel prompt compression framework that retains only the most semantically significant tokens in an input prompt before sending it to a Large Language Model (LLM). By leveraging token attribution methods — GlobEnc and DecompX — FrugalPrompt assigns saliency scores to every token in the input, ranks them, and retains only the top-k% most salient tokens to form a sparse, frugalized prompt. This reduces monetary costs, carbon footprint, and inference-time latency while maintaining competitive task performance.

The framework is evaluated across four NLP tasks:

Text Classification (CLS) — Sentiment Analysis
Text Summarization (SUM) — News Article Summarization
Question Answering (QA) — Commonsense QA
Text Reasoning (RSN) — Mathematical Reasoning

Pipeline

The FrugalPrompt pipeline works as follows:

Token Attribution: The input prompt tokens ⟨t₁, t₂, …, tₙ⟩ are passed through a task-specific token attribution module (GlobEnc or DecompX built on a 110M BERT encoder) to generate saliency scores ⟨s₁, s₂, …, sₙ⟩ for each token.
Token Ranking & Filtering: Tokens are ranked by their saliency scores. The top p = k% × n tokens are selected while preserving the original token order to form the reduced (frugalized) prompt.
LLM Inference: The frugalized prompt is passed to a frozen LLM for task inference.

Repository Structure

The codebase is primarily organized into three core Jupyter Notebooks:

frugal_ICL_clean.ipynb: The main experimental pipeline. This notebook is responsible for:
- Loading and sampling datasets (IMDB, Argilla News Summary, GSM8K, Cosmos QA).
- Calculating token attribution scores utilizing DecompX and GlobEnc.
- Compressing the prompts by filtering out low-attribution tokens to generate "frugal texts" (at varying reduction rates like 80%, 60%, and 50%).
- Querying various LLM APIs using the compressed prompts.
- Saving intermediate token scores and LLM responses to local JSON/CSV files.
frugal_results.ipynb: Contains the final evaluation scripts. It processes the predicted outputs generated by frugal_ICL_clean.ipynb and compiles the performance metrics reported in the paper.
frugal_token_counts.ipynb: Analyzes the efficiency gains. It calculates and visualizes the exact token count reductions achieved by FrugalPrompt across different datasets and compression percentages.

Supported Datasets & Models

Datasets Evaluated:

Sentiment Analysis: stanfordnlp/imdb
Summarization: argilla/news-summary
Mathematical Reasoning: openai/gsm8k
Commonsense QA: allenai/cosmos_qa

Attribution Algorithms:

DecompX
GlobEnc

LLMs Evaluated:

Llama-3 8B
Llama-3 70B
GPT-3.5
Gemini 2.0 Flash Thinking
o3-mini

Setup & Installation

1. Clone the repository and install dependencies:

git clone [https://github.com/Starscream-11813/Frugal-ICL.git](https://github.com/Starscream-11813/Frugal-ICL.git)
cd Frugal-ICL
pip install sentence-transformers evaluate peft accelerate datasets trl bitsandbytes galore-torch lexrank replicate openai backoff humanfriendly alive_progress google-generativeai

(Note: You will also need PyTorch and Torchvision installed according to your CUDA version).

2. Configure API Keys: Create a file named config.json inside an api/ directory at the root of the project:

{
  "REPLICATE_API_TOKEN": "your_replicate_token",
  "OPENAI_API_KEY": "your_openai_key",
  "KIMI_API_KEY": "your_kimi_key",
  "GEMINI_API_KEY": "your_gemini_key",
  "HUGGING_FACE_TOKEN": "your_hf_token"
}

Reproducing the Results

To reproduce the findings presented in the paper, follow these steps in order:

Generate Attribution Scores & LLM Predictions: Open and run frugal_ICL_clean.ipynb. This will download the datasets, compute the DecompX/GlobEnc attribution scores for the input texts, generate the FrugalPrompts, and fetch predictions from the configured LLM APIs. (Outputs are saved in the scores/ and responses/ directories).

Analyze Token Reduction: Open and run frugal_token_counts.ipynb to process the generated FrugalPrompts and reproduce the context length reduction statistics.

Evaluate Performance: Open and run frugal_results.ipynb. This notebook will ingest the JSON files from the responses/ directory, compute the reports, and reproduce the exact performance tables shown in the paper.

Citation

If you find this work useful, please cite our paper:

@article{raiyan2025frugalprompt,
  title={FrugalPrompt: Reducing Contextual Overhead in Large Language Models via Token Attribution},
  author={Raiyan, Syed Rifat and Ishmam, Md Farhan and Imran, Abdullah Al and Moni, Mohammad Ali},
  journal={arXiv preprint arXiv:2510.16439},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
global_token_attribution2 @ 4d400bb		global_token_attribution2 @ 4d400bb
responses		responses
scores		scores
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
frugal_ICL_clean.ipynb		frugal_ICL_clean.ipynb
frugal_cls_clean.ipynb		frugal_cls_clean.ipynb
frugal_math_clean.ipynb		frugal_math_clean.ipynb
frugal_results.ipynb		frugal_results.ipynb
frugal_token_counts.ipynb		frugal_token_counts.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FʀᴜɢᴀʟPʀᴏᴍᴘᴛ: Reducing Contextual Overhead in Large Language Models via Token Attribution

Overview

Pipeline

Repository Structure

Supported Datasets & Models

Setup & Installation

Reproducing the Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FʀᴜɢᴀʟPʀᴏᴍᴘᴛ: Reducing Contextual Overhead in Large Language Models via Token Attribution

Overview

Pipeline

Repository Structure

Supported Datasets & Models

Setup & Installation

Reproducing the Results

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages