A Python package to process CSV text data in batches using OpenAI's GPT API.
✅ Processes large CSV files (50K+ rows) in batches of 10 rows
✅ Uses OpenAI GPT-4 API (for now) to generate responses for each row
✅ Saves the results in a new CSV column
✅ CLI support (llm-process
) for easy execution
✅ Modular and scalable package design
llm_batch_processor/
│── llm_batch_processor/ # The package
│ │── __init__.py
│ │── config.py # Configuration settings
│ │── api_client.py # Handles OpenAI API requests
│ │── processor.py # Batch processing logic
│── scripts/
│ │── main.py # CLI entry point
│── data/ # Place CSV files here
│── setup.py # Package setup
│── setup.cfg
│── pyproject.toml
│── requirements.txt
│── README.md
│── .gitignore
│── LICENSE
git clone https://github.com/karthikRavichandran/llm_batch_processor.git
cd llm_batch_processor
Or
pip install llm-batch-process
Create a .env
file in the root directory and add:
OPENAI_API_KEY=your_api_key_here
Project_Id=your_project_id #optional
or
export OPENAI_API_KEY="your_api_key_here"
export Project_Id="your_project_id" #Optional
- Place your input CSV file in the
data/
folder or mention the location using--input_csv
- Ensure it has a column containing text (default:
"text"
) or mention the column name using--text_column
- Place your prompt in prompt.txt or mention it using
--prompt_file
llm-process
CLI args
--api_key API_KEY OpenAI API Key (default: from environment variable)
--project_id PROJECT_ID
Project ID (default: from environment variable)
--batch_size BATCH_SIZE
Batch size for processing (default: 10)
--input_csv INPUT_CSV
Path to input CSV file (default: data/input.csv)
--output_csv OUTPUT_CSV
Path to output CSV file (default: data/output.csv)
--text_column TEXT_COLUMN
Column name containing input text (default: text)
--response_column RESPONSE_COLUMN
Column name for storing responses (default: response)
--prompt_file PROMPT_FILE
Path to the text file containing the prompt
This will:
- Read data from
data/input.csv
- Process text in batches of 10 rows
- Send each batch to OpenAI GPT-4
- Save responses in
data/output.csv
under a new column"response"
Modify config.py
for:
BATCH_SIZE = 10 # Number of rows per batch
INPUT_CSV = "data/input.csv"
OUTPUT_CSV = "data/output.csv"
TEXT_COLUMN = "text"
RESPONSE_COLUMN = "response"
pip install -r requirements.txt
(TODO: Add unit tests)
pytest
python setup.py sdist bdist_wheel
twine upload dist/*
pip install llm_batch_processor
This project is licensed under the MIT License.
- Fork the repo
- Create a feature branch (
git checkout -b feature-branch
) - Commit changes (
git commit -m "Added new feature"
) - Push to branch (
git push origin feature-branch
) - Create a Pull Request 🚀
For support or collaboration, reach out via Karthik Ravichandran.