Agentic AI to edit large JSON(L), CSV, TSV, YAML item by item
A powerful Agentic AI batch processing tool built with Google ADK providing the power to process and transform large JSON, JSONL, CSV, TSV, and YAML files using a multi-agent system powered by Google's Gemini AI models.
Luis J Camargo
While most modern LLMs have a very large context window, processing very large structured files like JSONs or CSVs is prone to errors and limited to simple edits. This agentic tool lets you iterate through all the files and perform changes item by item, which allows full attention to each item and enables complex behaviors like web search on each entry.
-
You have a JSON file comprising a large list of cities but you don't know the geolocation of all of them. An LLM won't be able to perform these changes on the entire JSON file at once, even if it is capable of web search. Solution: Ask Batchy to perform a web lookup for each city to find the missing ones.
-
You have a CSV dataset and you need to add a new column with a translation of an English caption to French. The CSV is large, and even if it fits in the context window, the output is incomplete or prone to unwanted changes due to its size. Solution: Ask Batchy to perform these changes on your CSV.
-
You have a YAML file with user data where the "name" field is not divided into first name and last name, and simply splitting words rarely works. Additionally, the address is not divided into fields. While an LLM could help parse this data with certainty, your file has thousands of entries, making it impractical to split into chunks for processing with common LLMs. Solution: Ask Batchy to perform these changes on each item.
Once you have run the WebUI or used the sample tool online, write a prompt describing the changes you want and include your file. CAVEAT: In this preliminary version, the WebUI doesn't support file uploading, so you have to paste the file directly in the prompt. Example:
In the following YAML please split the "name" field for each registry into first_name and last_name, also split the "address" field into street (with number), city, state, and zip, set null if not detected.
File:
users:
- name: "John Michael Smith"
address: "123 Main Street Apt 4B, New York, NY 10001"
email: "john.smith@email.com"
phone: "+1-555-123-4567"
# The above YAML will be transformed into:
users:
- first_name: "John"
last_name: "Smith"
street: "123 Main Street Apt 4B"
city: "New York"
state: "NY"
zip: "10001"
email: "john.smith@email.com"
phone: "+1-555-123-4567"
- Batch processing of JSON, CSV, JSONL, YAML, and TSV files
- Multi-agent architecture for efficient processing
- Supports complex transformations on JSON data
- Built on Google ADK and Gemini AI model
- Memory-efficient processing of large files
- Flexible input/output format handling
-
Ensure you have Python installed on your system
-
Clone this repository:
git clone https://github.com/username/batchy-json-agent.git cd batchy-json-agent -
Install the required dependencies:
pip install google-adk google-genai
-
Set up environment variables:
cd batchy cp sample.env .envThen edit the
.envfile and add your Gemini API key:GOOGLE_API_KEY=your_api_key_here
The project uses Google ADK's agent system to process files. Here's how to use it:
To run the project locally in development mode:
adk webThis will start a local development server where you can:
- Upload files for processing
- Define transformation instructions
- Monitor the processing progress
- Download the transformed results
To deploy the project:
adk deployThis will deploy your agent to Google's infrastructure, making it available for production use.
The system uses three main components:
-
Root Agent (main_agent): Orchestrates the overall process, handles file preparation and validates formats.
-
Item Agent: Processes individual items in the batch according to the transformation instructions.
-
Reviewer Agent: Ensures the transformed output maintains correct formatting and structure.
The process flow:
- File is uploaded or provided as text
- Root agent prepares and validates the input
- Item agent processes each element according to instructions
- Reviewer agent validates the transformations
- Results are aggregated and saved in the specified format
- JSON (.json)
- CSV (.csv)
- JSONL (.jsonl)
- YAML (.yaml)
- TSV (.tsv)
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.