Skip to content

ljcamargo/batchy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"batchy icon"

Batchy

Agentic AI to edit large JSON(L), CSV, TSV, YAML item by item

A powerful Agentic AI batch processing tool built with Google ADK providing the power to process and transform large JSON, JSONL, CSV, TSV, and YAML files using a multi-agent system powered by Google's Gemini AI models.

Author

Luis J Camargo

Rationale

While most modern LLMs have a very large context window, processing very large structured files like JSONs or CSVs is prone to errors and limited to simple edits. This agentic tool lets you iterate through all the files and perform changes item by item, which allows full attention to each item and enables complex behaviors like web search on each entry.

Examples of use

  • You have a JSON file comprising a large list of cities but you don't know the geolocation of all of them. An LLM won't be able to perform these changes on the entire JSON file at once, even if it is capable of web search. Solution: Ask Batchy to perform a web lookup for each city to find the missing ones.

  • You have a CSV dataset and you need to add a new column with a translation of an English caption to French. The CSV is large, and even if it fits in the context window, the output is incomplete or prone to unwanted changes due to its size. Solution: Ask Batchy to perform these changes on your CSV.

  • You have a YAML file with user data where the "name" field is not divided into first name and last name, and simply splitting words rarely works. Additionally, the address is not divided into fields. While an LLM could help parse this data with certainty, your file has thousands of entries, making it impractical to split into chunks for processing with common LLMs. Solution: Ask Batchy to perform these changes on each item.

Prompting

Once you have run the WebUI or used the sample tool online, write a prompt describing the changes you want and include your file. CAVEAT: In this preliminary version, the WebUI doesn't support file uploading, so you have to paste the file directly in the prompt. Example:

In the following YAML please split the "name" field for each registry into first_name and last_name, also split the "address" field into street (with number), city, state, and zip, set null if not detected.

File:
users:
  - name: "John Michael Smith"
    address: "123 Main Street Apt 4B, New York, NY 10001"
    email: "john.smith@email.com"
    phone: "+1-555-123-4567"

# The above YAML will be transformed into:
users:
  - first_name: "John"
    last_name: "Smith"
    street: "123 Main Street Apt 4B"
    city: "New York"
    state: "NY"
    zip: "10001"
    email: "john.smith@email.com"
    phone: "+1-555-123-4567"

Features

  • Batch processing of JSON, CSV, JSONL, YAML, and TSV files
  • Multi-agent architecture for efficient processing
  • Supports complex transformations on JSON data
  • Built on Google ADK and Gemini AI model
  • Memory-efficient processing of large files
  • Flexible input/output format handling

Installation

  1. Ensure you have Python installed on your system

  2. Clone this repository:

    git clone https://github.com/username/batchy-json-agent.git
    cd batchy-json-agent
  3. Install the required dependencies:

    pip install google-adk google-genai
  4. Set up environment variables:

    cd batchy
    cp sample.env .env

    Then edit the .env file and add your Gemini API key:

    GOOGLE_API_KEY=your_api_key_here

Usage

The project uses Google ADK's agent system to process files. Here's how to use it:

Local Development

To run the project locally in development mode:

adk web

This will start a local development server where you can:

  • Upload files for processing
  • Define transformation instructions
  • Monitor the processing progress
  • Download the transformed results

Deployment

To deploy the project:

adk deploy

This will deploy your agent to Google's infrastructure, making it available for production use.

How It Works

The system uses three main components:

  1. Root Agent (main_agent): Orchestrates the overall process, handles file preparation and validates formats.

  2. Item Agent: Processes individual items in the batch according to the transformation instructions.

  3. Reviewer Agent: Ensures the transformed output maintains correct formatting and structure.

The process flow:

  1. File is uploaded or provided as text
  2. Root agent prepares and validates the input
  3. Item agent processes each element according to instructions
  4. Reviewer agent validates the transformations
  5. Results are aggregated and saved in the specified format

Supported File Formats

  • JSON (.json)
  • CSV (.csv)
  • JSONL (.jsonl)
  • YAML (.yaml)
  • TSV (.tsv)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Agentic AI to perform changes on JSON, JSONL, CSV or YAML item by item

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages