Skip to content

ToyokoLabs/parquetconv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ParquetConv

A command-line tool for converting between Parquet and CSV file formats using pandas.

Features

  • Automatic format detection: Automatically detects whether the input file is Parquet or CSV
  • Bidirectional conversion: Convert Parquet to CSV or CSV to Parquet
  • Flexible output naming: Auto-generates output filenames or allows custom naming
  • Error handling: Comprehensive error handling with informative messages
  • Force conversion: Option to force conversion even with uncertain file formats

Installation

Option 1: Install from PyPI (Recommended)

pip install parquetconv

After installation, you can use the parquetconv command directly from anywhere in your terminal.

Option 2: Install from source

Clone the repository and install:

git clone https://github.com/ToyokoLabs/parquetconv.git
cd parquetconv
pip install -e .

Option 3: Development setup with uv

The project uses uv for dependency management. Install dependencies with:

uv sync

Usage

After pip installation

Convert a Parquet file to CSV:

parquetconv input.parquet

Convert a CSV file to Parquet:

parquetconv input.csv

From source or development

python -m parquetconv.cli input.parquet
python -m parquetconv.cli input.csv

Advanced Usage

Specify a custom output filename:

parquetconv input.parquet -o custom_output.csv
parquetconv input.csv -o custom_output.parquet

Force conversion (useful when file format detection is uncertain):

parquetconv input_file --force

Command Line Options

  • input_file: Path to the input file (required)
  • -o, --output: Custom output file path (optional)
  • --force: Force conversion even if file format detection is uncertain
  • -h, --help: Show help message

Examples

# Convert Parquet to CSV with auto-generated filename
parquetconv data.parquet
# Output: data.csv

# Convert CSV to Parquet with custom filename
parquetconv data.csv -o processed_data.parquet

# Convert with force flag
parquetconv unknown_file --force

# Get help
parquetconv --help

Requirements

  • Python 3.9+
  • pandas >= 2.3.2
  • pyarrow >= 21.0.0

How It Works

  1. File Detection: The tool first checks the file extension, then attempts to read the file to determine its format
  2. Format Conversion: Uses pandas to read the input file and convert it to the opposite format
  3. Output Generation: Creates the output file with an appropriate extension if not specified

Error Handling

The tool provides clear error messages for:

  • Missing input files
  • Unsupported file formats
  • Read/write errors during conversion
  • Invalid file content

Development

To contribute to the project:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests (if available)
  5. Submit a pull request

License

This project is open source and available under the GNU General Public License v3.0.

Author

Sebastian Bassi - sebastian@toyoko.io

Repository

About

Parquet to CSV and CSV to parquet convert

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages