Skip to content

Releases: dgtlss/parqbridge

v1.2.0

21 Aug 12:34

Choose a tag to compare

  • Add export configuration options for include/exclude tables in parqbridge.php
  • Refactor ExportAllTablesCommand to merge command line options with config settings.

v1.1.8

14 Aug 10:07

Choose a tag to compare

Minor Update

  • Added FTP upload support

v1.1.5

13 Aug 12:17

Choose a tag to compare

Add transport format configuration to parqbridge.php, and enhance ExportTableCommand and ExternalParquetConverter for improved handling of JSONL and TSV formats.

v1.1.2

13 Aug 11:51

Choose a tag to compare

Enhance ParqBridge configuration by adding 'pyarrow_block_size' setting and update ExternalParquetConverter to utilize block size for improved handling of large TSV rows.

v1.1.1

13 Aug 11:47

Choose a tag to compare

Enhance ExternalParquetConverter for better compatibility with PyArrow by adjusting quoting options.

v1.1.0

13 Aug 11:43

Choose a tag to compare

Update README with badges and modify ExportTableCommand to use TSV for CSV output; enhance ExternalParquetConverter to read TSV for improved compatibility with JSON/text fields.

v1.0.0

13 Aug 10:23

Choose a tag to compare

ParqBridge v1.0.0

ParqBridge is a lightweight Laravel package that exports your database tables to real Apache Parquet files on any Laravel Storage disk (local, S3, etc.) via a simple artisan command. It aims for zero PHP dependency bloat while delegating the final Parquet write to a tiny, embedded Python script using PyArrow—or any custom CLI you provide.

Highlights

• First stable release focused on reliability and minimalism
• Works with Laravel 11 and 12
• True Apache Parquet output with optional compression
• Export single tables or all tables in bulk
• Zero PHP bloat; external writer backend (default: PyArrow) or custom command
• Schema inference across MySQL, PostgreSQL, SQLite, SQL Server

New commands

• parqbridge:export {table}: Export one table (supports filtering, limiting, disk/output selection)
• Options: --where=, --limit=, --output=, --disk=
• parqbridge:export-all: Export all tables into a single timestamped folder
• Options: --disk=, --output=, --include=, --exclude=
• parqbridge:tables: List tables available for export
• parqbridge:setup: Bootstrap the PyArrow backend (optionally in a venv)
• Options: --python=, --venv=, --no-venv, --write-env, --upgrade, --dry-run

How it works

• Streams rows from your DB in chunks (configurable) to a temporary CSV
• Infers a Parquet schema (per-driver) to preserve types
• Converts CSV to Parquet using:
• PyArrow via python3 by default, or
• A custom shell command you provide (e.g., DuckDB CLI)
• Writes the Parquet file to your chosen Storage disk

File naming: {table}-{YYYYMMDD_HHMMSS}.parquet.
Bulk export writes to a timestamped subfolder under your output directory.

Schema inference (per driver)

• MySQL/MariaDB: tinyint(1) → BOOLEAN; date/time/datetime/timestamp; decimal(precision,scale) → Parquet DECIMAL
• PostgreSQL: Integer/float/real/double; numeric as DECIMAL; timestamps and times with microsecond precision; json/jsonb/text/uuid/bytea handled as UTF8/binary where applicable
• SQLite: Best-effort mapping for int, text/char, real/float, blob
• SQL Server: int/smallint/tinyint/bigint; decimal/numeric as DECIMAL; datetime* as timestamp; time as microsecond time; common textual/binary types

Binary safety: non-UTF8 BYTE_ARRAY values are base64-encoded in the CSV and properly decoded by the writer.

Writer backends and compression

• Default backend: pyarrow (requires Python + PyArrow)
• Alternative: custom command via PARQBRIDGE_CUSTOM_CMD (must read {input} CSV and write {output} Parquet)
• Compression (PyArrow): UNCOMPRESSED/NONE, SNAPPY, GZIP, ZSTD, BROTLI, LZ4_RAW

Laravel integration

• Auto-discovers ParqBridge\ParqBridgeServiceProvider
• Publishes config with --tag="parqbridge-config"
• Uses Storage disks configured in config/filesystems.php

Configuration

Set in .env or config/parqbridge.php:
• PARQUET_DISK: target disk (e.g., s3, local)
• PARQUET_OUTPUT_DIR: output directory prefix (default parquet-exports)
• PARQUET_CHUNK_SIZE: DB chunk size (default 1000)
• PARQUET_INFERENCE: database|sample|hybrid (default hybrid)
• PARQUET_COMPRESSION: Parquet compression codec
• PARQBRIDGE_WRITER: pyarrow (default) or custom
• PARQBRIDGE_PYTHON: Python executable, e.g., python3
• PARQBRIDGE_CUSTOM_CMD: command template using {input} and {output}

Usage examples

• List tables:

php artisan parqbridge:tables

• Export one table:

php artisan parqbridge:export users --where="active = 1" --limit=1000 --disk=s3 --output="parquet-exports"

• Export all tables:

php artisan parqbridge:export-all --disk=s3 --output="parquet-exports" --exclude=migrations,password_resets

• Setup PyArrow backend:

php artisan parqbridge:setup --write-env

Requirements

• PHP 8.3+
• Laravel 11 or 12
• For default writer: Python (PARQBRIDGE_PYTHON, defaults to python3) with pyarrow installed
• php artisan parqbridge:setup can automate virtualenv creation, pip upgrade, and install

Notes and limitations

• Time/timestamp parsing uses robust formats for TIMESTAMP_MICROS and TIMESTAMP_MILLIS; ensure your data is in ISO-like Y-m-d H:i:s[.fraction] when using the external writer
• Custom writer mode requires you to provide a CLI that reads CSV from {input} and writes Parquet to {output}