ts-spark-connector

TypeScript client for Apache Spark Connect.
Construct Spark logical plans entirely in TypeScript and run them against a Spark Connect server.

🚀 Features

Build Spark logical plans using a fluent, PySpark-style API in TypeScript
Evaluate transformations locally or stream results via Arrow
Tagless Final DSL design with support for multiple backends
Composable, immutable, and strongly typed DataFrame operations
Column expressions (col, .gt, .alias, .and, etc.)
Compatible with Spark Connect Protobuf and spark-submit --class org.apache.spark.sql.connect.service.SparkConnectServer
Set operations (UNION, INTERSECT, EXCEPT) with by_name, is_all, and allow_missing_columns
Spark-compatible joins with configurable join types
Session-aware execution (no global singletons)
Plan viz / AST dump: export client AST to JSON & Mermaid
Ready-to-run examples in examples/

📦 Installation

npm i ts-spark-connector
# o: yarn add / pnpm add

Start a Spark Connect server (Docker)

cd spark-server docker compose up -d

Quick start

import { SparkSession, col } from "ts-spark-connector";

You need a running Spark Connect server. See spark-server/README.md for a ready-to-use Docker setup, or run your own server.

🧪 Quick Start

import { SparkSession } from "./src/client/session";
import { col } from "./src/engine/column";

const session = SparkSession.builder()
  // optional: auth / TLS
  .getOrCreate();

const people = session.read
  .option("delimiter", "\t")
  .option("header", "true")
  .csv("/data/people.tsv");

const purchases = session.read
  .option("delimiter", "\t")
  .option("header", "true")
  .csv("/data/purchases.tsv");

await people
  .join(purchases, col("id").eq(col("user_id")), "left")
  .select("name", "product", "amount")
  .filter(col("amount").gt(100))
  .show();

Example: UNION (Set Operation)

const p2024 = purchases.filter(col("year").eq(2024));
const p2025 = purchases.filter(col("year").eq(2025));

await p2024.union(p2025, { is_all: true, by_name: false })
  .limit(5)
  .show();

🗺️ Plan viz / AST dump

Inspect the client-side plan (before server optimization):

const df = purchases
  .select("user_id", "product", "amount")
  .filter(col("amount").gt(100))
  .orderBy(col("user_id").descNullsLast());

console.log(df.toClientASTJSON());      // JSON AST
console.log(df.toClientASTMermaid());   // Mermaid diagram
console.log(df.toSparkLogicalPlanJSON());// Client logical plan
console.log(df.toProtoJSON());           // Spark Connect proto

Tip: write these strings to disk (.mmd, .json) and publish them as CI artifacts.

🔐 TLS

const session = SparkSession.builder()
  .enableTLS({
    keyStorePath: "./certs/keystore.p12",
    keyStorePassword: "password",
    trustStorePath: "./certs/cert.crt",
    trustStorePassword: "password",
  })
  .getOrCreate();

✅ Compatibility Matrix

Component	Supported / Tested
Spark Connect	3.5.x
Scala ABI (JAR)	2.12 (`spark-connect_2.12`)
Node.js	18, 20, 22
OS	Linux (CI); macOS (local)

Planned: add CI jobs for macOS/Windows; update table as coverage expands.

✅ Feature Matrix

Feature	Supported
CSV Reading	✅
Filtering	✅
Projection / Alias	✅
Arrow decoding (`.show()`)	✅
Column expressions (`col`, `.gt`, `.and`, `.alias`, etc.)	✅
DSL abstraction (Tagless Final)	✅
Joins (configurable types)	✅
Aggregation (`groupBy().agg({...})`)	✅
Distinct (`distinct()`, `dropDuplicates(...)`)	✅
Sorting (`orderBy(...)`, `sort(...)`)	✅
Limit (`limit(n)`)	✅
Set operations (`UNION`, `INTERSECT`, `EXCEPT`)	✅
Column renaming (`withColumnRenamed(...)`)	✅
Type declarations (`.d.ts`)	✅
Modular compiler core (backend-agnostic)	✅
Tests (Unit + Integration + E2E)	✅
withColumn(...)	✅
when(...).otherwise(...)	✅
Window functions	✅
Null handling (`isNull`, `na.drop/fill/replace`)	✅
Parquet Reading	✅
JSON Reading	✅
DataFrameWriter (CSV/JSON/Parquet/ORC/Avro)	✅
Write `partitionBy`, `bucketBy`, `sortBy`	✅
describe(), `summary()`	✅
unionByName(...)	✅
Complex types + `explode/posexplode`	✅
JSON helpers (`from_json`, `to_json`)	✅
repartition(...) / coalesce(...)	✅
explain(...) (`simple/extended/formatted`)	✅
`SparkSession.builder.config(...)`	✅
Auth/TLS for Spark Connect	✅
spark.sql(...)	✅
Temp views (`createOrReplaceTempView`)	✅
Catalog (`read.table`, `saveAsTable`)	✅
Plan viz / AST dump	✅
cache() / persist() / unpersist()	⚠️ Limited by Spark Connect
Join hints (`broadcast`, etc.)	✅
sample(...), `randomSplit(...)`	✅
UDF (scalar)	⚠️ Limited by Spark Connect
UDAF / Vectorized UDF (Arrow)	⚠️ Limited by Spark Connect
Structured Streaming	✅
Watermark / triggers / output modes	✅
Lakehouse: Delta/Iceberg/Hudi	❌
JDBC read/write	❌
MLlib	❌

📄 License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
.github		.github
docs/examples		docs/examples
example_data		example_data
proto/spark/connect		proto/spark/connect
spark-server		spark-server
src		src
test		test
.gitignore		.gitignore
AGENTS.MD		AGENTS.MD
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.test		Dockerfile.test
LICENSE		LICENSE
README.md		README.md
TESTING.md		TESTING.md
bug_report.md		bug_report.md
config.yml		config.yml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
feature_request.md		feature_request.md
package-lock.json		package-lock.json
package.json		package.json
package.suggested.json		package.suggested.json
pull_request_template.md		pull_request_template.md
tsconfig.json		tsconfig.json
tsconfig.test.json		tsconfig.test.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ts-spark-connector

🚀 Features

📦 Installation

Start a Spark Connect server (Docker)

Quick start

🧪 Quick Start

Example: UNION (Set Operation)

🗺️ Plan viz / AST dump

🔐 TLS

✅ Compatibility Matrix

✅ Feature Matrix

📄 License

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ts-spark-connector

🚀 Features

📦 Installation

Start a Spark Connect server (Docker)

Quick start

🧪 Quick Start

Example: UNION (Set Operation)

🗺️ Plan viz / AST dump

🔐 TLS

✅ Compatibility Matrix

✅ Feature Matrix

📄 License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages