bitofsky
diff --git a/‎.github/release-drafter.yml‎
Lines changed: 40 additions & 0 deletions b/‎.github/release-drafter.yml‎
Lines changed: 40 additions & 0 deletions
diff --git a/‎.github/workflows/publish.yml‎
Lines changed: 28 additions & 0 deletions b/‎.github/workflows/publish.yml‎
Lines changed: 28 additions & 0 deletions
diff --git a/‎.github/workflows/release-drafter.yml‎
Lines changed: 25 additions & 0 deletions b/‎.github/workflows/release-drafter.yml‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 7 additions & 0 deletions b/‎.gitignore‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 149 additions & 0 deletions b/‎README.md‎
Lines changed: 149 additions & 0 deletions
@@ -0,0 +1,40 @@
+name-template: 'v$RESOLVED_VERSION'
+tag-template: 'v$RESOLVED_VERSION'
+categories:
+  - title: 'Features'
+    labels:
+      - 'feature'
+      - 'enhancement'
+  - title: 'Bug Fixes'
+    labels:
+      - 'bug'
+      - 'fix'
+  - title: 'Documentation'
+    labels:
+      - 'documentation'
+      - 'docs'
+  - title: 'Dependencies'
+    labels:
+      - 'dependencies'
+change-template: '- $TITLE @$AUTHOR (#$NUMBER)'
+change-title-escapes: '\<*_&'
+version-resolver:
+  major:
+    labels:
+      - 'major'
+  minor:
+    labels:
+      - 'minor'
+      - 'feature'
+  patch:
+    labels:
+      - 'patch'
+      - 'bug'
+      - 'fix'
+  default: patch
+template: |
+  ## Changes
+
+  $CHANGES
+
+  **Full Changelog**: https://github.com/$OWNER/$REPOSITORY/compare/$PREVIOUS_TAG...v$RESOLVED_VERSION
@@ -0,0 +1,28 @@
+name: Publish to npm
+
+on:
+  release:
+    types:
+      - published
+
+jobs:
+  publish:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      id-token: write
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: actions/setup-node@v4
+        with:
+          node-version: '20'
+          registry-url: 'https://registry.npmjs.org'
+
+      - run: npm ci
+
+      - run: npm test
+
+      - run: npm run build
+
+      - run: npm publish --provenance --access public
@@ -0,0 +1,25 @@
+name: Release Drafter
+
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    types:
+      - opened
+      - reopened
+      - synchronize
+
+permissions:
+  contents: read
+
+jobs:
+  update_release_draft:
+    permissions:
+      contents: write
+      pull-requests: write
+    runs-on: ubuntu-latest
+    steps:
+      - uses: release-drafter/release-drafter@v6
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -0,0 +1,7 @@
+node_modules/
+dist/
+*.log
+.DS_Store
+coverage/
+.env
+.env.*
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,149 @@
+# merge-streams
+
+**When Databricks gives you 90+ presigned URLs, merge them into one.**
+
+> *Because nobody wants to explain to their MCP client why it needs to juggle dozens of chunk URLs.*
+
+---
+
+## Why I Made This
+
+I was building an MCP Server that queries Databricks SQL for large datasets. I chose External Links format because INLINE would blow up memory.
+
+But then Databricks handed me back something like this:
+
+```
+chunk_0.arrow (presigned URL)
+chunk_1.arrow (presigned URL)
+chunk_2.arrow (presigned URL)
+...
+chunk_89.arrow (presigned URL)
+```
+
+My client would have to:
+1. Fetch each chunk sequentially
+2. Parse and merge them correctly (CSV headers? JSON array brackets? Arrow EOS markers?)
+3. Handle errors across 90 HTTP requests
+4. Pray nothing times out
+
+That was unacceptable. So I built this.
+
+---
+
+## The Solution
+
+`merge-streams` takes those chunked External Links and merges them into a single, unified stream.
+
+```
+90+ presigned URLs → merge-streams → 1 clean stream → S3 → 1 presigned URL
+```
+
+Now my MCP client gets one URL. Done.
+
+### What Makes It Fast
+
+- **Pre-connected**: Next chunk's connection opens while current chunk streams. No idle time.
+- **Zero accumulation**: Pure stream piping. Memory stays flat regardless of data size.
+- **Format-aware**: Not byte concatenation — actual format understanding.
+
+---
+
+## Features
+
+- **CSV**: Automatically deduplicates headers across chunks
+- **JSON_ARRAY**: Properly concatenates JSON arrays (handles brackets and commas)
+- **ARROW_STREAM**: Merges Arrow IPC streams batch-by-batch (doesn't just byte-concat)
+- **Memory-efficient**: Streaming-based, never loads entire files into memory
+- **AbortSignal support**: Cancel mid-stream when needed
+
+---
+
+## Installation
+
+```bash
+npm install merge-streams
+```
+
+Requires Node.js 18+ (uses native `fetch()` and `Readable.fromWeb()`)
+
+---
+
+## Quick Start: The Databricks Use Case
+
+See [test/databricks.spec.ts](test/databricks.spec.ts) for a complete working example.
+
+```bash
+# Run the integration test
+DATABRICKS_TOKEN=dapi... \
+DATABRICKS_HOST=xxx.cloud.databricks.com \
+DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/xxx \
+npm test -- test/databricks.spec.ts
+```
+
+---
+
+## API
+
+### URL-based (for Databricks External Links)
+
+```ts
+import { mergeStreamsFromUrls } from 'merge-streams'
+
+await mergeStreamsFromUrls('CSV', urls, outputStream)
+await mergeStreamsFromUrls('JSON_ARRAY', urls, outputStream)
+await mergeStreamsFromUrls('ARROW_STREAM', urls, outputStream)
+```
+
+### Options
+
+```ts
+const controller = new AbortController()
+
+await mergeCsvFromUrls(urls, output, { signal: controller.signal })
+
+// Cancel anytime
+controller.abort()
+```
+
+---
+
+## Format Details
+
+| Format | Databricks name | Behavior |
+|--------|----------------|----------|
+| CSV | `CSV` | Writes header once, skips duplicate headers from subsequent chunks |
+| JSON_ARRAY | `JSON_ARRAY` | Wraps in `[]`, strips brackets from chunks, inserts commas |
+| ARROW_STREAM | `ARROW_STREAM` | Re-encodes RecordBatches into single IPC stream (not byte-concat) |
+
+---
+
+## Types
+
+```ts
+type InputSource = Readable | (() => Readable) | (() => Promise<Readable>)
+type MergeFormat = 'ARROW_STREAM' | 'CSV' | 'JSON_ARRAY'
+```
+
+---
+
+## Why Not Just Byte-Concatenate?
+
+- **CSV**: You'd get duplicate headers scattered throughout
+- **JSON_ARRAY**: `[1,2][3,4]` is not valid JSON
+- **Arrow**: Most Arrow readers stop at the first EOS marker
+
+Each format needs format-aware merging. That's what this library does.
+
+---
+
+## Scope
+
+This library was born from a specific pain point: making Databricks External Links usable in MCP Server development. It does that one thing well.
+
+If you have other use cases in mind, PRs are welcome.
+
+---
+
+## License
+
+MIT