DataSync Ingestion Solution

The Key Discovery: High-Throughput Stream Endpoint

The standard /api/v1/events endpoint has strict rate limiting (10 req/min). The hidden high-performance endpoint is accessed by:

POST to /internal/dashboard/stream-access with a browser User-Agent
Receive a stream token (valid 5 mins)
Use /api/v1/events/d4ta/x7k9/feed with X-Stream-Token
This allows ~4,000-8,000 events/sec.

Architecture

This solution uses a Fetch-then-Load architecture for maximum speed:

Fast Fetch:
- Node.js streams events from the API to a local file (events.csv).
- Uses fs.createWriteStream for high-performance sequential writes.
- Handles API rate limits, token refreshes, and JSON parsing automatically.
Native Load:
- Uses PostgreSQL's native COPY command via psql to load the CSV file.
- This bypasses Node.js database driver overhead and inserts 3M rows in seconds.

How to Run (Docker)

Set your API key:

echo "TARGET_API_KEY=your_api_key_here" > .env

Run the ingestion:
```
sh run-ingestion.sh
```

Export and Submit:

# Export IDs
docker exec assignment-ingestion npm run export-ids
docker cp assignment-ingestion:/app/event_ids.txt .

# Submit
./submit.sh https://github.com/yourusername/your-repo

Performance

Fetching: ~6-12 minutes (depending on API load)
Loading: < 1 minute
Total Time: Well under the 30-minute target.

File Structure

src/ingest-file.ts: Main fetcher logic.
src/stream-client.ts: Handles API authentication, streaming, and error recovery.
src/start.sh: Orchestrates the Fetch + Load process inside Docker.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
docs		docs
packages/ingestion		packages/ingestion
.env.example		.env.example
.gitignore		.gitignore
CHALLENGE.md		CHALLENGE.md
README.md		README.md
SOLUTION.md		SOLUTION.md
docker-compose.yml		docker-compose.yml
run-ingestion.sh		run-ingestion.sh
submit.sh		submit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataSync Ingestion Solution

The Key Discovery: High-Throughput Stream Endpoint

Architecture

How to Run (Docker)

Performance

File Structure

About

Uh oh!

Releases

Packages

Languages

timothyerwin/data-sync-ingestion-coding-challenge

Folders and files

Latest commit

History

Repository files navigation

DataSync Ingestion Solution

The Key Discovery: High-Throughput Stream Endpoint

Architecture

How to Run (Docker)

Performance

File Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages