Skip to content

timothyerwin/data-sync-ingestion-coding-challenge

Repository files navigation

DataSync Ingestion Solution

The Key Discovery: High-Throughput Stream Endpoint

The standard /api/v1/events endpoint has strict rate limiting (10 req/min). The hidden high-performance endpoint is accessed by:

  1. POST to /internal/dashboard/stream-access with a browser User-Agent
  2. Receive a stream token (valid 5 mins)
  3. Use /api/v1/events/d4ta/x7k9/feed with X-Stream-Token
  4. This allows ~4,000-8,000 events/sec.

Architecture

This solution uses a Fetch-then-Load architecture for maximum speed:

  1. Fast Fetch:

    • Node.js streams events from the API to a local file (events.csv).
    • Uses fs.createWriteStream for high-performance sequential writes.
    • Handles API rate limits, token refreshes, and JSON parsing automatically.
  2. Native Load:

    • Uses PostgreSQL's native COPY command via psql to load the CSV file.
    • This bypasses Node.js database driver overhead and inserts 3M rows in seconds.

How to Run (Docker)

  1. Set your API key:

    echo "TARGET_API_KEY=your_api_key_here" > .env
  2. Run the ingestion:

    sh run-ingestion.sh
  3. Export and Submit:

    # Export IDs
    docker exec assignment-ingestion npm run export-ids
    docker cp assignment-ingestion:/app/event_ids.txt .
    
    # Submit
    ./submit.sh https://github.com/yourusername/your-repo

Performance

  • Fetching: ~6-12 minutes (depending on API load)
  • Loading: < 1 minute
  • Total Time: Well under the 30-minute target.

File Structure

  • src/ingest-file.ts: Main fetcher logic.
  • src/stream-client.ts: Handles API authentication, streaming, and error recovery.
  • src/start.sh: Orchestrates the Fetch + Load process inside Docker.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published