Skip to content

Conversation

@vitrvvivs
Copy link

Staggers write requests in order to reduce number of unprocessed items.
Combines unprocessed items into new batches (no more batches of only a few items).
Allows restoring from local file, because s3 likes to close long-running connections.

Implementation
It now has two completely separate loops:

  1. readline (created in _startDownload) that parses and pushes each line into an array (requestItems)
  2. _sendBatch (started in _checkTableReady), which pulls items from that array and sends them as batches.
    This separation allows _sendBatch to call itself after a certain amount of time has passed (every (1000 / concurrency) milliseconds). The previous implementation allowed a certain number of concurrent requests regardless of speed; on a fast network (a large EC2 instance), even 1 concurrent request was equivalent to 2500 writes per second.

Matt Geskey added 17 commits September 26, 2017 10:09
S3 has a chance of randomly closing the connection before the download
is finished. This makes restoring from large files impossible. This is a
hack, to download the file quickly, then do the much-slower restore.
most of the time was spent in node (CPU bound). Timing only how long the
request took failed to acount for overhead, and thus throttled down to
20% of the target.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant