High-performance concurrent SFTP file transfer pipeline for Go.
Built for large-scale ingestion with:
- Parallel SFTP downloads
- Worker-pool based processing
- SFTP connection pooling
- Retry with exponential backoff + jitter
- Context-aware cancellation
- Configurable read strategy (ReadAll vs Streaming)
- Benchmark suite for performance analysis
+---------------------+
| Job Producer |
+----------+----------+
|
v
+---------------------+ (N goroutines)
| Download Pool |
+----------+----------+
|
Buffered Channel
|
v
+---------------------+ (M goroutines)
| Processing Pool |
+----------+----------+
|
v
ProcessFunc
Feeds file download jobs into the pipeline.
- Uses a configurable number of goroutines
- Acquires SFTP clients from a connection pool
- Retries transient errors using exponential backoff + jitter
- Streams or fully reads files based on configuration
- Separate worker pool
- Executes user-defined
ProcessFunc - Tracks success / failure metrics
- Ensures proper resource cleanup
type Config struct {
SFTPReaders int
Workers int
BufferSize int
MaxRetries int
ReadMode ReadMode
StreamBufSize int
}const (
ReadAllMode ReadMode = iota
StreamingMode
)| Mode | Memory Usage | Speed | Best For |
|---|---|---|---|
| ReadAll | O(file size) | Faster | Small–medium files |
| Streaming | O(buffer size) | Slightly slower | Large files (50MB+) |
- Exponential backoff
- Random jitter to avoid thundering herd
- Retries only transient errors
- Permanent errors (e.g., file not found) fail immediately
- Pre-creates SFTP clients
- Reuses connections safely
- Prevents repeated SSH handshakes
- Avoids connection exhaustion
The pipeline fully respects context.Context:
- Stops job production
- Cancels retry backoff
- Stops workers gracefully
- Prevents goroutine leaks
- Local Docker SFTP server
- Go 1.22
- Intel i7-1355U
- Single 50MB file
~651ms/op
~754MB allocated
~22k allocs/op
~2.11s/op
~123MB allocated
~55k allocs/op
- Fastest throughput
- High memory usage (O(file size))
- Suitable for small/medium files
- Loads entire file into memory
- ~6× lower memory usage
- Slower due to SFTP packet round trips
- Suitable for large files or memory-constrained systems
- Memory usage bounded by buffer
This benchmark demonstrates explicit performance tradeoffs between:
- Throughput vs Memory
- CPU vs I/O bound workloads
- Allocation pressure
- Real-world system constraints
Understanding and measuring these tradeoffs is more important than raw speed.
cfg := pipeline.DefaultConfig()
cfg.ReadMode = pipeline.StreamingMode
p := pipeline.New(cfg)
metrics, err := p.Transfer(
ctx,
sshCfg,
jobs,
func(ctx context.Context, r pipeline.FileResult) error {
_, err := io.Copy(io.Discard, r.Reader)
return err
},
)
fmt.Println(metrics, err)type Metrics struct {
Success int32
Failed int32
Cancelled int32
}Metrics are updated atomically for thread safety.
This project demonstrates:
- Advanced Go concurrency patterns
- Worker pool architecture
- Resource pooling
- Backpressure handling
- Retry classification
- Memory vs throughput tradeoffs
- Clean modular package design
- Benchmark-driven performance analysis
ssh.InsecureIgnoreHostKey() is used for development.
For production, implement strict host key verification.
- High-volume SFTP ingestion
- ETL pipelines
- Background file processors
- Memory-sensitive ingestion systems
- Concurrent file transformation workflows
MIT