A high-performance parallel file downloader with caching capabilities for HTTP/HTTPS and S3-compatible storage.
- Parallel Downloads - Configurable concurrent downloads with worker pool pattern
- Download Resumption - Automatically resume interrupted downloads from partial files
- SHA256 Verification - Built-in checksum validation for integrity assurance
- S3 Caching Layer - Content-addressable cache to deduplicate downloads
- Retry Mechanism - Exponential backoff with configurable retry attempts
- Multi-Source Support - Download from HTTP/HTTPS and S3/MinIO endpoints
- Progress Tracking - Real-time progress bars for visual feedback
- Graceful Shutdown - Signal handling (SIGINT/SIGTERM) for clean interruption
- Environment Variables - Support for credential management via environment variables
Download the latest release for your platform from the releases page:
- Linux:
xget-linux-amd64.tar.gz,xget-linux-arm64.tar.gz,xget-linux-arm.tar.gz - macOS:
xget-darwin-amd64.tar.gz,xget-darwin-arm64.tar.gz - Windows:
xget-windows-amd64.exe.zip,xget-windows-arm64.exe.zip
Extract and run:
# Linux/macOS
tar xzf xget-linux-amd64.tar.gz
chmod +x xget-linux-amd64
./xget-linux-amd64 config.yaml
# Windows (PowerShell)
Expand-Archive xget-windows-amd64.exe.zip
.\xget-windows-amd64.exe config.yaml# Clone the repository
git clone <repository-url>
cd xget
# Build the binary
make build
# Binary will be available at ./bin/xget
./bin/xget config.yaml# Build the image
docker build -t xget .
# Run with config file
docker run -v $(pwd)/config.yaml:/config.yaml xget /config.yaml- Go 1.26.0 or higher
xget <config.yaml>The tool takes a single argument - the path to a YAML configuration file that defines:
- Storage endpoints (aliases)
- Cache configuration
- Download settings
- List of files to download
The generate command helps create configuration files by scanning an existing directory and computing SHA256 hashes for all files:
# Output to stdout
xget generate <directory>
# Write to file
xget generate <directory> -o output.yamlExample usage:
# Scan downloads directory and generate config
xget generate ./downloads -o generated.yaml
# Preview generated config
xget generate ./myfilesGenerated output format:
files:
- url: ""
dest: file1.tar.gz
sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
- url: ""
dest: subdir/file2.bin
sha256: a1b2c3d4e5f6789012345678901234567890123456789012345678901234567890The generate command:
- Recursively walks the directory tree
- Computes SHA256 hash for each regular file
- Uses relative paths from the base directory
- Outputs YAML with empty
urlfields (to be filled in manually) - Preserves directory structure in file paths
- Skips directories, symlinks, and special files
- Prints warnings to stderr for inaccessible files
Use cases:
- Verify existing downloads - Generate checksums to verify files downloaded outside xget
- Create download manifests - Document file checksums before distribution
- Populate config templates - Generate file entries and add URLs later
After generation, edit the config to add:
- URLs for each file
- Storage aliases (if using S3)
- Cache configuration
- Download settings (parallel, retries, etc.)
xget -version
# or
xget --versionDisplays version, commit hash, and build timestamp.
Create a configuration file (see config.yaml.template for a complete example):
# Storage aliases - define S3/MinIO endpoints
aliases:
# AWS S3 example
mycloud:
endpoint: https://s3.amazonaws.com
region: us-east-1
bucket: my-bucket
access_key: "" # optional, falls back to AWS_ACCESS_KEY_ID env var
secret_key: "" # optional, falls back to AWS_SECRET_ACCESS_KEY env var
# MinIO example with environment variable substitution
minio:
endpoint: https://minio.company.com
bucket: artifacts
access_key: ${MINIO_ACCESS_KEY}
secret_key: ${MINIO_SECRET_KEY}
# Cache storage
cache:
endpoint: https://s3.amazonaws.com
region: us-east-1
bucket: download-cache
prefix: files/ # optional key prefix
# Cache configuration
cache:
alias: cache # reference to alias defined above
enabled: true
# Download settings
settings:
parallel: 4 # max concurrent downloads (default: 4)
retries: 3 # retry attempts on failure (default: 3)
retry_delay: 5s # delay between retries (default: 5s)
# Files to download
files:
# Download from S3 using alias
- url: s3://mycloud/path/to/file1.tar.gz
dest: ./downloads/file1.tar.gz
sha256: abc123def456...
# Download from MinIO
- url: s3://minio/tools/file2.zip
dest: /opt/tools/file2.zip
sha256: def456ghi789...
# Download from HTTP
- url: https://example.com/file3.bin
dest: ./downloads/file3.bin
sha256: ghi789jkl012...The configuration supports environment variable expansion using ${VAR_NAME} syntax in:
- Alias credentials - Access keys, secret keys, and configuration
- Cache enabled flag - Enable/disable cache dynamically
- File destination paths - Customize download locations
Example:
aliases:
minio:
access_key: ${MINIO_ACCESS_KEY}
secret_key: ${MINIO_SECRET_KEY}
cache:
alias: minio
enabled: ${CACHE_ENABLED} # "true", "1", "yes" for enabled
files:
- url: https://example.com/file.tar.gz
dest: ${DOWNLOAD_DIR}/file.tar.gz
sha256: abc123...Cache enabled values:
The cache.enabled field accepts the following truthy values (case-insensitive):
"true","TRUE""yes","YES""1"
All other values (including empty string) are treated as false.
You can also omit access_key and secret_key fields to use standard AWS environment variables:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEY
HTTP/HTTPS URLs:
url: https://example.com/path/to/file.tar.gzS3 URLs:
url: s3://alias/path/to/file.tar.gzWhere alias references a storage endpoint defined in the aliases section.
- Check Existing File - Verify if destination file exists with correct SHA256 hash (skip if valid)
- Try Cache - Attempt to retrieve from cache by content hash (if cache enabled)
- Download from Source - Download with retry logic and exponential backoff
- Verify Checksum - Validate SHA256 hash against expected value
- Update Cache - Upload to cache on successful download (if cache enabled)
Downloads are saved with a .partial suffix during transfer:
- Existing partial files are automatically resumed using HTTP Range requests
- Only renamed to final destination after successful checksum verification
- Failed downloads leave partial file intact for next retry attempt
The S3-based cache uses SHA256 hash as the key for content-addressable storage:
- Prevents redundant downloads across different configurations
- Deduplicates files with identical content
- Transparently handles cache misses by falling back to source
settings:
parallel: 2
files:
- url: https://releases.example.com/tool-v1.2.3.tar.gz
dest: ./downloads/tool.tar.gz
sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855aliases:
artifacts:
endpoint: https://s3.amazonaws.com
region: us-west-2
bucket: build-artifacts
cache:
endpoint: https://s3.amazonaws.com
region: us-west-2
bucket: download-cache
cache:
alias: cache
enabled: true
settings:
parallel: 4
retries: 5
retry_delay: 10s
files:
- url: s3://artifacts/releases/app-v2.0.0.tar.gz
dest: ./releases/app.tar.gz
sha256: d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2# Set environment variables
export MINIO_ACCESS_KEY=myaccesskey
export MINIO_SECRET_KEY=mysecretkey
# Config file
cat > config.yaml <<EOF
aliases:
minio:
endpoint: https://minio.company.com
bucket: artifacts
access_key: ${MINIO_ACCESS_KEY}
secret_key: ${MINIO_SECRET_KEY}
files:
- url: s3://minio/binaries/tool.bin
dest: ./tool.bin
sha256: a1b2c3d4e5f6...
EOF
# Run xget
./bin/xget config.yamlWorkflow for creating a download manifest from existing files:
# Step 1: Generate checksums from existing directory
xget generate ./my-downloads -o manifest.yaml
# Step 2: Edit generated file to add URLs and other config
cat manifest.yamlOutput:
files:
- url: ""
dest: app-v1.0.0.tar.gz
sha256: a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890
- url: ""
dest: tools/helper.bin
sha256: fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210# Step 3: Add URLs and complete the configuration
cat > config.yaml <<EOF
settings:
parallel: 3
files:
- url: https://releases.example.com/app-v1.0.0.tar.gz
dest: app-v1.0.0.tar.gz
sha256: a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890
- url: https://cdn.example.com/tools/helper.bin
dest: tools/helper.bin
sha256: fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210
EOF
# Step 4: Use the config to download (or verify existing files)
xget config.yaml# Build binary
make build
# Run linter
make lint
# Update dependencies
make go-update
make go-tidyxget/
├── src/
│ ├── main.go # Application entry point
│ ├── downloader.go # Core download orchestration
│ ├── cache.go # S3-based caching layer
│ ├── checksum.go # SHA256 verification
│ ├── progress.go # Progress bar wrapper
│ ├── config/ # Configuration management
│ │ ├── config.go # YAML loading and validation
│ │ ├── types.go # Config structures
│ │ └── env.go # Environment variable expansion
│ └── storage/ # Download source abstractions
│ ├── storage.go # Source interface
│ ├── http.go # HTTP/HTTPS implementation
│ └── s3.go # S3/MinIO implementation
├── Makefile # Build commands
├── Dockerfile # Docker build
├── config.yaml.template # Configuration example
└── .golangci.yml # Linter configuration
# Run tests with race detector
go test -race ./...
# Run tests with coverage
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.outThe project follows strict Go coding standards. See .claude/rules/go-codestyle.md for detailed guidelines:
- Use
errors.Is()anderrors.As()for error comparison - Wrap errors with context using
fmt.Errorf("...: %w", err) - Accept
context.Contextas first parameter where applicable - Use lowercase in log messages
- Prefer singular package names
- Always use
anyinstead ofinterface{}
Follow conventional commit format. See .claude/rules/commit-messages.md for guidelines:
<type>: <subject>
Types: feat, fix, refactor, perf, test, docs, build, ci, choreExamples:
feat: add retry mechanism for failed downloads
fix: handle nil pointer in download manager
refactor: simplify error handling in S3 source
The Source interface abstracts download sources:
type Source interface {
Download(ctx context.Context, offset int64) (io.ReadCloser, int64, error)
GetSize(ctx context.Context) (int64, error)
}Implementations:
- HTTPSource - Downloads via HTTP/HTTPS with Range request support
- S3Source - Downloads from S3/MinIO using AWS SDK v2
The downloader uses a worker pool pattern with semaphore channel to limit concurrency:
- Processes files in parallel up to configured limit
- Handles partial file resume with
.partialsuffix - Implements retry logic with exponential backoff
- Propagates context cancellation for graceful shutdown
S3-based content-addressable cache:
Get(hash)- Retrieves file from cache by SHA256 hashPut(hash, file)- Uploads successfully downloaded file to cache- Transparent fallback to source on cache miss
- AWS SDK for Go v2 - S3/MinIO operations
- progressbar - Terminal progress visualization
- yaml.v3 - Configuration parsing
Full dependency list in go.mod.
0- All downloads completed successfully1- One or more downloads failed or configuration error
Contributions are welcome. Please:
- Follow the code style guidelines in
.claude/rules/go-codestyle.md - Use conventional commit messages per
.claude/rules/commit-messages.md - Ensure all tests pass and linter is clean
- Add tests for new functionality
Creating a new release is automated through GitHub Actions.
Releases follow Semantic Versioning:
- Major (
v2.0.0) - Breaking changes - Minor (
v1.1.0) - New features, backward compatible - Patch (
v1.0.1) - Bug fixes, backward compatible
# Create annotated tag
git tag -a v0.2.0 -m "v0.2.0"
# Push tag to trigger release workflow
git push origin v0.2.0[Add license information here]