Skip to content

Multi-protocol remote data access library for .NET with HTTP/HTTPS, SFTP, and local filesystem support. Features smart caching, progress reporting, and integrity validation.

License

Notifications You must be signed in to change notification settings

aardvark-platform/aardvark.data.remote

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aardvark.Data.Remote

A unified API for resolving data from local files, HTTP URLs, and SFTP servers. Designed for idiomatic C# and F# use.

âś… Multi-protocol support: Local files, HTTP/HTTPS, SFTP
âś… Automatic caching: Remote files cached locally
âś… No initialization required: Works with default configuration
âś… Type safe: Strongly typed results with clear error handling
âś… Thread safe: Immutable by design, no shared state

Is This For You?

YES if you need to:

  • Load datasets from mixed sources (local + remote)
  • Handle scientific data with SFTP/HTTP access
  • Cache downloaded files automatically
  • Write robust code that handles network failures gracefully

NO if you only need basic file I/O or HTTP client functionality.

Quick Start

using Aardvark.Data.Remote;

// Works with any source
var result1 = Fetch.Resolve("/local/dataset");
var result2 = Fetch.Resolve("http://example.com/data.zip");
var result3 = Fetch.Resolve("sftp://server.com/data.zip");

// All return strongly typed results
if (result1.IsResolved)
    Console.WriteLine($"Data ready at: {result1.Path}");
else if (result1.IsInvalidPath)
    Console.WriteLine($"Error: {result1.Reason}");
open Aardvark.Data.Remote

// Same API in F#
let result = Fetch.resolve "http://example.com/data.zip"
match result with
| Resolved path -> printfn "Data ready at: %s" path
| InvalidPath reason -> printfn "Error: %s" reason

The default configuration handles most use cases automatically.

The Problem This Solves

You're working with datasets that might be:

  • On your local drive: /data/experiment1/
  • On a web server: http://lab.university.edu/datasets/mars_rover_2024.zip
  • On an SFTP server: sftp://secure-server.gov/classified/mission_data.zip

Without this library: You write separate code for each case, handle extraction, manage caching, deal with partial downloads, retry failures...

With this library: One function call handles everything. All handled through a single interface.

Core Concepts

Everything is a string: Pass any path/URL to Fetch.resolve

Three outcomes:

  • Resolved path → Success, data is at this local path
  • InvalidPath reason → Bad input, fix your string
  • DownloadError (url, exception) → Network/server problem

Automatic behavior:

  • ZIP files get extracted automatically
  • Downloads are cached (no re-downloading)
  • SFTP requires config, everything else works immediately

Common Use Cases

Scientific Dataset Processing

var datasets = new[]
{
    "/local/calibration_data",
    "http://nasa.gov/mars/rover_images_2024.zip",
    "sftp://secure-server.edu/climate_model_v3.zip"
};

// Process all datasets identically
foreach (var source in datasets)
{
    var result = Fetch.Resolve(source);
    
    if (result.IsResolved)
    {
        Console.WriteLine($"Processing {result.Path}...");
        ProcessDataset(result.Path);
    }
    else if (result.IsInvalidPath)
        Console.WriteLine($"Skipping invalid source: {result.Reason}");
    else if (result.IsDownloadError)
        Console.WriteLine($"Failed to download {result.Uri}: {result.Exception.Message}");
}

Batch Processing with Progress

let config = { 
    Fetch.defaultConfig with 
        progress = Some (printfn "Download progress: %.1f%%")
}

async {
    let! results = Fetch.resolveManyWith config datasets
    
    let successful = results |> List.choose (function
        | Resolved path -> Some path
        | _ -> None
    )
    
    printfn "Successfully resolved %d/%d datasets" 
        successful.Length datasets.Length
} |> Async.RunSynchronously

SFTP with Credentials

let sftpConfig = {
    Host = "secure-data-server.gov"
    Port = 22
    User = "researcher"
    Pass = "secret123"
}

let config = { 
    Fetch.defaultConfig with 
        sftpConfig = Some sftpConfig 
}

let result = Fetch.resolveWith config "sftp://secure-data-server.gov/classified/data.zip"

API Reference

C# Methods

using Aardvark.Data.Remote;

// Simple usage
var result = Fetch.Resolve("http://example.com/data.zip");

// With configuration
var config = new FetchConfiguration 
{ 
    BaseDirectory = "/cache",
    MaxRetries = 3 
};
var result = Fetch.ResolveWith("http://example.com/data.zip", config);

// Async
var result = await Fetch.ResolveAsync("http://example.com/data.zip");

// Batch processing
var urls = new List<string> { "url1", "url2", "url3" };
var results = await Fetch.ResolveMany(urls);

F# Functions

Function Description
resolve : string -> ResolveResult Resolve with default settings
resolveWith : FetchConfig -> string -> ResolveResult Resolve with custom config
resolveAsync : string -> Async<ResolveResult> Async resolve with defaults
resolveAsyncWith : FetchConfig -> string -> Async<ResolveResult> Async resolve with custom config
resolveMany : string list -> Async<ResolveResult list> Batch resolve (parallel)
resolveManyWith : FetchConfig -> string list -> Async<ResolveResult list> Batch resolve with config

Result Types

type ResolveResult =
    | Resolved of localPath: string
    | InvalidPath of reason: string
    | DownloadError of uri: System.Uri * exn: System.Exception
    | SftpConfigMissing of uri: System.Uri

Configuration Options

F# Configuration

let config = {
    baseDirectory = "/data/cache"           // Where to store downloads
    sftpConfig = Some sftpConfig           // SFTP credentials
    maxRetries = 3                         // Retry failed downloads  
    timeout = TimeSpan.FromMinutes(10.0)   // Per-operation timeout
    progress = Some (printfn "%.1f%%")     // Progress callback
    forceDownload = false                  // Force re-download cached files
    logger = Some logCallback              // Logging function
}

C# Configuration

var config = new FetchConfiguration
{
    BaseDirectory = "/data/cache",
    SftpConfig = sftpConfig,
    MaxRetries = 3,
    Timeout = TimeSpan.FromMinutes(10),
    Progress = percent => Console.WriteLine($"{percent:F1}%"),
    ForceDownload = false,
    Logger = msg => Console.WriteLine(msg)
};

SFTP Configuration

// Direct credentials
let sftpConfig = {
    Host = "sftp.example.com"
    Port = 22
    User = "username"
    Pass = "password"
}

// Or use FileZilla config file
let config = { 
    Fetch.defaultConfig with 
        sftpConfigFile = Some "/path/to/filezilla.xml"
}

Supported Data Sources

Type Example Notes
Local Directory /path/to/data Direct access
Local Zip /path/to/data.zip Auto-extracted
HTTP/HTTPS http://example.com/data.zip Downloaded and cached
SFTP sftp://server.com/data.zip Requires SFTP config

Error Handling Best Practices

// Comprehensive error handling
let handleDataSource source =
    match Fetch.resolve source with
    | Resolved path -> 
        Ok path
        
    | InvalidPath reason ->
        // Bad input - log and skip
        log.Warning("Invalid data source {source}: {reason}", source, reason)
        Error $"Invalid: {reason}"
        
    | DownloadError (uri, ex) ->
        // Transient error - could retry
        log.Error("Download failed for {uri}: {error}", uri, ex.Message)  
        Error $"Download failed: {ex.Message}"
        
    | SftpConfigMissing uri ->
        // Configuration error - needs admin attention
        log.Error("SFTP config missing for {uri}", uri)
        Error "SFTP credentials required"

Architecture

This library follows functional programming principles:

  • Immutable Everything: All configuration is immutable records/classes
  • Pure Functions: No hidden state, no side effects in core logic
  • No Initialization: No registration, no setup, no global state
  • Thread Safe: Immutable design means inherent thread safety
  • Provider Pattern: Extensible via pure functions, not interfaces

Requirements

  • .NET 8.0
  • F# 8.0+ (for F# projects)
  • Dependencies: SSH.NET (SFTP), System.Text.Json

License

MIT License - Part of the Aardvark platform ecosystem.

About

Multi-protocol remote data access library for .NET with HTTP/HTTPS, SFTP, and local filesystem support. Features smart caching, progress reporting, and integrity validation.

Resources

License

Stars

Watchers

Forks

Packages

No packages published