Skip to content

Data flow graph public preview#136

Merged
jlian merged 11 commits intomainfrom
wasm
Jul 31, 2025
Merged

Data flow graph public preview#136
jlian merged 11 commits intomainfrom
wasm

Conversation

@jlian
Copy link
Collaborator

@jlian jlian commented Jul 29, 2025

Azure IoT Operations Data Flow Graphs: WASM Development (Public Preview)

Overview

This PR introduces Azure IoT Operations Data Flow Graphs - a powerful new capability for real-time data processing using WebAssembly (WASM) modules. This public preview enables developers to build custom data processing pipelines using Rust and Python, with support for complex workflows including branching, filtering, aggregation, and machine learning inference.

🚀 Key Features

Multi-Language WASM Development

  • Rust SDK: Full-featured SDK with procedural macros, logging, metrics, and state store APIs
  • Python Support: componentize-py integration with generated bindings
  • Docker Builders: Streamlined containerized build environments for both languages
  • Local Development: Complete toolchain support for native development workflows

Rich Operator Types

  • Map: Transform data (unit conversion, format changes)
  • Filter: Conditional data filtering based on predicates
  • Branch: Route data to different paths based on conditions
  • Accumulate: Time-windowed aggregation and statistics
  • Delay: Control timing and batch processing
  • Source/Sink: Integration with external systems

Production-Ready Architecture

  • Timely Dataflow: Built on Microsoft Research's proven computational model
  • Event-Time Semantics: Process data based on when events occurred
  • Hybrid Logical Clock: Ensures causal ordering and progress guarantees
  • Fault Tolerance: Built-in support for handling failures and ensuring data consistency

📁 What's Included

Sample Applications

  • graph-simple.yaml - Basic temperature conversion pipeline
  • graph-complex.yaml - Multi-sensor processing with ML inference
  • 10+ Rust Examples: Temperature, humidity, image processing, data enrichment
  • Python Examples: Map, filter, branch operators with complete implementations

Development Tools

  • Docker Builders:
    • ghcr.io/azure-samples/explore-iot-operations/rust-wasm-builder
    • ghcr.io/azure-samples/explore-iot-operations/python-wasm-builder
  • GitHub Actions: Automated builder image publishing
  • WIT Schema: Complete WebAssembly Interface Type definitions

Sample Data & Assets

  • Temperature and humidity sensor payloads
  • Sample images for computer vision workflows
  • Pre-trained ONNX models (MobileNet, SqueezeNet)

🔧 Technical Highlights

Rust Development

#[map_operator(init = "temperature_converter_init")]
fn temperature_converter(input: DataModel) -> DataModel {
    // Transform Fahrenheit to Celsius with full type safety
}

Python Development

class Map(exports.Map):
    def process(self, message: types.DataModel) -> types.DataModel:
        # Process data with generated type bindings

Graph Configuration

operations:
  - operationType: "map"
    name: "temperature/map"
    module: "temperature:1.0.0"

🛠️ Build & Deploy

Docker Builds (Recommended):

# Rust
docker run --rm -v "$(pwd):/workspace" ghcr.io/azure-samples/explore-iot-operations/rust-wasm-builder --app-name my-module

# Python  
docker run --rm -v "$(pwd):/workspace" ghcr.io/azure-samples/explore-iot-operations/python-wasm-builder --app-name my_module --app-type map

Local Development: Full support for native Rust/Python toolchains with registry configuration

📊 Impact

  • 81 files changed, 10,835 insertions: Comprehensive feature implementation
  • Production-ready: Based on proven academic research and production systems
  • Developer-friendly: Multiple language support with excellent tooling
  • Scalable: Distributed processing with automatic coordination

🔗 Integration

This feature integrates with:

  • Azure IoT Operations: Native deployment and management
  • Azure Container Registry: WASM module and graph storage
  • Kubernetes: Arc-enabled cluster deployment
  • ORAS: OCI artifact distribution

📚 Documentation

  • Comprehensive README with quick start guide
  • Complete Rust and Python development guides
  • Docker builder documentation
  • Sample workflows and deployment instructions

✅ Testing

  • All Docker builders tested and validated
  • GitHub Actions workflows for automated publishing
  • Sample modules verified with both build methods
  • End-to-end validation completed

jlian added 11 commits July 29, 2025 19:11
- Remove 'is_default_branch' restriction from Python builder
- Make repository references consistent between workflows
- Add workflow_dispatch trigger to Python workflow
- Update README to use default 'latest' tag instead of 'wasm' tag

This ensures both rust-wasm-builder and python-wasm-builder get the
'latest' tag when building from main or wasm branches.
- Fixed Python Docker builder to use flexible filename matching local method
- Changed from hardcoded ${APP_TYPE}.py to ${APP_NAME}.py pattern
- Both Docker and local builds now expect same naming convention
- Reorganized README to present local builds first, then Docker builds
- Added clear guidance on when to use each build approach
- Removed 'Recommended' labels for neutral presentation
- Made both Rust and Python sections consistent in structure
- Updated directory structure to match actual repository layout
- Fixed all GitHub links to use relative paths instead of absolute URLs
- Corrected Rust examples structure (temperature, humidity, format, etc.)
- Fixed Python examples structure (map, filter, branch)
- Updated Python Docker builder for flexible filename support
- README now serves as accurate quick reference companion to MS Learn docs
@jlian jlian requested a review from Copilot July 30, 2025 20:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces Azure IoT Operations Data Flow Graphs, a comprehensive WebAssembly (WASM) development framework for real-time data processing. The feature enables developers to build custom data processing pipelines using Rust and Python with support for complex workflows including branching, filtering, aggregation, and machine learning inference. Built on Microsoft Research's Timely Dataflow computational model, it provides production-ready architecture with event-time semantics and fault tolerance.

Key Changes

  • Multi-language WASM development framework with full Rust SDK featuring procedural macros, logging, metrics, and state store APIs
  • Rich operator ecosystem supporting Map, Filter, Branch, Accumulate, and Delay operations for comprehensive data processing workflows
  • Streamlined development toolchain including Docker builders, GitHub Actions workflows, and local development support with native toolchain integration

Reviewed Changes

Copilot reviewed 54 out of 81 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
samples/wasm/rust/examples//.rs Rust operator implementations demonstrating temperature conversion, humidity processing, image analysis, and data enrichment patterns
samples/wasm/rust/examples/*/Cargo.toml Dependency configurations for WASM modules with tinykube_wasm_sdk integration
samples/wasm/rust/Dockerfile Docker builder environment for streamlined Rust-to-WASM compilation with cargo registry configuration
samples/wasm/python/schema/*.wit WebAssembly Interface Type definitions for operators, state management, logging, and metrics
samples/wasm/rust/README.md Comprehensive development guide covering Rust-specific patterns, build processes, and best practices

@jlian jlian merged commit 51c5aa1 into main Jul 31, 2025
3 checks passed
@jlian jlian deleted the wasm branch July 31, 2025 19:34
@jlian jlian restored the wasm branch July 31, 2025 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant