Skip to content

data‐flow

Andrei G edited this page Aug 12, 2025 · 1 revision

Data Flow

This diagram shows how JSON data flows through the PJS system from input to streaming output.

graph TD
    %% Input Stage
    Input[Large JSON Input<br/>10KB - 100MB] --> Validator{JSON Validator<br/>sonic-rs}
    Validator -->|Valid| Parser[SIMD JSON Parser<br/>sonic-rs]
    Validator -->|Invalid| Error1[Validation Error]

    %% Parsing Stage  
    Parser --> AST[JSON AST<br/>Memory Mapped]
    AST --> Analyzer[Semantic Analyzer<br/>Type Detection]
    
    %% Analysis Stage
    Analyzer --> SchemaDetector[Schema Detector<br/>Pattern Recognition]
    SchemaDetector --> PriorityAssigner[Priority Assigner<br/>Business Rules]
    
    %% Priority Assignment
    PriorityAssigner --> PriorityMap{Priority Mapping}
    PriorityMap -->|Critical| CriticalQueue[Critical Priority<br/>IDs, Status, Core]
    PriorityMap -->|High| HighQueue[High Priority<br/>Names, Titles]
    PriorityMap -->|Medium| MediumQueue[Medium Priority<br/>Regular Content]
    PriorityMap -->|Low| LowQueue[Low Priority<br/>Metadata, Stats]
    PriorityMap -->|Background| BackgroundQueue[Background Priority<br/>Analytics, Logs]

    %% Frame Generation
    CriticalQueue --> FrameGen1[Skeleton Generator]
    HighQueue --> FrameGen2[Data Frame Generator]
    MediumQueue --> FrameGen3[Array Chunker]
    LowQueue --> FrameGen4[Object Fragmenter]
    BackgroundQueue --> FrameGen5[Stream Finalizer]

    %% Frame Processing
    FrameGen1 --> SkeletonFrame[Skeleton Frame<br/>JSON Structure Only]
    FrameGen2 --> DataFrames[Data Frames<br/>Core Values]
    FrameGen3 --> ArrayFrames[Array Chunks<br/>Paginated Arrays]
    FrameGen4 --> ObjectFrames[Object Fragments<br/>Nested Objects]
    FrameGen5 --> FinalFrames[Final Frames<br/>Completion Signal]

    %% Compression Stage
    SkeletonFrame --> Compressor1[Schema-aware<br/>Compressor]
    DataFrames --> Compressor2[Dictionary<br/>Encoder]
    ArrayFrames --> Compressor3[Delta<br/>Encoder]
    ObjectFrames --> Compressor4[Structural<br/>Compressor]
    FinalFrames --> Compressor5[Checksum<br/>Generator]

    %% Output Preparation
    Compressor1 --> PriorityScheduler{Priority Scheduler<br/>Adaptive Ordering}
    Compressor2 --> PriorityScheduler
    Compressor3 --> PriorityScheduler
    Compressor4 --> PriorityScheduler
    Compressor5 --> PriorityScheduler

    %% Transport Layer
    PriorityScheduler -->|Highest Priority First| TransportRouter{Transport Router}
    TransportRouter -->|HTTP/2| HTTP2Stream[HTTP/2 Stream<br/>Server-Sent Events]
    TransportRouter -->|WebSocket| WSStream[WebSocket Stream<br/>Binary/Text Frames]
    TransportRouter -->|TCP| TCPStream[Raw TCP Stream<br/>Custom Protocol]

    %% Client Processing
    HTTP2Stream --> ClientHTTP[HTTP Client<br/>Progressive Loading]
    WSStream --> ClientWS[WebSocket Client<br/>Real-time Updates]
    TCPStream --> ClientTCP[TCP Client<br/>High Performance]

    %% Client Reconstruction
    ClientHTTP --> Reconstructor1[JSON Reconstructor<br/>Frame Assembly]
    ClientWS --> Reconstructor2[JSON Reconstructor<br/>Frame Assembly]
    ClientTCP --> Reconstructor3[JSON Reconstructor<br/>Frame Assembly]

    %% Client Output
    Reconstructor1 --> ProgressiveJSON1[Progressive JSON<br/>Immediate UI Updates]
    Reconstructor2 --> ProgressiveJSON2[Progressive JSON<br/>Real-time Rendering]
    Reconstructor3 --> ProgressiveJSON3[Progressive JSON<br/>High-throughput Apps]

    %% Final Assembly
    ProgressiveJSON1 --> CompleteJSON[Complete JSON<br/>Final Assembly]
    ProgressiveJSON2 --> CompleteJSON
    ProgressiveJSON3 --> CompleteJSON

    %% Error Handling
    Error1 --> ErrorHandler[Error Handler<br/>Graceful Degradation]
    Parser -->|Parse Error| ErrorHandler
    Analyzer -->|Analysis Error| ErrorHandler
    ErrorHandler --> FallbackJSON[Fallback Response<br/>Original JSON]

    %% Performance Monitoring
    PriorityScheduler -.-> Metrics[Performance Metrics<br/>Latency, Throughput]
    TransportRouter -.-> NetworkMetrics[Network Metrics<br/>Bandwidth, RTT]
    Reconstructor1 -.-> ClientMetrics[Client Metrics<br/>Render Times]
    Reconstructor2 -.-> ClientMetrics
    Reconstructor3 -.-> ClientMetrics

    %% SIMD Acceleration Points
    Parser -.-> SIMD1[SIMD Acceleration<br/>vectorized parsing]
    Compressor1 -.-> SIMD2[SIMD Compression<br/>parallel processing]
    Compressor2 -.-> SIMD2
    Compressor3 -.-> SIMD2

    %% Memory Optimization
    AST -.-> MemPool[Memory Pool<br/>Zero-copy Operations]
    FrameGen1 -.-> MemPool
    FrameGen2 -.-> MemPool
    FrameGen3 -.-> MemPool

    %% Styling
    classDef inputStage fill:#e3f2fd
    classDef processingStage fill:#f3e5f5
    classDef priorityStage fill:#e8f5e8
    classDef transportStage fill:#fff3e0
    classDef clientStage fill:#fce4ec
    classDef errorStage fill:#ffebee
    classDef optimizationStage fill:#f1f8e9

    class Input,Validator,Parser,AST,Analyzer inputStage
    class SchemaDetector,PriorityAssigner,PriorityMap processingStage
    class CriticalQueue,HighQueue,MediumQueue,LowQueue,BackgroundQueue priorityStage
    class TransportRouter,HTTP2Stream,WSStream,TCPStream transportStage
    class ClientHTTP,ClientWS,ClientTCP,Reconstructor1,Reconstructor2,Reconstructor3 clientStage
    class Error1,ErrorHandler,FallbackJSON errorStage
    class SIMD1,SIMD2,MemPool,Metrics,NetworkMetrics,ClientMetrics optimizationStage
Loading

Data Flow Stages

1. Input Processing

  • JSON Validation: Fast validation using sonic-rs SIMD capabilities
  • SIMD Parsing: Vectorized JSON parsing for 2-5x performance improvement
  • AST Generation: Memory-mapped abstract syntax tree for zero-copy operations

2. Semantic Analysis

  • Type Detection: Automatic identification of data types and patterns
  • Schema Inference: Dynamic schema detection for compression optimization
  • Priority Assignment: Business rule-based priority calculation

3. Priority Queuing

Five priority levels ensure optimal user experience:

  • Critical (100): IDs, status indicators, core metadata
  • High (80): Names, titles, primary content
  • Medium (50): Regular data fields
  • Low (25): Supplementary information, statistics
  • Background (10): Analytics, logs, debug information

4. Frame Generation

  • Skeleton Frame: JSON structure with null/empty values
  • Data Frames: Actual content ordered by priority
  • Array Chunks: Large arrays split into manageable pieces
  • Object Fragments: Nested objects decomposed by importance

5. Compression Pipeline

  • Schema-aware: Uses detected patterns for optimal compression
  • Dictionary Encoding: Repeated strings and structures
  • Delta Encoding: Numerical sequences optimization
  • Structural Compression: JSON syntax minimization

6. Transport Adaptation

Multiple transport protocols supported:

  • HTTP/2: Server-sent events for web applications
  • WebSocket: Real-time bidirectional communication
  • Raw TCP: Maximum performance for specialized clients

7. Client Reconstruction

  • Progressive Assembly: JSON built incrementally as frames arrive
  • Priority Rendering: UI updates triggered by high-priority data
  • Adaptive Buffering: Optimizes for different network conditions

Performance Characteristics

Latency Improvements

  • Time to First Byte: 5-10ms vs 50-200ms traditional
  • Time to First Meaningful Data: 50-100ms vs 500ms-2s
  • Progressive Rendering: Immediate UI updates for critical data

Memory Efficiency

  • Zero-copy Operations: Minimize allocations during processing
  • Memory Pooling: Reuse buffers for repeated operations
  • Streaming Processing: O(1) memory usage regardless of input size

CPU Optimization

  • SIMD Acceleration: Vectorized operations where possible
  • Parallel Processing: Multi-threaded frame generation
  • Efficient Algorithms: Optimized data structures and algorithms

Error Handling Strategy

  1. Graceful Degradation: Invalid JSON falls back to original format
  2. Partial Recovery: Process valid portions of corrupted data
  3. Client Resilience: Reconstruction continues despite missing frames
  4. Monitoring Integration: All errors reported to metrics system

Adaptive Behavior

  • Network Conditions: Frame size adapts to bandwidth and latency
  • Client Capabilities: Compression level adjusted for device performance
  • Load Balancing: Priority scheduling considers server load
  • Feedback Loops: Client metrics influence server optimization
Clone this wiki locally