Skip to content

Latest commit

 

History

History
195 lines (137 loc) · 7.46 KB

File metadata and controls

195 lines (137 loc) · 7.46 KB

High-Performance JSON Transformations

This guide provides a comprehensive overview of the PerformanceOptimizedTransformer, a specialized component designed to significantly boost performance in high-throughput JSON transformation scenarios.

Overview

The PerformanceOptimizedTransformer is a decorator that wraps any existing transformer implementation and adds several optimization techniques. It's particularly useful in event processing pipelines that handle thousands or millions of events per second.

Key Features

1. Result Caching

Stores previously computed transformations to avoid redundant processing for identical inputs.

  • Cache Hit: Return cached result (microseconds vs. milliseconds)
  • Cache Miss: Transform normally and store result
  • Configurable Cache Size: Limit memory usage
  • Thread-Safe: Concurrent cache access

2. Structural Fingerprinting

Identifies documents with the same structure (field names and types) but different values.

  • Structure Recognition: Creates a fingerprint based on JSON structure, not values
  • Field Names: Records all property names in the document
  • Type Information: Preserves type information without actual values
  • Sorting: Ensures consistent fingerprints regardless of field order

3. Value-Aware Caching

Ensures different input values produce different results while maintaining structural optimizations.

  • Composite Cache Key: Combines structural fingerprint with value hash
  • Value Extraction: Extracts critical values that affect transformation
  • Hash Computation: Creates unique identifier for each value combination
  • Collision Avoidance: Ensures different inputs get different results

4. Path Pre-computation

Prepares frequently accessed paths for faster processing.

  • Path Resolution: Pre-resolves common paths for direct access
  • Resolver Caching: Stores optimized access patterns
  • Query Optimization: Avoids repeated path parsing

When to Use

The PerformanceOptimizedTransformer is ideal for:

  • High-Volume Event Processing: Systems processing thousands or millions of events per second
  • Similar Document Structures: When many documents share the same structure
  • CPU-Bound Applications: When transformation CPU usage is a bottleneck
  • Latency-Sensitive Systems: When minimizing transformation time is critical

Basic Usage

// Create your normal transformer
TransformationEngine baseTransformer = new FieldRenameTransformer("user.name", "user.fullName");

// Wrap it with the performance optimizer
PerformanceOptimizedTransformer optimizedTransformer = 
    new PerformanceOptimizedTransformer(baseTransformer);

// Use it like any other transformer
JsonElement result = optimizedTransformer.transform(input);

Advanced Configuration

Constructor Options

// Create with custom settings
PerformanceOptimizedTransformer optimizedTransformer = new PerformanceOptimizedTransformer(
    baseTransformer,      // The underlying transformer to optimize
    5000,                 // Cache size (number of results to store)
    true                  // Enable structural fingerprinting
);

Path Pre-computation

// Pre-compute paths that will be accessed frequently
optimizedTransformer.precomputePaths(
    "user.profile.name", 
    "user.profile.email",
    "metadata.timestamp",
    "items[].price"
);

Cache Management

// Clear the cache when needed (e.g., after configuration changes)
optimizedTransformer.clearCache();

// Schedule periodic cache clearing for long-running applications
scheduler.scheduleAtFixedRate(() -> {
    optimizedTransformer.clearCache();
}, 1, 1, TimeUnit.HOURS);

Real-World Example

Here's a comprehensive example showing how to integrate the PerformanceOptimizedTransformer in a Spring Boot application:

@Configuration
public class TransformerConfig {
    
    @Bean
    public TransformationEngine userProfileTransformer() {
        // Create the base transformers
        List<TransformationEngine> transformers = new ArrayList<>();
        transformers.add(new FieldRenameTransformer("user.firstName", "user.givenName"));
        transformers.add(new FieldRenameTransformer("user.lastName", "user.familyName"));
        transformers.add(new DateFormatTransformer("user.createdAt", "unix", "ISO-8601"));
        transformers.add(new DefaultValueTransformer("user.status", new JsonPrimitive("active")));
        
        // Create a composite transformer
        TransformationEngine compositeTransformer = new CompositeTransformationEngine(transformers);
        
        // Wrap with the performance optimizer
        PerformanceOptimizedTransformer optimizedTransformer = 
            new PerformanceOptimizedTransformer(compositeTransformer, 10000, true);
        
        // Pre-compute frequently accessed paths
        optimizedTransformer.precomputePaths(
            "user.firstName", "user.lastName", "user.createdAt", "user.status"
        );
        
        return optimizedTransformer;
    }
    
    @Bean
    public ScheduledExecutorService cacheMaintenanceScheduler(
            @Qualifier("userProfileTransformer") TransformationEngine transformer) {
        
        ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
        
        // Schedule cache clearing every 6 hours
        scheduler.scheduleAtFixedRate(() -> {
            if (transformer instanceof PerformanceOptimizedTransformer) {
                ((PerformanceOptimizedTransformer) transformer).clearCache();
                log.info("Cleared transformation cache");
            }
        }, 6, 6, TimeUnit.HOURS);
        
        return scheduler;
    }
}

Performance Benchmarks

TODO

Implementation Details

Cache Key Generation

The transformer builds a cache key using a strong 128-bit hash over the input's structure and values in a single pass. This avoids extra traversals and minimizes collisions without a separate structural fingerprint.

Thread Safety Considerations

The transformer is fully thread-safe:

  • Uses ConcurrentHashMap for cache storage
  • Performs atomic read and write operations
  • Handles race conditions gracefully
  • Works correctly in multi-threaded environments

Best Practices

  1. Initialize During Startup: Create optimized transformers at application startup
  2. Monitor Memory Usage: Adjust cache size based on available memory
  3. Clear Periodically: For long-running applications, clear cache occasionally
  4. Pre-compute Common Paths: Identify and pre-compute frequently accessed paths
  5. Combine with CompositeTransformer: Use with composite transformers for maximum benefit
  6. Measure Actual Performance: Benchmark with your specific workload

Limitations

  • Memory Usage: Caching requires memory proportional to cache size and document complexity
  • Complex Value Dependencies: May not fully optimize transformations with complex value interdependencies
  • Very Large Documents: May have diminishing returns for extremely large documents

Conclusion

The PerformanceOptimizedTransformer provides significant performance benefits for high-throughput JSON transformation scenarios. By intelligently caching results and optimizing path access, it can reduce transformation time by orders of magnitude for repeated or similar documents.

For systems processing thousands or millions of events, this can translate to substantial resource savings and throughput improvements without changing the transformation logic itself.