Skip to content

Conversation

xinlian12
Copy link
Member

@xinlian12 xinlian12 commented Aug 8, 2025

Issue

The current change feed endLSN calculation is not optimized for cases where LSN can increase without producing any change feed changes, for example updates changed purely due to binary encoding.

In this PR, we are trying to improve the endLSN calculation by utilizing per task metrics.

High level Design

Use three custom metrics and two coordinator components working together to track and analyze change feed performance which is used to tune the batch size during partition planning stage.

Core Metrics

  • ChangeFeedItemsCntMetric - A CustomSumMetric that tracks the total number of items fetched within a change feed micro-batch for each partition. Important to note that not all fetched changes are necessarily returned to Spark.

  • ChangeFeedLsnRangeMetric - A CustomSumMetric tracking the LSN (Log Sequence Number) range for partitions within each micro-batch. LSN represents the sequence of changes in the partition.

  • ChangeFeedPartitionIndexMetric - A CustomMetric that captures and maintains consistent partition indices, essential for correlating metrics across the system.

Coordinator Components

ChangeFeedMetricsTracker

Manages metrics calculation using an exponential weighting approach

  • Maintains a history of the last 5 measurements by default
  • Uses a decay factor (default 0.85) to give more weight to recent values
  • Calculates changes per LSN as changesFetchedCnt / lsnGap
  • Implements thread-safe tracking with synchronized updates

ChangeFeedMetricsListener

A Spark listener that integrates with Spark's metrics system

  • Takes two key parameters:
    partitionIndexMap: BiMap for mapping between partition ranges and numeric indices
    partitionMetricsMap: ConcurrentHashMap storing metrics trackers per partition
  • Processes metrics on task completion through onTaskEnd handler
  • Handles metric collection gracefully with error suppression for optimization purposes
image

Configs

By default, the dynamic batch tuning will be enabled, but customer can choose to opt out by setting spark.cosmos.changeFeed.performance.monitoring.enabled to be false

Test

  • Ingested documents to GreenTaxiRecords by using 01_Batch with a mix of bulk and point writes.
  • Using change feed to read items by using 02_StructuredStream
  • Cluster config:
image
  • ChangeFeed spark job configs:
image

Dev branch

image image image image image image image

Master branch

image image image image image image image

Notes:

  • In case there is a split happens, child partition will not inherit/use parent partition's metrics
  • Since the metrics is collected in the listener and in spark, there is no guarantee that the logic in listener will be executed before the next micro batch, so the tuning can happen with delay

@xinlian12 xinlian12 changed the title SparkChangeFeedEndLSNImprovement SparkChangeFeedEndLSNImprovement - [NO REVIEW] Aug 8, 2025
@github-actions github-actions bot added the Cosmos label Aug 8, 2025
@xinlian12 xinlian12 force-pushed the sparkEndLSNImprovement branch from 1623250 to d790eb1 Compare August 8, 2025 17:09
@xinlian12 xinlian12 marked this pull request as ready for review August 26, 2025 00:06
@Copilot Copilot AI review requested due to automatic review settings August 26, 2025 00:06
@xinlian12 xinlian12 requested review from kirankumarkolli and a team as code owners August 26, 2025 00:06
@xinlian12 xinlian12 changed the title SparkChangeFeedEndLSNImprovement - [NO REVIEW] SparkChangeFeedEndLSNImprovement Aug 26, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an improvement for end LSN calculation in change feed processing by introducing a metrics-based approach to optimize micro-batch sizing. The changes implement a system that tracks per-partition metrics to better estimate how many changes per LSN to expect, allowing for more accurate end LSN calculations when LSN can increase without producing actual change feed changes.

Key changes:

  • Added ChangeFeedMetricsTracker to calculate weighted average changes per LSN using exponential decay
  • Introduced Spark metrics listener infrastructure to capture and track change feed metrics across micro-batches
  • Enhanced CosmosPartitionPlanner to utilize metrics data when available for improved end LSN calculations

Reviewed Changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ChangeFeedMetricsTracker.scala New metrics tracker with exponential weighting for changes per LSN calculation
ChangeFeedMetricsListener.scala Spark listener to capture and process change feed metrics
CosmosPartitionPlanner.scala Enhanced end LSN calculation to use metrics data when available
ChangeFeedPartitionReader.scala Added custom metrics collection for LSN gaps and fetched changes
CosmosInputPartition.scala Added optional partition index field for metrics tracking
Multiple test files Updated method signatures and added comprehensive test coverage

@xinlian12 xinlian12 force-pushed the sparkEndLSNImprovement branch from 183bcb6 to afd96dd Compare August 26, 2025 06:21
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks @xinlian12 - this looks really great - the design explanation in the PR description helped a lot reviwing it and the implementation looks very clean! Only small ask woudl be to move the configurability to Spark config. Thanks again!

@xinlian12 xinlian12 force-pushed the sparkEndLSNImprovement branch from ed77d5d to 592661a Compare August 27, 2025 04:27
@xinlian12
Copy link
Member Author

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks

@FabianMeiswinkel FabianMeiswinkel merged commit 4eabe02 into Azure:main Aug 27, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants