Skip to content

Conversation

@raghuvanshraj
Copy link
Contributor

@raghuvanshraj raghuvanshraj commented Nov 28, 2025

Description

This change simplifies the lifecycle of a ManagedVSR. Close handling and state transitions are made more robust. Fixes a memory leak in the VSR rotation framework.
Screenshot 2025-11-25 at 4 28 41 PM
Screenshot 2025-11-25 at 10 22 42 PM

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.
Screenshot 2025-11-28 at 6 36 43 PM
|                                                         Metric |         Task |       Value |   Unit |
|---------------------------------------------------------------:|-------------:|------------:|-------:|
|                     Cumulative indexing time of primary shards |              |     236.124 |    min |
|             Min cumulative indexing time across primary shards |              |     236.124 |    min |
|          Median cumulative indexing time across primary shards |              |     236.124 |    min |
|             Max cumulative indexing time across primary shards |              |     236.124 |    min |
|            Cumulative indexing throttle time of primary shards |              |           0 |    min |
|    Min cumulative indexing throttle time across primary shards |              |           0 |    min |
| Median cumulative indexing throttle time across primary shards |              |           0 |    min |
|    Max cumulative indexing throttle time across primary shards |              |           0 |    min |
|                        Cumulative merge time of primary shards |              |           0 |    min |
|                       Cumulative merge count of primary shards |              |           0 |        |
|                Min cumulative merge time across primary shards |              |           0 |    min |
|             Median cumulative merge time across primary shards |              |           0 |    min |
|                Max cumulative merge time across primary shards |              |           0 |    min |
|               Cumulative merge throttle time of primary shards |              |           0 |    min |
|       Min cumulative merge throttle time across primary shards |              |           0 |    min |
|    Median cumulative merge throttle time across primary shards |              |           0 |    min |
|       Max cumulative merge throttle time across primary shards |              |           0 |    min |
|                      Cumulative refresh time of primary shards |              |           0 |    min |
|                     Cumulative refresh count of primary shards |              |           0 |        |
|              Min cumulative refresh time across primary shards |              |           0 |    min |
|           Median cumulative refresh time across primary shards |              |           0 |    min |
|              Max cumulative refresh time across primary shards |              |           0 |    min |
|                        Cumulative flush time of primary shards |              |      0.5131 |    min |
|                       Cumulative flush count of primary shards |              |        2080 |        |
|                Min cumulative flush time across primary shards |              |      0.5131 |    min |
|             Median cumulative flush time across primary shards |              |      0.5131 |    min |
|                Max cumulative flush time across primary shards |              |      0.5131 |    min |
|                                        Total Young Gen GC time |              |       3.769 |      s |
|                                       Total Young Gen GC count |              |         172 |        |
|                                          Total Old Gen GC time |              |           0 |      s |
|                                         Total Old Gen GC count |              |           0 |        |
|                                                     Store size |              | 1.85166e-05 |     GB |
|                                                  Translog size |              |    0.931702 |     GB |
|                                         Heap used for segments |              |           0 |     MB |
|                                       Heap used for doc values |              |           0 |     MB |
|                                            Heap used for terms |              |           0 |     MB |
|                                            Heap used for norms |              |           0 |     MB |
|                                           Heap used for points |              |           0 |     MB |
|                                    Heap used for stored fields |              |           0 |     MB |
|                                                  Segment count |              |           0 |        |
|                                                 Min Throughput | index-append |     50017.1 | docs/s |
|                                                Mean Throughput | index-append |     50775.1 | docs/s |
|                                              Median Throughput | index-append |     50475.1 | docs/s |
|                                                 Max Throughput | index-append |       52447 | docs/s |
|                                        50th percentile latency | index-append |     2251.43 |     ms |
|                                        90th percentile latency | index-append |     2733.09 |     ms |
|                                        99th percentile latency | index-append |     3205.55 |     ms |
|                                      99.9th percentile latency | index-append |      3785.1 |     ms |
|                                     99.99th percentile latency | index-append |     4078.56 |     ms |
|                                       100th percentile latency | index-append |     4407.94 |     ms |
|                                   50th percentile service time | index-append |     2251.43 |     ms |
|                                   90th percentile service time | index-append |     2733.09 |     ms |
|                                   99th percentile service time | index-append |     3205.55 |     ms |
|                                 99.9th percentile service time | index-append |      3785.1 |     ms |
|                                99.99th percentile service time | index-append |     4078.56 |     ms |
|                                  100th percentile service time | index-append |     4407.94 |     ms |
|                                                     error rate | index-append |           0 |      % |

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

❌ Gradle check result for 8c44afb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 4c13cf5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 3a2b211: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

❌ Gradle check result for f813cbc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@raghuvanshraj raghuvanshraj marked this pull request as ready for review December 1, 2025 06:09
@raghuvanshraj raghuvanshraj requested a review from a team as a code owner December 1, 2025 06:09
@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

❌ Gradle check result for 5acf690: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Comment on lines +201 to +218
// If already CLOSED, do nothing (idempotent)
if (state == VSRState.CLOSED) {
return;
}

// If ACTIVE, must freeze first
if (state == VSRState.ACTIVE) {
throw new IllegalStateException(String.format(
"Cannot close VSR %s: VSR is still ACTIVE. Must freeze VSR before closing.", id));
}

// If FROZEN, transition to CLOSED
if (state == VSRState.FROZEN) {
moveToClosed();
} else {
// This should never happen with current states, but defensive programming
throw new IllegalStateException(String.format(
"Cannot close VSR %s: unexpected state %s", id, state));
Copy link
Contributor

@Bukhtawar Bukhtawar Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need a dedicated centralised class VSRState for state machine validation

    private static final Set<VSRState> ACTIVE_VALID_TRANSITIONS = EnumSet.of(FROZEN, CLOSED);
    private static final Set<VSRState> FROZEN_VALID_TRANSITIONS = EnumSet.of(FLUSHING, CLOSED);
    private static final Set<VSRState> FLUSHING_VALID_TRANSITIONS = EnumSet.of(CLOSED);
    private static final Set<VSRState> CLOSED_VALID_TRANSITIONS = EnumSet.noneOf(VSRState.class);


    private Set<VSRState> getValidTransitions() {
        switch (this) {
            case ACTIVE:
                return ACTIVE_VALID_TRANSITIONS;
            case FROZEN:
                return FROZEN_VALID_TRANSITIONS;
            case FLUSHING:
                return FLUSHING_VALID_TRANSITIONS;
            case CLOSED:
                return CLOSED_VALID_TRANSITIONS;
            default:
                throw new IllegalStateException("Unknown state: " + this);
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see a usecase for the FLUSHING state as of now, hence I removed it. Between ACTIVE, FROZEN and CLOSED, we have only a small set of valid transitions, do you think we need the FLUSHING state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we do introduce a FLUSHING state, should we allow FROZEN to go to CLOSED? Ideally every FROZEN VSR should first go to FLUSHING state then CLOSED.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants