Skip to content

Conversation

@hui-cha
Copy link
Collaborator

@hui-cha hui-cha commented Dec 18, 2025

这个 PR 主要是在自适应攒批的能力中,针对大推送单独计算一个攒批时长;

主要的改动点

  • Data 上 PublisherGroup 的 pubMap 增加了一个计数功能;原因是因为 Publisher 在 unpub 的时候并不会直接从 pubMap 中移除,只是做一个标记,这样在获取 Publisher 数量的时候会遇到困难,因此这里增加了一个计数的功能;
  • Session 上 ChangeProcessor 的 Worker 抽了一个接口出来,增加了两个实现,DefaultChangeWorker 和 LargeChangeAdaptiveDelayWorker,DefaultChangeWorker 是修改前的逻辑,LargeChangeAdaptiveDelayWorker 增加了针对大推送单独计算攒批时长的能力,对于 ChangeTask 的 Merge 逻辑没有修改,保持了一致
  • LargeChangeAdaptiveDelayWorker 是使用了一个跳表 + Map 的数据结构,以解决获取超时任务以及任务合并的需求

LargeChangeAdaptiveDelayWorker 和 DefaultChangeWorker 本地有跑过 benchmark,性能基本是一致的

入队列
Benchmark (largeDataInfoSize) (totalDataInfoIdSize) (useLargeAdapterDelayChangeWorker) Mode Cnt Score Error Units
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 1000 false avgt 5 191.331 ± 12.758 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 1000 true avgt 5 195.061 ± 7.491 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 10000 false avgt 5 265.263 ± 72.057 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 10000 true avgt 5 249.443 ± 22.616 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 100000 false avgt 5 704.086 ± 66.189 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 100000 true avgt 5 687.472 ± 176.612 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 200000 false avgt 5 871.310 ± 290.528 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 200000 true avgt 5 752.635 ± 73.866 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 250000 false avgt 5 784.099 ± 54.253 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 250000 true avgt 5 791.765 ± 42.835 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 500000 false avgt 5 852.751 ± 11.220 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.01 500000 true avgt 5 915.662 ± 64.407 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 1000 false avgt 5 189.765 ± 4.145 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 1000 true avgt 5 195.470 ± 3.933 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 10000 false avgt 5 265.327 ± 101.800 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 10000 true avgt 5 259.384 ± 26.957 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 100000 false avgt 5 649.136 ± 46.702 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 100000 true avgt 5 633.807 ± 30.597 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 200000 false avgt 5 766.823 ± 66.441 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 200000 true avgt 5 743.806 ± 46.972 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 250000 false avgt 5 804.194 ± 46.566 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 250000 true avgt 5 884.713 ± 281.478 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 500000 false avgt 5 879.113 ± 61.840 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.05 500000 true avgt 5 968.567 ± 177.444 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 1000 false avgt 5 187.225 ± 3.528 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 1000 true avgt 5 195.986 ± 4.750 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 10000 false avgt 5 268.896 ± 87.588 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 10000 true avgt 5 272.835 ± 63.835 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 100000 false avgt 5 664.881 ± 41.483 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 100000 true avgt 5 655.475 ± 58.791 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 200000 false avgt 5 794.427 ± 179.723 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 200000 true avgt 5 752.679 ± 68.519 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 250000 false avgt 5 783.607 ± 78.078 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 250000 true avgt 5 812.330 ± 57.286 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 500000 false avgt 5 857.210 ± 41.503 ns/op
ChangeWorkerCommitBenchmark.benchmarkCommitChange 0.1 500000 true avgt 5 914.708 ± 39.681 ns/op

出队列
Benchmark (largeDataInfoSize) (totalDataInfoIdSize) (useLargeAdapterDelayChangeWorker) Mode Cnt Score Error Units
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 1000 false avgt 5 161.843 ± 17.578 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 1000 true avgt 5 68.299 ± 45.535 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 10000 false avgt 5 113.356 ± 16.498 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 10000 true avgt 5 62.625 ± 24.329 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 100000 false avgt 5 75.414 ± 2.889 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 100000 true avgt 5 70.686 ± 42.457 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 200000 false avgt 5 67.549 ± 19.593 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 200000 true avgt 5 93.186 ± 103.557 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 250000 false avgt 5 64.635 ± 22.405 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 250000 true avgt 5 75.272 ± 10.789 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 500000 false avgt 5 74.460 ± 10.797 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.01 500000 true avgt 5 93.739 ± 132.017 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 1000 false avgt 5 184.003 ± 42.605 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 1000 true avgt 5 61.981 ± 14.944 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 10000 false avgt 5 167.285 ± 13.378 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 10000 true avgt 5 64.807 ± 37.087 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 100000 false avgt 5 75.101 ± 1.882 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 100000 true avgt 5 72.169 ± 16.240 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 200000 false avgt 5 62.082 ± 15.950 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 200000 true avgt 5 61.289 ± 7.426 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 250000 false avgt 5 67.578 ± 26.685 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 250000 true avgt 5 67.245 ± 5.499 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 500000 false avgt 5 65.919 ± 13.829 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.05 500000 true avgt 5 61.851 ± 27.870 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 1000 false avgt 5 155.596 ± 31.248 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 1000 true avgt 5 61.079 ± 12.209 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 10000 false avgt 5 114.082 ± 16.761 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 10000 true avgt 5 62.499 ± 24.849 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 100000 false avgt 5 70.286 ± 6.839 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 100000 true avgt 5 74.672 ± 32.325 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 200000 false avgt 5 71.464 ± 23.325 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 200000 true avgt 5 61.834 ± 36.383 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 250000 false avgt 5 67.191 ± 17.826 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 250000 true avgt 5 61.909 ± 8.740 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 500000 false avgt 5 68.769 ± 9.965 ns/op
ChangeWorkerGetExpireBenchmark.benchmarkGetExpireTask 0.1 500000 true avgt 5 66.178 ± 45.447 ns/op

Summary by CodeRabbit

Release Notes

  • Version Bump

    • Updated to version 6.6.2
  • New Features

    • Added adaptive delay mechanism for handling large-scale publisher changes to improve system efficiency
    • Enhanced publisher count tracking in data change notifications
    • Improved memory configuration for optimized performance (increased heap and permgen settings)
  • Dependencies

    • Updated Hessian library to version 3.5.6
    • Added test dependencies for performance benchmarking
  • Chores

    • Updated project dependencies across modules
    • Enhanced test cleanup and resource management patterns

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 18, 2025

📝 Walkthrough

Walkthrough

Project version bumped from 6.6.1-auto-regulator to 6.6.2. Hessian dependency upgraded and Surefire memory settings increased. Publisher count tracking added to change requests and storage layers. Change processing workflow refactored with new abstractions (ChangeKey, ChangeHandler, ChangeTask, ChangeWorker) and dual-worker strategy supporting adaptive delays.

Changes

Cohort / File(s) Summary
Project Version & Dependencies
pom.xml, client/pom.xml, client/all/pom.xml, client/api/pom.xml, client/impl/pom.xml, client/log/pom.xml, core/pom.xml, server/pom.xml, server/common/pom.xml, server/common/util/pom.xml, server/distribution/pom.xml, server/distribution/all/pom.xml, server/remoting/pom.xml, server/remoting/api/pom.xml, server/remoting/bolt/pom.xml, server/remoting/http/pom.xml, server/server/pom.xml, server/server/data/pom.xml, server/server/integration/pom.xml, server/server/meta/pom.xml, server/server/session/pom.xml, server/server/shared/pom.xml, server/store/pom.xml, server/store/api/pom.xml, server/store/jdbc/pom.xml, server/store/jraft/pom.xml, test/pom.xml
Version 6.6.1-auto-regulator → 6.6.2 across all modules. Root pom.xml: Hessian 3.3.12 → 3.5.6, Surefire argLine memory settings increased (4096m → 14g heap, 512m MaxPermSize). Added mockito-core (server/common/model, server/server/session, server/server/shared) and JMH benchmarking dependencies.
Publisher Count Tracking - Model
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/sessionserver/DataChangeRequest.java
Added publisherCounts: Map<String, Integer> field, constructor parameter, and getter with null-safe empty-map default.
Publisher Count Tracking - Storage Layer
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/BaseDatumStorage.java, DatumStorage.java, DatumStorageDelegate.java, LocalDatumStorage.java, MultiClusterDatumStorage.java
Added getPubCount(String dataCenter, String dataInfoId) method overload to retrieve per-dataInfoId publisher counts, delegating through storage hierarchy.
Publisher Storage Implementation
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java, PublisherGroup.java, PublisherGroups.java
Introduced NotThreadSafePublisherMap class with concurrent map and realPubNum counter for tracking real publishers. Updated PublisherGroup to use this instead of raw ConcurrentHashMap and expose getRealPubNum(). Added getRealPubNum(String dataInfoId) to PublisherGroups.
Change Request Data Flow
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/change/DataChangeEventCenter.java
Updated ChangeNotifier to accept and propagate publisherCounts: Map<String, Integer> through constructor and DataChangeRequest construction.
Change Processing Abstractions
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeKey.java, ChangeHandler.java, ChangeTask.java, ChangeWorker.java
Introduced new public interfaces/classes: ChangeKey (dataInfoId + sorted datacenters), ChangeHandler (onChange contract), ChangeTask<Key> (generic task interface with key/deadline/compareTo), ChangeWorker<Key, Task> (task manager interface).
Change Task Implementation
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskImpl.java, ChangeTaskQueue.java
ChangeTaskImpl implements ChangeTask<ChangeKey> wrapping handler/context/expiry. ChangeTaskQueue<Key, Task> provides thread-safe priority queue using ConcurrentSkipListSet ordered by deadline with map-based O(1) lookups.
Change Worker Base & Strategy
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/AbstractChangeWorker.java, DefaultChangeWorker.java, LargeChangeAdaptiveDelayWorker.java
AbstractChangeWorker provides synchronized skeleton with debounce config and task lifecycle hooks. DefaultChangeWorker uses insertion-order LinkedHashMap. LargeChangeAdaptiveDelayWorker computes adaptive delay based on publisher count thresholds and manages deadline via ChangeTaskQueue.
Adaptive Delay Configuration
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayConfig.java, PushEfficiencyImproveConfig.java
LargeChangeAdaptiveDelayConfig holds publisher thresholds and delay parameters (base, per-unit, max). Extended PushEfficiencyImproveConfig with largeChangeTaskWaitingMillis and largeChangeAdaptiveDelayConfig accessors.
Change Processor Integration
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeProcessor.java, PushEfficiencyConfigUpdater.java
Refactored ChangeProcessor.Worker to support dual-worker orchestration (DefaultChangeWorker/LargeChangeAdaptiveDelayWorker). Added setLargeChangeAdaptiveDelayConfig, isUseLargeAdapterDelayChangeWorker(), and delegation logic for commitChange/getExpires. Updated PushEfficiencyConfigUpdater to propagate large-delay config.
Push Service & Request Handler
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/FirePushService.java, DataChangeRequestHandler.java
FirePushService renamed inner ChangeHandler to ChangeHandlerImpl. DataChangeRequestHandler extracts publisherCounts from DataChangeRequest and passes per-item count to TriggerPushContext constructor.
Session Push Context
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/TriggerPushContext.java
Added optional publisherCount: Integer field, constructor overload, and accessors hasPublisherCount()/getPublisherCount().
REST Resource
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/resource/PushEfficiencyConfigResource.java
Added GET endpoint /isUseLargeAdapterDelayChangeWorker to expose dual-worker mode status per datacenter.
Meta Server Import Cleanup
server/server/meta/src/main/java/com/alipay/sofa/registry/server/meta/bootstrap/MetaServerBootstrap.java, DataCenterResource.java
Import reorganization (no behavioral change). Removed trailing blank line.
Test Helper Lifecycle
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/providedata/FetchClientOffAddressService.java, FetchDataInfoIDBlackListService.java, server/server/session/src/main/java/com/alipay/sofa/registry/server/session/store/AbstractDataManager.java, DataIndexer.java
Added shutdownWatchDog() method (annotated @VisibleForTesting) to enable test cleanup of internal watchdog resources.
Base Server Shared Utilities
server/server/shared/src/main/java/com/alipay/sofa/registry/server/shared/providedata/AbstractFetchPersistenceSystemProperty.java, AbstractFetchSystemPropertyService.java
Added test-only shutdownWatchDog() method for resource cleanup in unit tests.
Unit Test GC Monitoring
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/utils/UnitTestGCMonitor.java
Introduced singleton UnitTestGCMonitor for monitoring full GC events during tests; generates heap dumps when GC count thresholds exceeded; thread-safe with reentrant lock.
Change Processing Tests
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeProcessorTest.java, ChangeTaskQueueTest.java, ChangeTaskTest.java, ChangeWorkerCommitBenchmark.java, ChangeWorkerGetExpireBenchmark.java, LargeChangeAdaptiveDelayWorkerTest.java, MockChangeHandler.java
Refactored ChangeProcessorTest to use extracted public classes (ChangeKey, ChangeHandler, ChangeTaskImpl). Added new tests for ChangeTaskQueue (push/find/pop operations), ChangeTask/ChangeKey equality/ordering. Added JMH benchmarks for worker commit and expire-task retrieval. Added mock handler. LargeChangeAdaptiveDelayWorkerTest present with commented-out test scaffolding.
Test Lifecycle Refactoring
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/connections/ConnectionsServiceTest.java, converter/pb/BoltEncodeTest.java, providedata/FetchClientOffServiceTest.java, providedata/FetchDataInfoIDBlackListServiceTest.java, store/DataCacheTest.java, store/SessionDataStoreTest.java, store/SessionInterestsTest.java, remoting/handler/DataChangeRequestHandlerTest.java
Updated tests to use try-finally blocks ensuring shutdownWatchDog() cleanup; refactored resource initialization into guarded blocks. Fixed import statements.
Data Model Tests
server/common/model/src/test/java/com/alipay/sofa/registry/server/data/multi/cluster/client/handler/RemoteDataChangeNotifyHandlerTest.java, server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java, server/server/data/src/test/java/com/alipay/sofa/registry/server/data/multi/cluster/client/handler/RemoteDataChangeNotifyHandlerTest.java
Updated DataChangeRequest constructor calls to include new publisherCounts map parameter. Commented out test methods in PublisherGroupTest.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant DataChangeEventCenter as DataChangeEventCenter
    participant ChangeProcessor
    participant DefaultWorker as DefaultChangeWorker
    participant LargeWorker as LargeChangeAdaptiveDelayWorker
    participant ChangeTaskQueue
    participant FirePushService

    Note over DataChangeEventCenter,FirePushService: Change Processing Flow (v6.6.2)

    Client->>DataChangeEventCenter: handleChanges(dataInfoIds, pubCounts)
    activate DataChangeEventCenter
    DataChangeEventCenter->>DataChangeEventCenter: buildPubCountsMap()
    DataChangeEventCenter->>ChangeProcessor: fireChange(ChangeNotifier)
    activate ChangeProcessor
    
    ChangeProcessor->>ChangeProcessor: selectWorker(useLargeAdapter)
    
    alt Large Change (high publisher count)
        ChangeProcessor->>LargeWorker: commitChange(ChangeKey, handler, ctx)
        activate LargeWorker
        LargeWorker->>LargeWorker: computeDelay(pubCount)
        Note over LargeWorker: delay = base + (pubCount × perUnit)<br/>capped at maxDelay
        LargeWorker->>ChangeTaskQueue: pushTask(ChangeTaskImpl)
        activate ChangeTaskQueue
        ChangeTaskQueue->>ChangeTaskQueue: updateDeadline(now + delay)
        ChangeTaskQueue-->>LargeWorker: ✓ queued
        deactivate ChangeTaskQueue
        LargeWorker-->>ChangeProcessor: true
        deactivate LargeWorker
    else Normal Change
        ChangeProcessor->>DefaultWorker: commitChange(ChangeKey, handler, ctx)
        activate DefaultWorker
        DefaultWorker->>DefaultWorker: applyDebouncing()
        DefaultWorker-->>ChangeProcessor: true
        deactivate DefaultWorker
    end
    
    ChangeProcessor->>ChangeProcessor: getExpireTasks()
    ChangeProcessor->>FirePushService: fireOnChange(tasks)
    activate FirePushService
    loop for each task
        FirePushService->>FirePushService: safeProcessTask(task)
        FirePushService->>FirePushService: task.doChange()
    end
    FirePushService-->>ChangeProcessor: ✓ completed
    deactivate FirePushService
    
    ChangeProcessor-->>DataChangeEventCenter: ✓ processed
    deactivate ChangeProcessor
    DataChangeEventCenter-->>Client: ✓ handled
    deactivate DataChangeEventCenter

    rect rgb(200, 240, 200)
        Note over DefaultWorker,LargeWorker: New dual-strategy architecture<br/>selects worker based on publisher count
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Feature/do push optimization #372: Directly related—both modify the push-efficiency and change-processing subsystem, adding PushEfficiencyConfigUpdater/Resource, ChangeDebouncingTime/ChangeProcessor refactoring, and PushProcessor integration.

Suggested reviewers

  • NickNYU

Poem

🐰 Hops forth with version six-point-two so new,
Publishers counted, their queues sorted true!
From DefaultWorker to LargeChangeWise,
Adaptive delays bloom—oh what a prize!
Task streams flow smooth through ChangeTaskQueue,
Registry pushes like carrots we chew! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.08% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title '增加针对大推送的自适应攒批的能力' (Add adaptive batching capability for large-scale push) accurately describes the main objective: introducing adaptive batching for large-scale push scenarios by implementing LargeChangeAdaptiveDelayWorker in the Session layer and adding publisher counting in the Data layer.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
server/server/meta/src/main/java/com/alipay/sofa/registry/server/meta/bootstrap/MetaServerBootstrap.java (1)

385-385: Broken server cleanup: remoteMetaServer is never closed.

Line 385 calls isOpen() instead of close(), leaving the server running during shutdown and leaking resources. This breaks the pattern established in lines 372, 375, 378, and 381.

🔎 Apply this diff to fix the cleanup:
  if (remoteMetaServer != null && remoteMetaServer.isOpen()) {
-   remoteMetaServer.isOpen();
+   remoteMetaServer.close();
  }
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeProcessor.java (3)

63-73: Same bug: return should be continue.

This method has the same early-return issue, which will skip processing remaining data centers when one has null workers.

🔎 Apply this diff:
   public void setWorkDelayTime(PushEfficiencyImproveConfig pushEfficiencyImproveConfig) {
     for (Map.Entry<String, Worker[]> entry : dataCenterWorkers.entrySet()) {
       Worker[] workers = entry.getValue();
       if (workers == null) {
-        return;
+        continue;
       }
       for (Worker work : workers) {
         work.setChangeTaskWorkDelay(pushEfficiencyImproveConfig);
       }
     }
   }

110-120: Same bug: return should be continue.

The setChangeDebouncingMillis method has the identical early-return issue on line 114.

🔎 Apply this diff:
   public void setChangeDebouncingMillis(int changeDebouncingMillis, int changeDebouncingMaxMillis) {
     for (Map.Entry<String, Worker[]> entry : dataCenterWorkers.entrySet()) {
       Worker[] workers = entry.getValue();
       if (workers == null) {
-        return;
+        continue;
       }
       for (Worker work : workers) {
         work.setChangeDebouncingMillis(changeDebouncingMillis, changeDebouncingMaxMillis);
       }
     }
   }

270-275: Potential NoSuchElementException if dataCenters is empty.

The stream().findFirst().get() call will throw if key.dataCenters is empty. Consider using orElseThrow() with a meaningful exception message for better debuggability.

🔎 Suggested improvement:
   Worker workerOf(ChangeKey key) {
-    String dataCenter = key.dataCenters.stream().findFirst().get();
+    String dataCenter = key.dataCenters.stream().findFirst()
+        .orElseThrow(() -> new IllegalArgumentException("ChangeKey has empty dataCenters"));
     Worker[] workers = dataCenterWorkers.computeIfAbsent(dataCenter, k -> initWorkers());
     int n = (key.hashCode() & 0x7fffffff) % workers.length;
     return workers[n];
   }
🧹 Nitpick comments (26)
server/server/meta/src/main/java/com/alipay/sofa/registry/server/meta/bootstrap/MetaServerBootstrap.java (1)

44-50: Import organization looks good; verify project style for wildcard imports.

The consolidated imports (lines 44–50) are all actively used in the code. However, line 45 uses import java.util.*; (wildcard). If your project style guide prefers explicit imports (common in stricter settings), consider replacing it with specific imports like import java.util.ArrayList;, import java.util.Collection;, import java.util.List;, etc.

server/common/model/src/main/java/com/alipay/sofa/registry/common/model/store/SubPublisherList.java (1)

36-39: Add validation and documentation for the new constructor.

The new constructor bypasses calcSize() and directly accepts a byteSize parameter, which creates a risk of size mismatches if callers provide incorrect values. Consider adding:

  1. Defensive validation to ensure byteSize is reasonable (e.g., non-negative, or in debug mode, matches calcSize())
  2. JavaDoc explaining when to use this constructor and the caller's responsibility to provide accurate byteSize
🔎 View suggested improvements:

Add validation:

 public SubPublisherList(List<SubPublisher> pubs, int byteSize) {
+  if (byteSize < 0) {
+    throw new IllegalArgumentException("byteSize must be non-negative");
+  }
   this.pubs = pubs;
   this.byteSize = byteSize;
 }

Add JavaDoc:

+/**
+ * Constructs a SubPublisherList with a precomputed byteSize.
+ * Use this constructor when the size has already been calculated to avoid recomputation.
+ * The caller is responsible for ensuring byteSize accurately reflects the actual size.
+ *
+ * @param pubs the list of SubPublishers
+ * @param byteSize the precomputed size in bytes
+ */
 public SubPublisherList(List<SubPublisher> pubs, int byteSize) {
   this.pubs = pubs;
   this.byteSize = byteSize;
 }
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeHandler.java (1)

19-22: Add javadoc to document the contract.

The interface and method lack documentation. Please add javadoc to clarify:

  • The purpose of the ChangeHandler interface
  • Parameter constraints (e.g., can dataInfoId or changeCtx be null?)
  • Return value semantics (what does true/false signify?)
  • Thread-safety expectations for implementations
🔎 Example javadoc:
+/**
+ * Handler for processing data change events.
+ */
 public interface ChangeHandler {
 
+  /**
+   * Processes a change event for the specified data.
+   *
+   * @param dataInfoId the identifier of the data that changed (non-null)
+   * @param changeCtx the context of the change event (non-null)
+   * @return true if the change was processed successfully, false otherwise
+   */
   boolean onChange(String dataInfoId, TriggerPushContext changeCtx);
 }
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/sessionserver/DataChangeRequest.java (1)

38-38: Add javadoc for the new publisherCounts field and getter.

The new publisherCounts field and its getter lack documentation. Please add javadoc to clarify the purpose and semantics of publisher counts per dataInfoId.

🔎 Example javadoc:
+  /** Map of dataInfoId to publisher count for each data item. */
   private final Map<String, Integer> publisherCounts;
   
   ...
   
+  /**
+   * Returns the publisher counts for each dataInfoId.
+   *
+   * @return unmodifiable map of dataInfoId to publisher count
+   */
   public Map<String, Integer> getPublisherCounts() {
     return Collections.unmodifiableMap(publisherCounts);
   }

Also applies to: 59-61

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeKey.java (1)

26-44: Add class and field javadoc.

The class and its fields lack documentation. Please add javadoc to explain:

  • The purpose of ChangeKey and its role in the change processing system
  • The semantics of dataInfoId, dataCenters, and sortedDCMark
  • Why sortedDCMark is computed the way it is
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayConfig.java (1)

25-50: Add English javadoc for public API.

While the Chinese field comments are helpful, public classes should include English javadoc for broader accessibility. Please document the purpose of this configuration class and each field's role in the adaptive delay algorithm.

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java (1)

48-117: Consider test flakiness from Thread.sleep.

The test relies on Thread.sleep() to simulate timeouts. While this works, it makes tests slower and potentially flaky if the system is under load. Consider whether timeout simulation can be achieved through mocking or dependency injection of the time source.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/FirePushService.java (1)

46-46: Consider explicit imports over wildcard.

The change from explicit imports to java.util.* reduces clarity about which classes are actually used. Explicit imports are generally preferred for better readability and to avoid potential naming conflicts.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeWorker.java (1)

47-47: Inconsistent use of concrete ChangeKey instead of generic Key type parameter.

The interface declares <Key extends Comparable<Key>> as a generic type parameter, but commitChange uses the concrete ChangeKey type directly. This breaks the generic contract and limits flexibility.

Consider making the signature consistent:

-  boolean commitChange(ChangeKey key, ChangeHandler handler, TriggerPushContext changeCtx);
+  boolean commitChange(Key key, ChangeHandler handler, TriggerPushContext changeCtx);

Alternatively, if all implementations will always use ChangeKey, consider removing the generic Key parameter and using ChangeKey throughout.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (2)

34-35: Inconsistent comment: mentions ConcurrentHashMap but uses HashMap.

The class comment states "使用 ConcurrentHashMap 保存键到任务的映射" but line 59 uses Maps.newHashMap() (a regular HashMap). Since all access is protected by ReadWriteLock, using a plain HashMap is correct, but the comment should be updated.


52-52: Consider using a plain TreeSet instead of ConcurrentSkipListSet.

Since all access to taskLinkList is protected by the write lock, the concurrent overhead of ConcurrentSkipListSet is unnecessary. A TreeSet would provide the same ordering guarantees with less overhead.

Also applies to: 60-60

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/DefaultChangeWorker.java (1)

117-133: Inconsistent return value: null vs empty list.

doGetExpireTasks returns null when tasks are empty (line 119), but if there are tasks and none are expired, it returns an empty ArrayList (line 121 initialized, nothing added). This inconsistency may confuse callers.

Consider always returning an empty list for consistency:

🔎 Suggested fix
   @Override
   protected List<ChangeTaskImpl> doGetExpireTasks(long now) {
     if (tasks.isEmpty()) {
-      return null;
+      return Collections.emptyList();
     }
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/PushEfficiencyImproveConfig.java (2)

61-63: Missing setter for largeChangeTaskWaitingMillis field.

The field largeChangeTaskWaitingMillis has a getter but no setter, unlike other similar fields in this class (e.g., changeTaskWaitingMillis). If this field is intended to be configurable, add a setter:

public void setLargeChangeTaskWaitingMillis(int largeChangeTaskWaitingMillis) {
    this.largeChangeTaskWaitingMillis = largeChangeTaskWaitingMillis;
}

300-305: Consider validating largeChangeTaskWaitingMillis in validate() method.

The validate() method checks pushTaskWaitingMillis and changeTaskWaitingMillis but not the new largeChangeTaskWaitingMillis. For consistency:

   public boolean validate() {
-    if (pushTaskWaitingMillis <= 0 || changeTaskWaitingMillis <= 0) {
+    if (pushTaskWaitingMillis <= 0 || changeTaskWaitingMillis <= 0 || largeChangeTaskWaitingMillis <= 0) {
       return false;
     }
     return true;
   }
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayWorkerTest.java (2)

137-190: Test method exercises ChangeTaskQueue directly rather than the worker.

testLessChange_getExpireTasks_notEmptyQueue_sameDeadline creates a ChangeTaskQueue directly and tests task ordering, which is useful but may be better placed in a dedicated ChangeTaskQueueTest file (which exists per the PR). Consider whether this test belongs here or should be moved.


589-592: Consider using TestUtils.assertBetween if available.

The assertBetween helper duplicates similar functionality. If TestUtils.assertBetween (seen used in ChangeProcessorTest.java line 66) has the same signature, consider reusing it for consistency.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/AbstractChangeWorker.java (2)

62-77: Non-atomic configuration updates may cause inconsistent reads.

Both setChangeTaskWorkDelay and setChangeDebouncingMillis update multiple volatile fields without synchronization. A concurrent createTask call could observe partial updates.

Consider synchronizing these methods for consistency:

🔎 Suggested fix:
   public void setChangeTaskWorkDelay(PushEfficiencyImproveConfig pushEfficiencyImproveConfig) {
+    synchronized (this.lock) {
       this.changeDebouncingMillis = pushEfficiencyImproveConfig.getChangeDebouncingMillis();
       this.changeDebouncingMaxMillis = pushEfficiencyImproveConfig.getChangeDebouncingMaxMillis();
       this.changeTaskWaitingMillis = pushEfficiencyImproveConfig.getChangeTaskWaitingMillis();
+    }
   }

   public void setChangeDebouncingMillis(int changeDebouncingMillis, int changeDebouncingMaxMillis) {
+    synchronized (this.lock) {
       this.changeDebouncingMillis = changeDebouncingMillis;
       this.changeDebouncingMaxMillis = changeDebouncingMaxMillis;
+    }
   }

28-28: Consider adding clear() as an abstract method.

The ChangeProcessor.Worker class calls clear() on both worker implementations (lines 193-195 in ChangeProcessor.java), but clear() is not declared in this abstract class. This could cause issues if a new subclass is created without implementing clear().

🔎 Add abstract clear method:
+  /**
+   * 清空所有任务
+   */
+  public abstract void clear();
+
   @Override
   public int getWaitingMillis() {
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/change/DataChangeEventCenter.java (1)

403-416: Verify downstream handling of inconsistent map keys.

The changes and pubCounts maps are built independently and may contain different keys:

  • A dataInfoId may exist in changes but not in pubCounts (if publisherCount is null)
  • This is expected during transition periods per the PR objectives

Ensure that downstream code (session server's ChangeProcessor) handles missing publisherCounts entries gracefully, likely treating them as unknown or using a default value.

Consider adding debug logging when publisherCount is null to aid troubleshooting:

Optional: Add debug logging for missing publisher counts
         Integer publisherCount =
             datumStorageDelegate.getPubCount(event.getDataCenter(), dataInfoId);
         if (publisherCount != null) {
           pubCounts.put(dataInfoId, publisherCount);
+        } else {
+          LOGGER.debug("Publisher count unavailable for dataInfoId: {}", dataInfoId);
         }
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/PublisherGroups.java (1)

100-103: Consider adding JavaDoc to clarify "real pub num" semantics.

The method delegates to PublisherGroup.getRealPubNum() but lacks documentation explaining what "real pub num" means versus pubMap.size(). Based on the PR objectives, this appears to count only active publishers (excluding tombstones), but this should be documented for API clarity.

Suggested documentation
+  /**
+   * Get the real publisher count for a dataInfoId.
+   * This counts only active publishers, excluding tombstoned entries.
+   *
+   * @param dataInfoId the data info ID
+   * @return the count of active publishers, or null if the group doesn't exist
+   */
   Integer getRealPubNum(String dataInfoId) {
     PublisherGroup group = publisherGroupMap.get(dataInfoId);
     return group == null ? null : group.getRealPubNum();
   }
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/DatumStorage.java (1)

70-70: Add JavaDoc to document the new interface method.

The new getPubCount(String dataCenter, String dataInfoId) overload lacks documentation explaining its purpose and return semantics, particularly regarding null handling when the dataInfoId doesn't exist. Interface methods should be well-documented to ensure consistent implementations.

Suggested documentation
+  /**
+   * Get the publisher count for a specific dataInfoId in a data center.
+   *
+   * @param dataCenter the data center
+   * @param dataInfoId the data info ID
+   * @return the publisher count, or null if not found
+   */
   Integer getPubCount(String dataCenter, String dataInfoId);
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/BaseDatumStorage.java (1)

31-31: Wildcard import introduced.

The import statement was changed from specific imports to import java.util.*;. While functional, wildcard imports can reduce code clarity and potentially cause issues if new classes are added to the java.util package in future Java versions.

server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java (1)

37-37: Wildcard import introduced.

Similar to other files, the import statement was changed to import java.util.*;. This is a minor style consideration.

server/server/data/src/main/java/com/alipay/sofa/registry/server/data/multi/cluster/storage/MultiClusterDatumStorage.java (1)

36-36: Wildcard import introduced.

The import statement was changed to import java.util.*;, similar to other files in this PR.

server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/PublisherGroup.java (1)

37-37: Wildcard import introduced.

The import statement was changed to import java.util.*;, consistent with other files in this PR.

server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java (1)

77-97: Consider converting the inline commented code to method-level documentation.

The commented code block (lines 77-92) explains the state transition logic, which is helpful. However, it could be more maintainable as a JavaDoc comment or moved to the method documentation.

🔎 Alternative approach:
+  /**
+   * Updates the publisher map and adjusts realPubNum based on state transitions:
+   * - Pub -> Pub: no change
+   * - Pub -> Unpub: decrement realPubNum
+   * - Unpub -> Pub: increment realPubNum  
+   * - Unpub -> Unpub: no change
+   */
   public PublisherEnvelope put(String registerId, PublisherEnvelope publisher) {
     PublisherEnvelope existPublisher = this.publishers.put(registerId, publisher);
     if (null == existPublisher) {
       if (publisher.isPub()) {
         this.realPubNum++;
       }
       return null;
     }

     boolean isExistPub = existPublisher.isPub();
     boolean isPub = publisher.isPub();

-    // 下面的逻辑是这段代码的简化
-    //    if (isExistPub) {
-    //      if (isPub) {
-    //        // Pub -> Pub: 没变化
-    //      } else {
-    //        // Pub -> Unpub: Publisher 数量减少
-    //        this.realPubNum--;
-    //      }
-    //    } else {
-    //      if (isPub) {
-    //        // Unpub -> Pub: Publisher 数量增加
-    //        this.realPubNum++;
-    //      } else {
-    //        // Unpub -> Unpub: 没变化
-    //      }
-    //    }
     if (isExistPub && !isPub) {
       this.realPubNum--;
     } else if (!isExistPub && isPub) {
       this.realPubNum++;
     }

     return existPublisher;
   }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e936b91 and 0630e0b.

📒 Files selected for processing (68)
  • client/all/pom.xml (1 hunks)
  • client/api/pom.xml (1 hunks)
  • client/impl/pom.xml (1 hunks)
  • client/log/pom.xml (1 hunks)
  • client/pom.xml (1 hunks)
  • core/pom.xml (1 hunks)
  • pom.xml (2 hunks)
  • server/common/model/pom.xml (2 hunks)
  • server/common/model/src/main/java/com/alipay/sofa/registry/common/model/sessionserver/DataChangeRequest.java (1 hunks)
  • server/common/model/src/main/java/com/alipay/sofa/registry/common/model/store/SubPublisherList.java (1 hunks)
  • server/common/pom.xml (1 hunks)
  • server/common/util/pom.xml (1 hunks)
  • server/distribution/all/pom.xml (1 hunks)
  • server/distribution/pom.xml (1 hunks)
  • server/pom.xml (1 hunks)
  • server/remoting/api/pom.xml (1 hunks)
  • server/remoting/bolt/pom.xml (1 hunks)
  • server/remoting/http/pom.xml (1 hunks)
  • server/remoting/pom.xml (1 hunks)
  • server/server/data/pom.xml (1 hunks)
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/BaseDatumStorage.java (2 hunks)
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/DatumStorage.java (1 hunks)
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/DatumStorageDelegate.java (1 hunks)
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/LocalDatumStorage.java (1 hunks)
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java (1 hunks)
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/PublisherGroup.java (3 hunks)
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/PublisherGroups.java (1 hunks)
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/change/DataChangeEventCenter.java (6 hunks)
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/multi/cluster/storage/MultiClusterDatumStorage.java (2 hunks)
  • server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java (2 hunks)
  • server/server/data/src/test/java/com/alipay/sofa/registry/server/data/multi/cluster/client/handler/RemoteDataChangeNotifyHandlerTest.java (2 hunks)
  • server/server/integration/pom.xml (1 hunks)
  • server/server/meta/pom.xml (1 hunks)
  • server/server/meta/src/main/java/com/alipay/sofa/registry/server/meta/bootstrap/MetaServerBootstrap.java (1 hunks)
  • server/server/meta/src/main/java/com/alipay/sofa/registry/server/meta/resource/DataCenterResource.java (0 hunks)
  • server/server/pom.xml (1 hunks)
  • server/server/session/pom.xml (2 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/AbstractChangeWorker.java (1 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeHandler.java (1 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeKey.java (1 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeProcessor.java (5 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTask.java (1 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskImpl.java (1 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (1 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeWorker.java (1 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/DefaultChangeWorker.java (1 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/FirePushService.java (4 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayConfig.java (1 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayWorker.java (1 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/PushEfficiencyConfigUpdater.java (2 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/PushEfficiencyImproveConfig.java (5 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/TriggerPushContext.java (3 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/remoting/handler/DataChangeRequestHandler.java (2 hunks)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/resource/PushEfficiencyConfigResource.java (1 hunks)
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeProcessorTest.java (7 hunks)
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java (1 hunks)
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskTest.java (1 hunks)
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeWorkerCommitBenchmark.java (1 hunks)
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeWorkerGetExpireBenchmark.java (1 hunks)
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayWorkerTest.java (1 hunks)
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/MockChangeHandler.java (1 hunks)
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/remoting/handler/DataChangeRequestHandlerTest.java (2 hunks)
  • server/server/shared/pom.xml (2 hunks)
  • server/store/api/pom.xml (1 hunks)
  • server/store/jdbc/pom.xml (1 hunks)
  • server/store/jraft/pom.xml (1 hunks)
  • server/store/pom.xml (1 hunks)
  • test/pom.xml (1 hunks)
💤 Files with no reviewable changes (1)
  • server/server/meta/src/main/java/com/alipay/sofa/registry/server/meta/resource/DataCenterResource.java
🧰 Additional context used
🧬 Code graph analysis (10)
server/server/meta/src/main/java/com/alipay/sofa/registry/server/meta/bootstrap/MetaServerBootstrap.java (1)
server/server/meta/src/main/java/com/alipay/sofa/registry/server/meta/resource/DataCenterResource.java (1)
  • Path (34-56)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/resource/PushEfficiencyConfigResource.java (1)
server/server/shared/src/main/java/com/alipay/sofa/registry/server/shared/resource/AuthChecker.java (1)
  • AuthChecker (25-32)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/DefaultChangeWorker.java (1)
server/common/util/src/main/java/com/alipay/sofa/registry/collections/Maps.java (1)
  • Maps (21-44)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskImpl.java (1)
server/common/util/src/main/java/com/alipay/sofa/registry/util/StringFormatter.java (1)
  • StringFormatter (22-29)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (1)
server/common/util/src/main/java/com/alipay/sofa/registry/collections/Maps.java (1)
  • Maps (21-44)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayWorkerTest.java (2)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/TraceTimes.java (1)
  • TraceTimes (22-124)
server/common/util/src/main/java/com/alipay/sofa/registry/util/StringFormatter.java (1)
  • StringFormatter (22-29)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeWorkerGetExpireBenchmark.java (1)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/TraceTimes.java (1)
  • TraceTimes (22-124)
server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java (3)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/store/UnPublisher.java (1)
  • UnPublisher (26-70)
server/server/data/src/test/java/com/alipay/sofa/registry/server/data/TestBaseUtils.java (1)
  • TestBaseUtils (68-413)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/RegisterVersion.java (1)
  • RegisterVersion (26-105)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeWorkerCommitBenchmark.java (1)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/TraceTimes.java (1)
  • TraceTimes (22-124)
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java (1)
server/common/util/src/main/java/com/alipay/sofa/registry/collections/Maps.java (1)
  • Maps (21-44)
🪛 GitHub Actions: integration test
pom.xml

[error] 1-1: Could not resolve dependencies for project registry-client-impl: com.alipay.sofa:hessian:jar:3.5.6-buffersize-SNAPSHOT not found.

🪛 GitHub Actions: pmd
client/impl/pom.xml

[error] 1-1: Dependency resolution failed for registry-client-impl: Could not find artifact com.alipay.sofa:hessian:jar:3.5.6-buffersize-SNAPSHOT required by com.alipay.sofa:registry-client-impl:6.6.2-SNAPSHOT. This prevented the build from completing.

pom.xml

[error] 1-1: Dependency resolution failed for registry-client-impl: Could not find artifact com.alipay.sofa:hessian:jar:3.5.6-buffersize-SNAPSHOT required by com.alipay.sofa:registry-client-impl:6.6.2-SNAPSHOT. This prevented the build from completing.

🪛 GitHub Actions: unit test
pom.xml

[error] 1-1: Dependency resolution failed for project registry-client-impl: Could not find artifact com.alipay.sofa:hessian:jar:3.5.6-buffersize-SNAPSHOT

🔇 Additional comments (56)
server/server/meta/src/main/java/com/alipay/sofa/registry/server/meta/bootstrap/MetaServerBootstrap.java (1)

358-368: Resource registration methods are well-structured.

The new bindResourceConfig() and registerInstances() methods correctly enumerate Jersey/JAX-RS annotated beans (Path, Provider) and register them with the ResourceConfig. The pattern aligns with the external code snippets showing @Path and @Provider usage in DataCenterResource and LeaderAwareFilter.

client/impl/pom.xml (1)

8-8: Verify hessian SNAPSHOT artifact availability before merge.

The pom.xml configuration correctly references com.alipay.sofa:hessian:3.5.6-buffersize-SNAPSHOT and the Sonatype snapshot repository is properly configured. However, this version does not appear in public Maven repositories. Confirm that this SNAPSHOT artifact has been published to your internal/Sonatype repository before proceeding with the build.

server/remoting/http/pom.xml (1)

8-8: LGTM! Version bump is consistent.

The parent version update to 6.6.2-SNAPSHOT aligns with the new development cycle for adaptive batching features.

server/store/api/pom.xml (1)

8-8: LGTM! Version bump is consistent.

The parent version update to 6.6.2-SNAPSHOT is consistent with the project-wide version alignment.

server/store/jdbc/pom.xml (1)

9-9: LGTM! Version bump is consistent.

The parent version update to 6.6.2-SNAPSHOT aligns with the repository-wide version bump.

client/log/pom.xml (1)

8-8: LGTM! Version bump is consistent.

The parent version update to 6.6.2-SNAPSHOT is consistent with other client modules.

server/remoting/pom.xml (1)

8-8: LGTM! Version bump is consistent.

The parent version update to 6.6.2-SNAPSHOT is consistent with other remoting modules.

client/all/pom.xml (1)

9-9: LGTM! Version bump is consistent.

The artifact version update to 6.6.2-SNAPSHOT aligns with the project-wide version bump for the new development cycle.

server/common/util/pom.xml (1)

8-8: LGTM! Version bump is consistent.

The parent version update to 6.6.2-SNAPSHOT is consistent with other server common modules.

server/store/pom.xml (1)

8-8: LGTM! Version bump is consistent.

The parent version update to 6.6.2-SNAPSHOT completes the consistent version alignment across all store modules.

server/pom.xml (1)

10-10: LGTM! Version bump is consistent with the PR-wide upgrade.

The parent version update to 6.6.2-SNAPSHOT aligns with the coordinated version bump across all modules in this PR.

server/remoting/bolt/pom.xml (1)

8-8: LGTM! Version bump is consistent.

The parent version update aligns with the coordinated upgrade across the project.

server/server/integration/pom.xml (1)

8-8: LGTM! Version bump is consistent.

The parent version update aligns with the PR-wide upgrade to 6.6.2-SNAPSHOT.

server/remoting/api/pom.xml (1)

8-8: LGTM! Version bump is consistent.

The parent version update aligns with the coordinated upgrade across all modules.

server/server/shared/pom.xml (2)

8-8: LGTM! Version bump is consistent.

The parent version update aligns with the PR-wide upgrade to 6.6.2-SNAPSHOT.


106-115: LGTM! JMH dependencies appropriately added for benchmark testing.

The addition of JMH (Java Microbenchmark Harness) dependencies with test scope aligns with the PR objectives, which mention benchmark comparisons between DefaultChangeWorker and LargeChangeAdaptiveDelayWorker for commit/dequeue operations.

client/pom.xml (1)

10-10: LGTM! Version bump is consistent.

The parent version update aligns with the coordinated upgrade to 6.6.2-SNAPSHOT across all modules.

server/common/model/pom.xml (2)

8-8: LGTM! Version bump is consistent.

The parent version update aligns with the PR-wide upgrade to 6.6.2-SNAPSHOT.


54-58: LGTM! Mockito dependency appropriately added for testing.

The addition of mockito-core with test scope is standard practice and aligns with the test infrastructure improvements across the PR.

server/common/pom.xml (1)

8-8: LGTM! Version bump is consistent.

The parent version update aligns with the coordinated upgrade to 6.6.2-SNAPSHOT across all modules.

pom.xml (1)

9-9: LGTM on version bump.

The parent version update to 6.6.2-SNAPSHOT is standard for development.

server/server/session/pom.xml (2)

8-8: LGTM on version bump.

The parent version update to 6.6.2-SNAPSHOT aligns with the project-wide version change.


113-122: Excellent addition for benchmark support.

The JMH dependencies enable the microbenchmarking described in the PR objectives (comparing DefaultChangeWorker and LargeChangeAdaptiveDelayWorker). The test scope is appropriate.

server/server/data/src/test/java/com/alipay/sofa/registry/server/data/multi/cluster/client/handler/RemoteDataChangeNotifyHandlerTest.java (1)

65-65: LGTM!

The test correctly accommodates the new publisherCounts parameter in the DataChangeRequest constructor by passing Collections.emptyMap(), which is appropriate for tests that don't exercise publisher counting logic.

Also applies to: 79-79

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/MockChangeHandler.java (1)

19-24: LGTM!

This is a simple test mock implementation that unconditionally returns true, which is appropriate for unit testing scenarios where the actual change handling logic is not under test.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/PushEfficiencyConfigUpdater.java (1)

120-132: LGTM!

The code correctly retrieves the LargeChangeAdaptiveDelayConfig from the push efficiency configuration, provides a sensible default if null, and propagates it to the change processor. The logic fits well within the existing configuration update flow.

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/remoting/handler/DataChangeRequestHandlerTest.java (1)

93-94: LGTM!

The test correctly updates the DataChangeRequest constructor call to include the new publisherCounts parameter with Collections.emptyMap(), which is appropriate for this test scenario.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/resource/PushEfficiencyConfigResource.java (1)

68-89: LGTM! Consistent REST endpoint implementation.

The new endpoint follows the same patterns as the existing getChangeDebouncingMillis endpoint with proper auth checking, error handling, and logging.

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeWorkerGetExpireBenchmark.java (2)

84-107: LGTM! Well-structured benchmark setup.

The benchmark properly initializes the worker based on configuration and starts a background thread to continuously commit changes, providing realistic conditions for measuring getExpireTasks performance.


147-157: Correct use of Thread.interrupted() for clean shutdown.

The runnable properly checks Thread.interrupted() to break the loop, which clears the interrupt flag and allows clean thread termination.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/remoting/handler/DataChangeRequestHandler.java (1)

86-115: LGTM! Proper null-safe publisher count tracking.

The implementation correctly retrieves publisherCounts from the request with a null-safe fallback to an empty map, then passes the per-item count to TriggerPushContext. This enables publisher-aware push triggering as intended.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/FirePushService.java (1)

76-76: Good refactor: ChangeHandler extraction and error handling.

Extracting the concrete ChangeHandlerImpl class and adding error logging when fireChange returns false improves maintainability and observability. The implementation correctly handles failures without silently swallowing them.

Also applies to: 103-106, 339-351

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTask.java (1)

31-46: LGTM! Consistent compareTo implementation.

The compareTo implementation correctly treats tasks with equal keys as equal (returning 0), then orders by deadline, with key as a tiebreaker. This is consistent with ChangeTaskImpl.equals() which also compares by key only, maintaining the equals/compareTo contract for proper behavior in sorted collections.

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeWorkerCommitBenchmark.java (1)

105-120: Redundant map update.

Line 110 updates this.versions.put(dataInfoId, version) but getVersion() already updates the map at line 129. This redundant put operation adds unnecessary overhead to the benchmark.

🔎 Apply this diff to remove the redundant put:
   @Benchmark
   public void benchmarkCommitChange() {
     int dataInfoNumber = this.random.nextInt(this.totalDataInfoIdSize);
     String dataInfoId = DATA_INFO_ID_PREFIX + dataInfoNumber;
     long version = this.getVersion(dataInfoId);
-    this.versions.put(dataInfoId, version);
     ChangeKey changeKey = new ChangeKey(DATA_CENTERS, dataInfoId);
     TraceTimes traceTimes = new TraceTimes();
     TriggerPushContext changeCtx;
     if (dataInfoNumber >= this.largeDataInfoIdThreshold) {
       changeCtx = new TriggerPushContext(DATA_CENTER, version, DATA_NODE, 0, traceTimes, 9000);
     } else {
       changeCtx = new TriggerPushContext(DATA_CENTER, version, DATA_NODE, 0, traceTimes, 100);
     }
     this.changeWorker.commitChange(changeKey, this.mockChangeHandler, changeCtx);
   }

Likely an incorrect or invalid review comment.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/TriggerPushContext.java (1)

30-30: LGTM! Clean addition of publisherCount field.

The use of Integer (boxed) rather than primitive int is appropriate here as it allows distinguishing between "count not available" (null) and "count is 0". The hasPublisherCount() method provides a clean null-check API.

Consider adding publisherCount to the toString() method (line 132) for debugging visibility:

return StringFormatter.format("TriggerPushCtx{{},ver={},pubCount={}}", dataNode, expectDatumVersion, publisherCount);

Also applies to: 122-128

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskTest.java (1)

31-222: Good test coverage for ChangeKey and ChangeTaskImpl equality/comparison semantics.

The tests comprehensively verify:

  • ChangeKey equality based on datacenter set and dataInfoId
  • ChangeTaskImpl equality based solely on key (ignoring timestamps)
  • Comparison ordering behavior
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeProcessorTest.java (1)

26-26: LGTM! Test updates correctly reflect the new public API surface.

The test has been properly updated to:

  • Use top-level ChangeKey, ChangeHandler, and ChangeTaskImpl classes
  • Access debouncing configuration via getChangeDebouncingTime() accessor
  • Use getExpires() instead of the previous getExpire() method

Also applies to: 45-50, 53-54, 63-63, 73-74, 80-80, 88-88, 110-114, 116-117, 128-129, 133-133, 151-154, 162-162

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/DefaultChangeWorker.java (1)

33-144: LGTM! Clean FIFO task queue implementation.

The use of LinkedHashMap with remove-then-put in updateTask correctly maintains insertion order semantics. The expiration logic properly leverages the ordered iteration.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/PushEfficiencyImproveConfig.java (1)

34-34: LGTM! Well-structured configuration additions for large-change adaptive delay.

The new configuration fields follow the existing patterns in the class and provide clean accessors for the large-change delay feature.

Also applies to: 90-92, 158-163, 279-286

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayWorkerTest.java (2)

60-92: Comprehensive test coverage for adaptive delay worker.

The tests thoroughly exercise:

  • Small vs large change handling
  • Task merging and deadline preservation
  • Expiration ordering based on publisher count
  • Mixed small/large change sequences

Also applies to: 94-135, 279-319, 321-365


78-78: Timing-based tests may be flaky in CI environments.

Multiple tests rely on Thread.sleep() with tight timing margins (e.g., Thread.sleep(100) followed by assertions about task expiration). These can be flaky under CPU pressure or slow CI runners.

Consider:

  1. Using larger sleep margins with wider assertion bounds
  2. Mocking time/clock for deterministic testing
  3. Adding @Ignore or @Category(SlowTest.class) annotations if flakiness is observed

Also applies to: 112-112, 117-117, 123-123, 248-248, 306-306, 339-339, 344-344, 352-352, 407-407, 482-482, 564-564

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayWorker.java (2)

191-195: Both timestamps set to deadline eliminates debouncing window for large pushes.

Setting both expireTimestamp (via constructor) and expireDeadlineTimestamp to the same deadline value means the task will expire immediately at the deadline with no opportunity for further merging/debouncing. This differs from the parent class behavior where expireTimestamp < expireDeadlineTimestamp allows a debouncing window.

If this is intentional for large pushes (to prevent indefinite batching), consider adding a brief comment explaining this design choice.


60-76: LGTM!

The constructor properly initializes all fields and computes maxDelay based on the threshold configuration. The use of ChangeTaskQueue for skip-list based ordering is appropriate for the adaptive delay use case.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeProcessor.java (2)

187-197: Race condition when switching between workers.

When setLargeChangeAdaptiveDelayConfig toggles the flag and clears the inactive worker, concurrent operations could:

  1. Commit a task to the old worker just before the switch
  2. Have that task lost when clear() is called

If live switching is expected, consider using synchronization or accepting this as a known limitation during config changes. A brief comment documenting this tradeoff would help future maintainers.


223-246: LGTM!

The runUnthrowable implementation correctly:

  1. Retrieves expired tasks under synchronization (via getExpires())
  2. Processes each task outside the lock with per-task exception handling
  3. Has an outer catch-all for unexpected failures
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/AbstractChangeWorker.java (2)

124-150: LGTM!

The doCommitChange merge logic correctly handles:

  • Skipping updates when the new context is older
  • Merging versions and trace times when requeuing
  • Preserving existing tasks when deadline is reached while updating version expectations

206-213: Task creation outside synchronized block is acceptable.

Creating the task before acquiring the lock minimizes lock contention. While config could theoretically change between createTask and doCommitChange, the practical impact is negligible since config changes are infrequent.

server/server/data/src/main/java/com/alipay/sofa/registry/server/data/change/DataChangeEventCenter.java (5)

175-175: LGTM: Publisher counts field added correctly.

The publisherCounts field follows the same pattern as other ChangeNotifier fields and is appropriately marked final for thread safety.


180-194: LGTM: Constructor properly wired for publisher counts.

The constructor signature and field assignment correctly handle the new publisherCounts parameter, maintaining consistency with other constructor parameters.


428-429: LGTM: ChangeNotifier construction correctly passes publisher counts.

The ChangeNotifier is properly instantiated with both the changes and pubCounts maps, enabling the adaptive batching feature for large pushes.


545-549: LGTM: Test helper appropriately provides empty publisher counts.

Using Collections.emptyMap() for the publisherCounts parameter in the test helper is appropriate and confirms that downstream code correctly handles empty or missing publisher count data.


204-205: LGTM: DataChangeRequest construction includes publisher counts.

The DataChangeRequest is correctly constructed with the new publisherCounts parameter at lines 204-205, and downstream consumers in the session server already gracefully handle cases where specific dataInfoId keys may not have corresponding entries in publisherCounts by allowing null values to be passed to TriggerPushContext.

server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/LocalDatumStorage.java (1)

85-88: Implementation looks correct.

The per-dataInfoId publisher count retrieval correctly delegates to the underlying storage. The dataCenter parameter is appropriately ignored since this is a single-datacenter storage implementation, consistent with other methods in this class.

server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/DatumStorageDelegate.java (1)

155-158: Delegation pattern correctly implemented.

The method properly delegates to the appropriate storage (local or multi-cluster) based on the datacenter, consistent with other methods in this class.

server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/BaseDatumStorage.java (1)

105-108: Implementation is correct and consistent.

The per-dataInfoId publisher count method properly retrieves the PublisherGroups and delegates to getRealPubNum, with appropriate null handling that matches the pattern used by getVersion(String dataInfoId).

server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java (1)

390-457: Comprehensive test coverage for publisher counting.

The test thoroughly validates the real publisher count behavior across multiple scenarios:

  • Initial state and additions
  • Version updates
  • Batch puts
  • Removals and tombstones
  • Clean operations
  • UnPublisher handling

This provides strong coverage for the new publisher counting feature.

Comment on lines 44 to 53
public DataChangeRequest(
String dataCenter, Map<String, DatumVersion> dataInfoIds, TraceTimes times) {
String dataCenter,
Map<String, DatumVersion> dataInfoIds,
Map<String, Integer> publisherCounts,
TraceTimes times) {
this.dataCenter = dataCenter;
this.dataInfoIds = dataInfoIds;
this.publisherCounts = publisherCounts;
this.times = times;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add null safety check for publisherCounts parameter.

The constructor accepts publisherCounts but doesn't validate it for null. If null is passed, getPublisherCounts() will throw NPE when calling Collections.unmodifiableMap(publisherCounts).

🔎 Apply this diff to add null safety:
   public DataChangeRequest(
       String dataCenter,
       Map<String, DatumVersion> dataInfoIds,
       Map<String, Integer> publisherCounts,
       TraceTimes times) {
     this.dataCenter = dataCenter;
     this.dataInfoIds = dataInfoIds;
-    this.publisherCounts = publisherCounts;
+    this.publisherCounts = publisherCounts != null ? publisherCounts : Collections.emptyMap();
     this.times = times;
   }
🤖 Prompt for AI Agents
In
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/sessionserver/DataChangeRequest.java
around lines 44 to 53, the constructor accepts publisherCounts but does not
guard against null which will cause an NPE when callers wrap it with
Collections.unmodifiableMap; add a null-safety assignment in the constructor
such that if publisherCounts is null you assign Collections.emptyMap() (or a new
HashMap<>() if mutability is needed elsewhere) before storing it to
this.publisherCounts so subsequent calls to getPublisherCounts() are safe.

Comment on lines 116 to 126
public boolean remove(String registerId, PublisherEnvelope expectDeletePublisher) {
boolean deleted = this.publishers.remove(registerId, expectDeletePublisher);
if (deleted) {
// 成功删除了
if (expectDeletePublisher.isPub()) {
// 删除了一个 Publisher,计数需要减少
this.realPubNum--;
}
}
return deleted;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add null check for the expectDeletePublisher parameter.

If expectDeletePublisher is null and successfully removed from the map (line 117), the call to expectDeletePublisher.isPub() at line 120 will throw a NullPointerException.

While passing null seems unlikely, defensive coding would prevent this edge case.

🔎 Apply this diff to add null validation:
 public boolean remove(String registerId, PublisherEnvelope expectDeletePublisher) {
+  if (expectDeletePublisher == null) {
+    throw new IllegalArgumentException("expectDeletePublisher cannot be null");
+  }
   boolean deleted = this.publishers.remove(registerId, expectDeletePublisher);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public boolean remove(String registerId, PublisherEnvelope expectDeletePublisher) {
boolean deleted = this.publishers.remove(registerId, expectDeletePublisher);
if (deleted) {
// 成功删除了
if (expectDeletePublisher.isPub()) {
// 删除了一个 Publisher,计数需要减少
this.realPubNum--;
}
}
return deleted;
}
public boolean remove(String registerId, PublisherEnvelope expectDeletePublisher) {
if (expectDeletePublisher == null) {
throw new IllegalArgumentException("expectDeletePublisher cannot be null");
}
boolean deleted = this.publishers.remove(registerId, expectDeletePublisher);
if (deleted) {
// 成功删除了
if (expectDeletePublisher.isPub()) {
// 删除了一个 Publisher,计数需要减少
this.realPubNum--;
}
}
return deleted;
}
🤖 Prompt for AI Agents
In
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java
around lines 116 to 126, add a null check for the expectDeletePublisher
parameter before calling expectDeletePublisher.isPub(); specifically, after
verifying deletion (deleted == true) guard the isPub() call with if
(expectDeletePublisher != null && expectDeletePublisher.isPub()) and only
decrement realPubNum in that branch so a null expectDeletePublisher does not
cause a NullPointerException.

Comment on lines +75 to +87
public void setLargeChangeAdaptiveDelayConfig(
LargeChangeAdaptiveDelayConfig largeChangeAdaptiveDelayConfig) {
this.largeChangeAdaptiveDelayConfig = largeChangeAdaptiveDelayConfig;
for (Map.Entry<String, Worker[]> entry : dataCenterWorkers.entrySet()) {
Worker[] workers = entry.getValue();
if (workers == null) {
return;
}
for (Worker work : workers) {
work.setLargeChangeAdaptiveDelayConfig(largeChangeAdaptiveDelayConfig);
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: return should be continue to process all data centers.

When workers == null, the method returns immediately, skipping configuration updates for remaining data centers. This is inconsistent with other similar methods in this class (e.g., setWorkDelayTime) and prevents proper configuration propagation.

🔎 Apply this diff to fix the early return:
   public void setLargeChangeAdaptiveDelayConfig(
       LargeChangeAdaptiveDelayConfig largeChangeAdaptiveDelayConfig) {
     this.largeChangeAdaptiveDelayConfig = largeChangeAdaptiveDelayConfig;
     for (Map.Entry<String, Worker[]> entry : dataCenterWorkers.entrySet()) {
       Worker[] workers = entry.getValue();
       if (workers == null) {
-        return;
+        continue;
       }
       for (Worker work : workers) {
         work.setLargeChangeAdaptiveDelayConfig(largeChangeAdaptiveDelayConfig);
       }
     }
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public void setLargeChangeAdaptiveDelayConfig(
LargeChangeAdaptiveDelayConfig largeChangeAdaptiveDelayConfig) {
this.largeChangeAdaptiveDelayConfig = largeChangeAdaptiveDelayConfig;
for (Map.Entry<String, Worker[]> entry : dataCenterWorkers.entrySet()) {
Worker[] workers = entry.getValue();
if (workers == null) {
return;
}
for (Worker work : workers) {
work.setLargeChangeAdaptiveDelayConfig(largeChangeAdaptiveDelayConfig);
}
}
}
public void setLargeChangeAdaptiveDelayConfig(
LargeChangeAdaptiveDelayConfig largeChangeAdaptiveDelayConfig) {
this.largeChangeAdaptiveDelayConfig = largeChangeAdaptiveDelayConfig;
for (Map.Entry<String, Worker[]> entry : dataCenterWorkers.entrySet()) {
Worker[] workers = entry.getValue();
if (workers == null) {
continue;
}
for (Worker work : workers) {
work.setLargeChangeAdaptiveDelayConfig(largeChangeAdaptiveDelayConfig);
}
}
}
🤖 Prompt for AI Agents
In
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeProcessor.java
around lines 75 to 87, the method uses return when workers == null which aborts
processing remaining data centers; change the early exit to continue so the loop
skips the null entry but continues iterating and applies
largeChangeAdaptiveDelayConfig to all other data centers' workers, keeping the
rest of the logic unchanged.

Comment on lines 97 to 105
} else {
// 更新任务,先移除旧任务再添加新任务
if (!this.taskLinkList.remove(existTask)) {
return false;
}
if (!this.taskLinkList.add(task)) {
return false;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential data inconsistency on failed update.

If taskLinkList.remove(existTask) succeeds but taskLinkList.add(task) fails (line 102-103), the method returns false but leaves the queue in an inconsistent state: the old task is removed from the skip list but the map still contains the old mapping.

Consider atomic rollback or documenting failure semantics:

🔎 Suggested fix
       } else {
         // 更新任务,先移除旧任务再添加新任务
         if (!this.taskLinkList.remove(existTask)) {
           return false;
         }
         if (!this.taskLinkList.add(task)) {
+          // Rollback: re-add the old task to maintain consistency
+          this.taskLinkList.add(existTask);
           return false;
         }
       }
🤖 Prompt for AI Agents
In
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java
around lines 97 to 105, the code removes the existing task from taskLinkList
then adds the new task, but if add(task) fails the list and the accompanying map
become inconsistent; change the flow to perform an atomic update or implement a
rollback: either (A) try to add the new task first and only remove the old one
after add succeeds, or (B) wrap remove/add in a try/catch and if add fails
re-add the removed existTask and restore the map entry, and ensure both
operations are executed under the same lock/synchronization to prevent races;
also update comments to document the transactional semantics and log errors on
rollback failure.

Comment on lines +56 to +90
public void setUseLargeAdapterDelayChangeWorker(boolean useLargeAdapterDelayChangeWorker) {
this.useLargeAdapterDelayChangeWorker = useLargeAdapterDelayChangeWorker;
}

public long getPublisherThreshold() {
return publisherThreshold;
}

public void setPublisherThreshold(long publisherThreshold) {
this.publisherThreshold = publisherThreshold;
}

public long getMaxPublisherCount() {
return maxPublisherCount;
}

public void setMaxPublisherCount(long maxPublisherCount) {
this.maxPublisherCount = maxPublisherCount;
}

public long getBaseDelay() {
return baseDelay;
}

public void setBaseDelay(long baseDelay) {
this.baseDelay = baseDelay;
}

public long getDelayPerUnit() {
return delayPerUnit;
}

public void setDelayPerUnit(long delayPerUnit) {
this.delayPerUnit = delayPerUnit;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add validation to setters.

The setters don't validate parameter values. Invalid values (e.g., negative delays, publisherThreshold > maxPublisherCount) could cause unexpected behavior in downstream code.

🔎 Example validation:
   public void setPublisherThreshold(long publisherThreshold) {
+    if (publisherThreshold < 0) {
+      throw new IllegalArgumentException("publisherThreshold must be non-negative");
+    }
     this.publisherThreshold = publisherThreshold;
   }
 
   public void setMaxPublisherCount(long maxPublisherCount) {
+    if (maxPublisherCount < 0) {
+      throw new IllegalArgumentException("maxPublisherCount must be non-negative");
+    }
     this.maxPublisherCount = maxPublisherCount;
   }
 
   public void setBaseDelay(long baseDelay) {
+    if (baseDelay < 0) {
+      throw new IllegalArgumentException("baseDelay must be non-negative");
+    }
     this.baseDelay = baseDelay;
   }
 
   public void setDelayPerUnit(long delayPerUnit) {
+    if (delayPerUnit < 0) {
+      throw new IllegalArgumentException("delayPerUnit must be non-negative");
+    }
     this.delayPerUnit = delayPerUnit;
   }
🤖 Prompt for AI Agents
In
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayConfig.java
around lines 56 to 90, the setter methods lack validation allowing negative
values and inconsistent thresholds; add input checks to each setter: ensure
numeric fields that represent counts/delays (publisherThreshold,
maxPublisherCount, baseDelay, delayPerUnit) are non-negative (>=0), and enforce
publisherThreshold <= maxPublisherCount when both are set (validate in either
setter or both); on invalid input throw IllegalArgumentException with a clear
message and only assign the field after validation so state remains consistent.

Comment on lines +85 to +92
public void reset(LargeChangeAdaptiveDelayConfig largeChangeAdaptiveDelayConfig) {
this.baseDelay = largeChangeAdaptiveDelayConfig.getBaseDelay();
this.delayPerUnit = largeChangeAdaptiveDelayConfig.getDelayPerUnit();
this.publisherThreshold = largeChangeAdaptiveDelayConfig.getPublisherThreshold();
this.maxPublisherCount = largeChangeAdaptiveDelayConfig.getMaxPublisherCount();
this.maxDelay =
this.computeDelay(baseDelay, delayPerUnit, publisherThreshold, maxPublisherCount);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Non-atomic configuration update may cause inconsistent state.

The reset() method writes to multiple volatile fields sequentially. A concurrent call to createTask() could observe a partial update (e.g., new baseDelay with old publisherThreshold), leading to incorrect delay calculations.

Consider synchronizing this method or performing updates atomically:

🔎 Suggested fix using synchronization:
   public void reset(LargeChangeAdaptiveDelayConfig largeChangeAdaptiveDelayConfig) {
+    synchronized (this.lock) {
       this.baseDelay = largeChangeAdaptiveDelayConfig.getBaseDelay();
       this.delayPerUnit = largeChangeAdaptiveDelayConfig.getDelayPerUnit();
       this.publisherThreshold = largeChangeAdaptiveDelayConfig.getPublisherThreshold();
       this.maxPublisherCount = largeChangeAdaptiveDelayConfig.getMaxPublisherCount();
       this.maxDelay =
           this.computeDelay(baseDelay, delayPerUnit, publisherThreshold, maxPublisherCount);
+    }
   }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayWorker.java
around lines 85 to 92, reset(...) updates multiple volatile fields one-by-one
which can be observed partially by concurrent createTask() calls; make the
configuration update atomic either by synchronizing reset (and any readers like
createTask) so the set of fields is updated/read under the same lock, or by
replacing the multiple volatile fields with a single volatile immutable config
holder object (containing baseDelay, delayPerUnit, publisherThreshold,
maxPublisherCount, maxDelay) and update that single reference in reset so
createTask can read one snapshot atomically; ensure maxDelay is computed from
the same snapshot before publishing it.

*/
public class NotThreadSafePublisherMap {

private int realPubNum;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volatile or AtomicLonger

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,我来修改下、

return existPublisher;
}

public void clear() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write operations like clear or put have to be mutually exclusive
use sync or writeLock to avoid concurrency

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NotThreadSafePublisherMap 这个结构是提供给 PublisherGroup 使用的,我看过 PublisherGroup 中所有的写操作都是有上锁的,所以这个数据结构里我就没有加锁了;realPubNum 这个主要是用来提供给 Session 判断推送是大推送还是小推送的,我觉得是允许和实际的 publisher 数量不一致的;所以这个 NotThreadSafePublisherMap 就没有加锁实现成线程安全的了;你觉得这样设计行不行呀?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java (1)

123-133: Consider adding a null check for defensive coding.

While expectDeletePublisher should never be null in practice (the put method rejects nulls), calling isPub() on line 127 without an explicit null check leaves a theoretical NPE risk. A previous review raised this concern.

🔎 Suggested defensive null check
   public boolean remove(String registerId, PublisherEnvelope expectDeletePublisher) {
     boolean deleted = this.publishers.remove(registerId, expectDeletePublisher);
     if (deleted) {
       // 成功删除了
-      if (expectDeletePublisher.isPub()) {
+      if (expectDeletePublisher != null && expectDeletePublisher.isPub()) {
         // 删除了一个 Publisher,计数需要减少
         this.realPubNum.decrementAndGet();
       }
     }
     return deleted;
   }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0630e0b and 2c730e4.

📒 Files selected for processing (2)
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/registry/SessionRegistry.java
✅ Files skipped from review due to trivial changes (1)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/registry/SessionRegistry.java
🧰 Additional context used
🧬 Code graph analysis (1)
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java (1)
server/common/util/src/main/java/com/alipay/sofa/registry/collections/Maps.java (1)
  • Maps (21-44)
🔇 Additional comments (5)
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java (5)

31-40: Good use of AtomicInteger to ensure visibility and atomicity.

The switch from a plain int with volatile to AtomicInteger for realPubNum properly addresses the visibility concerns raised in previous reviews. This ensures that reads from getRealPubNum() (which may occur without holding PublisherGroup's lock) will always see the latest value, while atomic increment/decrement operations remain correct.


42-65: LGTM!

All read methods are straightforward and safe. getRealPubNum() correctly uses AtomicInteger.get() which provides visibility guarantees for reads.


68-107: LGTM! Null check and transition logic are correct.

The null check at lines 69-71 addresses the previous review concern. The state transition logic for tracking real publisher count is correct and well-optimized:

  • Lines 100-104 efficiently handle all four state transitions (Pub↔Pub, Pub→Unpub, Unpub→Pub, Unpub↔Unpub)
  • The detailed Chinese comments helpfully explain the full logic before the optimization

109-112: LGTM!

The clear() method correctly resets both the map and the counter.


114-121: LGTM!

The null check on existPublisher before calling isPub() is correct, ensuring no NPE when the registerId doesn't exist in the map.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (1)

98-110: Critical: Task data loss on failed update.

The past review correctly identified a problem here, but the actual issue is complete data loss, not inconsistency. If taskLinkList.add(task) fails at line 107, the method returns false after having removed the old task from both structures (lines 100, 104). The key ends up with no entry in either taskMap or taskLinkList.

🔎 Suggested rollback fix
       } else {
         // 更新任务,先移除旧任务再添加新任务
         if (!this.taskLinkList.remove(existTask)) {
           return false;
-        } else {
-          // 队列中的任务删除掉后,同步删除掉 Map 中的任务映射
-          this.taskMap.remove(key);
         }
         // 尝试添加新任务
         if (!this.taskLinkList.add(task)) {
+          // Rollback: re-add the removed existTask to prevent data loss
+          this.taskLinkList.add(existTask);
           return false;
         }
+        // Only remove from map after successful add to skiplist
+        this.taskMap.remove(key);
       }

Alternative approach: attempt add before remove (may require handling duplicates in skiplist).

🧹 Nitpick comments (1)
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java (1)

27-40: Enhance class documentation to clarify the threading model.

The class name NotThreadSafePublisherMap suggests no thread-safety guarantees, but it uses AtomicInteger for realPubNum and ConcurrentMap for publishers. Based on past discussions, this design assumes external synchronization (via PublisherGroup's ReadWriteLock) for write operations while allowing lock-free reads of realPubNum.

Consider expanding the class javadoc to document:

  • External synchronization requirements for write operations
  • Why AtomicInteger is used despite the "NotThreadSafe" naming (visibility guarantees for lock-free reads)
  • The intended usage pattern with PublisherGroup

This will prevent confusion for future maintainers.

📝 Example documentation improvement
 /**
+ * A publisher map that tracks the real number of publishers (where isPub() == true).
+ * 
+ * <p>This class is named "NotThreadSafe" because write operations (put, remove, clear)
+ * require external synchronization via PublisherGroup's ReadWriteLock. However, the
+ * realPubNum field uses AtomicInteger to provide visibility guarantees for lock-free
+ * reads via getRealPubNum().
+ * 
+ * <p>The underlying ConcurrentMap allows safe iteration without holding locks, but
+ * all write operations must be externally synchronized.
+ * 
  * @author huicha
  * @date 2025/10/30
  */
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c730e4 and fd714a7.

📒 Files selected for processing (4)
  • server/common/model/src/main/java/com/alipay/sofa/registry/common/model/sessionserver/DataChangeRequest.java
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskImpl.java
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/common/model/src/main/java/com/alipay/sofa/registry/common/model/sessionserver/DataChangeRequest.java
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskImpl.java
🧰 Additional context used
🧬 Code graph analysis (1)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (1)
server/common/util/src/main/java/com/alipay/sofa/registry/collections/Maps.java (1)
  • Maps (21-44)
🔇 Additional comments (11)
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java (5)

42-65: LGTM!

The read methods correctly delegate to the underlying ConcurrentMap and AtomicInteger, providing appropriate visibility guarantees for concurrent access.


67-107: LGTM!

The put() method correctly handles all publisher state transitions:

  • New entry: increments if isPub()
  • Pub → Unpub: decrements
  • Unpub → Pub: increments
  • Pub → Pub / Unpub → Unpub: no change

The null check and transition logic properly address previous review feedback.


109-112: LGTM!

The clear() method correctly resets both the publisher map and the counter.


114-121: LGTM!

The remove(String registerId) method correctly decrements realPubNum only when a publisher entry is actually removed.


123-136: LGTM!

The conditional remove() method correctly uses ConcurrentMap.remove(key, value) for atomic comparison-and-delete, and only decrements realPubNum when a publisher is successfully removed. The null check properly addresses previous review feedback.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (6)

41-62: LGTM! Clean dual-structure design with appropriate locking.

The combination of HashMap for O(1) key lookup and ConcurrentSkipListSet for deadline-ordered iteration is well-suited for this task queue. The ReadWriteLock correctly protects concurrent access to both structures.


70-77: LGTM! Thread-safe lookup with read lock.

The method correctly uses read lock for concurrent-safe access to the map.


124-144: LGTM! Correct expiry logic with proper synchronization.

The method correctly removes expired tasks from both structures under write lock, maintaining consistency. The break at line 131 is valid because tasks are deadline-ordered.


152-173: LGTM! Efficient batch expiry processing.

The batch removal correctly maintains synchronization between both data structures while collecting all expired tasks in a single pass.


180-188: LGTM! Clean atomic clear operation.

The method correctly clears both data structures under write lock.


196-205: LGTM! Well-designed test helper with proper locking.

The visitor pattern with read lock correctly supports safe task iteration for testing purposes.

public SubPublisherList(List<SubPublisher> pubs, int byteSize) {
this.pubs = pubs;
this.byteSize = byteSize;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

LargeChangeAdaptiveDelayConfig largeChangeAdaptiveDelayConfig) {
this.largeAdaptiveDelayChangeWorker.reset(largeChangeAdaptiveDelayConfig);
this.useLargeAdapterDelayChangeWorker =
largeChangeAdaptiveDelayConfig.isUseLargeAdapterDelayChangeWorker();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check whether push notifications are still timely after clearing the queue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方会 clear 队列的改动点,专门拿出来和测试说过的,他这边会测试到

@hui-cha hui-cha changed the title 【WIP】增加针对大推送的自适应攒批的能力 增加针对大推送的自适应攒批的能力 Dec 30, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (1)

98-110: Acknowledge the existing review concern about partial failure.

The past review comment about data inconsistency when add(task) fails after remove(existTask) succeeds is still applicable. The current implementation leaves both structures empty for the key on add failure, which is more consistent but still loses the original task.

Consider documenting this behavior explicitly or implementing a rollback as suggested in the past review.

🧹 Nitpick comments (3)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (2)

35-36: Documentation inconsistency: Javadoc says "ConcurrentHashMap" but code uses HashMap.

The class-level Javadoc at line 35 mentions "使用 ConcurrentHashMap 保存键到任务的映射" but line 60 uses Maps.newHashMap() which creates a regular HashMap. While this is functionally correct since all access is protected by locks, the documentation is misleading.

Consider updating the Javadoc to accurately reflect the implementation:

🔎 Suggested documentation fix
- * <p>基于跳表数据结构实现的线程安全任务队列,支持按键查找和按截止时间排序: 1. 使用 ConcurrentHashMap 保存键到任务的映射,实现O(1)查找 2. 使用
+ * <p>基于跳表数据结构实现的线程安全任务队列,支持按键查找和按截止时间排序: 1. 使用 HashMap 保存键到任务的映射,实现O(1)查找 2. 使用

Also applies to: 60-60


216-218: Minor: Unnecessary use of ConcurrentMap for a copy.

getTaskMap() creates a copy using Maps.newConcurrentMap(), but since this is a defensive copy for testing, thread-safety of the returned map is unnecessary. A regular HashMap would be more efficient.

🔎 Suggested fix
   public Map<Key, Task> getTaskMap() {
     this.readLock.lock();
     try {
-      Map<Key, Task> cloneMap = Maps.newConcurrentMap();
+      Map<Key, Task> cloneMap = Maps.newHashMap();
       cloneMap.putAll(this.taskMap);
       return cloneMap;
     } finally {
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java (1)

174-176: Consider removing System.out.println from test.

System.out.println at line 175 produces output during test runs. Consider using a logger or removing it, as test output should ideally be silent when passing.

🔎 Suggested fix
         while (true) {
           List<MockChangeTask> timeoutTasks = queue.popTimeoutTasks(System.currentTimeMillis());
-          int size = CollectionUtils.size(timeoutTasks);
-          System.out.println("pop timeout task size: " + size);
           Thread.sleep(TimeUnit.MILLISECONDS.toMillis(50L));
         }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eaf76be and ba4dc93.

📒 Files selected for processing (2)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java
🧰 Additional context used
🧬 Code graph analysis (1)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (1)
server/common/util/src/main/java/com/alipay/sofa/registry/collections/Maps.java (1)
  • Maps (21-44)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (5)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (1)

124-144: LGTM!

The popTimeoutTask and popTimeoutTasks methods correctly use write locks and leverage the sorted order of the skip list to efficiently find and remove expired tasks. The synchronization between taskLinkList and taskMap is properly maintained.

Also applies to: 152-173

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java (4)

36-52: LGTM!

Clean and straightforward test validating push and lookup functionality with reference equality assertions.


54-123: LGTM with note on test duration.

Comprehensive test for timeout-based pop functionality. The sequential verification of task expiration and queue state is thorough. Note that this test takes ~3 seconds due to Thread.sleep calls, which is acceptable for time-based testing but contributes to overall test suite duration.


125-211: Good concurrent stress test design.

The test properly uses CountDownLatch for thread coordination and validates that the queue drains completely after interrupting producers. The final assertions checking both taskMap and taskLinkList are empty ensure internal consistency.


214-234: This concern is not valid. ChangeTask<String> extends Comparable<ChangeTask<String>> and provides a default compareTo() implementation. When MockChangeTask implements ChangeTask<String>, it automatically inherits the Comparable interface through the type hierarchy. No explicit Comparable implementation is needed, and ConcurrentSkipListSet<Task> will work correctly.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (1)

97-109: Data inconsistency on failed task update remains unaddressed.

The previous review comment about potential inconsistency between taskLinkList and taskMap when taskLinkList.add(task) fails (line 106) has not been addressed. If the add operation fails after removing the old task (lines 99-103), both data structures are left without any task for that key, even though the method returns false indicating failure.

Consider implementing rollback logic or at least documenting this failure mode.

🔎 Rollback approach
       } else {
         // 更新任务,先移除旧任务再添加新任务
         if (!this.taskLinkList.remove(existTask)) {
           return false;
-        } else {
-          // 队列中的任务删除掉后,同步删除掉 Map 中的任务映射
-          this.taskMap.remove(key);
         }
         // 尝试添加新任务
         if (!this.taskLinkList.add(task)) {
+          // Rollback: re-add the old task to maintain consistency
+          this.taskLinkList.add(existTask);
           return false;
         }
+        // Both skiplist operations succeeded, now update the map
+        this.taskMap.put(key, task);
-      }
-      this.taskMap.put(key, task);
+      } else {
+        // New task path
+        this.taskMap.put(key, task);
+      }
       return true;
🧹 Nitpick comments (4)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java (3)

154-154: Simplify redundant TimeUnit conversion.

TimeUnit.MILLISECONDS.toMillis(50L) simply returns 50L. Consider using Thread.sleep(50L) directly or TimeUnit.MILLISECONDS.sleep(50L) for clarity.

🔎 Proposed fix
-                  Thread.sleep(TimeUnit.MILLISECONDS.toMillis(50L));
+                  Thread.sleep(50L);

178-178: Simplify redundant TimeUnit conversion.

Same issue as line 154 - TimeUnit.MILLISECONDS.toMillis(50L) is redundant.

🔎 Proposed fix
-                Thread.sleep(TimeUnit.MILLISECONDS.toMillis(50L));
+                Thread.sleep(50L);

196-196: Simplify redundant TimeUnit conversion.

Same issue - TimeUnit.MILLISECONDS.toMillis(50L) is redundant.

🔎 Proposed fix
-    Thread.sleep(timeout + TimeUnit.MILLISECONDS.toMillis(50L));
+    Thread.sleep(timeout + 50L);
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (1)

206-221: Consider using HashMap for test snapshot.

Since getTaskMap is for testing and returns a snapshot, using Maps.newConcurrentMap() (line 215) is unnecessary. A regular HashMap would be more appropriate for a read-only copy.

🔎 Proposed change
   @VisibleForTesting
   public Map<Key, Task> getTaskMap() {
     this.readLock.lock();
     try {
-      Map<Key, Task> cloneMap = Maps.newConcurrentMap();
+      Map<Key, Task> cloneMap = Maps.newHashMap();
       cloneMap.putAll(this.taskMap);
       return cloneMap;
     } finally {
       this.readLock.unlock();
     }
   }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba4dc93 and b384043.

📒 Files selected for processing (33)
  • client/all/pom.xml
  • client/api/pom.xml
  • client/impl/pom.xml
  • client/log/pom.xml
  • client/pom.xml
  • core/pom.xml
  • pom.xml
  • server/common/model/pom.xml
  • server/common/model/src/main/java/com/alipay/sofa/registry/common/model/sessionserver/DataChangeRequest.java
  • server/common/pom.xml
  • server/common/util/pom.xml
  • server/distribution/all/pom.xml
  • server/distribution/pom.xml
  • server/pom.xml
  • server/remoting/api/pom.xml
  • server/remoting/bolt/pom.xml
  • server/remoting/http/pom.xml
  • server/remoting/pom.xml
  • server/server/data/pom.xml
  • server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java
  • server/server/integration/pom.xml
  • server/server/meta/pom.xml
  • server/server/pom.xml
  • server/server/session/pom.xml
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskImpl.java
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java
  • server/server/shared/pom.xml
  • server/store/api/pom.xml
  • server/store/jdbc/pom.xml
  • server/store/jraft/pom.xml
  • server/store/pom.xml
  • test/pom.xml
🚧 Files skipped from review as they are similar to previous changes (21)
  • server/store/pom.xml
  • server/distribution/all/pom.xml
  • client/log/pom.xml
  • server/common/model/pom.xml
  • server/common/model/src/main/java/com/alipay/sofa/registry/common/model/sessionserver/DataChangeRequest.java
  • server/server/session/pom.xml
  • server/server/data/pom.xml
  • server/distribution/pom.xml
  • server/server/shared/pom.xml
  • server/store/jdbc/pom.xml
  • server/server/meta/pom.xml
  • server/common/util/pom.xml
  • test/pom.xml
  • client/all/pom.xml
  • server/server/integration/pom.xml
  • server/remoting/api/pom.xml
  • core/pom.xml
  • server/store/api/pom.xml
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskImpl.java
  • client/api/pom.xml
  • client/impl/pom.xml
🧰 Additional context used
🧬 Code graph analysis (1)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (1)
server/common/util/src/main/java/com/alipay/sofa/registry/collections/Maps.java (1)
  • Maps (21-44)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (26)
server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java (6)

30-39: LGTM! Solid initialization with appropriate concurrent data structures.

The use of AtomicInteger for realPubNum ensures visibility guarantees for lock-free reads (e.g., getRealPubNum()), while ConcurrentMap supports safe iteration in PublisherGroup. The initialization is clean and correct.


41-64: LGTM! Read operations are correctly implemented.

All read methods properly delegate to the underlying concurrent map or atomic counter. The implementations are straightforward with no side effects.


67-106: LGTM! Publisher counting logic is correct.

The null check prevents NPE, and the state-transition logic correctly maintains realPubNum:

  • New publisher: increment only if isPub()
  • Replacement: handle all four pub/unpub transition cases
  • The simplified implementation (lines 99-103) matches the expanded logic documented in the comments

108-111: LGTM! Clear operation is correct.

Both the map and counter are properly reset. The method relies on external synchronization (via PublisherGroup's write lock) to ensure atomicity of the compound operation, which aligns with the class's design.


113-120: LGTM! Remove logic is correct.

The method properly handles the null case (no-op if key doesn't exist) and only decrements the counter when an actual publisher (isPub()) is removed.


122-135: LGTM! Conditional remove logic is correct.

The null check prevents NPE, and the method correctly uses conditional removal (only removes if the value matches). The counter is decremented only when both conditions are met: successful deletion AND the deleted publisher had isPub() == true.

server/remoting/pom.xml (1)

8-8: LGTM! Coordinated version bump.

The parent version update to 6.6.2 aligns with the project-wide version upgrade across all modules.

server/store/jraft/pom.xml (1)

9-9: LGTM! Version synchronized.

Parent version correctly updated to 6.6.2.

server/server/pom.xml (1)

8-8: LGTM! Version synchronized.

server/remoting/http/pom.xml (1)

8-8: LGTM! Version synchronized.

client/pom.xml (1)

10-10: LGTM! Version synchronized.

server/remoting/bolt/pom.xml (1)

8-8: LGTM! Version synchronized.

server/common/pom.xml (1)

8-8: LGTM! Version synchronized.

pom.xml (2)

9-9: LGTM! Project version updated.

The project version bump to 6.6.2 is consistent with all child module updates.


76-76: Hessian 3.5.6 is publicly available in Maven Central.

The version upgrade is confirmed to use an official release. Maven Central contains the complete distribution with JAR, POM, source, and Javadoc artifacts, all properly signed with cryptographic checksums. GitHub confirms v3.5.6 as a published (non-prerelease) release.

server/pom.xml (1)

10-10: LGTM—routine version bump to 6.6.2.

The parent version update from 6.6.1-auto-regulator to 6.6.2 is correct and aligns with the broader version coordination across modules mentioned in the PR context.

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java (3)

35-51: LGTM!

The test correctly validates basic push and find operations with proper identity assertions.


53-122: Timing-sensitive test is appropriate for timeout verification.

The test correctly validates sequential timeout behavior. While timing-based tests can be flaky in heavily loaded CI environments, the generous timeouts (1-second intervals) should provide adequate stability.


124-213: Well-structured concurrency test.

The test effectively validates thread-safe concurrent operations with proper coordination (CountDownLatch), shutdown (ThreadGroup.interrupt), and final state verification (empty queue assertion).

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueue.java (7)

31-61: Well-documented class with proper initialization.

The class documentation clearly explains the dual data structure approach (Map for O(1) lookup, SkipList for deadline ordering) and synchronization strategy using ReadWriteLock.


63-76: LGTM!

The findTask method correctly uses read lock for thread-safe lookups without blocking other readers.


117-143: LGTM!

The popTimeoutTask method correctly implements single-task expiry with proper write lock protection and atomic removal from both data structures.


145-172: LGTM!

The popTimeoutTasks method correctly implements batch expiry with efficient early-break optimization (line 158-160) leveraging the deadline-sorted skip list.


174-187: LGTM!

The clear method correctly clears both data structures atomically under write lock protection.


189-204: LGTM!

The visitTasks method correctly provides read-locked iteration for testing purposes.


223-236: LGTM!

The getTaskLinkList method correctly provides a cloned snapshot for testing.

Comment on lines +216 to +236
class MockChangeTask implements ChangeTask<String> {

private String key;

private long deadline;

public MockChangeTask(String key, long deadline) {
this.key = key;
this.deadline = deadline;
}

@Override
public String key() {
return this.key;
}

@Override
public long deadline() {
return this.deadline;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

MockChangeTask must implement Comparable for ConcurrentSkipListSet ordering.

The MockChangeTask is inserted into ConcurrentSkipListSet<Task> in ChangeTaskQueue, which requires Task to implement Comparable<Task> for ordering by deadline. Without this, the skip list cannot maintain proper deadline-based ordering.

🔎 Proposed fix
-class MockChangeTask implements ChangeTask<String> {
+class MockChangeTask implements ChangeTask<String>, Comparable<MockChangeTask> {
 
   private String key;
 
   private long deadline;
 
   public MockChangeTask(String key, long deadline) {
     this.key = key;
     this.deadline = deadline;
   }
 
   @Override
   public String key() {
     return this.key;
   }
 
   @Override
   public long deadline() {
     return this.deadline;
   }
+
+  @Override
+  public int compareTo(MockChangeTask other) {
+    int deadlineCompare = Long.compare(this.deadline, other.deadline);
+    if (deadlineCompare != 0) {
+      return deadlineCompare;
+    }
+    // Break ties by key to ensure consistency
+    return this.key.compareTo(other.key);
+  }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class MockChangeTask implements ChangeTask<String> {
private String key;
private long deadline;
public MockChangeTask(String key, long deadline) {
this.key = key;
this.deadline = deadline;
}
@Override
public String key() {
return this.key;
}
@Override
public long deadline() {
return this.deadline;
}
}
class MockChangeTask implements ChangeTask<String>, Comparable<MockChangeTask> {
private String key;
private long deadline;
public MockChangeTask(String key, long deadline) {
this.key = key;
this.deadline = deadline;
}
@Override
public String key() {
return this.key;
}
@Override
public long deadline() {
return this.deadline;
}
@Override
public int compareTo(MockChangeTask other) {
int deadlineCompare = Long.compare(this.deadline, other.deadline);
if (deadlineCompare != 0) {
return deadlineCompare;
}
// Break ties by key to ensure consistency
return this.key.compareTo(other.key);
}
}
🤖 Prompt for AI Agents
In
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java
around lines 216 to 236, MockChangeTask is missing Comparable implementation
required by ConcurrentSkipListSet ordering; change the class to implement
Comparable<MockChangeTask> (or Comparable<Task> to match the Task interface) and
add an @Override compareTo method that orders by deadline (ascending) and uses
the key as a deterministic tiebreaker when deadlines are equal; keep existing
key() and deadline() methods and ensure compareTo returns negative/zero/positive
per standard contract so the ConcurrentSkipListSet can maintain correct
deadline-based ordering.

@hui-cha hui-cha changed the title 增加针对大推送的自适应攒批的能力 【WIP】增加针对大推送的自适应攒批的能力 Dec 30, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java (1)

216-236: MockChangeTask still missing Comparable implementation.

As previously noted, MockChangeTask must implement Comparable<MockChangeTask> (or Comparable<Task>) for ConcurrentSkipListSet to maintain deadline-based ordering. Without this, a ClassCastException will occur at runtime when the skip list attempts to order tasks.

🔎 Proposed fix
-class MockChangeTask implements ChangeTask<String> {
+class MockChangeTask implements ChangeTask<String>, Comparable<MockChangeTask> {
 
   private String key;
 
   private long deadline;
 
   public MockChangeTask(String key, long deadline) {
     this.key = key;
     this.deadline = deadline;
   }
 
   @Override
   public String key() {
     return this.key;
   }
 
   @Override
   public long deadline() {
     return this.deadline;
   }
+
+  @Override
+  public int compareTo(MockChangeTask other) {
+    int deadlineCompare = Long.compare(this.deadline, other.deadline);
+    if (deadlineCompare != 0) {
+      return deadlineCompare;
+    }
+    // Break ties by key to ensure consistency
+    return this.key.compareTo(other.key);
+  }
 }
🧹 Nitpick comments (1)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java (1)

19-27: Remove unused imports.

Several imports (Random, ConcurrentSkipListSet, CountDownLatch, CollectionUtils) are only referenced in the commented-out test. Clean up these unused imports.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b384043 and 78b28f5.

📒 Files selected for processing (1)
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (1)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java (1)

35-51: LGTM!

The test correctly verifies basic push and find operations with appropriate assertions.

Comment on lines +53 to +122
@Test
public void testPopTimeoutTasks() throws InterruptedException {
long now = System.currentTimeMillis();
long timeoutOne = now + 1000L;
long timeoutTwo = now + 2000L;
long timeoutThree = now + 3000L;

MockChangeTask taskOne = new MockChangeTask("Key1", timeoutOne);
MockChangeTask taskTwo = new MockChangeTask("Key2", timeoutTwo);
MockChangeTask taskThree = new MockChangeTask("Key3", timeoutThree);

ChangeTaskQueue<String, MockChangeTask> queue = new ChangeTaskQueue<>();
queue.pushTask(taskOne);
queue.pushTask(taskTwo);
queue.pushTask(taskThree);

Thread.sleep(TimeUnit.SECONDS.toMillis(1L));

// 第一次获取超时任务
List<MockChangeTask> timeoutTasks = queue.popTimeoutTasks(System.currentTimeMillis());
Assert.assertEquals(1, timeoutTasks.size());
Assert.assertEquals(taskOne, timeoutTasks.get(0));

MockChangeTask taskOneFromQueue = queue.findTask("Key1");
Assert.assertNull(taskOneFromQueue);

MockChangeTask taskTwoFromQueue = queue.findTask("Key2");
Assert.assertNotNull(taskTwoFromQueue);
Assert.assertSame(taskTwo, taskTwoFromQueue);

MockChangeTask taskThreeFromQueue = queue.findTask("Key3");
Assert.assertNotNull(taskThreeFromQueue);
Assert.assertSame(taskThree, taskThreeFromQueue);

Thread.sleep(TimeUnit.SECONDS.toMillis(1L));

// 第二次获取超时任务
timeoutTasks = queue.popTimeoutTasks(System.currentTimeMillis());
Assert.assertEquals(1, timeoutTasks.size());
Assert.assertEquals(taskTwo, timeoutTasks.get(0));

taskOneFromQueue = queue.findTask("Key1");
Assert.assertNull(taskOneFromQueue);

taskTwoFromQueue = queue.findTask("Key2");
Assert.assertNull(taskTwoFromQueue);

taskThreeFromQueue = queue.findTask("Key3");
Assert.assertNotNull(taskThreeFromQueue);
Assert.assertSame(taskThree, taskThreeFromQueue);

Thread.sleep(TimeUnit.SECONDS.toMillis(1L));

// 第三次获取超时任务
timeoutTasks = queue.popTimeoutTasks(System.currentTimeMillis());
Assert.assertEquals(1, timeoutTasks.size());
Assert.assertEquals(taskThree, timeoutTasks.get(0));

timeoutTasks = queue.popTimeoutTasks(System.currentTimeMillis());
Assert.assertEquals(0, timeoutTasks.size());

taskOneFromQueue = queue.findTask("Key1");
Assert.assertNull(taskOneFromQueue);

taskTwoFromQueue = queue.findTask("Key2");
Assert.assertNull(taskTwoFromQueue);

taskThreeFromQueue = queue.findTask("Key3");
Assert.assertNull(taskThreeFromQueue);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Test relies on Thread.sleep() timing which can be flaky.

This test uses Thread.sleep() and System.currentTimeMillis() to verify timeout behavior. Thread scheduling, GC pauses, and system load can cause timing drift, leading to flaky test failures in CI environments.

Consider using a virtual time approach (e.g., injecting a clock/time provider into ChangeTaskQueue) or adding timing tolerance to make the test more robust.

🤖 Prompt for AI Agents
In
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java
around lines 53-122, the test is flaky because it relies on Thread.sleep() and
System.currentTimeMillis(); modify ChangeTaskQueue to accept an injectable time
provider (e.g., a java.util.function.LongSupplier or java.time.Clock) and use
that provider wherever current time is required, then update this test to
construct the queue with a controllable fake clock (e.g., an AtomicLong-backed
LongSupplier), replace Thread.sleep() calls by advancing the fake time by the
required milliseconds between pops, and call popTimeoutTasks using the queue’s
time provider (or without passing System.currentTimeMillis()) so the test
deterministically simulates time progression without real sleeps.

Comment on lines 124 to 213
// @Test
// public void testConcurrentPushAndPop() throws InterruptedException {
// ChangeTaskQueue<String, MockChangeTask> queue = new ChangeTaskQueue<>();
//
// long timeout = TimeUnit.SECONDS.toMillis(1L);
// int keyRange = 10000;
//
// // 32 个 push 线程 + 1 个 pop 线程 = 33
// CountDownLatch latch = new CountDownLatch(33);
//
// // 这里计划单线程 pop,多线程 push
// // 创建 push 的线程
// ThreadGroup pushTaskGroup = new ThreadGroup("unit-test-push-task");
// for (int index = 0; index < 32; index++) {
// String threadName = "push-task-thread-" + index;
// Thread pushTaskThread =
// new Thread(pushTaskGroup, threadName) {
// @Override
// public void run() {
// try {
// Random random = new Random();
// while (true) {
// int next = random.nextInt(keyRange);
//
// String key = "Key" + next;
// long deadline = System.currentTimeMillis() + timeout;
//
// MockChangeTask task = new MockChangeTask(key, deadline);
// queue.pushTask(task);
//
// Thread.sleep(TimeUnit.MILLISECONDS.toMillis(50L));
// }
// } catch (InterruptedException interruptedException) {
// // 响应中断退出
// } finally {
// latch.countDown();
// }
// }
// };
// pushTaskThread.start();
// }
//
// // 开一个 pop 的线程
// String threadName = "pop-task-thread";
// Thread popTaskThread =
// new Thread(threadName) {
// @Override
// public void run() {
// try {
// while (true) {
// List<MockChangeTask> timeoutTasks =
// queue.popTimeoutTasks(System.currentTimeMillis());
// int size = CollectionUtils.size(timeoutTasks);
// System.out.println("pop timeout task size: " + size);
// Thread.sleep(TimeUnit.MILLISECONDS.toMillis(50L));
// }
// } catch (InterruptedException interruptedException) {
// // 响应中断退出
// } finally {
// latch.countDown();
// }
// }
// };
// popTaskThread.start();
//
// // 首先保持运行一段时间
// Thread.sleep(TimeUnit.SECONDS.toMillis(5L));
//
// // 然后中断 push 线程
// pushTaskGroup.interrupt();
//
// // 等一个 timeout + 50ms 的时间,让 pop 线程处理完所有的任务
// Thread.sleep(timeout + TimeUnit.MILLISECONDS.toMillis(50L));
//
// // 中断 pop 线程
// popTaskThread.interrupt();
//
// // 等待任务都停止了
// boolean result = latch.await(1, TimeUnit.SECONDS);
// Assert.assertTrue(result);
//
// // 检查队列状态,这里预期所有的任务都能正常被取出
// Map<String, MockChangeTask> taskMap = queue.getTaskMap();
// Assert.assertNotNull(taskMap);
// Assert.assertTrue(taskMap.isEmpty());
//
// ConcurrentSkipListSet<MockChangeTask> taskList = queue.getTaskLinkList();
// Assert.assertNotNull(taskList);
// Assert.assertTrue(taskList.isEmpty());
// }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Commented-out test should be resolved.

The commit message indicates this concurrent test was temporarily commented out ("注释掉这个单测试一下"). Commented test code should not be merged.

Please either:

  • Enable the test if it's valuable and stable
  • Remove it entirely if it's not needed
  • Create a follow-up issue to address any blocking problems and remove the commented code

Would you like me to help analyze whether this concurrent test can be stabilized and enabled, or should it be tracked separately?

🤖 Prompt for AI Agents
In
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java
around lines 124-213, there is a large commented-out concurrent test; remove or
resolve it before merging: either (A) re-enable and stabilize the test by
replacing Random with a deterministic seed or fixed sequence, remove indefinite
loops (use a bounded iteration or latch), replace Thread.sleep-based timing with
await/notify or CountDownLatch timeouts, and assert expected state
deterministically; or (B) if the test is flaky/unneeded, delete the commented
block and add a JIRA/issue reference in the commit message describing why it was
removed and what needs to be done to reintroduce a reliable concurrent test.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java (3)

48-117: Test relies on Thread.sleep() timing which can be flaky.

This test uses Thread.sleep() and System.currentTimeMillis() to verify timeout behavior. Thread scheduling, GC pauses, and system load can cause timing drift, leading to flaky test failures in CI environments.

Consider using a virtual time approach (e.g., injecting a clock/time provider into ChangeTaskQueue) or adding timing tolerance to make the test more robust.


119-208: Commented-out test should be resolved before merging.

The commit message indicates this concurrent test was temporarily commented out ("注释掉这个单测试一下"). Commented test code should not be merged.

Please either:

  • Enable the test if it's valuable and stable
  • Remove it entirely if it's not needed
  • Create a follow-up issue to address any blocking problems and remove the commented code

211-231: MockChangeTask must implement Comparable for ConcurrentSkipListSet ordering.

The MockChangeTask is inserted into ConcurrentSkipListSet<Task> in ChangeTaskQueue, which requires Task to implement Comparable<Task> for ordering by deadline. Without this, the skip list cannot maintain proper deadline-based ordering and will throw ClassCastException at runtime.

🔎 Proposed fix
-class MockChangeTask implements ChangeTask<String> {
+class MockChangeTask implements ChangeTask<String>, Comparable<MockChangeTask> {
 
   private String key;
 
   private long deadline;
 
   public MockChangeTask(String key, long deadline) {
     this.key = key;
     this.deadline = deadline;
   }
 
   @Override
   public String key() {
     return this.key;
   }
 
   @Override
   public long deadline() {
     return this.deadline;
   }
+
+  @Override
+  public int compareTo(MockChangeTask other) {
+    int deadlineCompare = Long.compare(this.deadline, other.deadline);
+    if (deadlineCompare != 0) {
+      return deadlineCompare;
+    }
+    // Break ties by key to ensure consistency
+    return this.key.compareTo(other.key);
+  }
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78b28f5 and 7f00bc7.

📒 Files selected for processing (2)
  • server/server/data/src/test/java/com/alipay/sofa/registry/server/data/timer/CacheCountTaskTest.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (1)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/ChangeTaskQueueTest.java (1)

30-46: LGTM! Basic push and find test is well-structured.

The test correctly verifies that tasks can be pushed to the queue and retrieved by key using assertSame for identity checks.

Comment on lines 30 to 50
// CacheCountTask task = new CacheCountTask();
// DataServerConfig cfg = TestBaseUtils.newDataConfig("testDc");
// task.setDataServerConfig(cfg);
//
// // npe
// Assert.assertFalse(task.count());
//
// DatumStorageDelegate datumStorageDelegate = TestBaseUtils.newLocalDatumDelegate("testDc", true);
// task.setDatumCache(datumStorageDelegate);
//
// cfg.setCacheCountIntervalSecs(0);
// Assert.assertFalse(task.init());
// // empty
// Assert.assertTrue(task.count());
//
// cfg.setCacheCountIntervalSecs(1);
// Publisher pub = TestBaseUtils.createTestPublisher("testDataId");
// datumStorageDelegate.getLocalDatumStorage().putPublisher("testDc", pub);
// // has item
// Assert.assertTrue(task.count());
// Assert.assertTrue(task.init());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "CacheCountTaskTest.java" | head -20

Repository: sofastack/sofa-registry

Length of output: 168


🏁 Script executed:

sed -n '1,60p' ./server/server/data/src/test/java/com/alipay/sofa/registry/server/data/timer/CacheCountTaskTest.java | cat -n

Repository: sofastack/sofa-registry

Length of output: 2477


Re-enable the test before merging.

The entire test body is commented out, disabling test coverage for CacheCountTask. The test validates critical functionality including NPE handling, configuration validation, cache behavior, and task initialization. Commented-out tests should not be merged.

🤖 Prompt for AI Agents
In
server/server/data/src/test/java/com/alipay/sofa/registry/server/data/timer/CacheCountTaskTest.java
around lines 30-50 the entire test body is commented out which disables
validation for NPE handling, configuration checks, cache behavior and task init;
restore the test by uncommenting the block, ensure any required test helpers
(TestBaseUtils, CacheCountTask, DataServerConfig, DatumStorageDelegate,
Publisher) are imported and available, fix any compile errors caused by API
changes, run the unit test locally to verify it passes, and commit the
uncommented, passing test.

@hui-cha hui-cha force-pushed the perf/cpu-mem-optimization branch from 7f00bc7 to 1626f27 Compare December 30, 2025 08:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/unit-test.yml (1)

27-30: Consider consolidating Maven commands to avoid redundant compilation.

The current sequence mvn compile -B && mvn clean test ... compiles the code, then immediately cleans (removing the compiled classes), causing Maven to recompile during the test phase. This doubles compilation time.

Consider using a single Maven invocation to avoid the redundancy.

🔎 Proposed fix
       run: |
         free -h
         cat /proc/meminfo | grep MemTotal
-        mvn compile -B && mvn clean test -DisSkipIntegrationTest=true "-Dtest.logging.level=ERROR" --fail-at-end --batch-mode
+        mvn clean compile test -DisSkipIntegrationTest=true "-Dtest.logging.level=ERROR" --fail-at-end --batch-mode
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f00bc7 and 1626f27.

📒 Files selected for processing (1)
  • .github/workflows/unit-test.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (1)
.github/workflows/unit-test.yml (1)

25-26: Good addition for OOM debugging.

The MAVEN_OPTS configuration with heap dump on OOM and explicit memory settings will help diagnose memory issues in CI. The heap dump path correctly aligns with the artifact upload configuration on line 36.

@hui-cha hui-cha force-pushed the perf/cpu-mem-optimization branch from 157eb6e to 8769fd6 Compare December 30, 2025 09:21
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8769fd6 and 4d8b2a7.

📒 Files selected for processing (2)
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/cache/SessionCacheServiceTest.java
  • server/store/jdbc/src/test/java/com/alipay/sofa/registry/jdbc/repository/impl/AppRevisionRepositoryTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: build

Comment on lines 30 to 89
// private final String dataInfoId = TestUtils.newDataInfoId("testDataInfoId");
// private final String dataCenter = "testDc";
//
// @Test
// public void testGet() {
// SessionDatumCacheService cacheService = new SessionDatumCacheService();
// cacheService.sessionServerConfig = TestUtils.newSessionConfig(dataCenter);
//
// DatumCacheGenerator generator = new DatumCacheGenerator();
// cacheService.setCacheGenerators(Collections.singletonMap(DatumKey.class.getName(), generator));
// generator.dataNodeService = Mockito.mock(DataNodeService.class);
//
// cacheService.init();
// DatumKey datumKey = new DatumKey(dataInfoId, Collections.singleton(dataCenter));
// Key key = new Key(DatumKey.class.getName(), datumKey);
// Assert.assertEquals(key, new Key(DatumKey.class.getName(), datumKey));
// Assert.assertTrue(key.toString(), key.toString().contains(dataInfoId));
// Assert.assertEquals(key.getEntityName(), DatumKey.class.getName());
// Assert.assertEquals(key.getEntityType(), datumKey);
// Assert.assertEquals(key.getEntityType().hashCode(), datumKey.hashCode());
//
// Value value = cacheService.getValueIfPresent(key);
// Assert.assertNull(value);
//
// value = cacheService.getValue(key);
// Assert.assertNull(value.getPayload());
//
// MultiSubDatum multiSubDatum =
// MultiSubDatum.of(
// SubDatum.normalOf(
// dataInfoId,
// dataCenter,
// 100,
// Collections.emptyList(),
// "testDataId",
// "testInstanceId",
// "testGroup",
// Lists.newArrayList(System.currentTimeMillis())));
//
// Mockito.when(
// generator.dataNodeService.fetch(Mockito.anyString(), Mockito.anySetOf(String.class)))
// .thenReturn(multiSubDatum);
// // invalidate the null value
// cacheService.invalidate(key);
// value = cacheService.getValue(key);
// Assert.assertEquals(value.getPayload(), multiSubDatum);
// value = cacheService.getValueIfPresent(key);
// Assert.assertEquals(value.getPayload(), multiSubDatum);
// cacheService.invalidate(key);
//
// value = cacheService.getValueIfPresent(key);
// Assert.assertNull(value);
//
// // touch remove listener
// for (int i = 0; i < 1000; i++) {
// datumKey = new DatumKey(dataInfoId + ":" + i, Collections.singleton(dataCenter));
// key = new Key(DatumKey.class.getName(), datumKey);
// cacheService.getValue(key);
// }
// }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n server/server/session/src/test/java/com/alipay/sofa/registry/server/session/cache/SessionCacheServiceTest.java | head -120

Repository: sofastack/sofa-registry

Length of output: 4533


🏁 Script executed:

wc -l server/server/session/src/test/java/com/alipay/sofa/registry/server/session/cache/SessionCacheServiceTest.java

Repository: sofastack/sofa-registry

Length of output: 180


Commented-out test must be addressed before merge.

The entire testGet method and its supporting fields (lines 30-89) are commented out, and the @Test annotation is also commented. This leaves the test class with zero runnable tests.

Commented-out code should not remain in the repository. Choose one of the following:

  • Fix the test if the PR changes broke it
  • Use @Ignore("reason") if temporary disablement is needed
  • Remove entirely if no longer relevant

Since the PR modifies session layer components, explain whether this test failure is related to your changes and how you plan to resolve it.

🤖 Prompt for AI Agents
In
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/cache/SessionCacheServiceTest.java
around lines 30-89 the entire testGet method and its supporting fields are
commented out leaving the class with no runnable tests; either restore and fix
the test (uncomment the dataInfoId/dataCenter fields and the @Test method,
update mocks/expectations to match any API changes introduced by this PR—e.g.,
adjust DatumCacheGenerator/DataNodeService interactions, constructors, or return
types—and run the unit tests to confirm it passes), or if the test is
temporarily invalid due to unrelated refactors annotate the method with
@Ignore("reason and tracking issue/PR") and remove the commented block, or
delete the test entirely if it is obsolete; in all cases add a brief commit
message explaining which option you chose and link to an issue/PR if deferring.

Comment on lines 49 to 212
// @Autowired private AppRevisionRepository appRevisionJdbcRepository;
//
// @Autowired private InterfaceAppsRepository interfaceAppsJdbcRepository;
//
// @Autowired private AppRevisionMapper appRevisionMapper;
//
// @Autowired private DefaultCommonConfig defaultCommonConfig;
//
// private List<AppRevision> appRevisionList;
//
// private static final int APP_REVISION_SIZE = 100;
// private Set<String> dataCenters = Sets.newHashSet();
//
// @Before
// public void buildAppRevision() {
// ((AppRevisionJdbcRepository) appRevisionJdbcRepository).init();
// appRevisionList = buildAppRevisions(APP_REVISION_SIZE);
//
// dataCenters.add(defaultCommonConfig.getDefaultClusterId());
// appRevisionJdbcRepository.setDataCenters(dataCenters);
// interfaceAppsJdbcRepository.setDataCenters(dataCenters);
//
// appRevisionJdbcRepository.startSynced();
// interfaceAppsJdbcRepository.startSynced();
// }
//
// private void register() throws Exception {
// // register
// for (AppRevision appRevisionRegister : appRevisionList) {
// appRevisionJdbcRepository.register(appRevisionRegister);
// }
// }
//
// private void queryAndCheck() {
// // query app_revision
// for (AppRevision appRevisionRegister : appRevisionList) {
// AppRevision revision =
// appRevisionJdbcRepository.queryRevision(appRevisionRegister.getRevision());
// Assert.assertEquals(appRevisionRegister.getAppName(), revision.getAppName());
// }
// interfaceAppsJdbcRepository.waitSynced();
// // query by interface
// for (AppRevision appRevisionRegister : appRevisionList) {
// for (Map.Entry<String, AppRevisionInterface> entry :
// appRevisionRegister.getInterfaceMap().entrySet()) {
// String dataInfoId = entry.getKey();
// InterfaceMapping appNames = interfaceAppsJdbcRepository.getAppNames(dataInfoId);
// Assert.assertTrue(appNames.getNanosVersion() > 0);
// Assert.assertTrue(appNames.getApps().size() == 1);
// Assert.assertTrue(appNames.getApps().contains(appRevisionRegister.getAppName()));
// }
// }
// }
//
// @Test
// public void registerAndQuery() throws Exception {
// register();
// queryAndCheck();
// }
//
// @Test
// public void revisionLoad() throws Exception {
// AppRevisionJdbcRepository repository = (AppRevisionJdbcRepository) appRevisionJdbcRepository;
//
// register();
// queryAndCheck();
//
// LoadingCache<String, AppRevision> cache = repository.getRevisions();
// Assert.assertEquals(cache.asMap().size(), APP_REVISION_SIZE);
// AppRevision notExistAppRevision = BuildNotExistRevision();
//
// appRevisionList.add(notExistAppRevision);
// appRevisionList.add(notExistAppRevision);
//
// for (AppRevision appRevisionRegister : appRevisionList) {
// cache.invalidate(appRevisionRegister.getRevision());
// }
// Assert.assertEquals(cache.asMap().size(), 0);
//
// // query app_revision
// for (AppRevision appRevisionRegister : appRevisionList) {
// AppRevision revision =
// appRevisionJdbcRepository.queryRevision(appRevisionRegister.getRevision());
// if (notExistAppRevision.getRevision().equals(appRevisionRegister.getRevision())) {
// Assert.assertNull(revision);
// } else {
// Assert.assertEquals(appRevisionRegister.getAppName(), revision.getAppName());
// }
// }
//
// Assert.assertEquals(cache.asMap().size(), APP_REVISION_SIZE);
// }
//
// class HeartbeatRunner extends LoopRunnable {
//
// @Override
// public void runUnthrowable() {
// for (AppRevision appRevision : appRevisionList) {
// boolean success = appRevisionJdbcRepository.heartbeat(appRevision.getRevision());
// if (!success) {
// try {
// appRevisionJdbcRepository.register(appRevision);
// } catch (Exception e) {
// e.printStackTrace();
// }
// }
// }
// }
//
// @Override
// public void waitingUnthrowable() {
// ConcurrentUtils.sleepUninterruptibly(1, TimeUnit.SECONDS);
// }
// }
//
// @Test
// public void heartbeatClean() throws Exception {
//
// registerAndQuery();
// appRevisionJdbcRepository.waitSynced();
// ((AppRevisionJdbcRepository) appRevisionJdbcRepository).cleanCache();
//
// for (AppRevision appRevision : appRevisionList) {
// boolean before = appRevisionJdbcRepository.heartbeat(appRevision.getRevision());
// Assert.assertTrue(before);
// List<AppRevisionDomain> querys =
// appRevisionMapper.queryRevision(dataCenters, appRevision.getRevision());
// Assert.assertTrue(!CollectionUtils.isEmpty(querys));
// }
//
// for (AppRevision appRevision : appRevisionList) {
// AppRevisionDomain domain =
// AppRevisionDomainConvertor.convert2Domain(
// defaultCommonConfig.getDefaultClusterId(), appRevision);
// domain.setDeleted(true);
// appRevisionMapper.replace(domain);
// }
// appRevisionJdbcRepository.waitSynced();
// for (AppRevision appRevision : appRevisionList) {
// boolean after = appRevisionJdbcRepository.heartbeat(appRevision.getRevision());
// Assert.assertFalse(after);
// Assert.assertNull(appRevisionJdbcRepository.queryRevision(appRevision.getRevision()));
// }
// ConcurrentUtils.createDaemonThread("heartbeatClean-test", new HeartbeatRunner()).start();
// ((AppRevisionJdbcRepository) appRevisionJdbcRepository).cleanCache();
// Thread.sleep(3000);
// for (AppRevision appRevision : appRevisionList) {
// boolean success = appRevisionJdbcRepository.heartbeat(appRevision.getRevision());
// Assert.assertTrue(success);
// AppRevisionDomain domain =
// AppRevisionDomainConvertor.convert2Domain(
// defaultCommonConfig.getDefaultClusterId(), appRevision);
// domain.setDeleted(true);
// appRevisionMapper.replace(domain);
// }
// }
//
// @Test
// public void testCountByApp() throws Exception {
// register();
// appRevisionJdbcRepository.waitSynced();
// Map<String, Integer> counts = appRevisionJdbcRepository.countByApp();
// Assert.assertEquals(APP_REVISION_SIZE, counts.size());
// }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if lines 49-212 in AppRevisionRepositoryTest.java are indeed commented out
wc -l server/store/jdbc/src/test/java/com/alipay/sofa/registry/jdbc/repository/impl/AppRevisionRepositoryTest.java

Repository: sofastack/sofa-registry

Length of output: 179


🏁 Script executed:

#!/bin/bash
# Read the specific lines 49-212 to verify the commenting claim
sed -n '49,212p' server/store/jdbc/src/test/java/com/alipay/sofa/registry/jdbc/repository/impl/AppRevisionRepositoryTest.java | head -50

Repository: sofastack/sofa-registry

Length of output: 2163


🏁 Script executed:

#!/bin/bash
# Search for other test files that might test AppRevisionRepository functionality
find . -name '*Test.java' -type f | xargs rg -l 'AppRevisionRepository' 2>/dev/null | grep -v AppRevisionRepositoryTest.java

Repository: sofastack/sofa-registry

Length of output: 427


🏁 Script executed:

#!/bin/bash
# Check if there are any uncommented @Test methods in AppRevisionRepositoryTest.java
rg -n '^[^/]*@Test' server/store/jdbc/src/test/java/com/alipay/sofa/registry/jdbc/repository/impl/AppRevisionRepositoryTest.java

Repository: sofastack/sofa-registry

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Check what AppRevisionRepository functionality is tested in other test files
rg -A 5 'AppRevisionRepository' ./server/server/session/src/test/java/com/alipay/sofa/registry/server/session/remoting/handler/AppRevisionSliceHandlerTest.java | head -30

Repository: sofastack/sofa-registry

Length of output: 677


🏁 Script executed:

#!/bin/bash
# Check InterfaceAppsIndexCleanerTest
rg -A 5 'AppRevisionRepository\|InterfaceAppsRepository' ./server/server/meta/src/test/java/com/alipay/sofa/registry/server/meta/cleaner/InterfaceAppsIndexCleanerTest.java | head -30

Repository: sofastack/sofa-registry

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Check MetadataCacheRegistryTest
rg -A 5 'AppRevisionRepository' ./server/server/session/src/test/java/com/alipay/sofa/registry/server/session/metadata/MetadataCacheRegistryTest.java | head -30

Repository: sofastack/sofa-registry

Length of output: 1191


🏁 Script executed:

#!/bin/bash
# Check if the entire test class is still runnable - look for the class definition and any test methods
head -50 server/store/jdbc/src/test/java/com/alipay/sofa/registry/jdbc/repository/impl/AppRevisionRepositoryTest.java

Repository: sofastack/sofa-registry

Length of output: 2366


🏁 Script executed:

#!/bin/bash
# Check if there are ANY @Test methods in the file (even below line 212)
rg -n '@Test' server/store/jdbc/src/test/java/com/alipay/sofa/registry/jdbc/repository/impl/AppRevisionRepositoryTest.java

Repository: sofastack/sofa-registry

Length of output: 122


🏁 Script executed:

#!/bin/bash
# Check the full structure of the test class - see what comes after line 212
tail -5 server/store/jdbc/src/test/java/com/alipay/sofa/registry/jdbc/repository/impl/AppRevisionRepositoryTest.java

Repository: sofastack/sofa-registry

Length of output: 257


🏁 Script executed:

#!/bin/bash
# Check git log to see when these lines were commented out and why
cd server/store/jdbc/src/test/java/com/alipay/sofa/registry/jdbc/repository/impl && git log --oneline -20 AppRevisionRepositoryTest.java

Repository: sofastack/sofa-registry

Length of output: 86


🏁 Script executed:

#!/bin/bash
# Check for any helper methods or non-test code in the class
rg -n 'buildAppRevisions|BuildNotExistRevision' server/store/jdbc/src/test/java/com/alipay/sofa/registry/jdbc/repository/impl/AppRevisionRepositoryTest.java

Repository: sofastack/sofa-registry

Length of output: 200


🏁 Script executed:

#!/bin/bash
# Check if buildAppRevisions and BuildNotExistRevision methods exist elsewhere in the file
cat -n server/store/jdbc/src/test/java/com/alipay/sofa/registry/jdbc/repository/impl/AppRevisionRepositoryTest.java | grep -E 'buildAppRevisions|BuildNotExistRevision' | head -20

Repository: sofastack/sofa-registry

Length of output: 207


🏁 Script executed:

#!/bin/bash
# Get the full context of the commit that disabled these tests
git show 4d8b2a7e --stat | head -30

Repository: sofastack/sofa-registry

Length of output: 1682


🏁 Script executed:

#!/bin/bash
# Check the parent class AbstractH2DbTestBase for helper methods
rg -n 'buildAppRevisions|BuildNotExistRevision' server/store/jdbc/src/test/java/com/alipay/sofa/registry/jdbc/

Repository: sofastack/sofa-registry

Length of output: 706


All test coverage has been removed from this integration test class.

Every @test method (4 tests total: registerAndQuery, revisionLoad, heartbeatClean, testCountByApp), setup method, and inner class have been commented out. This eliminates the only integration test validating actual JDBC layer behavior including AppRevision registration, querying, caching, and heartbeat logic against a real H2 database.

While other test files reference AppRevisionRepository, they use mocks instead of testing actual database interactions. This removes critical coverage for:

  • JDBC connection handling and transaction management
  • Cache loading, invalidation, and concurrency
  • SQL query correctness and data consistency
  • Heartbeat and cleanup mechanisms with real database state

The commit message ("判断下是否是这个用例" / "Check if this is the test case") suggests exploratory work. Clarify whether this was intentional or accidental, and restore these tests before merge. If they fail, fix the issues rather than disabling coverage.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.github/workflows/unit-test.yml (1)

24-24: Fix the typo in step name.

The step name still contains "Unit Testt" instead of "Unit Test". This typo was previously flagged but remains unfixed.

🔎 Proposed fix
-    - name: Unit Testt
+    - name: Unit Test
🧹 Nitpick comments (1)
server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java (1)

37-37: Consider reverting to explicit imports.

Wildcard imports (import java.util.*;) can reduce code clarity and may lead to naming conflicts. Consider using explicit imports for the specific classes needed (Collections, List, Map, Arrays).

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d8b2a7 and 5d3a805.

📒 Files selected for processing (4)
  • .github/workflows/unit-test.yml
  • server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java
  • server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupsTest.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayWorkerTest.java
✅ Files skipped from review due to trivial changes (1)
  • server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupsTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/push/LargeChangeAdaptiveDelayWorkerTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (3)
.github/workflows/unit-test.yml (2)

25-26: Good addition for debugging OOM issues.

The MAVEN_OPTS configuration correctly enables heap dumps on OutOfMemoryError, which will help diagnose memory issues during test execution.

Consider ensuring the artifact upload step (lines 29-34) handles cases where the heap dump file doesn't exist gracefully, since OOM may not always occur. The if: always() condition helps, but verify that upload-artifact@v4 won't fail the workflow if the file is missing.


34-34: Verify artifact upload handles missing heap dumps gracefully.

The path changed from a wildcard pattern (/tmp/*.hprof) to a specific file (/tmp/heap-dump.hprof), making it more precise but also more restrictive. If the JVM cannot write to the exact path specified (e.g., file conflicts, permissions), it may create a variant filename, and the artifact won't be captured.

Confirm that upload-artifact@v4 with the if: always() condition will not fail the workflow when the heap dump file is missing (which is the normal case when no OOM occurs). You may also consider adding if-no-files-found: ignore to the artifact upload step to make this behavior explicit:

🔎 Suggested enhancement
     - name: Upload heap dump
       if: always()
       uses: actions/upload-artifact@v4
       with:
         name: heap-dump
         path: /tmp/heap-dump.hprof
+        if-no-files-found: ignore
server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java (1)

390-457: LGTM! Test correctly validates publisher counting with UnPublisher support.

The test comprehensively validates the new getRealPubNum() functionality across all core operations (add, put, remove, clean). The UnPublisher test cases (lines 447-456) correctly verify that unpublished entries are excluded from the real count only after version advancement, which aligns with the PR's design that publishers are marked rather than immediately removed.

Comment on lines 44 to 388
// private static final SyncSlotAcceptorManager ACCEPT_ALL = request -> true;
//
// @Test
// public void testBaseOp() {
// final String dataId = "testDataInfoId";
// long start = DatumVersionUtil.nextId();
// Publisher publisher = TestBaseUtils.createTestPublisher(dataId);
// PublisherGroup group = new PublisherGroup(publisher.getDataInfoId(), "dc");
//
// Assert.assertEquals(group.dataInfoId, publisher.getDataInfoId());
// Assert.assertEquals(group.dataId, publisher.getDataId());
// Assert.assertEquals(group.instanceId, publisher.getInstanceId());
// Assert.assertEquals(group.group, publisher.getGroup());
// Assert.assertEquals(group.dataCenter, "dc");
// Assert.assertTrue(group.getVersion().getValue() > start);
//
// Datum datum = group.toDatum();
//
// Assert.assertEquals(group.dataInfoId, datum.getDataInfoId());
// Assert.assertEquals(group.dataId, datum.getDataId());
// Assert.assertEquals(group.instanceId, datum.getInstanceId());
// Assert.assertEquals(group.group, datum.getGroup());
// Assert.assertEquals(group.dataCenter, datum.getDataCenter());
// Assert.assertEquals(group.getPublishers().size(), datum.getPubMap().size());
// Assert.assertEquals(group.getVersion().getValue(), datum.getVersion());
//
// DatumVersion version = group.getVersion();
// DatumVersion v = group.addPublisher(publisher);
// Assert.assertTrue(v.getValue() > version.getValue());
// Assert.assertEquals(group.getPublishers().size(), 1);
// datum = group.toDatum();
// Assert.assertEquals(group.getPublishers().size(), datum.getPubMap().size());
// Assert.assertTrue(group.getPublishers().get(0) == datum.getPubMap().values().iterator().next());
//
// // add same pub, not change
// v = group.addPublisher(publisher);
// Assert.assertNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
//
// // add version older pub
// Publisher older = TestBaseUtils.cloneBase(publisher);
// older.setVersion(publisher.getVersion() - 1);
// v = group.addPublisher(older);
// Assert.assertNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
//
// // add registerTimestamp older pub
// older = TestBaseUtils.cloneBase(publisher);
// older.setRegisterTimestamp(publisher.getRegisterTimestamp() - 1);
// v = group.addPublisher(older);
// Assert.assertNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
//
// // add newer version
// Publisher newer = TestBaseUtils.createTestPublisher(dataId);
// newer.setRegisterId(publisher.getRegisterId());
// v = group.addPublisher(newer);
// Assert.assertNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
// Assert.assertTrue(group.getPublishers().get(0) == newer);
//
// v = group.addPublisher(newer);
// Assert.assertNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
// Assert.assertTrue(group.getPublishers().get(0) == newer);
//
// newer = TestBaseUtils.cloneBase(newer);
// List<ServerDataBox> dataList = Lists.newArrayList();
// dataList.add(new ServerDataBox("testDataBox1"));
// newer.setDataList(dataList);
// newer.setVersion(newer.getVersion() + 1);
//
// v = group.addPublisher(newer);
// Assert.assertNotNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
// Assert.assertTrue(group.getPublishers().get(0) == newer);
//
// final ProcessId mockProcessId = new ProcessId("xxx", System.currentTimeMillis(), 1, 1);
// v = group.clean(mockProcessId, CleanContinues.ALWAYS);
// Assert.assertNull(v);
//
// v = group.clean(null, CleanContinues.ALWAYS);
// Assert.assertNotNull(v);
// Assert.assertTrue(group.getPublishers().isEmpty());
//
// group.addPublisher(newer);
// v = group.clean(ServerEnv.PROCESS_ID, CleanContinues.ALWAYS);
// Assert.assertNotNull(v);
// Assert.assertTrue(group.getPublishers().isEmpty());
//
// group.addPublisher(newer);
// v = group.remove(ServerEnv.PROCESS_ID, Collections.EMPTY_MAP);
// Assert.assertNull(v);
//
// v = group.remove(mockProcessId, Collections.EMPTY_MAP);
// Assert.assertNull(v);
//
// v =
// group.remove(
// ServerEnv.PROCESS_ID,
// Collections.singletonMap(
// newer.getRegisterId(),
// new RegisterVersion(newer.getVersion() - 1, newer.getRegisterTimestamp())));
// Assert.assertNull(v);
// Assert.assertFalse(group.getPublishers().isEmpty());
//
// v =
// group.remove(
// ServerEnv.PROCESS_ID,
// Collections.singletonMap(
// newer.getRegisterId(),
// new RegisterVersion(newer.getVersion(), newer.getRegisterTimestamp() - 1)));
// Assert.assertNull(v);
// Assert.assertFalse(group.getPublishers().isEmpty());
//
// v =
// group.remove(
// ServerEnv.PROCESS_ID,
// Collections.singletonMap(
// newer.getRegisterId(),
// new RegisterVersion(newer.getVersion() - 1, newer.getRegisterTimestamp() - 1)));
// Assert.assertNull(v);
// Assert.assertFalse(group.getPublishers().isEmpty());
//
// v =
// group.remove(
// ServerEnv.PROCESS_ID,
// Collections.singletonMap(
// newer.getRegisterId(),
// new RegisterVersion(newer.getVersion() + 1, newer.getRegisterTimestamp())));
// Assert.assertNotNull(v);
// Assert.assertTrue(group.getPublishers().isEmpty());
//
// group.compact(Long.MAX_VALUE);
// group.addPublisher(newer);
//
// v =
// group.remove(
// ServerEnv.PROCESS_ID,
// Collections.singletonMap(
// newer.getRegisterId(),
// new RegisterVersion(newer.getVersion(), newer.getRegisterTimestamp() + 1)));
// Assert.assertNotNull(v);
// Assert.assertTrue(group.getPublishers().isEmpty());
//
// group.compact(Long.MAX_VALUE);
// group.addPublisher(newer);
//
// v =
// group.remove(
// ServerEnv.PROCESS_ID,
// Collections.singletonMap(
// newer.getRegisterId() + "aa",
// new RegisterVersion(newer.getVersion(), newer.getRegisterTimestamp())));
// Assert.assertNull(v);
// Assert.assertFalse(group.getPublishers().isEmpty());
//
// v =
// group.remove(
// ServerEnv.PROCESS_ID,
// Collections.singletonMap(
// newer.getRegisterId(),
// new RegisterVersion(newer.getVersion(), newer.getRegisterTimestamp())));
// Assert.assertNotNull(v);
// Assert.assertTrue(group.getPublishers().isEmpty());
//
// // has tombstone
// v = group.addPublisher(newer);
// Assert.assertNull(v);
// Assert.assertTrue(group.getPublishers().isEmpty());
//
// int compactCount = group.compact(Long.MIN_VALUE);
// Assert.assertEquals(0, compactCount);
// compactCount = group.compact(Long.MAX_VALUE);
// Assert.assertEquals(1, compactCount);
// // add again
//
// v = group.addPublisher(newer);
// Assert.assertNotNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
//
// // remove processId=null, no tombstone
// v =
// group.remove(
// null,
// Collections.singletonMap(
// newer.getRegisterId(),
// new RegisterVersion(newer.getVersion(), newer.getRegisterTimestamp())));
// Assert.assertNotNull(v);
// Assert.assertTrue(group.getPublishers().isEmpty());
//
// v = group.addPublisher(newer);
// Assert.assertNotNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
//
// Assert.assertTrue(group.toString().length() != 0);
// }
//
// @Test
// public void testUpdate() {
// final String dataId = "testDataInfoId";
// Publisher publisher = TestBaseUtils.createTestPublisher(dataId);
// PublisherGroup group = new PublisherGroup(publisher.getDataInfoId(), "dc");
// DatumVersion startV = group.getVersion();
// DatumVersion v = group.put(Lists.newArrayList(publisher, publisher));
// Assert.assertNotNull(v);
// Assert.assertTrue(v.getValue() > startV.getValue());
// Assert.assertEquals(group.getPublishers().size(), 1);
// Assert.assertEquals(group.getPublishers().get(0), publisher);
//
// Publisher older = TestBaseUtils.cloneBase(publisher);
// older.setVersion(older.getVersion() - 1);
// v = group.put(Lists.newArrayList(older));
// Assert.assertNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
// Assert.assertEquals(group.getPublishers().get(0), publisher);
//
// Publisher newer = TestBaseUtils.cloneBase(publisher);
// newer.setVersion(publisher.getVersion() + 1);
// v = group.put(Lists.newArrayList(older, newer));
//
// Assert.assertNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
// Assert.assertEquals(group.getPublishers().get(0), newer);
//
// newer = TestBaseUtils.cloneBase(newer);
// List<ServerDataBox> dataList = Lists.newArrayList();
// dataList.add(new ServerDataBox("testDataBox"));
// newer.setDataList(dataList);
//
// newer.setVersion(newer.getVersion() + 1);
// v = group.put(Lists.newArrayList(older, newer));
//
// Assert.assertNotNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
// Assert.assertEquals(group.getPublishers().get(0), newer);
//
// Assert.assertEquals(group.getSessionProcessIds().size(), 1);
// Assert.assertTrue(group.getSessionProcessIds().contains(ServerEnv.PROCESS_ID));
//
// String sessionIp = "notFound";
// Map<String, DatumSummary> sessionSummary = Maps.newHashMap();
// Map<String, Map<String, RegisterVersion>> sessionPubVersions = Maps.newHashMap();
//
// group.foreach(
// DatumBiConsumer.publisherGroupBiConsumer(
// publisher.getDataInfoId(), sessionPubVersions, Sets.newHashSet(sessionIp), ACCEPT_ALL));
// for (Entry<String, Map<String, RegisterVersion>> entry : sessionPubVersions.entrySet()) {
// sessionSummary.put(entry.getKey(), new DatumSummary(group.dataInfoId, entry.getValue()));
// }
//
// DatumSummary summary = sessionSummary.get(sessionIp);
// Assert.assertEquals(sessionSummary.size(), 1);
// Assert.assertEquals(summary.getDataInfoId(), group.dataInfoId);
// Assert.assertEquals(summary.size(), 0);
// Assert.assertEquals(summary.getPublisherVersions().size(), 0);
//
// sessionIp = publisher.getTargetAddress().getIpAddress();
//
// sessionSummary = Maps.newHashMap();
// sessionPubVersions = Maps.newHashMap();
// group.foreach(
// DatumBiConsumer.publisherGroupBiConsumer(
// publisher.getDataInfoId(), sessionPubVersions, Sets.newHashSet(sessionIp), ACCEPT_ALL));
// for (Entry<String, Map<String, RegisterVersion>> entry : sessionPubVersions.entrySet()) {
// sessionSummary.put(entry.getKey(), new DatumSummary(group.dataInfoId, entry.getValue()));
// }
//
// summary = sessionSummary.get(sessionIp);
// Assert.assertEquals(summary.getPublisherVersions().size(), 1);
//
// final ProcessId mockProcessId =
// new ProcessId(ServerEnv.PROCESS_ID.getHostAddress(), System.currentTimeMillis(), 1, 1);
//
// Publisher add = TestBaseUtils.createTestPublisher(dataId);
// add.setTargetAddress(URL.valueOf("192.168.1.1:1000"));
// add.setSessionProcessId(mockProcessId);
// v = group.put(Lists.newArrayList(add));
// Assert.assertNotNull(v);
// Assert.assertEquals(group.getPublishers().size(), 2);
// Assert.assertTrue(group.getPublishers().contains(newer));
// Assert.assertTrue(group.getPublishers().contains(add));
//
// sessionIp = publisher.getTargetAddress().getIpAddress();
//
// sessionSummary = Maps.newHashMap();
// sessionPubVersions = Maps.newHashMap();
// group.foreach(
// DatumBiConsumer.publisherGroupBiConsumer(
// publisher.getDataInfoId(), sessionPubVersions, Sets.newHashSet(sessionIp), ACCEPT_ALL));
// for (Entry<String, Map<String, RegisterVersion>> entry : sessionPubVersions.entrySet()) {
// sessionSummary.put(entry.getKey(), new DatumSummary(group.dataInfoId, entry.getValue()));
// }
// summary = sessionSummary.get(sessionIp);
// Assert.assertEquals(summary.getPublisherVersions().size(), 2);
//
// Map<String, RegisterVersion> publisherVersions = Maps.newHashMap();
// group.foreach(
// DatumBiConsumer.publisherGroupBiConsumer(
// publisher.getDataInfoId(), publisherVersions, ACCEPT_ALL));
// summary = new DatumSummary(group.dataInfoId, publisherVersions);
//
// Assert.assertEquals(summary.getPublisherVersions().size(), 2);
// Assert.assertEquals(group.getSessionProcessIds().size(), 2);
// Assert.assertTrue(group.getSessionProcessIds().contains(ServerEnv.PROCESS_ID));
// Assert.assertTrue(group.getSessionProcessIds().contains(mockProcessId));
//
// Map<String, Publisher> conns = group.getByConnectId(ConnectId.of("unknown:999", "unknown:999"));
// Assert.assertTrue(conns.isEmpty());
//
// conns = group.getByConnectId(add.connectId());
// Assert.assertEquals(conns.size(), 1);
// Assert.assertEquals(conns.get(add.getRegisterId()), add);
//
// conns = group.getByConnectId(newer.connectId());
// Assert.assertEquals(conns.size(), 1);
// Assert.assertEquals(conns.get(newer.getRegisterId()), newer);
//
// v =
// group.remove(
// mockProcessId, Collections.singletonMap(add.getRegisterId(), add.registerVersion()));
// Assert.assertNotNull(v);
// Assert.assertEquals(group.getPublishers().size(), 1);
//
// Assert.assertEquals(group.getSessionProcessIds().size(), 1);
// Assert.assertTrue(group.getSessionProcessIds().contains(ServerEnv.PROCESS_ID));
//
// sessionSummary = Maps.newHashMap();
// sessionPubVersions = Maps.newHashMap();
// group.foreach(
// DatumBiConsumer.publisherGroupBiConsumer(
// publisher.getDataInfoId(), sessionPubVersions, Sets.newHashSet(sessionIp), ACCEPT_ALL));
// for (Entry<String, Map<String, RegisterVersion>> entry : sessionPubVersions.entrySet()) {
// sessionSummary.put(entry.getKey(), new DatumSummary(dataId, entry.getValue()));
// }
// summary = sessionSummary.get(sessionIp);
// Assert.assertEquals(summary.getPublisherVersions().size(), 1);
//
// conns = group.getByConnectId(add.connectId());
// Assert.assertEquals(conns.size(), 0);
//
// conns = group.getByConnectId(newer.connectId());
// Assert.assertEquals(conns.size(), 1);
// Assert.assertEquals(conns.get(newer.getRegisterId()), newer);
// }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, check if the file exists and confirm the commented-out tests
git ls-files | grep -i PublisherGroupTest.java

Repository: sofastack/sofa-registry

Length of output: 166


🏁 Script executed:

#!/bin/bash
# Read the file to confirm the commented-out code at lines 44-388
wc -l server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java

Repository: sofastack/sofa-registry

Length of output: 170


🏁 Script executed:

#!/bin/bash
# Get the beginning of the file to see what's at lines 40-60
sed -n '40,60p' server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java | cat -n

Repository: sofastack/sofa-registry

Length of output: 1068


🏁 Script executed:

#!/bin/bash
# Check for all @Test annotations in the file to understand test structure
rg -n '@Test' server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java

Repository: sofastack/sofa-registry

Length of output: 107


🏁 Script executed:

#!/bin/bash
# Search for commented-out test methods in this specific file
rg -n '//\s*@Test' server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java

Repository: sofastack/sofa-registry

Length of output: 107


🏁 Script executed:

#!/bin/bash
# Check the actual state of lines 44-388 to see commented code
sed -n '44,100p' server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java | cat -n

Repository: sofastack/sofa-registry

Length of output: 3106


🏁 Script executed:

#!/bin/bash
# Check what's at line 390 (the third commented @Test)
sed -n '388,410p' server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java | cat -n

Repository: sofastack/sofa-registry

Length of output: 1159


🏁 Script executed:

#!/bin/bash
# Get a count of total commented lines in the file
grep -c '^//' server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java

Repository: sofastack/sofa-registry

Length of output: 70


🏁 Script executed:

#!/bin/bash
# Check the full structure - how many test methods exist and which are commented
sed -n '1,458p' server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java | grep -n -E '(public void test|@Test)'

Repository: sofastack/sofa-registry

Length of output: 219


🏁 Script executed:

#!/antml:#!/bin/bash
# Check for NotThreadSafePublisherMap in the PR changes
fd PublisherGroup.java | head -5

Repository: sofastack/sofa-registry

Length of output: 162


🏁 Script executed:

#!/bin/bash
# Search for NotThreadSafePublisherMap in the codebase
rg -l 'NotThreadSafePublisherMap' --type java

Repository: sofastack/sofa-registry

Length of output: 269


🏁 Script executed:

#!/bin/bash
# Look at the actual PublisherGroup implementation to understand storage changes
find . -name 'PublisherGroup.java' -not -path '*/test/*' -type f

Repository: sofastack/sofa-registry

Length of output: 164


🏁 Script executed:

#!/bin/bash
# Read PublisherGroup.java to understand the storage mechanism
wc -l server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/PublisherGroup.java

Repository: sofastack/sofa-registry

Length of output: 166


🏁 Script executed:

#!/bin/bash
# Check what fields PublisherGroup uses for storage
rg -n 'NotThreadSafePublisherMap|publishers|publisherMap' server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/PublisherGroup.java | head -20

Repository: sofastack/sofa-registry

Length of output: 145


🏁 Script executed:

#!/bin/bash
# Look at the class definition and key fields
sed -n '1,100p' server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/PublisherGroup.java | cat -n

Repository: sofastack/sofa-registry

Length of output: 4661


🏁 Script executed:

#!/bin/bash
# Check the git history/status to see if this is a WIP PR
git log --oneline -20 | head -20

Repository: sofastack/sofa-registry

Length of output: 87


🏁 Script executed:

#!/antml:#!/bin/bash
# Check the commit messages related to commented tests
git log --all --oneline --grep="comment" -10

Repository: sofastack/sofa-registry

Length of output: 420


🏁 Script executed:

#!/bin/bash
# Check the current status to see if this is a WIP branch
git status | head -20

Repository: sofastack/sofa-registry

Length of output: 132


🏁 Script executed:

#!/bin/bash
# Look at the current branch
git branch -v

Repository: sofastack/sofa-registry

Length of output: 291


🏁 Script executed:

#!/antml:#!/bin/bash
# Check the diff to see what changed in the test file
git diff HEAD server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java | head -100

Repository: sofastack/sofa-registry

Length of output: 49


🏁 Script executed:

#!/an/bash
# Count how many lines are commented in the test file
awk '/^\/\//{count++} END{print "Commented lines:", count}' server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java

Repository: sofastack/sofa-registry

Length of output: 87


🏁 Script executed:

#!/bin/bash
# Check if there are any active (non-commented) test methods in the file
sed -n '1,458p' server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java | grep -E '^[^/].*@Test|^[^/].*public void test'

Repository: sofastack/sofa-registry

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Look at the end of the test file to see if there are any active tests
tail -50 server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java | cat -n

Repository: sofastack/sofa-registry

Length of output: 2503


🏁 Script executed:

#!/bin/bash
# Verify if there are ANY uncommented test methods or assertions in the file
sed -n '1,458p' server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java | grep -v '^//' | grep -E '@Test|public void|Assert\.' | head -20

Repository: sofastack/sofa-registry

Length of output: 49


🏁 Script executed:

#!/bin/bash
# Check the complete file structure to confirm everything is commented
head -3 server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java && tail -3 server/server/data/src/test/java/com/alipay/sofa/registry/server/data/cache/PublisherGroupTest.java

Repository: sofastack/sofa-registry

Length of output: 262


🏁 Script executed:

#!/bin/bash
# Look at NotThreadSafePublisherMap to understand if it's new
wc -l server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java
head -50 server/server/data/src/main/java/com/alipay/sofa/registry/server/data/cache/NotThreadSafePublisherMap.java | tail -30

Repository: sofastack/sofa-registry

Length of output: 859


Uncomment and fix all disabled tests before merge.

All test methods in this file have been commented out (414 of 458 lines). This includes three complete test methods:

  • testBaseOp() covering add, remove, clean, version handling, and tombstone operations
  • testUpdate() covering put operations, session tracking, and connection management
  • testPublisherCount() covering publisher counting logic

With the entire test suite disabled, there is no verification that your changes to PublisherGroup and the new NotThreadSafePublisherMap implementation function correctly. The commit message ("尝试找一下有问题的用例") indicates you're debugging which tests fail, but disabling all tests hides any regressions introduced by the storage mechanism change.

Action required: Fix the test failures, uncomment all test methods, and ensure they pass before this PR can be merged.

@hui-cha hui-cha force-pushed the perf/cpu-mem-optimization branch from 3df8a4a to 30f1bcb Compare December 30, 2025 10:49
@hui-cha hui-cha force-pushed the perf/cpu-mem-optimization branch from c8b2daf to b384043 Compare December 30, 2025 11:43
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6aef58f and 26b773b.

📒 Files selected for processing (2)
  • pom.xml
  • test/pom.xml
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/pom.xml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (1)
pom.xml (1)

9-9: LGTM!

Version bump to 6.6.2 aligns with the PR objectives.

<junit.vintage.version>${junit.version}.2</junit.vintage.version>
<junit.platform.version>1.0.2</junit.platform.version>
<sofa.hessian.version>3.3.12</sofa.hessian.version>
<sofa.hessian.version>3.5.6</sofa.hessian.version>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if Hessian 3.5.6 is available in Maven Central
echo "=== Checking Maven Central for Hessian 3.5.6 ==="
curl -s "https://search.maven.org/solrsearch/select?q=g:com.alipay.sofa+AND+a:hessian+AND+v:3.5.6&rows=5&wt=json" | jq '.response.docs[] | {id: .id, version: .v, timestamp: .timestamp}' 2>/dev/null || echo "Query failed or jq not available"

echo -e "\n=== Direct artifact check via Maven Central API ==="
curl -s "https://search.maven.org/artifact/com.alipay.sofa/hessian/3.5.6/jar" -I 2>&1 | head -10

echo -e "\n=== Checking Alipay's public nexus (alternative) ==="
curl -s "https://oss.sonatype.org/service/local/lucene/search?q=hessian&g=com.alipay.sofa&v=3.5.6" 2>&1 | head -20 || echo "Alternative repo not accessible"

Repository: sofastack/sofa-registry

Length of output: 742


🏁 Script executed:

#!/bin/bash
# Check for repository declarations in pom.xml
echo "=== Checking pom.xml for repository configurations ==="
grep -n "<repositories>" pom.xml -A 50 | head -60

echo -e "\n=== Checking for Alipay/Sofastack repository URLs ==="
grep -i "alipay\|sofastack\|repo" pom.xml | head -20

Repository: sofastack/sofa-registry

Length of output: 4028


Hessian 3.5.6 is not available in public Maven repositories.

Verification confirms that version 3.5.6 does not exist in Maven Central or OSS Sonatype. The build depends on internal Alipay repositories (mvn.test.alipay.net and mvn.dev.alipay.net) that are not publicly accessible. This contradicts the earlier indication that the official version would be released—it has not been published to public repositories. This creates a build dependency on proprietary Alipay infrastructure, which will prevent external developers from building this project.

🤖 Prompt for AI Agents
In pom.xml at line 76, the property <sofa.hessian.version> is set to 3.5.6 which
is not published to public Maven repositories; update the project to use a
Hessian version that exists on Maven Central (or another public repo) or remove
reliance on internal Alipay repos. Replace the property value with a publicly
available version (verify the exact latest available version on Maven Central
before changing), remove any internal-only repository entries that enable 3.5.6,
and then run a build to confirm the artifact resolves; if you must keep 3.5.6
temporarily, add a clear comment and a ticket to publish or vendor the artifact
so external contributors can build.

pom.xml Outdated
Comment on lines 636 to 638
<!-- <argLine>@{argLine} -Xmx4096m -XX:PermSize=256m -XX:MaxPermSize=512m</argLine> -->
<!-- modify at 25.12.31: 这里调大堆大小是因为发现 master 分支也突然跑不过 ut 了,先调大,后续分析下为啥需要这么多内存 -->
<argLine>@{argLine} -Xmx12g -XX:PermSize=512m -XX:MaxPermSize=1g</argLine>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use correct JDK 8 memory options and investigate excessive memory requirement.

Two major concerns with the test memory configuration:

  1. Incorrect JVM options for JDK 8: The configuration uses -XX:PermSize and -XX:MaxPermSize, which are JDK 7 options that were removed in JDK 8 (the project targets JDK 8 per lines 65-66). Line 635's comment correctly notes JDK 8 should use -XX:MetaspaceSize and -XX:MaxMetaspaceSize instead.

  2. Excessive memory allocation without root cause analysis: Increasing heap from 4GB to 12GB (3x) is a concerning workaround. The comment acknowledges this is temporary pending investigation, but such excessive memory requirements often indicate:

    • Memory leaks in tests
    • Improper test isolation
    • Inefficient test data setup
    • Tests that should be integration tests rather than unit tests
🔎 Recommended configuration using correct JDK 8 options
-    <!-- 根据JDK版本配置,jdk8: -Xmx1024m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m -->
-    <!-- <argLine>@{argLine} -Xmx4096m -XX:PermSize=256m -XX:MaxPermSize=512m</argLine> -->
-    <!-- modify at 25.12.31: 这里调大堆大小是因为发现 master 分支也突然跑不过 ut 了,先调大,后续分析下为啥需要这么多内存 -->
-    <argLine>@{argLine} -Xmx12g -XX:PermSize=512m -XX:MaxPermSize=1g</argLine>
+    <!-- 根据JDK版本配置,jdk8: -Xmx1024m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m -->
+    <!-- TODO: Investigate why tests require 12GB heap - potential memory leak or inefficient test setup -->
+    <argLine>@{argLine} -Xmx12g -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=1g</argLine>

Would you like help identifying which tests are consuming excessive memory? I can generate scripts to profile test memory usage by class or module.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In pom.xml around lines 636-638, the argLine uses JDK7-only options
(-XX:PermSize/-XX:MaxPermSize) and an overly large heap; replace those with JDK8
options (-XX:MetaspaceSize and -XX:MaxMetaspaceSize), revert the heap to a sane
default (e.g., -Xmx4g) and run the test suite; if tests still require >4g, add
targeted memory profiling (heap dumps or per-test memory measurement) to find
leaks/inefficient tests and only then increase Xmx after fixing root causes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pom.xml (2)

76-76: Verify Hessian 3.5.6 availability before merge.

Per the previous discussion, confirm that com.alipay.sofa:hessian:3.5.6 has been released to Maven Central before merging this PR. The Hessian PR was linked (sofastack/sofa-hessian#124), but public CI will fail if the artifact is not yet published.

#!/bin/bash
# Check if Hessian 3.5.6 is available in Maven Central
curl -s "https://search.maven.org/solrsearch/select?q=g:com.alipay.sofa+AND+a:hessian+AND+v:3.5.6&rows=1&wt=json" | jq '.response.numFound'

636-638: Use correct JDK 8 memory options (same issue as flagged previously).

This has the same issue as previously noted: -XX:PermSize and -XX:MaxPermSize are JDK 7 options that don't exist in JDK 8. Replace with -XX:MetaspaceSize and -XX:MaxMetaspaceSize.

🔎 Recommended fix
-    <!-- <argLine>@{argLine} -Xmx4096m -XX:PermSize=256m -XX:MaxPermSize=512m</argLine> -->
-    <!-- modify at 25.12.31: 这里调大堆大小是因为发现 master 分支也突然跑不过 ut 了,先调大,后续分析下为啥需要这么多内存 -->
-    <argLine>@{argLine} -Xmx14g -XX:PermSize=512m -XX:MaxPermSize=1g</argLine>
+    <!-- TODO: Investigate why tests require 14GB heap - potential memory leak or inefficient test setup -->
+    <argLine>@{argLine} -Xmx14g -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=1g</argLine>
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d37cbf and 5181f93.

📒 Files selected for processing (2)
  • pom.xml
  • test/pom.xml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (2)
test/pom.xml (1)

8-8: LGTM!

Version update to 6.6.2 is consistent with the root pom.

pom.xml (1)

9-9: LGTM!

Version bump to 6.6.2 is appropriate for the new release.

test/pom.xml Outdated
Comment on lines 125 to 127
<!-- modify at 25.12.31: 这里调大堆大小是因为发现 master 分支也突然跑不过 ut 了,先调大,后续分析下为啥需要这么多内存 -->
<!-- <argLine>@{argLine} -Xmx6144m -XX:PermSize=256m -XX:MaxPermSize=512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dfile.encoding=UTF8</argLine> -->
<argLine>@{argLine} -Xmx14g -XX:PermSize=512m -XX:MaxPermSize=1g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dfile.encoding=UTF8</argLine>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use correct JDK 8 memory options.

The project targets JDK 8 (per maven.compiler.source=1.8 in root pom), but -XX:PermSize and -XX:MaxPermSize are JDK 7 options that were removed in JDK 8. These options will be silently ignored. Use -XX:MetaspaceSize and -XX:MaxMetaspaceSize instead.

Also, 14GB heap is a significant increase. The comment acknowledges this needs investigation—consider creating a tracking issue.

🔎 Recommended fix using correct JDK 8 options
-                            <!-- modify at 25.12.31: 这里调大堆大小是因为发现 master 分支也突然跑不过 ut 了,先调大,后续分析下为啥需要这么多内存 -->
-                            <!-- <argLine>@{argLine} -Xmx6144m -XX:PermSize=256m -XX:MaxPermSize=512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dfile.encoding=UTF8</argLine> -->
-                            <argLine>@{argLine} -Xmx14g -XX:PermSize=512m -XX:MaxPermSize=1g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dfile.encoding=UTF8</argLine>
+                            <!-- TODO: Investigate why tests require 14GB heap - potential memory leak or inefficient test setup -->
+                            <argLine>@{argLine} -Xmx14g -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=1g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dfile.encoding=UTF8</argLine>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<!-- modify at 25.12.31: 这里调大堆大小是因为发现 master 分支也突然跑不过 ut 了,先调大,后续分析下为啥需要这么多内存 -->
<!-- <argLine>@{argLine} -Xmx6144m -XX:PermSize=256m -XX:MaxPermSize=512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dfile.encoding=UTF8</argLine> -->
<argLine>@{argLine} -Xmx14g -XX:PermSize=512m -XX:MaxPermSize=1g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dfile.encoding=UTF8</argLine>
<!-- TODO: Investigate why tests require 14GB heap - potential memory leak or inefficient test setup -->
<argLine>@{argLine} -Xmx14g -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=1g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dfile.encoding=UTF8</argLine>
🤖 Prompt for AI Agents
In test/pom.xml around lines 125 to 127, the argLine uses JDK7 PermSize options
and a very large 14g heap; replace -XX:PermSize and -XX:MaxPermSize with the
JDK8 equivalents -XX:MetaspaceSize and -XX:MaxMetaspaceSize (adjust values as
appropriate) and consider reducing the -Xmx value to a safer size while opening
a tracking issue to investigate why tests require such large heap memory; ensure
the new argLine uses -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
-Dfile.encoding=UTF8 along with the Metaspace options.

@hui-cha hui-cha force-pushed the perf/cpu-mem-optimization branch from 5181f93 to 6ee7aed Compare January 5, 2026 08:25
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Fix all issues with AI Agents 🤖
In
@server/server/session/src/test/java/com/alipay/sofa/registry/server/session/utils/UnitTestGCMonitor.java:
- Around line 101-115: stopMonitoring() fails to set the running flag to false,
preventing proper restart; after interrupting and nulling monitoryThread in
stopMonitoring(), assign this.running = false (while still holding the lock) so
the state reflects the monitor has stopped; update the method that references
running (stopMonitoring and any restart logic) to rely on this flag accordingly.
- Around line 182-186: In UnitTestGCMonitor (inside the while(true) loop where
Thread.interrupted() is checked), the code currently logs an interruption but
keeps looping; update the interruption handling to actually exit the loop by
adding a break (or return if this is inside the run() method) immediately after
the LOGGER.error call so the thread stops instead of continuing.
- Around line 82-99: The startMonitoring() method never sets the running flag,
so calling it multiple times spawns multiple threads and stopMonitoring()’s
early-exit check never works; inside startMonitoring() (in the same locked
section after verifying !this.running and before creating/starting the Thread),
set this.running = true, then create and start the Thread and assign
this.monitoryThread so stopMonitoring() can observe the flag and stop the single
monitor thread correctly.
🧹 Nitpick comments (5)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/connections/ConnectionsServiceTest.java (1)

37-86: Consider splitting into focused test methods with descriptive names.

The test method test() covers multiple scenarios: empty state checks, mocked connectivity testing, and filtering logic. Splitting this into separate, focused test methods with descriptive names would improve maintainability and make test failures easier to diagnose.

Example structure:

  • testGetConnectionsReturnsEmptyWhenNoConnections()
  • testGetConnectionsWithMockedServerReturnsClientAddresses()
  • testGetIpConnectsFiltersConnectionsByRemoteIp()
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/utils/UnitTestGCMonitor.java (2)

54-54: Typo in field name.

monitoryThread should be monitorThread.

🔎 Proposed fix
-  private Thread monitoryThread;
+  private Thread monitorThread;

126-127: Using LOGGER.error() for informational messages.

These log statements are routine operational info, not errors. Consider using LOGGER.info() or LOGGER.debug() for better log hygiene and to avoid noise in error monitoring.

Also applies to: 133-133, 141-145

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/providedata/FetchClientOffAddressService.java (2)

390-398: Consider adding error handling for guaranteed cleanup.

The shutdown method calls super.shutdownWatchDog() before closing internal components. If the superclass method throws an exception, clientManagerProcessor and clientOpenFailWatchDog won't be closed, potentially leaving test resources uncleaned.

🔎 Proposed fix with try-finally
 @VisibleForTesting
 @Override
 public void shutdownWatchDog() {
-  super.shutdownWatchDog();
-
-  this.clientManagerProcessor.close();
-  this.clientOpenFailWatchDog.close();
+  try {
+    super.shutdownWatchDog();
+  } finally {
+    this.clientManagerProcessor.close();
+    this.clientOpenFailWatchDog.close();
+  }
 }

44-44: Consider upgrading to Apache Commons Collections 4.x.

The project uses Apache Commons Collections 3.2.2 (via the import org.apache.commons.collections.CollectionUtils), which is from the deprecated 3.x version. The modern version 4.x (org.apache.commons.collections4) includes bug fixes and improvements. If feasible, update the dependency to 4.x and change the import accordingly.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5181f93 and 6ee7aed.

📒 Files selected for processing (14)
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/providedata/FetchClientOffAddressService.java
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/providedata/FetchDataInfoIDBlackListService.java
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/store/AbstractDataManager.java
  • server/server/session/src/main/java/com/alipay/sofa/registry/server/session/store/DataIndexer.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/connections/ConnectionsServiceTest.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/converter/pb/BoltEncodeTest.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/providedata/FetchClientOffServiceTest.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/providedata/FetchDataInfoIDBlackListServiceTest.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/store/DataCacheTest.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/store/SessionDataStoreTest.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/store/SessionInterestsTest.java
  • server/server/session/src/test/java/com/alipay/sofa/registry/server/session/utils/UnitTestGCMonitor.java
  • server/server/shared/src/main/java/com/alipay/sofa/registry/server/shared/providedata/AbstractFetchPersistenceSystemProperty.java
  • server/server/shared/src/main/java/com/alipay/sofa/registry/server/shared/providedata/AbstractFetchSystemPropertyService.java
🧰 Additional context used
🧬 Code graph analysis (4)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/store/SessionDataStoreTest.java (4)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/ConnectId.java (1)
  • ConnectId (31-151)
server/common/util/src/main/java/com/alipay/sofa/registry/util/StringFormatter.java (1)
  • StringFormatter (22-29)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/PublisherUtils.java (1)
  • PublisherUtils (36-119)
server/common/util/src/main/java/com/alipay/sofa/registry/util/ConcurrentUtils.java (1)
  • ConcurrentUtils (34-122)
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/providedata/FetchDataInfoIDBlackListService.java (1)
server/common/util/src/main/java/com/alipay/sofa/registry/util/CollectionUtils.java (1)
  • CollectionUtils (31-76)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/connections/ConnectionsServiceTest.java (1)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/ConnectId.java (1)
  • ConnectId (31-151)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/store/DataCacheTest.java (5)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/store/DataInfo.java (1)
  • DataInfo (29-180)
server/common/util/src/main/java/com/alipay/sofa/registry/net/NetUtil.java (1)
  • NetUtil (42-228)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/store/URL.java (1)
  • URL (32-244)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/constants/ValueConstants.java (1)
  • ValueConstants (26-200)
server/common/model/src/main/java/com/alipay/sofa/registry/common/model/store/BaseInfo.java (1)
  • BaseInfo (33-448)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (18)
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/converter/pb/BoltEncodeTest.java (2)

34-36: LGTM! Import organization follows Java conventions.

The JUnit imports are correctly placed and required for the test annotations and assertions used in the file.


47-47: Minor formatting adjustment - acceptable.

The additional blank line provides visual separation before the test method.

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/connections/ConnectionsServiceTest.java (2)

39-52: Good resource management pattern.

The try-finally structure with null-safe initialization ensures that sessionInterests and sessionWatchers are properly cleaned up even if exceptions occur during test setup. This prevents resource leaks in the test execution.


77-85: Proper resource cleanup with defensive null checks.

The finally block correctly checks for null before calling shutdownWatchDog() on both resources, preventing potential NPEs while ensuring cleanup always occurs.

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/utils/UnitTestGCMonitor.java (3)

1-34: LGTM!

License header and imports are appropriate for the GC monitoring functionality.


192-199: LGTM for JDK 8 focus.

The FGC detection covers common JDK 8 collectors. If future JDK versions (ZGC, Shenandoah) need support, additional patterns would be required.


201-232: LGTM!

The heap dump generation logic is well-implemented with proper directory creation and error handling.

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/providedata/FetchClientOffServiceTest.java (1)

93-96: LGTM! Proper test resource cleanup.

The addition of shutdownWatchDog() ensures the watchdog thread is terminated after the test suite completes, preventing resource leaks.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/store/DataIndexer.java (2)

26-30: LGTM! Improved import clarity.

Replacing the wildcard import with explicit imports improves readability and avoids potential naming conflicts.


149-152: LGTM! Test-visible shutdown hook added.

The shutdownWatchDog() method provides a clean way for tests to terminate the indexer refresher thread, preventing resource leaks during test execution.

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/store/SessionInterestsTest.java (1)

40-50: LGTM! Enhanced test isolation and cleanup.

The refactored lifecycle methods ensure:

  • Each test gets a fresh SessionInterests instance via @Before
  • Watchdog resources are properly released via @After
  • Nullifying interests prevents accidental reuse between tests
server/server/session/src/main/java/com/alipay/sofa/registry/server/session/providedata/FetchDataInfoIDBlackListService.java (1)

233-239: LGTM! Test-visible shutdown hook with proper cleanup order.

The shutdownWatchDog() override correctly:

  • Calls super.shutdownWatchDog() first to ensure parent cleanup
  • Then closes the internal watchDog component
  • Is properly scoped for test-only usage via @VisibleForTesting
server/server/session/src/test/java/com/alipay/sofa/registry/server/session/providedata/FetchDataInfoIDBlackListServiceTest.java (1)

77-220: LGTM! Excellent refactor for guaranteed resource cleanup.

The refactored test now uses a try-finally pattern that ensures:

  • Resources are properly shut down even if the test fails or throws an exception
  • Null checks prevent NPEs if initialization fails early
  • Test behavior remains unchanged while improving reliability

This is a best practice for test resource management.

server/server/session/src/main/java/com/alipay/sofa/registry/server/session/store/AbstractDataManager.java (1)

259-262: LGTM: Clean test hook for watchdog shutdown.

The @VisibleForTesting annotation appropriately signals that this method is for test cleanup. The delegation to connectDataIndexer.shutdownWatchDog() provides a clean way for tests to terminate background threads gracefully.

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/store/DataCacheTest.java (1)

54-95: LGTM: Robust test cleanup pattern applied consistently.

The try-finally refactoring ensures that shutdownWatchDog() is invoked even if tests encounter exceptions. The null-safe cleanup pattern (initialize to null, check before shutdown) prevents potential NPEs and guarantees proper resource cleanup across all test methods.

Also applies to: 97-147, 149-172, 174-208, 210-236, 308-385, 387-470, 472-542, 544-602

server/server/session/src/test/java/com/alipay/sofa/registry/server/session/store/SessionDataStoreTest.java (1)

38-135: LGTM: Consistent cleanup pattern applied to both test methods.

The try-finally blocks ensure proper watchdog shutdown in both test() and testConcurrent(). The null-safe cleanup guards prevent NPEs while guaranteeing resource cleanup even when tests fail during setup or execution.

Also applies to: 142-260

server/server/shared/src/main/java/com/alipay/sofa/registry/server/shared/providedata/AbstractFetchSystemPropertyService.java (1)

169-172: LGTM: Test hook for watchdog shutdown.

The @VisibleForTesting shutdownWatchDog() method provides a clean way for tests to terminate the background WatchDog thread via watchDog.close(), ensuring proper resource cleanup in test scenarios.

server/server/shared/src/main/java/com/alipay/sofa/registry/server/shared/providedata/AbstractFetchPersistenceSystemProperty.java (1)

130-133: LGTM: Test hook for watchdog shutdown.

The @VisibleForTesting shutdownWatchDog() method mirrors the implementation in AbstractFetchSystemPropertyService, providing consistent test cleanup semantics across the codebase by delegating to watchDog.close().

Comment on lines +82 to +99
public void startMonitoring() {
this.lock.lock();
try {
if (this.running) {
LOGGER.error("[UnitTestGCMonitor] GC 监控已经开启了");
return;
}

LOGGER.error("[UnitTestGCMonitor] 开始监控 GC 情况");

Thread monitoryThread = new Thread(this, "UnitTestGCMonitorThread");
monitoryThread.start();

this.monitoryThread = monitoryThread;
} finally {
this.lock.unlock();
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing running = true assignment causes multiple bugs.

The running flag is never set to true in startMonitoring(). This causes:

  1. Multiple calls to startMonitoring() will spawn multiple monitor threads
  2. stopMonitoring() will always exit early (line 104 check always passes)
🔎 Proposed fix
   public void startMonitoring() {
     this.lock.lock();
     try {
       if (this.running) {
         LOGGER.error("[UnitTestGCMonitor] GC 监控已经开启了");
         return;
       }

       LOGGER.error("[UnitTestGCMonitor] 开始监控 GC 情况");

       Thread monitoryThread = new Thread(this, "UnitTestGCMonitorThread");
       monitoryThread.start();

       this.monitoryThread = monitoryThread;
+      this.running = true;
     } finally {
       this.lock.unlock();
     }
   }
🤖 Prompt for AI Agents
In
@server/server/session/src/test/java/com/alipay/sofa/registry/server/session/utils/UnitTestGCMonitor.java
around lines 82-99, The startMonitoring() method never sets the running flag, so
calling it multiple times spawns multiple threads and stopMonitoring()’s
early-exit check never works; inside startMonitoring() (in the same locked
section after verifying !this.running and before creating/starting the Thread),
set this.running = true, then create and start the Thread and assign
this.monitoryThread so stopMonitoring() can observe the flag and stop the single
monitor thread correctly.

Comment on lines +101 to +115
public void stopMonitoring() {
this.lock.lock();
try {
if (!this.running) {
LOGGER.error("[UnitTestGCMonitor] GC 监控已经停止");
return;
}

// 中断线程,然后退出
this.monitoryThread.interrupt();
this.monitoryThread = null;
} finally {
this.lock.unlock();
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing running = false assignment.

After interrupting the thread, running should be set to false to allow restarting monitoring if needed.

🔎 Proposed fix
   public void stopMonitoring() {
     this.lock.lock();
     try {
       if (!this.running) {
         LOGGER.error("[UnitTestGCMonitor] GC 监控已经停止");
         return;
       }

       // 中断线程,然后退出
       this.monitoryThread.interrupt();
       this.monitoryThread = null;
+      this.running = false;
     } finally {
       this.lock.unlock();
     }
   }
🤖 Prompt for AI Agents
In
@server/server/session/src/test/java/com/alipay/sofa/registry/server/session/utils/UnitTestGCMonitor.java
around lines 101-115, stopMonitoring() fails to set the running flag to false,
preventing proper restart; after interrupting and nulling monitoryThread in
stopMonitoring(), assign this.running = false (while still holding the lock) so
the state reflects the monitor has stopped; update the method that references
running (stopMonitoring and any restart logic) to rely on this flag accordingly.

Comment on lines +182 to +186
if (Thread.interrupted()) {
// 响应中断
LOGGER.error("[UnitTestGCMonitor] 监控 GC 线程被中断,退出!");
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Thread interrupt check doesn't exit the loop.

When Thread.interrupted() returns true, the code logs but continues in the while(true) loop. Add a break or return statement to properly exit.

🔎 Proposed fix
         if (Thread.interrupted()) {
           // 响应中断
           LOGGER.error("[UnitTestGCMonitor] 监控 GC 线程被中断,退出!");
+          return;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (Thread.interrupted()) {
// 响应中断
LOGGER.error("[UnitTestGCMonitor] 监控 GC 线程被中断,退出!");
}
}
if (Thread.interrupted()) {
// 响应中断
LOGGER.error("[UnitTestGCMonitor] 监控 GC 线程被中断,退出!");
return;
}
}
🤖 Prompt for AI Agents
In
@server/server/session/src/test/java/com/alipay/sofa/registry/server/session/utils/UnitTestGCMonitor.java
around lines 182-186, In UnitTestGCMonitor (inside the while(true) loop where
Thread.interrupted() is checked), the code currently logs an interruption but
keeps looping; update the interruption handling to actually exit the loop by
adding a break (or return if this is inside the run() method) immediately after
the LOGGER.error call so the thread stops instead of continuing.

@hui-cha hui-cha changed the title 【WIP】增加针对大推送的自适应攒批的能力 增加针对大推送的自适应攒批的能力 Jan 5, 2026
@huanglongchao huanglongchao merged commit db2e321 into sofastack:master Jan 5, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants