Оптимизация хранения семантических токенов, фикс расчёта дельт #3706

nixel2007 · 2025-12-28T10:56:10Z

Описание

Связанные задачи

Closes

Чеклист

Общие

Ветка PR обновлена из develop
Отладочные, закомментированные и прочие, не имеющие смысла участки кода удалены
Изменения покрыты тестами
Обязательные действия перед коммитом выполнены (запускал команду gradlew precommit)

Для диагностик

Описание диагностики заполнено для обоих языков (присутствуют файлы для обоих языков, для русского заполнено все подробно, перевод на английский можно опустить)

Дополнительно

Summary by CodeRabbit

Refactor
- Reworked semantic tokens pipeline to use compact array-based token storage and delta encoding for improved performance while preserving API compatibility.
- Enhanced edit computation to better handle same-line text insertions and line-offset scenarios; caching updated to match new representation.
Tests
- Added a unit test validating optimal delta generation for same-line insertions.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…nt[] для повышения производительности

coderabbitai · 2025-12-28T10:56:26Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

This change refactors SemanticTokensProvider to store delta-encoded token data as int[] instead of List, adds toDeltaEncodedArray/toList(int[]) helpers, and rewrites computeEdits to operate on int[] token groups (size 5) with line-offset-aware suffix matching and updated caching logic.

Changes

Cohort / File(s)	Summary
Token Data Storage & Delta Encoding Optimization `src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java`	Replaced `List<Integer>` with `int[]` for delta-encoded token storage in `CachedTokenData` and the pipeline. Added `toDeltaEncodedArray` and `toList(int[])`. Reworked `computeEdits(int[] prev, int[] curr)` to work on arrays, compute edits by token groups (size 5), handle line-offsets via `findSuffixMatchWithOffset`, and update caching/serialization paths.
Tests `src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java`	Added `deltaWithTextInsertedOnSameLine_shouldReturnOptimalDelta` to validate same-line insert produces an optimal single edit and that delta size is smaller than full tokens.

Sequence Diagram(s)

(omitted — changes are internal to a single provider and do not introduce new multi-component control flow)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

semanticTokens #3510 — Introduces the SemanticTokensProvider and related semantic-tokens plumbing; closely related to the provider changes here.

Suggested labels

hacktoberfest-accepted

Suggested reviewers

theshadowco

Poem

🐰 Arrays hop in, lists take flight,
Five-token beats through day and night,
Offsets matched with clever art,
Deltas shrink and play their part,
A rabbit cheers — small code, big delight! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title clearly describes the two main changes: optimization of semantic token storage and fix for delta calculation, which directly align with the core modifications shown in the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/optimize-delta-semantic-tokens

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

nixel2007 · 2025-12-28T10:58:22Z

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

+      if (prevDeltaLine == currDeltaLine) {
+        // Полное совпадение
+        suffixMatch++;
+      } else if (!foundBoundary && currDeltaLine - prevDeltaLine == lineOffset) {


@copilot учитывает ли это вставку текста без перевода строки?

@coderabbitai ну или ты ответь

@nixel2007, да, код учитывает вставку текста без перевода строки, но обрабатывает её иначе, чем вставку/удаление целых строк.

При вставке текста на той же строке (без \n):

Токены на этой строке получат изменённые значения deltaStart (смещение позиции)

Токены на последующих строках сохранят те же deltaLine значения

Алгоритм обнаружит такие изменения через проверку всех полей токена в findSuffixMatchWithOffset:

// Сначала проверяем все поля кроме deltaLine for (int j = 1; j < tokenSize; j++) { // j=1 это deltaStart if (prev[prevIdx + j] != curr[currIdx + j]) { otherFieldsMatch = false; break; } }

Если deltaStart отличается, токены не считаются совпадающими и попадут в область редактирования.

Оптимизация с lineOffset специально работает для вертикальных сдвигов (вставка/удаление строк), когда deltaLine изменяется, но сами токены остаются идентичными. Для горизонтальных сдвигов (вставка в той же строке) такой оптимизации нет, но это корректное поведение — позиции токенов действительно изменились и требуют обновления.

Copilot · 2025-12-28T10:58:32Z

@nixel2007 I've opened a new pull request, #3707, to work on those changes. Once the pull request is ready, I'll request review from you.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java (1)
376-389: Minor: first flag is redundant.

The first flag handling is correct but slightly redundant. Since prevLine and prevChar are initialized to 0, the first iteration would compute the same values without the conditional:

deltaLine = tokenEntry.line() - 0 = tokenEntry.line()

deltaStart = tokenEntry.start() - 0 = tokenEntry.start()

This is a minor readability observation; the current code is clear in intent.
🔎 Optional simplification
     var prevLine = 0;
     var prevChar = 0;
     var index = 0;
-    var first = true;

     for (SemanticTokenEntry tokenEntry : sorted) {
-      int deltaLine = first ? tokenEntry.line() : (tokenEntry.line() - prevLine);
+      int deltaLine = tokenEntry.line() - prevLine;
       int prevCharOrZero = (deltaLine == 0) ? prevChar : 0;
-      int deltaStart = first ? tokenEntry.start() : (tokenEntry.start() - prevCharOrZero);
+      int deltaStart = tokenEntry.start() - prevCharOrZero;

       data[index++] = deltaLine;
       data[index++] = deltaStart;
       data[index++] = tokenEntry.length();
       data[index++] = tokenEntry.type();
       data[index++] = tokenEntry.modifiers();

       prevLine = tokenEntry.line();
       prevChar = tokenEntry.start();
-      first = false;
     }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e09504 and 65674c2.

📒 Files selected for processing (1)

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

🧰 Additional context used

📓 Path-based instructions (2)

src/main/java/**/*.java

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

src/main/java/**/*.java: Follow the Style Guide for code formatting and conventions
Use Lombok annotations to reduce boilerplate code and enable annotation processing in IDE
Write JavaDoc for public APIs and include comments for complex logic
Use meaningful, descriptive names for classes and methods following Java naming conventions
Optimize imports before committing; DO NOT optimize imports across the entire project unless specifically working on that task

Files:

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

src/**/*.java

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use Java 17 for language and follow Java naming conventions

Files:

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

🧠 Learnings (1)

📚 Learning: 2025-11-19T09:02:03.154Z

Learnt from: theshadowco
Repo: 1c-syntax/bsl-language-server PR: 3610
File: src/main/java/com/github/_1c_syntax/bsl/languageserver/folding/QueryPackageFoldingRangeSupplier.java:26-26
Timestamp: 2025-11-19T09:02:03.154Z
Learning: В проекте bsl-language-server используется кастомная версия ANTLR4 от 1c-syntax (io.github.1c-syntax:antlr4), которая включает базовый класс org.antlr.v4.runtime.Tokenizer с методами getAst() и getTokens(). SDBLTokenizer и BSLTokenizer наследуются от этого базового класса.

Applied to files:

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: build
GitHub Check: Benchmark
GitHub Check: build (25, windows-latest)
GitHub Check: build (25, ubuntu-latest)
GitHub Check: build (17, windows-latest)
GitHub Check: build (17, ubuntu-latest)
GitHub Check: build (21, ubuntu-latest)
GitHub Check: build (21, windows-latest)
GitHub Check: build (21, macOS-latest)
GitHub Check: CodeQL analysis (java)
GitHub Check: Agent
GitHub Check: build (25, windows-latest)
GitHub Check: build (21, windows-latest)
GitHub Check: build (21, ubuntu-latest)
GitHub Check: build (25, macOS-latest)
GitHub Check: build (25, ubuntu-latest)
GitHub Check: build (17, macOS-latest)
GitHub Check: build (17, ubuntu-latest)
GitHub Check: build (17, windows-latest)
GitHub Check: build (21, macOS-latest)

🔇 Additional comments (6)

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java (6)

94-97: Good optimization using primitive arrays.

Switching from List<Integer> to int[] reduces memory overhead and avoids boxing costs. Since CachedTokenData is keyed by resultId and equality on the record itself isn't used for lookups, the reference-based array equality is not a concern here.

113-120: Clean separation: primitives internally, boxing only at API boundary.

The approach of computing with int[] and converting via toList() only when constructing the LSP response is efficient and correct.

220-290: Well-structured delta computation with token grouping.

The algorithm correctly:

Finds the first differing token while tracking the accumulated line position.

Computes the line offset to account for inserted/deleted lines.

Finds matching suffix tokens with line offset tolerance.

Generates a single edit covering the changed range.

The logic handles the semantic token structure (groups of 5) properly.

292-344: Correct handling of delta-encoded line offset at boundary.

The suffix matching logic correctly implements the insight that in delta encoding, only the first token after an edit point will have its deltaLine shifted by lineOffset. The foundBoundary flag ensures we only accept one such boundary token, and all subsequent suffix tokens must match exactly. This is the correct behavior for semantic token delta encoding.

395-397: LGTM!

Clean and idiomatic conversion. Using toList() (which returns an unmodifiable list) is appropriate for the LSP4J response.

134-159: Thread safety consideration for cache operations.

The sequence of get → put → remove on tokenCache is not atomic. However, since each resultId is unique (UUID-based) and clients maintain their own resultId sequence, there's no practical race condition. Different clients won't share or interfere with each other's cached data.

If you anticipate scenarios where the same previousResultId could be used concurrently (unlikely with LSP's request-response model), consider verifying that LSP clients serialize delta requests per document.

Copilot

Pull request overview

This PR optimizes the storage and manipulation of semantic tokens by replacing List<Integer> with primitive int[] arrays, reducing memory overhead and boxing costs. Additionally, it fixes the delta calculation algorithm to correctly handle line offsets when tokens are inserted or deleted in the middle of a document.

Key Changes:

Changed internal token data storage from List<Integer> to int[] for better memory efficiency
Rewrote the computeEdits method with a more sophisticated algorithm that accounts for delta-encoded line positions
Added findSuffixMatchWithOffset helper method to correctly match suffix tokens when lines are inserted/deleted

Copilot · 2025-12-28T11:01:25Z

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

+    outer:
+    for (int i = 0; i < minTokens; i++) {
+      int base = i * TOKEN_SIZE;
+      for (int j = 0; j < TOKEN_SIZE; j++) {
+        if (prev[base + j] != curr[base + j]) {
+          firstDiffToken = i;
+          break outer;


The label name 'outer' is too generic and doesn't convey meaningful information about the loop's purpose. Consider using a more descriptive name like 'findFirstDiff' or 'searchPrefixMatch' to make the control flow clearer.

Suggested change

outer:

for (int i = 0; i < minTokens; i++) {

int base = i * TOKEN_SIZE;

for (int j = 0; j < TOKEN_SIZE; j++) {

if (prev[base + j] != curr[base + j]) {

firstDiffToken = i;

break outer;

findFirstDiff:

for (int i = 0; i < minTokens; i++) {

int base = i * TOKEN_SIZE;

for (int j = 0; j < TOKEN_SIZE; j++) {

if (prev[base + j] != curr[base + j]) {

firstDiffToken = i;

break findFirstDiff;

Copilot · 2025-12-28T11:01:25Z

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

-    while (prefixMatch < minSize && previousData.get(prefixMatch).equals(currentData.get(prefixMatch))) {
-      prefixMatch++;
+  private static List<SemanticTokensEdit> computeEdits(int[] prev, int[] curr) {
+    final int TOKEN_SIZE = 5;


The constant TOKEN_SIZE is defined locally in multiple methods (computeEdits and findSuffixMatchWithOffset). Consider extracting this as a class-level private static final constant to avoid duplication and ensure consistency across methods. This would make the code more maintainable if the token structure ever changes.

Copilot · 2025-12-28T11:01:25Z

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

+  private static List<SemanticTokensEdit> computeEdits(int[] prev, int[] curr) {
+    final int TOKEN_SIZE = 5;
+
+    int prevTokenCount = prev.length / TOKEN_SIZE;
+    int currTokenCount = curr.length / TOKEN_SIZE;
+
+    if (prevTokenCount == 0 && currTokenCount == 0) {
+      return List.of();
    }

-    // If both are identical, return empty edits
-    if (prefixMatch == previousData.size() && prefixMatch == currentData.size()) {
+    // Находим первый отличающийся токен и одновременно вычисляем сумму deltaLine для prefix
+    int firstDiffToken = 0;
+    int prefixAbsLine = 0;
+    int minTokens = Math.min(prevTokenCount, currTokenCount);
+
+    outer:
+    for (int i = 0; i < minTokens; i++) {
+      int base = i * TOKEN_SIZE;
+      for (int j = 0; j < TOKEN_SIZE; j++) {
+        if (prev[base + j] != curr[base + j]) {
+          firstDiffToken = i;
+          break outer;
+        }
+      }
+      prefixAbsLine += prev[base]; // накапливаем deltaLine
+      firstDiffToken = i + 1;
+    }
+
+    // Если все токены одинаковые
+    if (firstDiffToken == minTokens && prevTokenCount == currTokenCount) {
      return List.of();
    }

-    // Find the last differing index (from the end)
-    int suffixMatch = 0;
-    while (suffixMatch < minSize - prefixMatch
-      && previousData.get(previousData.size() - 1 - suffixMatch)
-      .equals(currentData.get(currentData.size() - 1 - suffixMatch))) {
-      suffixMatch++;
+    // Вычисляем смещение строк инкрементально от prefixAbsLine
+    int prevSuffixAbsLine = prefixAbsLine;
+    for (int i = firstDiffToken; i < prevTokenCount; i++) {
+      prevSuffixAbsLine += prev[i * TOKEN_SIZE];
    }
+    int currSuffixAbsLine = prefixAbsLine;
+    for (int i = firstDiffToken; i < currTokenCount; i++) {
+      currSuffixAbsLine += curr[i * TOKEN_SIZE];
+    }
+    int lineOffset = currSuffixAbsLine - prevSuffixAbsLine;
+
+    // Находим последний отличающийся токен с учётом смещения строк
+    int suffixMatchTokens = findSuffixMatchWithOffset(prev, curr, firstDiffToken, lineOffset, TOKEN_SIZE);
+
+    // Вычисляем границы редактирования
+    int deleteEndToken = prevTokenCount - suffixMatchTokens;
+    int insertEndToken = currTokenCount - suffixMatchTokens;

-    // Calculate the range to replace
-    int deleteStart = prefixMatch;
-    int deleteCount = previousData.size() - prefixMatch - suffixMatch;
-    int insertEnd = currentData.size() - suffixMatch;
+    int deleteStart = firstDiffToken * TOKEN_SIZE;
+    int deleteCount = (deleteEndToken - firstDiffToken) * TOKEN_SIZE;
+    int insertEnd = insertEndToken * TOKEN_SIZE;
+
+    if (deleteCount == 0 && deleteStart == insertEnd) {
+      return List.of();
+    }

-    // Extract the data to insert
-    List<Integer> insertData = currentData.subList(prefixMatch, insertEnd);
+    // Создаём список для вставки из среза массива
+    List<Integer> insertData = toList(Arrays.copyOfRange(curr, deleteStart, insertEnd));

    var edit = new SemanticTokensEdit();
    edit.setStart(deleteStart);
    edit.setDeleteCount(deleteCount);
    if (!insertData.isEmpty()) {
-      edit.setData(new ArrayList<>(insertData));
+      edit.setData(insertData);
    }

    return List.of(edit);
  }


The test coverage for the new delta calculation algorithm should be expanded. The existing tests (deltaWithLineInsertedAtBeginning and deltaWithLineInsertedInMiddle) verify that the delta is smaller than the full data, but they don't verify the correctness of the actual edit content. Consider adding assertions to verify that applying the edit to the previous data actually produces the current data, to ensure the delta calculation is correct.

Copilot · 2025-12-28T11:01:25Z

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

+    if (deleteCount == 0 && deleteStart == insertEnd) {
+      return List.of();
+    }



The condition check at line 275 may be incorrect. When deleteCount is 0 and deleteStart equals insertEnd, it means there's nothing to delete and nothing to insert. However, this check uses deleteStart (which is firstDiffToken * TOKEN_SIZE) rather than checking if there's actually any difference. A more correct check would be to verify if deleteCount is 0 and the insert range (from deleteStart to insertEnd) is empty, but the current logic already handles this correctly. Consider whether this early return is truly necessary, as the edit creation below would naturally produce an edit with deleteCount=0 and empty insertData, which might be semantically valid for LSP.

Suggested change

if (deleteCount == 0 && deleteStart == insertEnd) {

return List.of();

}

nixel2007 · 2025-12-28T11:05:50Z

@copilot отработай замечания

Copilot · 2025-12-28T11:06:01Z

@nixel2007 I've opened a new pull request, #3708, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: nixel2007 <[email protected]>

@nixel2007

When text is inserted on the same line without line breaks (lineOffset=0), the boundary token may have a different deltaStart value while keeping the same deltaLine. The algorithm now accounts for this case by allowing deltaStart to differ for the boundary token when lineOffset=0. Addresses comment from @nixel2007 about text insertion without line breaks. Co-authored-by: nixel2007 <[email protected]>

github-actions · 2025-12-28T11:10:15Z

Test Results

2 781 files ±0 2 781 suites ±0 51m 4s ⏱️ + 3m 2s
1 109 tests +1 1 109 ✅ +1 0 💤 ±0 0 ❌ ±0
9 981 runs +9 9 981 ✅ +9 0 💤 ±0 0 ❌ ±0

Results for commit 9ee333e. ± Comparison against base commit 6e09504.

This pull request removes 8 and adds 5 tests. Note that renamed tests count towards both.

#, count=1
, count=0
, count=1
A, count=1
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [2] content=
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [6] content=
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [7] content=#
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [8] content=

com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [2] content=
, count=0
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [6] content=
A, count=1
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [7] content=#
, count=1
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [8] content=
#, count=1
com.github._1c_syntax.bsl.languageserver.providers.SemanticTokensProviderTest ‑ deltaWithTextInsertedOnSameLine_shouldReturnOptimalDelta()

♻️ This comment has been updated with latest results.

Co-authored-by: nixel2007 <[email protected]>

- Added DELTA_LINE_INDEX and DELTA_START_INDEX constants for clarity - Simplified redundant condition check - Improved test assertion to compare against new token size Co-authored-by: nixel2007 <[email protected]>

…test Added detailed assertions on decoded token fields (line, start, length, type, modifiers) for both original and modified token arrays to verify the exact token structure. Co-authored-by: nixel2007 <[email protected]>

Added specific assertions on delta edit fields: - start position (5 - after prefix match) - deleteCount (0 - no deletion needed due to suffix match) - data size (10 integers = 2 tokens to insert) The test now verifies that with lineOffset=0, the suffix matching correctly identifies "А" and ";" tokens despite different deltaStart positions. Co-authored-by: nixel2007 <[email protected]>

Fix inline text insertion handling in semantic token delta calculation

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java (2)

221-221: Extract TOKEN_SIZE as class-level constant.

The TOKEN_SIZE constant is defined locally here and also used in findSuffixMatchWithOffset. Consider extracting it as a private static final int class-level constant to avoid duplication and ensure consistency.

235-241: Consider more descriptive label name.

The label outer is too generic. Consider a more descriptive name like findFirstDiff or prefixScan to clarify the loop's purpose.

🧹 Nitpick comments (1)

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java (1)

300-361: Excellent fix for inline text insertion handling.

This method correctly implements the fix for inline text insertions (lineOffset==0) by allowing deltaStart to differ for the boundary token (lines 318-320, 343-348). This addresses the bug mentioned in the retrieved learnings where tokens after an inline insertion have the same deltaLine but different deltaStart.

Optional refactor: Consider extracting DELTA_LINE_INDEX and DELTA_START_INDEX as class-level constants alongside the suggested TOKEN_SIZE constant for consistency.

Based on learnings, this fix correctly handles the case where the old algorithm failed to match tokens when lineOffset=0 because it checked deltaStart equality.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65674c2 and 9ee333e.

📒 Files selected for processing (2)

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java
src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java

🧰 Additional context used

📓 Path-based instructions (3)

src/test/java/**/*Test.java

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Maintain or improve test coverage and use appropriate test frameworks (JUnit, AssertJ, Mockito)

Files:

src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java

src/**/*.java

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use Java 17 for language and follow Java naming conventions

Files:

src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java
src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

src/main/java/**/*.java

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

src/main/java/**/*.java: Follow the Style Guide for code formatting and conventions
Use Lombok annotations to reduce boilerplate code and enable annotation processing in IDE
Write JavaDoc for public APIs and include comments for complex logic
Use meaningful, descriptive names for classes and methods following Java naming conventions
Optimize imports before committing; DO NOT optimize imports across the entire project unless specifically working on that task

Files:

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

🧠 Learnings (3)

📓 Common learnings

Learnt from: nixel2007
Repo: 1c-syntax/bsl-language-server PR: 0
File: :0-0
Timestamp: 2025-12-28T11:15:32.317Z
Learning: In PR #3706 about semantic token optimization, I incorrectly stated that the algorithm in findSuffixMatchWithOffset correctly handles inline text insertions (text insertion without line breaks). The actual bug is that when lineOffset=0, tokens after the insertion point have the same deltaLine but different deltaStart, and the old algorithm failed to match them because it checked deltaStart equality. PR #3707 fixes this by allowing deltaStart to differ for the boundary token when lineOffset==0.

📚 Learning: 2025-12-28T11:15:32.317Z

Learnt from: nixel2007
Repo: 1c-syntax/bsl-language-server PR: 0
File: :0-0
Timestamp: 2025-12-28T11:15:32.317Z
Learning: In PR #3706 about semantic token optimization, I incorrectly stated that the algorithm in findSuffixMatchWithOffset correctly handles inline text insertions (text insertion without line breaks). The actual bug is that when lineOffset=0, tokens after the insertion point have the same deltaLine but different deltaStart, and the old algorithm failed to match them because it checked deltaStart equality. PR #3707 fixes this by allowing deltaStart to differ for the boundary token when lineOffset==0.

Applied to files:

src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java
src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

📚 Learning: 2025-11-19T09:02:03.154Z

Learnt from: theshadowco
Repo: 1c-syntax/bsl-language-server PR: 3610
File: src/main/java/com/github/_1c_syntax/bsl/languageserver/folding/QueryPackageFoldingRangeSupplier.java:26-26
Timestamp: 2025-11-19T09:02:03.154Z
Learning: В проекте bsl-language-server используется кастомная версия ANTLR4 от 1c-syntax (io.github.1c-syntax:antlr4), которая включает базовый класс org.antlr.v4.runtime.Tokenizer с методами getAst() и getTokens(). SDBLTokenizer и BSLTokenizer наследуются от этого базового класса.

Applied to files:

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: Analyse
GitHub Check: build (25, windows-latest)
GitHub Check: build (21, macOS-latest)
GitHub Check: build (25, ubuntu-latest)
GitHub Check: build (21, windows-latest)
GitHub Check: build (21, ubuntu-latest)
GitHub Check: build (17, ubuntu-latest)
GitHub Check: build (25, macOS-latest)
GitHub Check: build (17, macOS-latest)
GitHub Check: build (17, windows-latest)
GitHub Check: build
GitHub Check: Benchmark

🔇 Additional comments (4)

src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java (1)

1344-1420: LGTM! Comprehensive test for inline text insertion.

This test effectively validates the fix for inline text insertions (lineOffset=0) mentioned in the retrieved learnings. The test covers:

Token structure validation before and after the edit

Delta response verification

Detailed assertions on edit properties (start, deleteCount, data size)

Optimality check to ensure the delta is smaller than full tokens

The detailed comments (lines 1396-1404) clearly explain the expected behavior, making the test easy to understand and maintain.

Based on learnings, this test addresses the specific bug where tokens after an inline insertion have the same deltaLine but different deltaStart.

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java (3)

94-96: Good optimization using primitive array.

Changing from List<Integer> to int[] reduces boxing/unboxing overhead and improves memory efficiency for cached token data.

378-410: LGTM! Efficient array-based encoding.

The refactoring to use int[] instead of List<Integer> reduces boxing overhead during delta encoding. The logic remains correct while improving performance.

412-414: LGTM! Clean conversion helper.

Simple and effective utility method for converting int[] to List<Integer> when interfacing with the LSP4J API.

sonarqubecloud · 2025-12-28T12:41:15Z

Quality Gate passed

Issues
19 New issues
1 Accepted issue

Measures
0 Security Hotspots
82.6% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Оптимизация хранения семантических токенов: замена List<Integer> на i…

65674c2

…nt[] для повышения производительности

Copilot AI review requested due to automatic review settings December 28, 2025 10:56

Copilot started reviewing on behalf of nixel2007 December 28, 2025 10:56 View session

nixel2007 commented Dec 28, 2025

View reviewed changes

Initial plan

1c03de3

Copilot AI mentioned this pull request Dec 28, 2025

Fix inline text insertion handling in semantic token delta calculation #3707

Merged

10 tasks

coderabbitai bot reviewed Dec 28, 2025

View reviewed changes

Copilot AI reviewed Dec 28, 2025

View reviewed changes

Copilot AI mentioned this pull request Dec 28, 2025

Рефакторинг кода семантических токенов: константа TOKEN_SIZE, улучшенные тесты #3708

Closed

Copilot AI and others added 2 commits December 28, 2025 11:06

Initial analysis and test case for inline text insertion

529b40d

Co-authored-by: nixel2007 <[email protected]>

Copilot AI and others added 5 commits December 28, 2025 11:16

Clean up debug statements from test code

b3da273

Co-authored-by: nixel2007 <[email protected]>

Improve code clarity with named constants and better assertions

f154def

- Added DELTA_LINE_INDEX and DELTA_START_INDEX constants for clarity - Simplified redundant condition check - Improved test assertion to compare against new token size Co-authored-by: nixel2007 <[email protected]>

Merge pull request #3707 from 1c-syntax/copilot/sub-pr-3706

9ee333e

Fix inline text insertion handling in semantic token delta calculation

coderabbitai bot reviewed Dec 28, 2025

View reviewed changes

nixel2007 merged commit 116269d into develop Dec 28, 2025
42 checks passed

nixel2007 deleted the feature/optimize-delta-semantic-tokens branch December 28, 2025 14:33

	if (deleteCount == 0 && deleteStart == insertEnd) {
	return List.of();
	}

Оптимизация хранения семантических токенов, фикс расчёта дельт #3706

Оптимизация хранения семантических токенов, фикс расчёта дельт #3706

Uh oh!

Conversation

nixel2007 commented Dec 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Описание

Связанные задачи

Чеклист

Общие

Для диагностик

Дополнительно

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

nixel2007 Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

nixel2007 Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Dec 28, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

nixel2007 commented Dec 28, 2025

Uh oh!

Copilot AI commented Dec 28, 2025

Uh oh!

github-actions bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

nixel2007 commented Dec 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 28, 2025 •

edited

Loading

github-actions bot commented Dec 28, 2025 •

edited

Loading

sonarqubecloud bot commented Dec 28, 2025 •

edited

Loading