Skip to content

Conversation

@nixel2007
Copy link
Member

@nixel2007 nixel2007 commented Dec 28, 2025

Описание

Связанные задачи

Closes

Чеклист

Общие

  • Ветка PR обновлена из develop
  • Отладочные, закомментированные и прочие, не имеющие смысла участки кода удалены
  • Изменения покрыты тестами
  • Обязательные действия перед коммитом выполнены (запускал команду gradlew precommit)

Для диагностик

  • Описание диагностики заполнено для обоих языков (присутствуют файлы для обоих языков, для русского заполнено все подробно, перевод на английский можно опустить)

Дополнительно

Summary by CodeRabbit

  • Refactor

    • Reworked semantic tokens pipeline to use compact array-based token storage and delta encoding for improved performance while preserving API compatibility.
    • Enhanced edit computation to better handle same-line text insertions and line-offset scenarios; caching updated to match new representation.
  • Tests

    • Added a unit test validating optimal delta generation for same-line insertions.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings December 28, 2025 10:56
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 28, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

This change refactors SemanticTokensProvider to store delta-encoded token data as int[] instead of List, adds toDeltaEncodedArray/toList(int[]) helpers, and rewrites computeEdits to operate on int[] token groups (size 5) with line-offset-aware suffix matching and updated caching logic.

Changes

Cohort / File(s) Summary
Token Data Storage & Delta Encoding Optimization
src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java
Replaced List<Integer> with int[] for delta-encoded token storage in CachedTokenData and the pipeline. Added toDeltaEncodedArray and toList(int[]). Reworked computeEdits(int[] prev, int[] curr) to work on arrays, compute edits by token groups (size 5), handle line-offsets via findSuffixMatchWithOffset, and update caching/serialization paths.
Tests
src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java
Added deltaWithTextInsertedOnSameLine_shouldReturnOptimalDelta to validate same-line insert produces an optimal single edit and that delta size is smaller than full tokens.

Sequence Diagram(s)

(omitted — changes are internal to a single provider and do not introduce new multi-component control flow)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • semanticTokens #3510 — Introduces the SemanticTokensProvider and related semantic-tokens plumbing; closely related to the provider changes here.

Suggested labels

hacktoberfest-accepted

Suggested reviewers

  • theshadowco

Poem

🐰 Arrays hop in, lists take flight,
Five-token beats through day and night,
Offsets matched with clever art,
Deltas shrink and play their part,
A rabbit cheers — small code, big delight! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly describes the two main changes: optimization of semantic token storage and fix for delta calculation, which directly align with the core modifications shown in the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/optimize-delta-semantic-tokens

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

if (prevDeltaLine == currDeltaLine) {
// Полное совпадение
suffixMatch++;
} else if (!foundBoundary && currDeltaLine - prevDeltaLine == lineOffset) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot учитывает ли это вставку текста без перевода строки?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai ну или ты ответь

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixel2007, да, код учитывает вставку текста без перевода строки, но обрабатывает её иначе, чем вставку/удаление целых строк.

При вставке текста на той же строке (без \n):

  • Токены на этой строке получат изменённые значения deltaStart (смещение позиции)
  • Токены на последующих строках сохранят те же deltaLine значения

Алгоритм обнаружит такие изменения через проверку всех полей токена в findSuffixMatchWithOffset:

// Сначала проверяем все поля кроме deltaLine
for (int j = 1; j < tokenSize; j++) {  // j=1 это deltaStart
  if (prev[prevIdx + j] != curr[currIdx + j]) {
    otherFieldsMatch = false;
    break;
  }
}

Если deltaStart отличается, токены не считаются совпадающими и попадут в область редактирования.

Оптимизация с lineOffset специально работает для вертикальных сдвигов (вставка/удаление строк), когда deltaLine изменяется, но сами токены остаются идентичными. Для горизонтальных сдвигов (вставка в той же строке) такой оптимизации нет, но это корректное поведение — позиции токенов действительно изменились и требуют обновления.

Copy link
Contributor

Copilot AI commented Dec 28, 2025

@nixel2007 I've opened a new pull request, #3707, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java (1)

376-389: Minor: first flag is redundant.

The first flag handling is correct but slightly redundant. Since prevLine and prevChar are initialized to 0, the first iteration would compute the same values without the conditional:

  • deltaLine = tokenEntry.line() - 0 = tokenEntry.line()
  • deltaStart = tokenEntry.start() - 0 = tokenEntry.start()

This is a minor readability observation; the current code is clear in intent.

🔎 Optional simplification
     var prevLine = 0;
     var prevChar = 0;
     var index = 0;
-    var first = true;

     for (SemanticTokenEntry tokenEntry : sorted) {
-      int deltaLine = first ? tokenEntry.line() : (tokenEntry.line() - prevLine);
+      int deltaLine = tokenEntry.line() - prevLine;
       int prevCharOrZero = (deltaLine == 0) ? prevChar : 0;
-      int deltaStart = first ? tokenEntry.start() : (tokenEntry.start() - prevCharOrZero);
+      int deltaStart = tokenEntry.start() - prevCharOrZero;

       data[index++] = deltaLine;
       data[index++] = deltaStart;
       data[index++] = tokenEntry.length();
       data[index++] = tokenEntry.type();
       data[index++] = tokenEntry.modifiers();

       prevLine = tokenEntry.line();
       prevChar = tokenEntry.start();
-      first = false;
     }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e09504 and 65674c2.

📒 Files selected for processing (1)
  • src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java
🧰 Additional context used
📓 Path-based instructions (2)
src/main/java/**/*.java

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

src/main/java/**/*.java: Follow the Style Guide for code formatting and conventions
Use Lombok annotations to reduce boilerplate code and enable annotation processing in IDE
Write JavaDoc for public APIs and include comments for complex logic
Use meaningful, descriptive names for classes and methods following Java naming conventions
Optimize imports before committing; DO NOT optimize imports across the entire project unless specifically working on that task

Files:

  • src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java
src/**/*.java

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use Java 17 for language and follow Java naming conventions

Files:

  • src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java
🧠 Learnings (1)
📚 Learning: 2025-11-19T09:02:03.154Z
Learnt from: theshadowco
Repo: 1c-syntax/bsl-language-server PR: 3610
File: src/main/java/com/github/_1c_syntax/bsl/languageserver/folding/QueryPackageFoldingRangeSupplier.java:26-26
Timestamp: 2025-11-19T09:02:03.154Z
Learning: В проекте bsl-language-server используется кастомная версия ANTLR4 от 1c-syntax (io.github.1c-syntax:antlr4), которая включает базовый класс org.antlr.v4.runtime.Tokenizer с методами getAst() и getTokens(). SDBLTokenizer и BSLTokenizer наследуются от этого базового класса.

Applied to files:

  • src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: build
  • GitHub Check: Benchmark
  • GitHub Check: build (25, windows-latest)
  • GitHub Check: build (25, ubuntu-latest)
  • GitHub Check: build (17, windows-latest)
  • GitHub Check: build (17, ubuntu-latest)
  • GitHub Check: build (21, ubuntu-latest)
  • GitHub Check: build (21, windows-latest)
  • GitHub Check: build (21, macOS-latest)
  • GitHub Check: CodeQL analysis (java)
  • GitHub Check: Agent
  • GitHub Check: build (25, windows-latest)
  • GitHub Check: build (21, windows-latest)
  • GitHub Check: build (21, ubuntu-latest)
  • GitHub Check: build (25, macOS-latest)
  • GitHub Check: build (25, ubuntu-latest)
  • GitHub Check: build (17, macOS-latest)
  • GitHub Check: build (17, ubuntu-latest)
  • GitHub Check: build (17, windows-latest)
  • GitHub Check: build (21, macOS-latest)
🔇 Additional comments (6)
src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java (6)

94-97: Good optimization using primitive arrays.

Switching from List<Integer> to int[] reduces memory overhead and avoids boxing costs. Since CachedTokenData is keyed by resultId and equality on the record itself isn't used for lookups, the reference-based array equality is not a concern here.


113-120: Clean separation: primitives internally, boxing only at API boundary.

The approach of computing with int[] and converting via toList() only when constructing the LSP response is efficient and correct.


220-290: Well-structured delta computation with token grouping.

The algorithm correctly:

  1. Finds the first differing token while tracking the accumulated line position.
  2. Computes the line offset to account for inserted/deleted lines.
  3. Finds matching suffix tokens with line offset tolerance.
  4. Generates a single edit covering the changed range.

The logic handles the semantic token structure (groups of 5) properly.


292-344: Correct handling of delta-encoded line offset at boundary.

The suffix matching logic correctly implements the insight that in delta encoding, only the first token after an edit point will have its deltaLine shifted by lineOffset. The foundBoundary flag ensures we only accept one such boundary token, and all subsequent suffix tokens must match exactly. This is the correct behavior for semantic token delta encoding.


395-397: LGTM!

Clean and idiomatic conversion. Using toList() (which returns an unmodifiable list) is appropriate for the LSP4J response.


134-159: Thread safety consideration for cache operations.

The sequence of getputremove on tokenCache is not atomic. However, since each resultId is unique (UUID-based) and clients maintain their own resultId sequence, there's no practical race condition. Different clients won't share or interfere with each other's cached data.

If you anticipate scenarios where the same previousResultId could be used concurrently (unlikely with LSP's request-response model), consider verifying that LSP clients serialize delta requests per document.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the storage and manipulation of semantic tokens by replacing List<Integer> with primitive int[] arrays, reducing memory overhead and boxing costs. Additionally, it fixes the delta calculation algorithm to correctly handle line offsets when tokens are inserted or deleted in the middle of a document.

Key Changes:

  • Changed internal token data storage from List<Integer> to int[] for better memory efficiency
  • Rewrote the computeEdits method with a more sophisticated algorithm that accounts for delta-encoded line positions
  • Added findSuffixMatchWithOffset helper method to correctly match suffix tokens when lines are inserted/deleted

Comment on lines +235 to +241
outer:
for (int i = 0; i < minTokens; i++) {
int base = i * TOKEN_SIZE;
for (int j = 0; j < TOKEN_SIZE; j++) {
if (prev[base + j] != curr[base + j]) {
firstDiffToken = i;
break outer;
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The label name 'outer' is too generic and doesn't convey meaningful information about the loop's purpose. Consider using a more descriptive name like 'findFirstDiff' or 'searchPrefixMatch' to make the control flow clearer.

Suggested change
outer:
for (int i = 0; i < minTokens; i++) {
int base = i * TOKEN_SIZE;
for (int j = 0; j < TOKEN_SIZE; j++) {
if (prev[base + j] != curr[base + j]) {
firstDiffToken = i;
break outer;
findFirstDiff:
for (int i = 0; i < minTokens; i++) {
int base = i * TOKEN_SIZE;
for (int j = 0; j < TOKEN_SIZE; j++) {
if (prev[base + j] != curr[base + j]) {
firstDiffToken = i;
break findFirstDiff;

Copilot uses AI. Check for mistakes.
while (prefixMatch < minSize && previousData.get(prefixMatch).equals(currentData.get(prefixMatch))) {
prefixMatch++;
private static List<SemanticTokensEdit> computeEdits(int[] prev, int[] curr) {
final int TOKEN_SIZE = 5;
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant TOKEN_SIZE is defined locally in multiple methods (computeEdits and findSuffixMatchWithOffset). Consider extracting this as a class-level private static final constant to avoid duplication and ensure consistency across methods. This would make the code more maintainable if the token structure ever changes.

Copilot uses AI. Check for mistakes.
Comment on lines +220 to 290
private static List<SemanticTokensEdit> computeEdits(int[] prev, int[] curr) {
final int TOKEN_SIZE = 5;

int prevTokenCount = prev.length / TOKEN_SIZE;
int currTokenCount = curr.length / TOKEN_SIZE;

if (prevTokenCount == 0 && currTokenCount == 0) {
return List.of();
}

// If both are identical, return empty edits
if (prefixMatch == previousData.size() && prefixMatch == currentData.size()) {
// Находим первый отличающийся токен и одновременно вычисляем сумму deltaLine для prefix
int firstDiffToken = 0;
int prefixAbsLine = 0;
int minTokens = Math.min(prevTokenCount, currTokenCount);

outer:
for (int i = 0; i < minTokens; i++) {
int base = i * TOKEN_SIZE;
for (int j = 0; j < TOKEN_SIZE; j++) {
if (prev[base + j] != curr[base + j]) {
firstDiffToken = i;
break outer;
}
}
prefixAbsLine += prev[base]; // накапливаем deltaLine
firstDiffToken = i + 1;
}

// Если все токены одинаковые
if (firstDiffToken == minTokens && prevTokenCount == currTokenCount) {
return List.of();
}

// Find the last differing index (from the end)
int suffixMatch = 0;
while (suffixMatch < minSize - prefixMatch
&& previousData.get(previousData.size() - 1 - suffixMatch)
.equals(currentData.get(currentData.size() - 1 - suffixMatch))) {
suffixMatch++;
// Вычисляем смещение строк инкрементально от prefixAbsLine
int prevSuffixAbsLine = prefixAbsLine;
for (int i = firstDiffToken; i < prevTokenCount; i++) {
prevSuffixAbsLine += prev[i * TOKEN_SIZE];
}
int currSuffixAbsLine = prefixAbsLine;
for (int i = firstDiffToken; i < currTokenCount; i++) {
currSuffixAbsLine += curr[i * TOKEN_SIZE];
}
int lineOffset = currSuffixAbsLine - prevSuffixAbsLine;

// Находим последний отличающийся токен с учётом смещения строк
int suffixMatchTokens = findSuffixMatchWithOffset(prev, curr, firstDiffToken, lineOffset, TOKEN_SIZE);

// Вычисляем границы редактирования
int deleteEndToken = prevTokenCount - suffixMatchTokens;
int insertEndToken = currTokenCount - suffixMatchTokens;

// Calculate the range to replace
int deleteStart = prefixMatch;
int deleteCount = previousData.size() - prefixMatch - suffixMatch;
int insertEnd = currentData.size() - suffixMatch;
int deleteStart = firstDiffToken * TOKEN_SIZE;
int deleteCount = (deleteEndToken - firstDiffToken) * TOKEN_SIZE;
int insertEnd = insertEndToken * TOKEN_SIZE;

if (deleteCount == 0 && deleteStart == insertEnd) {
return List.of();
}

// Extract the data to insert
List<Integer> insertData = currentData.subList(prefixMatch, insertEnd);
// Создаём список для вставки из среза массива
List<Integer> insertData = toList(Arrays.copyOfRange(curr, deleteStart, insertEnd));

var edit = new SemanticTokensEdit();
edit.setStart(deleteStart);
edit.setDeleteCount(deleteCount);
if (!insertData.isEmpty()) {
edit.setData(new ArrayList<>(insertData));
edit.setData(insertData);
}

return List.of(edit);
}
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage for the new delta calculation algorithm should be expanded. The existing tests (deltaWithLineInsertedAtBeginning and deltaWithLineInsertedInMiddle) verify that the delta is smaller than the full data, but they don't verify the correctness of the actual edit content. Consider adding assertions to verify that applying the edit to the previous data actually produces the current data, to ensure the delta calculation is correct.

Copilot uses AI. Check for mistakes.
Comment on lines +275 to 278
if (deleteCount == 0 && deleteStart == insertEnd) {
return List.of();
}

Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition check at line 275 may be incorrect. When deleteCount is 0 and deleteStart equals insertEnd, it means there's nothing to delete and nothing to insert. However, this check uses deleteStart (which is firstDiffToken * TOKEN_SIZE) rather than checking if there's actually any difference. A more correct check would be to verify if deleteCount is 0 and the insert range (from deleteStart to insertEnd) is empty, but the current logic already handles this correctly. Consider whether this early return is truly necessary, as the edit creation below would naturally produce an edit with deleteCount=0 and empty insertData, which might be semantically valid for LSP.

Suggested change
if (deleteCount == 0 && deleteStart == insertEnd) {
return List.of();
}

Copilot uses AI. Check for mistakes.
@nixel2007
Copy link
Member Author

@copilot отработай замечания

Copy link
Contributor

Copilot AI commented Dec 28, 2025

@nixel2007 I've opened a new pull request, #3708, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits December 28, 2025 11:06
When text is inserted on the same line without line breaks (lineOffset=0),
the boundary token may have a different deltaStart value while keeping
the same deltaLine. The algorithm now accounts for this case by allowing
deltaStart to differ for the boundary token when lineOffset=0.

Addresses comment from @nixel2007 about text insertion without line breaks.

Co-authored-by: nixel2007 <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 28, 2025

Test Results

2 781 files  ±0  2 781 suites  ±0   51m 4s ⏱️ + 3m 2s
1 109 tests +1  1 109 ✅ +1  0 💤 ±0  0 ❌ ±0 
9 981 runs  +9  9 981 ✅ +9  0 💤 ±0  0 ❌ ±0 

Results for commit 9ee333e. ± Comparison against base commit 6e09504.

This pull request removes 8 and adds 5 tests. Note that renamed tests count towards both.
#, count=1
, count=0
, count=1
A, count=1
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [2] content=
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [6] content=
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [7] content=#
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [8] content=
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [2] content=
, count=0
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [6] content=
A, count=1
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [7] content=#
, count=1
com.github._1c_syntax.bsl.languageserver.diagnostics.MetadataObjectNameLengthDiagnosticTest ‑ [8] content=
#, count=1
com.github._1c_syntax.bsl.languageserver.providers.SemanticTokensProviderTest ‑ deltaWithTextInsertedOnSameLine_shouldReturnOptimalDelta()

♻️ This comment has been updated with latest results.

Copilot AI and others added 5 commits December 28, 2025 11:16
- Added DELTA_LINE_INDEX and DELTA_START_INDEX constants for clarity
- Simplified redundant condition check
- Improved test assertion to compare against new token size

Co-authored-by: nixel2007 <[email protected]>
…test

Added detailed assertions on decoded token fields (line, start, length, type, modifiers)
for both original and modified token arrays to verify the exact token structure.

Co-authored-by: nixel2007 <[email protected]>
Added specific assertions on delta edit fields:
- start position (5 - after prefix match)
- deleteCount (0 - no deletion needed due to suffix match)
- data size (10 integers = 2 tokens to insert)

The test now verifies that with lineOffset=0, the suffix matching correctly
identifies "А" and ";" tokens despite different deltaStart positions.

Co-authored-by: nixel2007 <[email protected]>
Fix inline text insertion handling in semantic token delta calculation
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java (2)

221-221: Extract TOKEN_SIZE as class-level constant.

The TOKEN_SIZE constant is defined locally here and also used in findSuffixMatchWithOffset. Consider extracting it as a private static final int class-level constant to avoid duplication and ensure consistency.


235-241: Consider more descriptive label name.

The label outer is too generic. Consider a more descriptive name like findFirstDiff or prefixScan to clarify the loop's purpose.

🧹 Nitpick comments (1)
src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java (1)

300-361: Excellent fix for inline text insertion handling.

This method correctly implements the fix for inline text insertions (lineOffset==0) by allowing deltaStart to differ for the boundary token (lines 318-320, 343-348). This addresses the bug mentioned in the retrieved learnings where tokens after an inline insertion have the same deltaLine but different deltaStart.

Optional refactor: Consider extracting DELTA_LINE_INDEX and DELTA_START_INDEX as class-level constants alongside the suggested TOKEN_SIZE constant for consistency.

Based on learnings, this fix correctly handles the case where the old algorithm failed to match tokens when lineOffset=0 because it checked deltaStart equality.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65674c2 and 9ee333e.

📒 Files selected for processing (2)
  • src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java
  • src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java
🧰 Additional context used
📓 Path-based instructions (3)
src/test/java/**/*Test.java

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Maintain or improve test coverage and use appropriate test frameworks (JUnit, AssertJ, Mockito)

Files:

  • src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java
src/**/*.java

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use Java 17 for language and follow Java naming conventions

Files:

  • src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java
  • src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java
src/main/java/**/*.java

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

src/main/java/**/*.java: Follow the Style Guide for code formatting and conventions
Use Lombok annotations to reduce boilerplate code and enable annotation processing in IDE
Write JavaDoc for public APIs and include comments for complex logic
Use meaningful, descriptive names for classes and methods following Java naming conventions
Optimize imports before committing; DO NOT optimize imports across the entire project unless specifically working on that task

Files:

  • src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: nixel2007
Repo: 1c-syntax/bsl-language-server PR: 0
File: :0-0
Timestamp: 2025-12-28T11:15:32.317Z
Learning: In PR #3706 about semantic token optimization, I incorrectly stated that the algorithm in findSuffixMatchWithOffset correctly handles inline text insertions (text insertion without line breaks). The actual bug is that when lineOffset=0, tokens after the insertion point have the same deltaLine but different deltaStart, and the old algorithm failed to match them because it checked deltaStart equality. PR #3707 fixes this by allowing deltaStart to differ for the boundary token when lineOffset==0.
📚 Learning: 2025-12-28T11:15:32.317Z
Learnt from: nixel2007
Repo: 1c-syntax/bsl-language-server PR: 0
File: :0-0
Timestamp: 2025-12-28T11:15:32.317Z
Learning: In PR #3706 about semantic token optimization, I incorrectly stated that the algorithm in findSuffixMatchWithOffset correctly handles inline text insertions (text insertion without line breaks). The actual bug is that when lineOffset=0, tokens after the insertion point have the same deltaLine but different deltaStart, and the old algorithm failed to match them because it checked deltaStart equality. PR #3707 fixes this by allowing deltaStart to differ for the boundary token when lineOffset==0.

Applied to files:

  • src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java
  • src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java
📚 Learning: 2025-11-19T09:02:03.154Z
Learnt from: theshadowco
Repo: 1c-syntax/bsl-language-server PR: 3610
File: src/main/java/com/github/_1c_syntax/bsl/languageserver/folding/QueryPackageFoldingRangeSupplier.java:26-26
Timestamp: 2025-11-19T09:02:03.154Z
Learning: В проекте bsl-language-server используется кастомная версия ANTLR4 от 1c-syntax (io.github.1c-syntax:antlr4), которая включает базовый класс org.antlr.v4.runtime.Tokenizer с методами getAst() и getTokens(). SDBLTokenizer и BSLTokenizer наследуются от этого базового класса.

Applied to files:

  • src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Analyse
  • GitHub Check: build (25, windows-latest)
  • GitHub Check: build (21, macOS-latest)
  • GitHub Check: build (25, ubuntu-latest)
  • GitHub Check: build (21, windows-latest)
  • GitHub Check: build (21, ubuntu-latest)
  • GitHub Check: build (17, ubuntu-latest)
  • GitHub Check: build (25, macOS-latest)
  • GitHub Check: build (17, macOS-latest)
  • GitHub Check: build (17, windows-latest)
  • GitHub Check: build
  • GitHub Check: Benchmark
🔇 Additional comments (4)
src/test/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProviderTest.java (1)

1344-1420: LGTM! Comprehensive test for inline text insertion.

This test effectively validates the fix for inline text insertions (lineOffset=0) mentioned in the retrieved learnings. The test covers:

  • Token structure validation before and after the edit
  • Delta response verification
  • Detailed assertions on edit properties (start, deleteCount, data size)
  • Optimality check to ensure the delta is smaller than full tokens

The detailed comments (lines 1396-1404) clearly explain the expected behavior, making the test easy to understand and maintain.

Based on learnings, this test addresses the specific bug where tokens after an inline insertion have the same deltaLine but different deltaStart.

src/main/java/com/github/_1c_syntax/bsl/languageserver/providers/SemanticTokensProvider.java (3)

94-96: Good optimization using primitive array.

Changing from List<Integer> to int[] reduces boxing/unboxing overhead and improves memory efficiency for cached token data.


378-410: LGTM! Efficient array-based encoding.

The refactoring to use int[] instead of List<Integer> reduces boxing overhead during delta encoding. The logic remains correct while improving performance.


412-414: LGTM! Clean conversion helper.

Simple and effective utility method for converting int[] to List<Integer> when interfacing with the LSP4J API.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 28, 2025

@nixel2007 nixel2007 merged commit 116269d into develop Dec 28, 2025
42 checks passed
@nixel2007 nixel2007 deleted the feature/optimize-delta-semantic-tokens branch December 28, 2025 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants