Optimised llamacpp_backend.cpp by VyasGuru · Pull Request #330 · RunanywhereAI/runanywhere-sdks

VyasGuru · 2026-02-04T16:45:22Z

Changes

using a LUT & DFA based utf-8 scanning approach. here
Converted O(n^2) stop scanning to constant time using a rolling tail buffer.
Reserving memory so that the += allocations don't cause too many allocations and memcpys.

Results

Tested against old code.
2.8x ~ 3.3x improvement

Important

Optimized llamacpp_backend.cpp with DFA-based UTF-8 validation and efficient stop sequence detection, improving performance by 2.8x ~ 3.3x.

UTF-8 Scanning:
- Replaced is_valid_utf8() with Utf8State struct using DFA for UTF-8 validation.
- Utilizes a LUT for efficient byte processing.
Stop Sequence Detection:
- Replaced O(n^2) stop scanning with a rolling buffer in generate_stream().
- Uses stop_window to efficiently detect stop sequences.
Memory Optimization:
- Reserves memory for stop_window and partial_utf8_buffer to reduce allocations.
Performance:
- Achieves 2.8x ~ 3.3x performance improvement in text generation.

^{This description was created by}^{for a37fb27. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

Bug Fixes
- Prevents invalid UTF-8 characters in streamed output by buffering and validating partial multibyte sequences.
- Reliably detects stop sequences across token boundaries, avoiding accidental output beyond stop points.
- Ensures remaining buffered text is emitted correctly at the end when no stop or cancellation occurs.
Refactor
- Streamlined streaming generation to accumulate and emit text via sliding windows and UTF-8 state validation.
- Consolidated stop-sequence logic for more predictable behavior.
- Replaced scattered logs with a unified generation summary log.

Greptile Overview

Greptile Summary

This PR implements significant performance optimizations to the LlamaCpp text generation backend, achieving 2.8x-3.3x speedup through three key improvements:

Key Changes:

Replaced simple loop-based UTF-8 validation with Bjoern Hoehrmann's DFA-based state machine for faster, more efficient validation
Converted O(n²) stop sequence scanning (checking entire accumulated text) to O(1) using a rolling tail buffer of fixed size
Added memory reservations (reserve()) to prevent repeated allocations during string concatenation

Implementation Details:

UTF-8 validation now uses a lookup table (LUT) with 364 bytes to drive a deterministic finite automaton
Stop sequence detection maintains a rolling window of max 2×MAX_STOP_LEN characters instead of accumulating all generated text
Partial UTF-8 sequences are buffered separately until complete characters are validated
Only validated UTF-8 text beyond the stop detection window is streamed to callbacks

The optimizations maintain correct behavior for stop sequence detection and UTF-8 handling while dramatically reducing memory allocations and string scanning overhead.

Confidence Score: 4/5

This PR is safe to merge with one minor issue to verify
The optimizations are well-designed and use established algorithms (Bjoern Hoehrmann's UTF-8 DFA). The rolling window approach for stop sequence detection is sound. However, there's one potential issue with how the UTF-8 scanner state is reinitialized in each iteration that should be verified through testing.
The main implementation file should be tested thoroughly to ensure the UTF-8 state machine correctly handles multi-byte sequences that span token boundaries

Important Files Changed

Filename	Overview
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp	Replaced UTF-8 validation with DFA-based approach and optimized stop sequence detection from O(n²) to O(1) using rolling window buffer

Sequence Diagram

sequenceDiagram
    participant Caller
    participant generate_stream
    participant Sampler as llama_sampler
    participant Utf8State
    participant StopWindow as Rolling Window
    participant Callback

    Caller->>generate_stream: TextGenerationRequest
    activate generate_stream
    
    generate_stream->>generate_stream: Initialize stop_window (reserve MAX_STOP_LEN*2)
    generate_stream->>generate_stream: Initialize partial_utf8_buffer (reserve 8)
    
    loop For each token (up to max_tokens)
        generate_stream->>Sampler: sample next token
        Sampler-->>generate_stream: new_token_id
        
        generate_stream->>generate_stream: Convert token to string
        generate_stream->>generate_stream: Append to partial_utf8_buffer
        
        generate_stream->>Utf8State: Create scanner_state
        
        loop Scan UTF-8 bytes with DFA
            generate_stream->>Utf8State: process(byte)
            Utf8State-->>generate_stream: state == 0 (complete char)
            generate_stream->>generate_stream: Track valid_upto position
        end
        
        alt valid_upto > 0
            generate_stream->>generate_stream: Extract valid_chunk
            generate_stream->>StopWindow: Append valid_chunk
            generate_stream->>generate_stream: Erase from partial_utf8_buffer
            
            loop Check stop sequences (O(1) in rolling window)
                generate_stream->>StopWindow: find(stop_seq)
                alt Stop sequence found
                    generate_stream->>Callback: Send text before stop
                    generate_stream-->>Caller: Break (stop detected)
                end
            end
            
            alt stop_window.size() > MAX_STOP_LEN
                generate_stream->>generate_stream: Calculate safe_len
                generate_stream->>Callback: Send safe text
                generate_stream->>StopWindow: Erase sent portion
            end
        end
        
        generate_stream->>Sampler: Decode next token batch
    end
    
    generate_stream->>Callback: Send remaining stop_window
    deactivate generate_stream
    generate_stream-->>Caller: Generation complete

_{(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!}

Changes:

coderabbitai · 2026-02-04T16:45:41Z

📝 Walkthrough

Walkthrough

The llamacpp backend generation logic is refactored to use a DFA-based UTF-8 state machine and implement streaming stop-sequence detection via partial buffers and sliding windows. Token text is accumulated incrementally, validated for UTF-8 safety, and checked against stop sequences before callbacks are invoked. No public APIs are modified.

Changes

Cohort / File(s)	Summary
UTF-8 Validation & Stop-Sequence Handling `sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`	Replaced ad-hoc UTF-8 validation with DFA-based `Utf8State` machine; introduced streaming UTF-8 safe handling using `partial_utf8_buffer` and partial validator; consolidated stop-sequence logic with `STOP_SEQUENCES` list, `MAX_STOP_LEN`, and `stop_window` buffer for incremental detection; refactored generation flow to accumulate token text and detect stop sequences across emitted tokens; added sliding window cleanup to emit safe prefixes and truncate the window; adjusted final callback behavior to emit remaining window content; replaced conditional max_tokens logging with unified summary log.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 UTF-8 hops with validation grace,
State machines dance at streaming pace,
Windows slide where sequences hide,
Buffers bloom with tokens applied,
Callbacks bounce—oh what a race!

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description provides technical implementation details and performance results, but does not follow the repository's standardized template structure with required sections like Type of Change, Testing, Platform-Specific Testing, Labels, and Checklist.	Fill out the template with: Type of Change (select Refactoring), Testing section indicating if tests were added/passed, Platform-Specific Testing checkboxes, appropriate Labels (Commons), and complete the Checklist items.
Title check	❓ Inconclusive	The title 'Optimised llamacpp_backend.cpp' is vague and describes only the file being changed, not the actual improvements or changes made.	Consider a more specific title that captures the main improvement, such as 'Optimize UTF-8 validation and stop sequence detection in llamacpp_backend' or 'Implement DFA-based UTF-8 validation and rolling buffer for stop sequences'.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to a37fb27 in 13 seconds. Click for details.

Reviewed 222 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 0 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_iXCk0y1Tpe6mptga

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp (1)
402-417: ⚠️ Potential issue | 🟠 Major

Token count will be incorrect after this change.

The generate() function counts callback invocations as tokens_generated (line 406). After the buffering optimization, callbacks emit multi-token chunks rather than individual tokens, so result.tokens_generated will report the number of chunks emitted—not actual tokens generated by the model.

This affects:

result.tokens_generated accuracy

finish_reason logic at line 422 (compares chunk count to max_tokens)

Consider exposing the actual token count from generate_stream().
🐛 Proposed fix: Add output parameter for generated token count
 bool LlamaCppTextGeneration::generate_stream(const TextGenerationRequest& request,
                                              TextStreamCallback callback,
-                                             int* out_prompt_tokens) {
+                                             int* out_prompt_tokens,
+                                             int* out_generated_tokens) {
     // ... existing code ...

+    if (out_generated_tokens) {
+        *out_generated_tokens = tokens_generated;
+    }
+
     LOGI("Generation complete: %d tokens", tokens_generated);
     return !cancel_requested_.load();
 }
Then update generate():
+    int actual_tokens_generated = 0;
     bool success = generate_stream(
         request,
         [&](const std::string& token) -> bool {
             generated_text += token;
-            tokens_generated++;
             return !cancel_requested_.load();
         },
-        &prompt_tokens);
+        &prompt_tokens,
+        &actual_tokens_generated);
+
+    result.tokens_generated = actual_tokens_generated;

🧹 Nitpick comments (2)

sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp (2)
517-529: Consider adding a safety bound on partial_utf8_buffer size.

If the tokenizer ever produces invalid UTF-8 that cannot complete a valid sequence, partial_utf8_buffer will grow unboundedly since valid_upto would remain 0. While this shouldn't happen with well-behaved tokenizers, a defensive check would prevent potential memory issues.
🛡️ Optional defensive check
         if (valid_upto > 0) {
             std::string valid_chunk = partial_utf8_buffer.substr(0, valid_upto);
             stop_window.append(valid_chunk);
             partial_utf8_buffer.erase(0, valid_upto);
+        } else if (partial_utf8_buffer.size() > 16) {
+            // Safety valve: discard suspected invalid bytes if buffer grows too large
+            LOGE("Invalid UTF-8 detected in token stream, clearing buffer");
+            partial_utf8_buffer.clear();
         }
575-577: Incomplete UTF-8 at end of generation is silently dropped.

If partial_utf8_buffer contains residual bytes when generation ends (e.g., model stopped mid-codepoint), they are neither emitted nor logged. Consider adding a debug/warning log to aid troubleshooting.
🔍 Add visibility for dropped bytes
     if (!cancel_requested_.load() && !stop_sequence_hit && !stop_window.empty()) {
         callback(stop_window);
     }
+
+    if (!partial_utf8_buffer.empty()) {
+        LOGI("Discarded %zu incomplete UTF-8 bytes at end of generation",
+             partial_utf8_buffer.size());
+    }
 
     llama_memory_clear(llama_get_memory(context_), true);

greptile-apps

_{1 file reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-04T16:50:30Z

sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp

+    bool process(uint8_t byte) {
+        static const uint8_t utf8d[] = {
+            0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 00..1f
+            0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 20..3f
+            0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 40..5f
+            0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 60..7f
+            1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, // 80..9f
+            7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, // a0..bf
+            8,8,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, // c0..df
+            0xa,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x4,0x3,0x3, // e0..ef
+            0xb,0x6,0x6,0x6,0x5,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8, // f0..ff
+            0x0,0x1,0x2,0x3,0x5,0x8,0x7,0x1,0x1,0x1,0x4,0x6,0x1,0x1,0x1,0x1, // s0..s0
+            1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,0,1,1,1,1,1,1, // s1..s2
+            1,2,1,1,1,1,1,2,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1, // s3..s4
+            1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,3,1,1,1,1,1,1, // s5..s6
+            1,3,1,1,1,1,1,3,1,3,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // s7..s8
+        };

-        bytes += 1;
-        for (int i = 1; i < num; ++i) {
-            if ((*bytes & 0xC0) != 0x80)
-                return false;
-            bytes += 1;
-        }
+        uint32_t type = utf8d[byte];
+        state = utf8d[256 + state * 16 + type];
+        return (state == 0);


the process() method doesn't update state correctly - the return value indicates if a character completed a valid sequence, but the state itself is internal to the DFA

Suggested change

bool process(uint8_t byte) {

static const uint8_t utf8d[] = {

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 00..1f

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 20..3f

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 40..5f

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 60..7f

1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, // 80..9f

7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, // a0..bf

8,8,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, // c0..df

0xa,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x4,0x3,0x3, // e0..ef

0xb,0x6,0x6,0x6,0x5,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8, // f0..ff

0x0,0x1,0x2,0x3,0x5,0x8,0x7,0x1,0x1,0x1,0x4,0x6,0x1,0x1,0x1,0x1, // s0..s0

1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,0,1,1,1,1,1,1, // s1..s2

1,2,1,1,1,1,1,2,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1, // s3..s4

1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,3,1,1,1,1,1,1, // s5..s6

1,3,1,1,1,1,1,3,1,3,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // s7..s8

};

bytes += 1;

for (int i = 1; i < num; ++i) {

if ((*bytes & 0xC0) != 0x80)

return false;

bytes += 1;

}

uint32_t type = utf8d[byte];

state = utf8d[256 + state * 16 + type];

return (state == 0);

bool process(uint8_t byte) {

static const uint8_t utf8d[] = {

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 00..1f

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 20..3f

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 40..5f

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 60..7f

1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, // 80..9f

7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, // a0..bf

8,8,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, // c0..df

0xa,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x4,0x3,0x3, // e0..ef

0xb,0x6,0x6,0x6,0x5,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8, // f0..ff

0x0,0x1,0x2,0x3,0x5,0x8,0x7,0x1,0x1,0x1,0x4,0x6,0x1,0x1,0x1,0x1, // s0..s0

1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,0,1,1,1,1,1,1, // s1..s2

1,2,1,1,1,1,1,2,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1, // s3..s4

1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,3,1,1,1,1,1,1, // s5..s6

1,3,1,1,1,1,1,3,1,3,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // s7..s8

};

uint32_t type = utf8d[byte];

uint32_t new_state = utf8d[256 + state * 16 + type];

state = new_state;

return (state == 0);

}

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp Line: 26:46 Comment: the `process()` method doesn't update `state` correctly - the return value indicates if a character completed a valid sequence, but the state itself is internal to the DFA ```suggestion bool process(uint8_t byte) { static const uint8_t utf8d[] = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 00..1f 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 20..3f 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 40..5f 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 60..7f 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, // 80..9f 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, // a0..bf 8,8,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, // c0..df 0xa,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x3,0x4,0x3,0x3, // e0..ef 0xb,0x6,0x6,0x6,0x5,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8,0x8, // f0..ff 0x0,0x1,0x2,0x3,0x5,0x8,0x7,0x1,0x1,0x1,0x4,0x6,0x1,0x1,0x1,0x1, // s0..s0 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,0,1,1,1,1,1,1, // s1..s2 1,2,1,1,1,1,1,2,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1, // s3..s4 1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,3,1,1,1,1,1,1, // s5..s6 1,3,1,1,1,1,1,3,1,3,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // s7..s8 }; uint32_t type = utf8d[byte]; uint32_t new_state = utf8d[256 + state * 16 + type]; state = new_state; return (state == 0); } ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-04T16:50:31Z

sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp

+        Utf8State scanner_state;
+        size_t valid_upto = 0;
+        for (size_t i = 0; i < partial_utf8_buffer.size(); ++i) {
+            scanner_state.process(static_cast<uint8_t>(partial_utf8_buffer[i]));
+            if (scanner_state.state == 0) {
+                valid_upto = i + 1;
            }
        }


scanner_state should be reused or reset across iterations - creating a new Utf8State inside the loop means each token's UTF-8 validation starts fresh, but partial_utf8_buffer can contain incomplete sequences from previous iterations

Suggested change

Utf8State scanner_state;

size_t valid_upto = 0;

for (size_t i = 0; i < partial_utf8_buffer.size(); ++i) {

scanner_state.process(static_cast<uint8_t>(partial_utf8_buffer[i]));

if (scanner_state.state == 0) {

valid_upto = i + 1;

}

}

Utf8State scanner_state;

size_t valid_upto = 0;

for (size_t i = 0; i < partial_utf8_buffer.size(); ++i) {

bool complete = scanner_state.process(static_cast<uint8_t>(partial_utf8_buffer[i]));

if (complete) {

valid_upto = i + 1;

}

}

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp Line: 517:524 Comment: scanner_state should be reused or reset across iterations - creating a new `Utf8State` inside the loop means each token's UTF-8 validation starts fresh, but `partial_utf8_buffer` can contain incomplete sequences from previous iterations ```suggestion Utf8State scanner_state; size_t valid_upto = 0; for (size_t i = 0; i < partial_utf8_buffer.size(); ++i) { bool complete = scanner_state.process(static_cast<uint8_t>(partial_utf8_buffer[i])); if (complete) { valid_upto = i + 1; } } ``` How can I resolve this? If you propose a fix, please make it concise.

VyasGuru · 2026-02-04T17:04:53Z

scanner state is being used correctly. Incomplete sequences are considered in the next round. all is good from my side

shubhammalhotra28 · 2026-02-05T04:10:17Z

LGTM, merging. Thanks @VyasGuru

Optimised llamacpp_backend.cpp

Optimised llamacpp_backend.cpp

a37fb27

Changes:

ellipsis-dev bot reviewed Feb 4, 2026

View reviewed changes

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

greptile-apps bot reviewed Feb 4, 2026

View reviewed changes

shubhammalhotra28 merged commit cf742f8 into RunanywhereAI:main Feb 5, 2026
8 of 12 checks passed

ManthanNimodiya pushed a commit to ManthanNimodiya/runanywhere-sdks that referenced this pull request Feb 23, 2026

Merge pull request RunanywhereAI#330 from VyasGuru/llamabackend

90d5e46

Optimised llamacpp_backend.cpp

This was referenced Mar 12, 2026

Ios demo polish #454

Open

Metal shubham 1 [draft contains metal + ios-demo] - might release testfligth for testing #461

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimised llamacpp_backend.cpp#330

Optimised llamacpp_backend.cpp#330
shubhammalhotra28 merged 1 commit intoRunanywhereAI:mainfrom
VyasGuru:llamabackend

VyasGuru commented Feb 4, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

coderabbitai bot commented Feb 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 4, 2026

Uh oh!

greptile-apps bot Feb 4, 2026

Uh oh!

VyasGuru commented Feb 4, 2026

Uh oh!

Uh oh!

shubhammalhotra28 commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

VyasGuru commented Feb 4, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Results

Summary by CodeRabbit

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

VyasGuru commented Feb 4, 2026

Uh oh!

Uh oh!

shubhammalhotra28 commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

VyasGuru commented Feb 4, 2026 •

edited by greptile-apps bot

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading