Dev by shubhammalhotra28 · Pull Request #440 · RunanywhereAI/runanywhere-sdks

shubhammalhotra28 · 2026-03-01T21:37:29Z

Description

Brief description of the changes made.

Type of Change

Bug fix
New feature
Documentation update
Refactoring

Testing

Lint passes locally
Added/updated tests for changes

Platform-Specific Testing (check all that apply)

Swift SDK / iOS Sample:

Tested on iPhone (Simulator or Device)
Tested on iPad / Tablet
Tested on Mac (macOS target)

Kotlin SDK / Android Sample:

Tested on Android Phone (Emulator or Device)
Tested on Android Tablet

Flutter SDK / Flutter Sample:

Tested on iOS
Tested on Android

React Native SDK / React Native Sample:

Tested on iOS
Tested on Android

Playground:

Tested on target platform
Verified no regressions in existing Playground projects
Web SDK / Web Sample:
Tested in Chrome (Desktop)
Tested in Firefox
Tested in Safari
WASM backends load (LlamaCpp + ONNX)
OPFS storage persistence verified (survives page refresh)
Settings persistence verified (localStorage)

Labels

Please add the appropriate label(s):

SDKs:

Swift SDK - Changes to Swift SDK (sdk/runanywhere-swift)
Kotlin SDK - Changes to Kotlin SDK (sdk/runanywhere-kotlin)
Flutter SDK - Changes to Flutter SDK (sdk/runanywhere-flutter)
React Native SDK - Changes to React Native SDK (sdk/runanywhere-react-native)
Web SDK - Changes to Web SDK (sdk/runanywhere-web)
Commons - Changes to shared native code (sdk/runanywhere-commons)

Sample Apps:

iOS Sample - Changes to iOS example app (examples/ios)
Android Sample - Changes to Android example app (examples/android)
Flutter Sample - Changes to Flutter example app (examples/flutter)
React Native Sample - Changes to React Native example app (examples/react-native)
Web Sample - Changes to Web example app (examples/web)

Checklist

Code follows project style guidelines
Self-review completed
Documentation updated (if needed)

Screenshots

Attach relevant UI screenshots for changes (if applicable):

Mobile (Phone)
Tablet / iPad
Desktop / Mac

Important

Enhance RAG backend with adaptive text generation, sentence-level chunking, and optimized vector store for improved performance and accuracy.

Text Generation:
- Add probe_confidence(), inject_system_prompt(), append_context(), generate_from_context(), and clear_context() methods to LlamaCppTextGeneration in llamacpp_backend.cpp and LlamaCppGenerator in llamacpp_generator.cpp.
- Implement adaptive query loop in RAGBackend in rag_backend.cpp using new text generation methods.
Document Handling:
- Add split_into_sentences() to DocumentChunker in rag_chunker.cpp for sentence-level chunking.
- Update RAGBackend in rag_backend.cpp to use sentence-level chunking for more focused search results.
Vector Store:
- Use i8 quantization in VectorStoreUSearch in vector_store_usearch.cpp for reduced memory usage.
- Adjust search threshold logic in vector_store_usearch.cpp to ensure top-K results are returned.
Miscellaneous:
- Update RAGBackendConfig in rag_backend.h to increase top_k default to 10.
- Add logging improvements across multiple files for better debugging and performance tracking.

^{This description was created by}^{for 9e4f2df. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

Release Notes

New Features
- Enhanced RAG system with adaptive context management and confidence scoring for improved answer generation accuracy.
- Sentence-level search refinement to retrieve more precise context from documents.
- Confidence-based context selection to maintain higher-quality information in responses.
Performance Improvements
- Optimized vector storage with improved quantization and reduced search parameters for faster retrieval.
- Increased default search retrieval count from 3 to 10 for better candidate selection.

Greptile Summary

Implements adaptive RAG optimization with sentence-level retrieval and confidence-based context accumulation. The system now retrieves parent chunks, splits them into sentences, embeds sentences individually, then incrementally adds sentences to KV cache until confidence threshold is reached via logit probing.

Key changes:

Sentence-level retrieval: Embeds and scores individual sentences from parent chunks instead of using full chunks
Adaptive confidence loop: Progressively adds sentences to KV cache and probes model confidence after each addition using Yes/No token logits
KV cache management: New methods (inject_system_prompt, append_context, probe_confidence, generate_from_context, clear_context) for stateful generation
Vector store optimizations: Changed quantization from f32 to i8 (RAM savings) and reduced HNSW parameters by ~70% (expansion_add: 128→40, expansion_search: 64→20)
Configuration changes: Increased default top_k from 3 to 10

Critical issues:

rag_backend.cpp:401 - Missing error handling: append_context() return value not checked, loop continues even if context append fails
Quantization and HNSW changes need verification to ensure retrieval quality meets requirements
WIP comment in rag_chunker.cpp suggests incomplete implementation

Performance implications:

Sentence-level embedding creates significant computational overhead (embedding every sentence across 5 parent chunks)
Confidence probing runs inference after each sentence addition
Trade-off: more precise context vs. increased latency

Confidence Score: 3/5

Significant architectural changes with quality/performance trade-offs that need verification before production deployment
Score reflects: (1) critical missing error handling in adaptive loop that could cause incorrect behavior when context fills up, (2) unverified quality impact of quantization and HNSW parameter reductions, (3) WIP comments indicating incomplete implementation, (4) significant performance implications of sentence-level embedding that need measurement. The core implementation is sound but needs testing and refinement before production use.
Pay close attention to vector_store_usearch.cpp/h (quantization/HNSW changes affect retrieval quality) and rag_backend.cpp (missing error handling in adaptive loop)

Important Files Changed

Filename	Overview
sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp	Implements adaptive RAG with sentence-level retrieval and confidence probing. Missing error handling for append_context() failures in adaptive loop.
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp	Implements KV cache management methods (probe_confidence, inject_system_prompt, append_context, generate_from_context). Complex but well-structured with proper cleanup.
sdk/runanywhere-commons/src/backends/rag/rag_chunker.cpp	Added split_into_sentences() method. Contains WIP comment indicating incomplete implementation.
sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.cpp	Changed quantization from f32 to i8 and reduced HNSW parameters. Significant quality/performance trade-off that needs verification.
sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.h	Reduced HNSW search parameters significantly (expansion_add: 128→40, expansion_search: 64→20). May reduce recall quality.

Sequence Diagram

sequenceDiagram
    participant User
    participant RAGBackend
    participant VectorStore
    participant Embedder
    participant Generator
    participant KVCache

    User->>RAGBackend: query(text)
    RAGBackend->>Generator: clear_context()
    Generator->>KVCache: Clear all state
    RAGBackend->>Generator: inject_system_prompt(ICL)
    Generator->>KVCache: Add ICL prompt at pos 0

    RAGBackend->>Embedder: embed(query)
    Embedder-->>RAGBackend: query_embedding
    RAGBackend->>VectorStore: search(query_embedding, top_k=5)
    Note over VectorStore: Retrieve 5 parent chunks
    VectorStore-->>RAGBackend: parent_chunks[5]

    loop For each parent chunk
        RAGBackend->>RAGBackend: split_into_sentences()
        loop For each sentence
            RAGBackend->>Embedder: embed(sentence)
            Embedder-->>RAGBackend: sentence_embedding
            RAGBackend->>RAGBackend: score = cosine_similarity()
        end
    end

    RAGBackend->>RAGBackend: sort sentences by similarity
    Note over RAGBackend: Keep top 10 sentences

    loop Until confidence > 0.8 OR all sentences added
        RAGBackend->>Generator: append_context(sentence)
        Generator->>KVCache: Append sentence tokens
        RAGBackend->>Generator: probe_confidence("", query)
        Generator->>KVCache: Add probe tokens temporarily
        Generator->>Generator: Extract Yes/No logits
        Generator->>Generator: Compute softmax confidence
        Generator->>KVCache: Remove probe tokens
        Generator-->>RAGBackend: confidence_score
        
        alt confidence > 0.8
            Note over RAGBackend: Threshold reached, stop
        end
    end

    RAGBackend->>Generator: generate_from_context(query_suffix)
    Generator->>KVCache: Add query tokens
    loop Token generation
        Generator->>Generator: Sample next token
        Generator->>KVCache: Append token
    end
    Generator-->>RAGBackend: generated_text

    RAGBackend-->>User: result + metadata

_{Last reviewed commit: 9e4f2df}

@unchecked

* feat(lora): add LoRA adapter support across SDK + demo app Implement LoRA (Low-Rank Adaptation) adapter hot-swapping for llama.cpp backend across all 6 SDK layers (C++ -> C API -> Component -> JNI -> Kotlin Bridge -> Kotlin Public API). - Add load/remove/clear/query LoRA adapter operations - Use vtable dispatch in component layer to decouple librac_commons from librac_backend_llamacpp (fixes linker errors) - Add LoRA vtable entries to rac_llm_service_ops_t - Fix AttachCurrentThread cast for Android NDK C++ JNI build - Add RunAnyWhereLora Android demo app with Material 3 Q&A UI - Add comprehensive implementation docs with C/C++ API reference * feat(ci): add selectable build targets to Build All workflow + fix Swift concurrency errors Rewrite build-all-test.yml with 9 boolean checkbox inputs so each build target can be toggled independently from the GitHub Actions UI: - C++ Android Backends (arm64-v8a, armeabi-v7a, x86_64 matrix) - C++ iOS Backends (XCFramework) - Kotlin SDK (JVM + Android) - Swift SDK (iOS/macOS) - Web SDK (TypeScript) - Flutter SDK (Dart analyze via Melos) - React Native SDK (TypeScript via Lerna) - Android Example Apps (RunAnywhereAI + RunAnyWhereLora) - IntelliJ Plugin Fix two Swift strict-concurrency errors that fail the Swift SDK build: - LiveTranscriptionSession: add @unchecked Sendable (safe because class is @mainactor, all access serialized) - RunAnywhere+VisionLanguage: add Sendable conformance to rac_vlm_image_t so the C struct can cross the Task boundary in the streaming builder; simplify StreamingCollector to start timing at init * fix(swift): resolve strict concurrency errors in LiveTranscriptionSession and VLM streaming LiveTranscriptionSession.swift: - Replace [weak self] captures with strong `let session = self` before closures to avoid captured var in @Sendable/@task contexts (class is @mainactor @unchecked Sendable so strong ref is safe, bounded by stream lifecycle) - Wrap deprecated startStreamingTranscription call in @available helper to silence deprecation warning until migration to transcribeStream API RunAnywhere+VisionLanguage.swift: - Add `let capturedCImage = cImage` before AsyncThrowingStream closure so the Task captures an immutable let instead of a mutable var - Add `extension rac_vlm_image_t: @unchecked Sendable {}` for the C struct to cross Task concurrency boundaries safely - Simplify StreamingCollector to initialize startTime at init instead of requiring a separate async start() call * fix(jni): address CodeRabbit review findings in LoRA JNI functions - Replace raw -1 returns with RAC_ERROR_INVALID_HANDLE/RAC_ERROR_INVALID_ARGUMENT to match codebase error handling conventions - Use getCString() helper instead of raw GetStringUTFChars/ReleaseStringUTFChars - Add missing result logging to racLlmComponentRemoveLora and racLlmComponentClearLora - Use rac_free() instead of free() in racLlmComponentGetLoraInfo for consistency - Clarify LoRA adapter memory ownership comments (adapters freed automatically with model per llama.cpp b8011 API — llama_adapter_lora_free is deprecated)

* ios initial changes * minimal sample needed to test lora * updating docs * addressed the comments

First version for Optimised RAG. Not polished yet, Once tested, I'll microoptimise, bench, and finish.

Optimised RAG Prototype

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 9e4f2df in 17 seconds. Click for details.

Reviewed 1515 lines of code in 12 files
Skipped 0 files when reviewing.
Skipped posting 0 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_MeSXsFnqYzwJFcTN

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai · 2026-03-01T21:37:57Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ed42a7 and 9e4f2df.

📒 Files selected for processing (12)

sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h
sdk/runanywhere-commons/src/backends/rag/inference_provider.h
sdk/runanywhere-commons/src/backends/rag/llamacpp_generator.cpp
sdk/runanywhere-commons/src/backends/rag/llamacpp_generator.h
sdk/runanywhere-commons/src/backends/rag/rac_rag_pipeline.cpp
sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp
sdk/runanywhere-commons/src/backends/rag/rag_backend.h
sdk/runanywhere-commons/src/backends/rag/rag_chunker.cpp
sdk/runanywhere-commons/src/backends/rag/rag_chunker.h
sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.cpp
sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.h

📝 Walkthrough

Walkthrough

This PR adds context-aware generation capabilities to the LlamaCpp backend, including KV-cache management, confidence probing, and system prompt injection. It extends the RAG pipeline with sentence-level text splitting and adaptive context accumulation based on confidence scoring, while optimizing vector store performance through reduced expansion parameters and i8 quantization.

Changes

Cohort / File(s)	Summary
LlamaCpp Backend Context APIs `sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h`, `sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`	Added 5 new methods: probe_confidence(), inject_system_prompt(), append_context(), generate_from_context(), and clear_context() for KV-cache-aware generation workflow with tokenization, sampling, and confidence-based Yes/No scoring via softmax.
Text Generator Interface `sdk/runanywhere-commons/src/backends/rag/inference_provider.h`	Extended ITextGenerator with 5 virtual default methods matching LlamaCpp backend APIs: probe_confidence(), inject_system_prompt(), append_context(), generate_from_context(), and clear_context(); defaults return safe no-ops or delegate to existing generate().
LlamaCpp Generator Implementation `sdk/runanywhere-commons/src/backends/rag/llamacpp_generator.h`, `sdk/runanywhere-commons/src/backends/rag/llamacpp_generator.cpp`	Implemented all 5 context-aware methods with mutex-protected synchronization, memory management, and integration with KV-cache state; updated context_size() to derive from impl_ when available.
RAG Backend Core `sdk/runanywhere-commons/src/backends/rag/rag_backend.h`, `sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp`	Introduced system prompt injection (kICLSystemPrompt), added confidence threshold (0.8) and context preservation flags; reworked query path with sentence-level splitting, per-sentence confidence evaluation, and adaptive context accumulation before generate_from_context().
Document Chunker `sdk/runanywhere-commons/src/backends/rag/rag_chunker.h`, `sdk/runanywhere-commons/src/backends/rag/rag_chunker.cpp`	Added split_into_sentences() method for sentence-level text decomposition with boundary detection, trimming, and empty-sentence filtering; enables fine-grained context accumulation in RAG pipeline.
Vector Store Optimization `sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.h`, `sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.cpp`	Switched metric from f32_k to i8_k quantization for memory efficiency; reduced expansion_add (128→40) and expansion_search (64→20) defaults for improved performance/memory trade-off.
RAG Pipeline Configuration `sdk/runanywhere-commons/src/backends/rag/rac_rag_pipeline.cpp`	Changed default top_k retrieval from 3 to 10 when config value is absent or non-positive.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RAGBackend
    participant TextGen as TextGenerator
    participant VectorStore
    participant Chunker
    participant LLM as LlamaCpp Engine

    Client->>RAGBackend: query(text)
    RAGBackend->>TextGen: inject_system_prompt(kICLSystemPrompt)
    TextGen->>LLM: tokenize & cache system prompt
    
    RAGBackend->>VectorStore: retrieve top_k candidates
    VectorStore-->>RAGBackend: parent chunks + similarities
    
    RAGBackend->>Chunker: split_into_sentences(chunk)
    Chunker-->>RAGBackend: sentence array
    
    loop for each sentence
        RAGBackend->>TextGen: append_context(sentence)
        TextGen->>LLM: tokenize & cache sentence
        
        RAGBackend->>TextGen: probe_confidence(context, query)
        TextGen->>LLM: forward pass on "Yes"/"No" logits
        LLM-->>TextGen: confidence score (0.0-1.0)
        TextGen-->>RAGBackend: confidence float
        
        alt confidence >= threshold
            RAGBackend->>RAGBackend: accumulate_context(sentence)
        else confidence < threshold OR partial context
            RAGBackend->>TextGen: clear_context()
            TextGen->>LLM: reset KV cache
        end
    end
    
    RAGBackend->>TextGen: generate_from_context(accumulated_context + query_suffix)
    TextGen->>LLM: generate with accumulated KV state
    LLM-->>TextGen: generated text + metadata
    TextGen-->>RAGBackend: GenerationResult
    
    RAGBackend->>RAGBackend: enrich metadata (sentences_used, confidence, sources)
    RAGBackend-->>Client: GenerationResult with provenance

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

WIP | Fix parametre piping through sdk #340: Modifies LlamaCppTextGeneration with per-request sampler/config and end-to-end system_prompt plumbing, overlapping with the backend context API additions in this PR.

Suggested labels

kotlin-sdk

Suggested reviewers

Siddhesh2377

Poem

🐰 Hops through cache with prompts so neat,
Confidence scores make context complete,
Sentences split and accumulated with care,
KV-cache wisdom floating in the air! 🎲✨

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

_{12 files reviewed, 7 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-03-01T21:45:02Z

sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp

+        for (const auto& sentence_result : search_results) {
+            const std::string& sentence_text = sentence_result.text;
+            std::string append_text = (sentences_used == 0) ? sentence_text : ("\n" + sentence_text);
+            text_generator->append_context(append_text);


Return value not checked - if append_context() fails (e.g., context full), loop continues anyway, leading to incorrect confidence probing

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp Line: 401 Comment: Return value not checked - if `append_context()` fails (e.g., context full), loop continues anyway, leading to incorrect confidence probing How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-01T21:45:03Z

sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.cpp

+        // Create metric for cosine similarity. Using i8 instead of float to save on RAM(quality isnt affected much)
        metric_punned_t metric(
            static_cast<std::size_t>(config.dimension),
            metric_kind_t::cos_k,


Quantization change from f32 to i8 significantly reduces memory but may affect retrieval quality. Verify that recall/precision metrics meet requirements before deploying.

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.cpp Line: 57-60 Comment: Quantization change from f32 to i8 significantly reduces memory but may affect retrieval quality. Verify that recall/precision metrics meet requirements before deploying. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-01T21:45:04Z

sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.h

+    size_t expansion_add = 40;           // Construction search depth ( even a smaller one should be good enough)
+    size_t expansion_search = 20;        // Query search depth( even a smaller one should be good enough)


HNSW parameters reduced by ~70% (expansion_add: 128→40, expansion_search: 64→20). This trades recall quality for speed. Verify search quality meets requirements.

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.h Line: 51-52 Comment: HNSW parameters reduced by ~70% (expansion_add: 128→40, expansion_search: 64→20). This trades recall quality for speed. Verify search quality meets requirements. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-01T21:45:05Z

sdk/runanywhere-commons/src/backends/rag/rag_chunker.cpp

    return text.length() / config_.chars_per_token;
 }

+// used for focus mode in RAG(not final yet, will minmax this further, but this is a working version)


WIP comment ("not final yet, will minmax this further"). Remove before merging or track in an issue.

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/backends/rag/rag_chunker.cpp Line: 32 Comment: WIP comment ("not final yet, will minmax this further"). Remove before merging or track in an issue. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-01T21:45:06Z

sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp

+static const std::string kICLSystemPrompt =
+    "You are a question-answering assistant. Given context passages and a question, "
+    "determine if the passages contain enough information to answer the question.\n\n"
+    "Example 1 (Sufficient context):\n"
+    "Context: \"The Eiffel Tower was completed in 1889 for the World's Fair in Paris.\"\n"
+    "Question: \"When was the Eiffel Tower built?\"\n"


ICL prompt examples show explicit context in the prompt format, but actual probing relies on implicit KV cache context. This mismatch may reduce effectiveness of confidence probing.

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp Line: 18-23 Comment: ICL prompt examples show explicit context in the prompt format, but actual probing relies on implicit KV cache context. This mismatch may reduce effectiveness of confidence probing. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-01T21:45:07Z

sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp

+        std::vector<ScoredSentence> scored_sentences;
+
+        for (const auto& parent : parent_chunks) {
+            auto sentences = chunker->split_into_sentences(parent.text);
+            LOGI("Parent chunk '%s' split into %zu sentences", parent.chunk_id.c_str(), sentences.size());
+
+            for (const auto& sentence : sentences) {
+                if (sentence.size() < 3) {
+                    continue;
+                }
+
+                try {
+                    auto sentence_embedding = embedding_provider->embed(sentence);
+                    float sim = cosine_similarity(query_embedding, sentence_embedding);
+
+                    scored_sentences.push_back({
+                        sentence,
+                        sim,
+                        parent.chunk_id,
+                        parent.metadata
+                    });
+                } catch (const std::exception& e) {
+                    LOGE("Failed to embed sentence, skipping: %s", e.what());
+                }
+            }


Sentence-level embedding for every sentence across 5 parent chunks creates significant computational overhead compared to the previous chunk-only approach. Consider caching sentence embeddings if parent chunks are frequently accessed.

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp Line: 228-252 Comment: Sentence-level embedding for every sentence across 5 parent chunks creates significant computational overhead compared to the previous chunk-only approach. Consider caching sentence embeddings if parent chunks are frequently accessed. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-01T21:45:08Z

sdk/runanywhere-commons/src/backends/rag/rag_backend.h

    size_t embedding_dimension = 384;
-    size_t top_k = 3;
+    size_t top_k = 10; //Need to get Golden document
    float similarity_threshold = 0.15f;


Move inline comment to separate line above for better readability.

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/backends/rag/rag_backend.h Line: 29 Comment: Move inline comment to separate line above for better readability. How can I resolve this? If you propose a fix, please make it concise.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Siddhesh2377 and others added 7 commits February 21, 2026 15:06

Add lora ios (#407)

2e45ec0

* ios initial changes * minimal sample needed to test lora * updating docs * addressed the comments

Merge branch 'main' into dev

abda61a

Prototype for Optimised RAG

4cb8532

First version for Optimised RAG. Not polished yet, Once tested, I'll microoptimise, bench, and finish.

Merge branch 'RunanywhereAI:main' into RAG-OPTIS

aa7236c

Merge branch 'main' into dev

bc33fef

Merge pull request #428 from VyasGuru/RAG-OPTIS

9e4f2df

Optimised RAG Prototype

ellipsis-dev bot reviewed Mar 1, 2026

View reviewed changes

shubhammalhotra28 closed this Mar 1, 2026

greptile-apps bot reviewed Mar 1, 2026

View reviewed changes

		size_t expansion_add = 40; // Construction search depth ( even a smaller one should be good enough)
		size_t expansion_search = 20; // Query search depth( even a smaller one should be good enough)

Conversation

shubhammalhotra28 commented Mar 1, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Platform-Specific Testing (check all that apply)

Labels

Checklist

Screenshots

Summary by CodeRabbit

Release Notes

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shubhammalhotra28 commented Mar 1, 2026 •

edited by greptile-apps bot

Loading

coderabbitai bot commented Mar 1, 2026 •

edited

Loading