Skip to content

Dev#440

Closed
shubhammalhotra28 wants to merge 7 commits intomainfrom
dev
Closed

Dev#440
shubhammalhotra28 wants to merge 7 commits intomainfrom
dev

Conversation

@shubhammalhotra28
Copy link
Copy Markdown
Contributor

@shubhammalhotra28 shubhammalhotra28 commented Mar 1, 2026

Description

Brief description of the changes made.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Refactoring

Testing

  • Lint passes locally
  • Added/updated tests for changes

Platform-Specific Testing (check all that apply)

Swift SDK / iOS Sample:

  • Tested on iPhone (Simulator or Device)
  • Tested on iPad / Tablet
  • Tested on Mac (macOS target)

Kotlin SDK / Android Sample:

  • Tested on Android Phone (Emulator or Device)
  • Tested on Android Tablet

Flutter SDK / Flutter Sample:

  • Tested on iOS
  • Tested on Android

React Native SDK / React Native Sample:

  • Tested on iOS
  • Tested on Android

Playground:

  • Tested on target platform
  • Verified no regressions in existing Playground projects
    Web SDK / Web Sample:
  • Tested in Chrome (Desktop)
  • Tested in Firefox
  • Tested in Safari
  • WASM backends load (LlamaCpp + ONNX)
  • OPFS storage persistence verified (survives page refresh)
  • Settings persistence verified (localStorage)

Labels

Please add the appropriate label(s):

SDKs:

  • Swift SDK - Changes to Swift SDK (sdk/runanywhere-swift)
  • Kotlin SDK - Changes to Kotlin SDK (sdk/runanywhere-kotlin)
  • Flutter SDK - Changes to Flutter SDK (sdk/runanywhere-flutter)
  • React Native SDK - Changes to React Native SDK (sdk/runanywhere-react-native)
  • Web SDK - Changes to Web SDK (sdk/runanywhere-web)
  • Commons - Changes to shared native code (sdk/runanywhere-commons)

Sample Apps:

  • iOS Sample - Changes to iOS example app (examples/ios)
  • Android Sample - Changes to Android example app (examples/android)
  • Flutter Sample - Changes to Flutter example app (examples/flutter)
  • React Native Sample - Changes to React Native example app (examples/react-native)
  • Web Sample - Changes to Web example app (examples/web)

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Documentation updated (if needed)

Screenshots

Attach relevant UI screenshots for changes (if applicable):

  • Mobile (Phone)
  • Tablet / iPad
  • Desktop / Mac

Important

Enhance RAG backend with adaptive text generation, sentence-level chunking, and optimized vector store for improved performance and accuracy.

  • Text Generation:
    • Add probe_confidence(), inject_system_prompt(), append_context(), generate_from_context(), and clear_context() methods to LlamaCppTextGeneration in llamacpp_backend.cpp and LlamaCppGenerator in llamacpp_generator.cpp.
    • Implement adaptive query loop in RAGBackend in rag_backend.cpp using new text generation methods.
  • Document Handling:
    • Add split_into_sentences() to DocumentChunker in rag_chunker.cpp for sentence-level chunking.
    • Update RAGBackend in rag_backend.cpp to use sentence-level chunking for more focused search results.
  • Vector Store:
    • Use i8 quantization in VectorStoreUSearch in vector_store_usearch.cpp for reduced memory usage.
    • Adjust search threshold logic in vector_store_usearch.cpp to ensure top-K results are returned.
  • Miscellaneous:
    • Update RAGBackendConfig in rag_backend.h to increase top_k default to 10.
    • Add logging improvements across multiple files for better debugging and performance tracking.

This description was created by Ellipsis for 9e4f2df. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced RAG system with adaptive context management and confidence scoring for improved answer generation accuracy.
    • Sentence-level search refinement to retrieve more precise context from documents.
    • Confidence-based context selection to maintain higher-quality information in responses.
  • Performance Improvements

    • Optimized vector storage with improved quantization and reduced search parameters for faster retrieval.
    • Increased default search retrieval count from 3 to 10 for better candidate selection.

Greptile Summary

Implements adaptive RAG optimization with sentence-level retrieval and confidence-based context accumulation. The system now retrieves parent chunks, splits them into sentences, embeds sentences individually, then incrementally adds sentences to KV cache until confidence threshold is reached via logit probing.

Key changes:

  • Sentence-level retrieval: Embeds and scores individual sentences from parent chunks instead of using full chunks
  • Adaptive confidence loop: Progressively adds sentences to KV cache and probes model confidence after each addition using Yes/No token logits
  • KV cache management: New methods (inject_system_prompt, append_context, probe_confidence, generate_from_context, clear_context) for stateful generation
  • Vector store optimizations: Changed quantization from f32 to i8 (RAM savings) and reduced HNSW parameters by ~70% (expansion_add: 128→40, expansion_search: 64→20)
  • Configuration changes: Increased default top_k from 3 to 10

Critical issues:

  • rag_backend.cpp:401 - Missing error handling: append_context() return value not checked, loop continues even if context append fails
  • Quantization and HNSW changes need verification to ensure retrieval quality meets requirements
  • WIP comment in rag_chunker.cpp suggests incomplete implementation

Performance implications:

  • Sentence-level embedding creates significant computational overhead (embedding every sentence across 5 parent chunks)
  • Confidence probing runs inference after each sentence addition
  • Trade-off: more precise context vs. increased latency

Confidence Score: 3/5

  • Significant architectural changes with quality/performance trade-offs that need verification before production deployment
  • Score reflects: (1) critical missing error handling in adaptive loop that could cause incorrect behavior when context fills up, (2) unverified quality impact of quantization and HNSW parameter reductions, (3) WIP comments indicating incomplete implementation, (4) significant performance implications of sentence-level embedding that need measurement. The core implementation is sound but needs testing and refinement before production use.
  • Pay close attention to vector_store_usearch.cpp/h (quantization/HNSW changes affect retrieval quality) and rag_backend.cpp (missing error handling in adaptive loop)

Important Files Changed

Filename Overview
sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp Implements adaptive RAG with sentence-level retrieval and confidence probing. Missing error handling for append_context() failures in adaptive loop.
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp Implements KV cache management methods (probe_confidence, inject_system_prompt, append_context, generate_from_context). Complex but well-structured with proper cleanup.
sdk/runanywhere-commons/src/backends/rag/rag_chunker.cpp Added split_into_sentences() method. Contains WIP comment indicating incomplete implementation.
sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.cpp Changed quantization from f32 to i8 and reduced HNSW parameters. Significant quality/performance trade-off that needs verification.
sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.h Reduced HNSW search parameters significantly (expansion_add: 128→40, expansion_search: 64→20). May reduce recall quality.

Sequence Diagram

sequenceDiagram
    participant User
    participant RAGBackend
    participant VectorStore
    participant Embedder
    participant Generator
    participant KVCache

    User->>RAGBackend: query(text)
    RAGBackend->>Generator: clear_context()
    Generator->>KVCache: Clear all state
    RAGBackend->>Generator: inject_system_prompt(ICL)
    Generator->>KVCache: Add ICL prompt at pos 0

    RAGBackend->>Embedder: embed(query)
    Embedder-->>RAGBackend: query_embedding
    RAGBackend->>VectorStore: search(query_embedding, top_k=5)
    Note over VectorStore: Retrieve 5 parent chunks
    VectorStore-->>RAGBackend: parent_chunks[5]

    loop For each parent chunk
        RAGBackend->>RAGBackend: split_into_sentences()
        loop For each sentence
            RAGBackend->>Embedder: embed(sentence)
            Embedder-->>RAGBackend: sentence_embedding
            RAGBackend->>RAGBackend: score = cosine_similarity()
        end
    end

    RAGBackend->>RAGBackend: sort sentences by similarity
    Note over RAGBackend: Keep top 10 sentences

    loop Until confidence > 0.8 OR all sentences added
        RAGBackend->>Generator: append_context(sentence)
        Generator->>KVCache: Append sentence tokens
        RAGBackend->>Generator: probe_confidence("", query)
        Generator->>KVCache: Add probe tokens temporarily
        Generator->>Generator: Extract Yes/No logits
        Generator->>Generator: Compute softmax confidence
        Generator->>KVCache: Remove probe tokens
        Generator-->>RAGBackend: confidence_score
        
        alt confidence > 0.8
            Note over RAGBackend: Threshold reached, stop
        end
    end

    RAGBackend->>Generator: generate_from_context(query_suffix)
    Generator->>KVCache: Add query tokens
    loop Token generation
        Generator->>Generator: Sample next token
        Generator->>KVCache: Append token
    end
    Generator-->>RAGBackend: generated_text

    RAGBackend-->>User: result + metadata
Loading

Last reviewed commit: 9e4f2df

Siddhesh2377 and others added 7 commits February 21, 2026 15:06
* feat(lora): add LoRA adapter support across SDK + demo app

  Implement LoRA (Low-Rank Adaptation) adapter hot-swapping for llama.cpp
  backend across all 6 SDK layers (C++ -> C API -> Component -> JNI ->
  Kotlin Bridge -> Kotlin Public API).

  - Add load/remove/clear/query LoRA adapter operations
  - Use vtable dispatch in component layer to decouple librac_commons
    from librac_backend_llamacpp (fixes linker errors)
  - Add LoRA vtable entries to rac_llm_service_ops_t
  - Fix AttachCurrentThread cast for Android NDK C++ JNI build
  - Add RunAnyWhereLora Android demo app with Material 3 Q&A UI
  - Add comprehensive implementation docs with C/C++ API reference

* feat(ci): add selectable build targets to Build All workflow + fix Swift concurrency errors

  Rewrite build-all-test.yml with 9 boolean checkbox inputs so each build
  target can be toggled independently from the GitHub Actions UI:
  - C++ Android Backends (arm64-v8a, armeabi-v7a, x86_64 matrix)
  - C++ iOS Backends (XCFramework)
  - Kotlin SDK (JVM + Android)
  - Swift SDK (iOS/macOS)
  - Web SDK (TypeScript)
  - Flutter SDK (Dart analyze via Melos)
  - React Native SDK (TypeScript via Lerna)
  - Android Example Apps (RunAnywhereAI + RunAnyWhereLora)
  - IntelliJ Plugin

  Fix two Swift strict-concurrency errors that fail the Swift SDK build:
  - LiveTranscriptionSession: add @unchecked Sendable (safe because class
    is @mainactor, all access serialized)
  - RunAnywhere+VisionLanguage: add Sendable conformance to rac_vlm_image_t
    so the C struct can cross the Task boundary in the streaming builder;
    simplify StreamingCollector to start timing at init

* fix(swift): resolve strict concurrency errors in LiveTranscriptionSession and VLM streaming

  LiveTranscriptionSession.swift:
  - Replace [weak self] captures with strong `let session = self` before
    closures to avoid captured var in @Sendable/@task contexts (class is
    @mainactor @unchecked Sendable so strong ref is safe, bounded by
    stream lifecycle)
  - Wrap deprecated startStreamingTranscription call in @available helper
    to silence deprecation warning until migration to transcribeStream API

  RunAnywhere+VisionLanguage.swift:
  - Add `let capturedCImage = cImage` before AsyncThrowingStream closure
    so the Task captures an immutable let instead of a mutable var
  - Add `extension rac_vlm_image_t: @unchecked Sendable {}` for the C
    struct to cross Task concurrency boundaries safely
  - Simplify StreamingCollector to initialize startTime at init instead
    of requiring a separate async start() call

* fix(jni): address CodeRabbit review findings in LoRA JNI functions

  - Replace raw -1 returns with RAC_ERROR_INVALID_HANDLE/RAC_ERROR_INVALID_ARGUMENT
    to match codebase error handling conventions
  - Use getCString() helper instead of raw GetStringUTFChars/ReleaseStringUTFChars
  - Add missing result logging to racLlmComponentRemoveLora and racLlmComponentClearLora
  - Use rac_free() instead of free() in racLlmComponentGetLoraInfo for consistency
  - Clarify LoRA adapter memory ownership comments (adapters freed automatically
    with model per llama.cpp b8011 API — llama_adapter_lora_free is deprecated)
* ios initial changes

* minimal sample needed to test lora

* updating docs

* addressed the comments
First version for Optimised RAG. Not polished yet, Once tested, I'll microoptimise, bench, and finish.
Copy link
Copy Markdown

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 9e4f2df in 17 seconds. Click for details.
  • Reviewed 1515 lines of code in 12 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_MeSXsFnqYzwJFcTN

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 1, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ed42a7 and 9e4f2df.

📒 Files selected for processing (12)
  • sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
  • sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h
  • sdk/runanywhere-commons/src/backends/rag/inference_provider.h
  • sdk/runanywhere-commons/src/backends/rag/llamacpp_generator.cpp
  • sdk/runanywhere-commons/src/backends/rag/llamacpp_generator.h
  • sdk/runanywhere-commons/src/backends/rag/rac_rag_pipeline.cpp
  • sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp
  • sdk/runanywhere-commons/src/backends/rag/rag_backend.h
  • sdk/runanywhere-commons/src/backends/rag/rag_chunker.cpp
  • sdk/runanywhere-commons/src/backends/rag/rag_chunker.h
  • sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.cpp
  • sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.h

📝 Walkthrough

Walkthrough

This PR adds context-aware generation capabilities to the LlamaCpp backend, including KV-cache management, confidence probing, and system prompt injection. It extends the RAG pipeline with sentence-level text splitting and adaptive context accumulation based on confidence scoring, while optimizing vector store performance through reduced expansion parameters and i8 quantization.

Changes

Cohort / File(s) Summary
LlamaCpp Backend Context APIs
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h, sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
Added 5 new methods: probe_confidence(), inject_system_prompt(), append_context(), generate_from_context(), and clear_context() for KV-cache-aware generation workflow with tokenization, sampling, and confidence-based Yes/No scoring via softmax.
Text Generator Interface
sdk/runanywhere-commons/src/backends/rag/inference_provider.h
Extended ITextGenerator with 5 virtual default methods matching LlamaCpp backend APIs: probe_confidence(), inject_system_prompt(), append_context(), generate_from_context(), and clear_context(); defaults return safe no-ops or delegate to existing generate().
LlamaCpp Generator Implementation
sdk/runanywhere-commons/src/backends/rag/llamacpp_generator.h, sdk/runanywhere-commons/src/backends/rag/llamacpp_generator.cpp
Implemented all 5 context-aware methods with mutex-protected synchronization, memory management, and integration with KV-cache state; updated context_size() to derive from impl_ when available.
RAG Backend Core
sdk/runanywhere-commons/src/backends/rag/rag_backend.h, sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp
Introduced system prompt injection (kICLSystemPrompt), added confidence threshold (0.8) and context preservation flags; reworked query path with sentence-level splitting, per-sentence confidence evaluation, and adaptive context accumulation before generate_from_context().
Document Chunker
sdk/runanywhere-commons/src/backends/rag/rag_chunker.h, sdk/runanywhere-commons/src/backends/rag/rag_chunker.cpp
Added split_into_sentences() method for sentence-level text decomposition with boundary detection, trimming, and empty-sentence filtering; enables fine-grained context accumulation in RAG pipeline.
Vector Store Optimization
sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.h, sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.cpp
Switched metric from f32_k to i8_k quantization for memory efficiency; reduced expansion_add (128→40) and expansion_search (64→20) defaults for improved performance/memory trade-off.
RAG Pipeline Configuration
sdk/runanywhere-commons/src/backends/rag/rac_rag_pipeline.cpp
Changed default top_k retrieval from 3 to 10 when config value is absent or non-positive.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RAGBackend
    participant TextGen as TextGenerator
    participant VectorStore
    participant Chunker
    participant LLM as LlamaCpp Engine

    Client->>RAGBackend: query(text)
    RAGBackend->>TextGen: inject_system_prompt(kICLSystemPrompt)
    TextGen->>LLM: tokenize & cache system prompt
    
    RAGBackend->>VectorStore: retrieve top_k candidates
    VectorStore-->>RAGBackend: parent chunks + similarities
    
    RAGBackend->>Chunker: split_into_sentences(chunk)
    Chunker-->>RAGBackend: sentence array
    
    loop for each sentence
        RAGBackend->>TextGen: append_context(sentence)
        TextGen->>LLM: tokenize & cache sentence
        
        RAGBackend->>TextGen: probe_confidence(context, query)
        TextGen->>LLM: forward pass on "Yes"/"No" logits
        LLM-->>TextGen: confidence score (0.0-1.0)
        TextGen-->>RAGBackend: confidence float
        
        alt confidence >= threshold
            RAGBackend->>RAGBackend: accumulate_context(sentence)
        else confidence < threshold OR partial context
            RAGBackend->>TextGen: clear_context()
            TextGen->>LLM: reset KV cache
        end
    end
    
    RAGBackend->>TextGen: generate_from_context(accumulated_context + query_suffix)
    TextGen->>LLM: generate with accumulated KV state
    LLM-->>TextGen: generated text + metadata
    TextGen-->>RAGBackend: GenerationResult
    
    RAGBackend->>RAGBackend: enrich metadata (sentences_used, confidence, sources)
    RAGBackend-->>Client: GenerationResult with provenance
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

kotlin-sdk

Suggested reviewers

  • Siddhesh2377

Poem

🐰 Hops through cache with prompts so neat,
Confidence scores make context complete,
Sentences split and accumulated with care,
KV-cache wisdom floating in the air! 🎲✨

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files reviewed, 7 comments

Edit Code Review Agent Settings | Greptile

for (const auto& sentence_result : search_results) {
const std::string& sentence_text = sentence_result.text;
std::string append_text = (sentences_used == 0) ? sentence_text : ("\n" + sentence_text);
text_generator->append_context(append_text);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return value not checked - if append_context() fails (e.g., context full), loop continues anyway, leading to incorrect confidence probing

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp
Line: 401

Comment:
Return value not checked - if `append_context()` fails (e.g., context full), loop continues anyway, leading to incorrect confidence probing

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +57 to 60
// Create metric for cosine similarity. Using i8 instead of float to save on RAM(quality isnt affected much)
metric_punned_t metric(
static_cast<std::size_t>(config.dimension),
metric_kind_t::cos_k,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quantization change from f32 to i8 significantly reduces memory but may affect retrieval quality. Verify that recall/precision metrics meet requirements before deploying.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.cpp
Line: 57-60

Comment:
Quantization change from f32 to i8 significantly reduces memory but may affect retrieval quality. Verify that recall/precision metrics meet requirements before deploying.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +51 to +52
size_t expansion_add = 40; // Construction search depth ( even a smaller one should be good enough)
size_t expansion_search = 20; // Query search depth( even a smaller one should be good enough)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HNSW parameters reduced by ~70% (expansion_add: 128→40, expansion_search: 64→20). This trades recall quality for speed. Verify search quality meets requirements.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/rag/vector_store_usearch.h
Line: 51-52

Comment:
HNSW parameters reduced by ~70% (expansion_add: 128→40, expansion_search: 64→20). This trades recall quality for speed. Verify search quality meets requirements.

How can I resolve this? If you propose a fix, please make it concise.

return text.length() / config_.chars_per_token;
}

// used for focus mode in RAG(not final yet, will minmax this further, but this is a working version)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP comment ("not final yet, will minmax this further"). Remove before merging or track in an issue.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/rag/rag_chunker.cpp
Line: 32

Comment:
WIP comment ("not final yet, will minmax this further"). Remove before merging or track in an issue.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +18 to +23
static const std::string kICLSystemPrompt =
"You are a question-answering assistant. Given context passages and a question, "
"determine if the passages contain enough information to answer the question.\n\n"
"Example 1 (Sufficient context):\n"
"Context: \"The Eiffel Tower was completed in 1889 for the World's Fair in Paris.\"\n"
"Question: \"When was the Eiffel Tower built?\"\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ICL prompt examples show explicit context in the prompt format, but actual probing relies on implicit KV cache context. This mismatch may reduce effectiveness of confidence probing.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp
Line: 18-23

Comment:
ICL prompt examples show explicit context in the prompt format, but actual probing relies on implicit KV cache context. This mismatch may reduce effectiveness of confidence probing.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +228 to +252
std::vector<ScoredSentence> scored_sentences;

for (const auto& parent : parent_chunks) {
auto sentences = chunker->split_into_sentences(parent.text);
LOGI("Parent chunk '%s' split into %zu sentences", parent.chunk_id.c_str(), sentences.size());

for (const auto& sentence : sentences) {
if (sentence.size() < 3) {
continue;
}

try {
auto sentence_embedding = embedding_provider->embed(sentence);
float sim = cosine_similarity(query_embedding, sentence_embedding);

scored_sentences.push_back({
sentence,
sim,
parent.chunk_id,
parent.metadata
});
} catch (const std::exception& e) {
LOGE("Failed to embed sentence, skipping: %s", e.what());
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence-level embedding for every sentence across 5 parent chunks creates significant computational overhead compared to the previous chunk-only approach. Consider caching sentence embeddings if parent chunks are frequently accessed.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/rag/rag_backend.cpp
Line: 228-252

Comment:
Sentence-level embedding for every sentence across 5 parent chunks creates significant computational overhead compared to the previous chunk-only approach. Consider caching sentence embeddings if parent chunks are frequently accessed.

How can I resolve this? If you propose a fix, please make it concise.

size_t embedding_dimension = 384;
size_t top_k = 3;
size_t top_k = 10; //Need to get Golden document
float similarity_threshold = 0.15f;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move inline comment to separate line above for better readability.

Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/rag/rag_backend.h
Line: 29

Comment:
Move inline comment to separate line above for better readability.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants