[Web-SDK] [Web-Example] Web SDK fixes by sanchitmonga22 · Pull Request #433 · RunanywhereAI/runanywhere-sdks

sanchitmonga22 · 2026-02-27T08:46:07Z

Updated imports in speak.ts, transcribe.ts, vision.ts, and voice.ts to use the new core package for audio and video functionalities.
Introduced new AudioCapture, AudioPlayback, and VideoCapture classes to handle audio and video processing more efficiently.
Added backend-agnostic types for STT, TTS, LLM, and VLM in the core types module.
Implemented streaming capabilities for model downloads in ModelDownloader.
Enhanced the overall structure for better modularity and maintainability.

Description

Brief description of the changes made.

Type of Change

Bug fix
New feature
Documentation update
Refactoring

Testing

Lint passes locally
Added/updated tests for changes

Platform-Specific Testing (check all that apply)

Swift SDK / iOS Sample:

Tested on iPhone (Simulator or Device)
Tested on iPad / Tablet
Tested on Mac (macOS target)

Kotlin SDK / Android Sample:

Tested on Android Phone (Emulator or Device)
Tested on Android Tablet

Flutter SDK / Flutter Sample:

Tested on iOS
Tested on Android

React Native SDK / React Native Sample:

Tested on iOS
Tested on Android

Playground:

Tested on target platform
Verified no regressions in existing Playground projects
Web SDK / Web Sample:
Tested in Chrome (Desktop)
Tested in Firefox
Tested in Safari
WASM backends load (LlamaCpp + ONNX)
OPFS storage persistence verified (survives page refresh)
Settings persistence verified (localStorage)

Labels

Please add the appropriate label(s):

SDKs:

Swift SDK - Changes to Swift SDK (sdk/runanywhere-swift)
Kotlin SDK - Changes to Kotlin SDK (sdk/runanywhere-kotlin)
Flutter SDK - Changes to Flutter SDK (sdk/runanywhere-flutter)
React Native SDK - Changes to React Native SDK (sdk/runanywhere-react-native)
Web SDK - Changes to Web SDK (sdk/runanywhere-web)
Commons - Changes to shared native code (sdk/runanywhere-commons)

Sample Apps:

iOS Sample - Changes to iOS example app (examples/ios)
Android Sample - Changes to Android example app (examples/android)
Flutter Sample - Changes to Flutter example app (examples/flutter)
React Native Sample - Changes to React Native example app (examples/react-native)
Web Sample - Changes to Web example app (examples/web)

Checklist

Code follows project style guidelines
Self-review completed
Documentation updated (if needed)

Screenshots

Attach relevant UI screenshots for changes (if applicable):

Mobile (Phone)
Tablet / iPad
Desktop / Mac

Summary by CodeRabbit

New Features
- Added streaming download support with real-time progress tracking for models
- Introduced configurable system prompts and options for vision model processing
- Added hardware acceleration selection (CPU/WebGPU) during initialization
Bug Fixes
- Implemented automatic CPU fallback for models encountering WebGPU compatibility issues
- Enhanced WASM module loading with configurable base URLs and helper paths
- Added token repetition detection to prevent infinite loops
Refactor
- Unified type definitions for speech-to-text, text-to-speech, and voice activity detection across packages
- Reorganized module imports for clearer separation of concerns
- Expanded audio/video capture infrastructure in public APIs
Chores
- Updated package versions to 0.1.0-beta.10

Important

Refactor Web SDK to improve modularity and type safety by centralizing audio/video processing and types in a core package, and implement streaming model downloads.

Behavior:
- Updated imports in speak.ts, transcribe.ts, vision.ts, and voice.ts to use new core package for audio/video functionalities.
- Introduced AudioCapture, AudioPlayback, and VideoCapture classes for efficient audio/video processing.
- Implemented streaming capabilities for model downloads in ModelDownloader.
Types:
- Added backend-agnostic types for STT, TTS, LLM, and VLM in core types module.
Structure:
- Enhanced structure for better modularity and maintainability.

^{This description was created by}^{for 3a8d798. You can customize this summary. It will automatically update as commits are pushed.}

Greptile Summary

Refactors Web SDK architecture by consolidating backend-agnostic types (STT, TTS, VLM, VAD, LLM) into the core package, enabling cleaner separation between infrastructure and backend implementations.

Key improvements:

Streaming model downloads with automatic fallbacks (OPFS → memory cache) prevent download failures due to quota issues
VLM enhancements: system prompt support, improved sampler chain with DRY sampler to prevent repetition, and CPU fallback for Qwen2-VL on WebGPU
LoRA adapter handling updated to use new llama.cpp batch API (llama_set_adapters_lora)
Audio/video capture classes (AudioCapture, AudioPlayback, VideoCapture) now properly exported from core
Provider interfaces strengthened with concrete return types instead of loose object signatures

Architecture changes:

Types moved from backend packages to @runanywhere/web core: VLMTypes.ts, STTTypes.ts, TTSTypes.ts, VADTypes.ts, LLMTypes.ts
Backend packages now re-export core types and define only backend-specific configs (e.g., VLMModelFamily enum in llamacpp, model file configurations in onnx)
Example apps updated to import audio/video infrastructure from core package

Confidence Score: 4/5

Safe to merge with minor risks - type refactoring is well-executed, but C++ VLM changes add significant new logic
Type consolidation is clean and maintains backward compatibility through re-exports. Streaming download implementation has proper error handling and fallbacks. VLM C++ changes are comprehensive but add complexity (system prompt handling, sampler reconfiguration, CPU fallback for Qwen2-VL). LoRA adapter updates correctly follow new llama.cpp API
Pay attention to rac_vlm_llamacpp.cpp for VLM runtime behavior with new sampler chain and CPU fallback logic

Important Files Changed

Filename	Overview
sdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.ts	Streaming model download implementation with proper fallbacks to buffered download and memory cache when OPFS fails
sdk/runanywhere-web/packages/core/src/types/VLMTypes.ts	Backend-agnostic VLM type definitions moved to core package for better modularity
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp	Added system prompt support, improved sampler chain with DRY sampler for repetition prevention, and CPU fallback for Qwen2-VL on WebGPU
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp	Updated LoRA adapter API calls to use new llama.cpp API (llama_set_adapters_lora instead of deprecated single-adapter methods)
sdk/runanywhere-web/packages/core/src/Infrastructure/ProviderTypes.ts	Improved type safety by using concrete types from core instead of loose object signatures
sdk/runanywhere-web/packages/llamacpp/src/Extensions/VLMTypes.ts	Refactored to re-export generic VLM types from core, keeps only backend-specific VLMModelFamily enum

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TB
    Core["@runanywhere/web (core)<br/>- VLMTypes.ts<br/>- STTTypes.ts<br/>- TTSTypes.ts<br/>- LLMTypes.ts<br/>- AudioCapture<br/>- VideoCapture<br/>- ModelDownloader"]
    
    LlamaCpp["@runanywhere/web-llamacpp<br/>(Backend)"]
    ONNX["@runanywhere/web-onnx<br/>(Backend)"]
    
    LlamaCppTypes["VLMTypes.ts<br/>- Re-exports core types<br/>- Adds VLMModelFamily enum"]
    ONNXTypes["STTTypes.ts<br/>- Re-exports core types<br/>- Adds STTModelConfig"]
    
    Providers["Provider Interfaces<br/>(ProviderTypes.ts)<br/>- LLMProvider<br/>- STTProvider<br/>- TTSProvider"]
    
    Examples["Example Apps<br/>(speak.ts, vision.ts, etc.)"]
    
    Core --> |"Exports generic types"| LlamaCpp
    Core --> |"Exports generic types"| ONNX
    Core --> |"Type-safe interfaces"| Providers
    
    LlamaCpp --> |"Re-exports + backend enums"| LlamaCppTypes
    ONNX --> |"Re-exports + model configs"| ONNXTypes
    
    LlamaCppTypes --> |"Implements"| Providers
    ONNXTypes --> |"Implements"| Providers
    
    Core --> |"Imports audio/video"| Examples
    LlamaCppTypes --> |"Uses VLM types"| Examples
    ONNXTypes --> |"Uses STT/TTS types"| Examples
    
    style Core fill:#e1f5ff
    style LlamaCpp fill:#fff4e1
    style ONNX fill:#fff4e1
    style Providers fill:#e8f5e8

_{Last reviewed commit: 3a8d798}

- Updated imports in `speak.ts`, `transcribe.ts`, `vision.ts`, and `voice.ts` to use the new core package for audio and video functionalities. - Introduced new `AudioCapture`, `AudioPlayback`, and `VideoCapture` classes to handle audio and video processing more efficiently. - Added backend-agnostic types for STT, TTS, LLM, and VLM in the core types module. - Implemented streaming capabilities for model downloads in `ModelDownloader`. - Enhanced the overall structure for better modularity and maintainability.

coderabbitai · 2026-02-27T08:46:17Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 400e323c-90e2-456c-aa3f-238f424bf769

📥 Commits

Reviewing files that changed from the base of the PR and between 3a8d798 and c0a1533.

📒 Files selected for processing (7)

sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp
sdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.ts
sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.ts
sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts
sdk/runanywhere-web/packages/llamacpp/src/index.ts
sdk/runanywhere-web/packages/onnx/src/index.ts

📝 Walkthrough

Walkthrough

This pull request refactors the RunAnywhere Web SDK module boundaries by consolidating audio I/O infrastructure and speech-related types into the core package, reorganizing type definitions across multiple files to support backend-agnostic interfaces, and adding streaming-to-storage download capability. Additionally, it extends VLM processing with new parameters (system prompt, topP, model family), updates provider interfaces with strongly-typed result types, and introduces configuration options for WASM and helper file base URLs in backend packages.

Changes

Cohort / File(s)	Summary
Example View Imports `examples/web/RunAnywhereAI/src/views/speak.ts`, `transcribe.ts`, `vision.ts`, `voice.ts`	Reorganized imports to use AudioCapture, AudioPlayback, and VideoCapture from core package instead of backend-specific paths; consolidated related imports under unified source modules; added systemPrompt option to VLM worker process.
Core Infrastructure `sdk/runanywhere-web/packages/core/src/Infrastructure/AudioCapture.ts`, `AudioPlayback.ts`, `VideoCapture.ts`	Updated import paths for SDKLogger and related dependencies from external package to local relative modules.
Core Model Download `sdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.ts`	Added `downloadAndStoreStreaming()` method to support streaming-first download pathway with fallback to buffered download; integrated streaming and buffered progress aggregation for primary and companion files.
Core Provider Types `sdk/runanywhere-web/packages/core/src/Infrastructure/ProviderTypes.ts`	Replaced inline result/options types in LLMProvider, STTProvider, and TTSProvider with strongly-typed dedicated types (LLMGenerationResult, STTTranscriptionResult, STTTranscribeOptions, TTSSynthesisResult, TTSSynthesizeOptions); added ProviderMap interface.
Core Type Definitions `sdk/runanywhere-web/packages/core/src/types/LLMTypes.ts`, `STTTypes.ts`, `TTSTypes.ts`, `VADTypes.ts`, `VLMTypes.ts`, `enums.ts`	Introduced backend-agnostic type definitions for STT, TTS, VAD, and VLM; expanded ModelCategory enum with speech, vision, and audio categories; updated import sources for HardwareAcceleration.
Core Type Exports `sdk/runanywhere-web/packages/core/src/types.ts`, `types/index.ts`	Added comprehensive re-exports of LLM, VLM, STT, TTS, and VAD types from feature-specific type modules; reorganized type groupings for public API surface.
Core Public API `sdk/runanywhere-web/packages/core/src/index.ts`	Expanded exports to include AudioCapture, AudioPlayback, AudioFileLoader, and VideoCapture with associated types as public infrastructure components.
Core Voice Pipeline `sdk/runanywhere-web/packages/core/src/Public/Extensions/RunAnywhere+VoicePipeline.ts`	Moved redundant provider requirement checks to single upfront validation before processing; removed duplicate checks from individual transcription, generation, and synthesis steps.
LlamaCpp Backend Configuration `sdk/runanywhere-web/packages/llamacpp/src/LlamaCPP.ts`, `LlamaCppProvider.ts`, `LlamaCppBridge.ts`	Added LlamaCPPRegisterOptions interface with wasmUrl, webgpuWasmUrl, and acceleration fields; propagated options through register methods; updated WASM binary locating with dynamic baseUrl derivation and locateFile callback.
LlamaCpp Type Reorganization `sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+TextGeneration.ts`, `RunAnywhere+VLM.ts`, `Extensions/VLMTypes.ts`, `src/index.ts`	Moved LLM and VLM type imports to central package; narrowed VLM exports to only VLMModelFamily; re-exported generic VLM types from core; added VLMModelFamily enum for architecture support (Qwen2VL, SmolVLM, LLaVA, etc.).
LlamaCpp VLM Processing `sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.ts`, `VLMWorkerRuntime.ts`	Extended VLM process payload and options with topP, systemPrompt, and modelFamily parameters; added QwenVL WebGPU-to-CPU WASM fallback detection and restart logic; integrated WASM baseUrl handling and heap memory allocation for system prompt.
ONNX Backend Configuration `sdk/runanywhere-web/packages/onnx/src/ONNX.ts`, `Foundation/SherpaONNXBridge.ts`, `Foundation/SherpaHelperLoader.ts`, `src/index.ts`	Added ONNXRegisterOptions interface with wasmUrl and helperBaseUrl fields; implemented dynamic helperBaseUrl derivation on WASM load; updated helper file URL resolution to use bridge base URL with fallback to import.meta.url.
ONNX Type Consolidation `sdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere+STT.ts`, `RunAnywhere+TTS.ts`, `RunAnywhere+VAD.ts`, `STTTypes.ts`, `TTSTypes.ts`, `VADTypes.ts`	Narrowed public type exports; moved transcription/synthesis result types to core; re-exported shared contract types from central package; expanded backend-specific config interfaces (VADModelConfig, TTSVoiceConfig) with additional parameters.
C++ VLM Backend `sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`	Enhanced format_vlm_prompt_with_template to accept system_prompt and model_type; added per-request effective model type resolution; implemented Qwen2VL default system prompt injection; added sampler configuration with penalties, DRY, min_p; implemented repetition guard against token looping; added CPU fallback for M-RoPE WebGPU failures; extended logging with diagnostics and NaN/Inf checks.
C++ LoRA Management `sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`	Added batch adapter clear in apply_lora_adapters when no adapters present; removed redundant comments; maintained existing teardown behavior.
Package Version Updates `sdk/runanywhere-web/packages/core/package.json`, `llamacpp/package.json`, `onnx/package.json`	Bumped package versions from 0.1.0-beta.9 to 0.1.0-beta.10 across core, llamacpp, and onnx packages.
Model Registry `examples/web/RunAnywhereAI/src/services/model-manager.ts`	Added Qwen2-VL 2B Instruct Q4_K_M model entry with M-RoPE and WebGPU NaN mitigation notes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Fix/372 streaming model import #395: Adds streaming support for large model imports (streaming-to-storage, stream-based loaders) with overlapping changes to ModelDownloader download paths.
feat(web-sdk): add full TypeScript definitions and improve TS support #403: Reorganizes core package's public TypeScript types and exports, moving audio/VAD/TTS types into core with re-export surfaces.
Web SDK (Beta) #351: Introduces the Web SDK refactor that originally added/exported AudioPlayback and AudioCapture from the core package, upon which this PR's import changes depend.

Suggested reviewers

Siddhesh2377

🐰 The rabbit hops through code with glee,
Moving audio from onnx to core so free,
Types consolidated, imports rearranged,
VLM now whispers with system prompts changed,
Qwen2 on CPU when GPU won't play!

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	Pull request description is largely incomplete. It includes an initial summary of high-level changes but lacks detailed information in required template sections (Testing, Labels, Checklist), with most checkboxes unchecked and no platform-specific testing results provided.	Complete the PR description by: (1) selecting the appropriate Type of Change, (2) confirming Lint and test status, (3) checking relevant Platform-Specific Testing items (especially Web SDK tests: Chrome/Firefox/Safari, WASM backends, OPFS persistence), (4) adding applicable Labels (Web SDK, Commons, Web Sample), and (5) checking Checklist items to confirm code style compliance and self-review completion.
Title check	❓ Inconclusive	The title '[Web-SDK] [Web-Example] Web SDK fixes' is vague and generic, using 'fixes' which doesn't clearly convey the main change.	Replace with a more specific title that highlights the primary refactoring, e.g., 'Refactor Web SDK audio/video infrastructure and centralize backend-agnostic types in core package'.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch smonga/web_updates

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

shubhammalhotra28 · 2026-02-27T22:02:31Z

@coderabbitai please do a detailed review. Thanks!

shubhammalhotra28 · 2026-02-27T22:02:36Z

@greptile please do a detailed review. Thanks!

coderabbitai · 2026-02-27T22:02:53Z

@shubhammalhotra28, I'll perform a detailed review of all the changes in this PR. I'll examine:

The updated imports and new audio/video handling classes (AudioCapture, AudioPlayback, VideoCapture)
Backend-agnostic types for STT, TTS, LLM, and VLM
ModelDownloader streaming implementation
Overall code quality, modularity, and potential issues

Let me start the comprehensive review now.

🧠 Learnings used

Learnt from: CR
Repo: RunanywhereAI/runanywhere-sdks PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-27T08:41:48.423Z
Learning: Applies to **/runanywhere-web/packages/core/**/*.{ts,tsx} : Web SDK (TypeScript): Run `npm run typecheck -w packages/core` for type checking TypeScript files in the Web SDK

Learnt from: CR
Repo: RunanywhereAI/runanywhere-sdks PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-27T08:41:48.423Z
Learning: Applies to **/runanywhere-web/packages/core/**/*.{ts,tsx} : Web SDK (TypeScript): Use `npm run build -w packages/core` (from `sdk/runanywhere-web/`) to build; note that `llamacpp` package has a pre-existing duplicate index signature TypeScript error

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

sdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.ts (1)

275-300: ⚠️ Potential issue | 🟡 Minor

Use backend-agnostic size accounting for additional files.

At Line 298, this.storage.getFileSize(fileKey) only checks OPFS. If local storage is active and streaming succeeds there, size can be null, so totalSize is undercounted.

Proposed fix

@@
-      // Download additional files (e.g., mmproj for VLM)
+      // Download additional files (e.g., mmproj for VLM)
+      let additionalTotalSize = 0;
       if (model.additionalFiles && model.additionalFiles.length > 0) {
         for (let i = 0; i < model.additionalFiles.length; i++) {
@@
-          const streamedSize = await this.downloadAndStoreStreaming(file.url, fileKey, fileProgressCb);
-          if (streamedSize === null) {
+          const streamedSize = await this.downloadAndStoreStreaming(file.url, fileKey, fileProgressCb);
+          if (streamedSize === null) {
             const fileData = await this.downloadFile(file.url, fileProgressCb);
             await this.storeInOPFS(fileKey, fileData);
+            additionalTotalSize += fileData.length;
+          } else {
+            additionalTotalSize += streamedSize;
           }
         }
       }
@@
-      let totalSize = primarySize;
-      if (model.additionalFiles) {
-        for (const file of model.additionalFiles) {
-          const fileKey = this.additionalFileKey(modelId, file.filename);
-          const size = await this.storage.getFileSize(fileKey);
-          if (size !== null) totalSize += size;
-        }
-      }
+      const totalSize = primarySize + additionalTotalSize;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.ts`
around lines 275 - 300, The loop that sums additional file sizes uses
this.storage.getFileSize(fileKey) which only checks OPFS and can return null
when the file exists in the other backend (e.g. local storage after streaming);
implement a backend-agnostic helper (e.g. this.getStoredFileSize(fileKey)) that
first calls this.storage.getFileSize(fileKey) and if that returns null falls
back to retrieving the stored file via an existing read method (e.g.
this.storage.getFile or similar) and computes the size from the returned
Blob/ArrayBuffer, then replace the current call in the ModelDownloader
additional-files loop (where totalSize is computed using additionalFileKey and
storage.getFileSize) to use this new helper so totalSize correctly accounts for
files regardless of storage backend.

🧹 Nitpick comments (6)

sdk/runanywhere-web/packages/core/src/types/TTSTypes.ts (1)

8-18: Consider moving the index signature after named properties.

The index signature on line 9 works correctly, but placing it after the named properties is more conventional and improves readability by showing the core contract first.

♻️ Suggested reordering

 export interface TTSSynthesisResult {
-  [key: string]: unknown;
   /** Raw PCM audio data */
   audioData: Float32Array;
   /** Audio sample rate */
   sampleRate: number;
   /** Duration in milliseconds */
   durationMs: number;
   /** Processing time in milliseconds */
   processingTimeMs: number;
+  /** Allow backend-specific extensions */
+  [key: string]: unknown;
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-web/packages/core/src/types/TTSTypes.ts` around lines 8 - 18,
The TTSSynthesisResult interface currently places the index signature before the
named properties; reorder it so the named properties (audioData, sampleRate,
durationMs, processingTimeMs) come first and then include the index signature
([key: string]: unknown) after them to improve readability and emphasize the
core contract of TTSSynthesisResult.

sdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere+VAD.ts (1)

251-261: Consider logging suppressed errors in cleanup.

The empty catch block silently swallows all errors during VAD destruction. While cleanup errors are often non-critical, logging them at debug level aids troubleshooting.

💡 Suggested improvement

   cleanup(): void {
     if (this._vadHandle !== 0) {
       try {
         SherpaONNXBridge.shared.module._SherpaOnnxDestroyVoiceActivityDetector(this._vadHandle);
-      } catch { /* ignore */ }
+      } catch (e) {
+        logger.debug('VAD cleanup error (non-critical):', e);
+      }
       this._vadHandle = 0;
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere`+VAD.ts around
lines 251 - 261, The cleanup() method currently swallows errors when destroying
the VAD handle (_vadHandle)—update the catch block to log the caught error at
debug level instead of ignoring it; specifically, catch the error from
SherpaONNXBridge.shared.module._SherpaOnnxDestroyVoiceActivityDetector(this._vadHandle)
and call your debug logger (e.g., console.debug or your module logger) with a
short message like "Failed to destroy VAD handle" and include the error object,
then continue to set _vadHandle = 0 and clear _jsActivityCallback,
_lastSpeechState and _speechStartMs as before.

sdk/runanywhere-web/packages/core/src/types/VLMTypes.ts (1)

16-26: Consider stricter discriminated union for VLMImage.

The current interface allows all data fields to be present or absent simultaneously. A discriminated union would provide compile-time enforcement:

💡 Stricter type alternative (optional)

export type VLMImage =
  | { format: VLMImageFormat.FilePath; filePath: string; width?: number; height?: number }
  | { format: VLMImageFormat.RGBPixels; pixelData: Uint8Array; width: number; height: number }
  | { format: VLMImageFormat.Base64; base64Data: string; width?: number; height?: number };

This ensures exactly one data field is provided based on the format. However, the current approach is valid if flexibility is preferred.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-web/packages/core/src/types/VLMTypes.ts` around lines 16 -
26, Replace the loose VLMImage interface with a discriminated union keyed on the
existing format property (VLMImageFormat) so each variant requires exactly the
appropriate data field: for format VLMImageFormat.FilePath require filePath
(width/height optional), for VLMImageFormat.RGBPixels require pixelData and make
width and height required, and for VLMImageFormat.Base64 require base64Data
(width/height optional); update any usages of VLMImage to accept the new union
type and adjust code that assumed optional fields may be absent.

sdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.ts (1)

96-101: Harden helper URL construction against missing trailing slash.

Line 99 assumes helperBaseUrl always ends with /. If it doesn’t, helper fetches fail due to malformed URLs.

Proposed minimal fix

-  const bridgeBase = SherpaONNXBridge.shared.helperBaseUrl;
-  const url = bridgeBase
-    ? `${bridgeBase}${filename}`
-    : new URL(`../../wasm/sherpa/${filename}`, import.meta.url).href;
+  const bridgeBase = SherpaONNXBridge.shared.helperBaseUrl;
+  const normalizedBase = bridgeBase
+    ? (bridgeBase.endsWith('/') ? bridgeBase : `${bridgeBase}/`)
+    : null;
+  const url = normalizedBase
+    ? `${normalizedBase}${filename}`
+    : new URL(`../../wasm/sherpa/${filename}`, import.meta.url).href;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.ts`
around lines 96 - 101, The construction of the helper URL uses
SherpaONNXBridge.shared.helperBaseUrl (bridgeBase) but assumes it ends with a
trailing slash, causing malformed URLs when it doesn't; update the logic in
SherpaHelperLoader.ts where bridgeBase is used to build url (the bridgeBase /
url / filename code) to normalize bridgeBase by ensuring it ends with a single
'/' (or use URL(filename, bridgeBase) semantics) before concatenating so
`${bridgeBase}${filename}` always produces a valid path.

sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts (1)

567-599: Add a defensive topP fallback before writing VLM options.

At Line 598, topP is written directly. Guarding against non-finite values in the worker keeps inference stable even if payload shape drifts.

Defensive fallback patch

 async function processImage(
   rgbPixels: ArrayBuffer,
   width: number, height: number,
   prompt: string,
   maxTokens: number, temperature: number,
-  topP: number, systemPrompt?: string,
+  topP?: number, systemPrompt?: string,
 ): Promise<VLMWorkerResult> {
@@
-  m.setValue(optPtr + vo.topP, topP, 'float');
+  const safeTopP = Number.isFinite(topP) ? (topP as number) : 0.9;
+  m.setValue(optPtr + vo.topP, safeTopP, 'float');

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts`
around lines 567 - 599, The code writes topP into the VLM options without
validation; before calling m.setValue(optPtr + vo.topP, topP, 'float') in
VLMWorkerRuntime (around where optPtr and vo are used), guard topP with a
defensive fallback such as const safeTopP = Number.isFinite(topP) ? topP : 1.0
and then write safeTopP instead; ensure you reference and update the m.setValue
call that uses vo.topP to use the validated safeTopP variable.

sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+VLM.ts (1)

25-25: Consider a temporary compatibility re-export for removed VLM types.

At Line 25, this module now exports only VLMModelFamily; consumers importing VLMImageFormat/VLM* types from this module path will break. A one-release compatibility re-export (or explicit migration note) would make upgrades safer.

Compatibility re-export option

-export { VLMModelFamily } from './VLMTypes';
+export { VLMImageFormat, VLMModelFamily } from './VLMTypes';
+export type {
+  VLMImage,
+  VLMGenerationOptions,
+  VLMGenerationResult,
+  VLMStreamingResult,
+} from './VLMTypes';

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere`+VLM.ts at
line 25, The export change removed several VLM types consumers may still import
(e.g., VLMImageFormat and other VLM* types), so restore a one-release
compatibility re-export from the original source: re-export VLMImageFormat and
any other VLM* symbols alongside VLMModelFamily from './VLMTypes' in
RunAnywhere+VLM.ts (or add a clear comment indicating it is a temporary shim for
migration), ensuring imports like `VLMImageFormat` continue to resolve until
callers are updated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sdk/runanywhere-web/packages/llamacpp/src/index.ts`:
- Around line 34-35: Exports were changed to expose VLM and VLMModelFamily but
you removed the original shared VLM contract type exports consumers import from
`@runanywhere/web-llamacpp`; restore compatibility by re-exporting the original
contract type names as aliases from the new module. Update the barrel (index.ts)
to export the original shared types (the same public type names consumers used)
alongside VLM and VLMModelFamily by forwarding them from
'./Extensions/RunAnywhere+VLM' so existing imports continue to compile.

In `@sdk/runanywhere-web/packages/onnx/src/index.ts`:
- Around line 28-34: The public barrel removed shared STT/TTS/VAD contract
exports causing breaking changes; restore backward compatibility by re-exporting
the original shared contract symbols from the ONNX package index: export the
shared STT, TTS and VAD contract types/names under their previous names (e.g.
STT, STTModelType, STTModelConfig, STTWhisperFiles, STTZipformerFiles,
STTParaformerFiles, TTS, TTSVoiceConfig, VAD, VADModelConfig) as aliases that
point to the correct backend-specific or common contract definitions so
consumers importing from `@runanywhere/web-onnx` keep working. Ensure these
re-exports are added to the existing ./index.ts barrel alongside the
backend-specific exports.

In `@sdk/runanywhere-web/packages/onnx/src/ONNX.ts`:
- Around line 47-49: The code sets bridge.helperBaseUrl directly from
options.helperBaseUrl which can omit the required trailing slash; update the
assignment in the initialization block (where options?.helperBaseUrl is handled
before calling ONNXProvider.register()) to normalize the value by appending a
single '/' if one is not present (e.g., check
options.helperBaseUrl.endsWith('/') and add '/' when false) and then assign the
normalized string to bridge.helperBaseUrl so all helper URLs built from it are
valid.

---

Outside diff comments:
In `@sdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.ts`:
- Around line 275-300: The loop that sums additional file sizes uses
this.storage.getFileSize(fileKey) which only checks OPFS and can return null
when the file exists in the other backend (e.g. local storage after streaming);
implement a backend-agnostic helper (e.g. this.getStoredFileSize(fileKey)) that
first calls this.storage.getFileSize(fileKey) and if that returns null falls
back to retrieving the stored file via an existing read method (e.g.
this.storage.getFile or similar) and computes the size from the returned
Blob/ArrayBuffer, then replace the current call in the ModelDownloader
additional-files loop (where totalSize is computed using additionalFileKey and
storage.getFileSize) to use this new helper so totalSize correctly accounts for
files regardless of storage backend.

---

Nitpick comments:
In `@sdk/runanywhere-web/packages/core/src/types/TTSTypes.ts`:
- Around line 8-18: The TTSSynthesisResult interface currently places the index
signature before the named properties; reorder it so the named properties
(audioData, sampleRate, durationMs, processingTimeMs) come first and then
include the index signature ([key: string]: unknown) after them to improve
readability and emphasize the core contract of TTSSynthesisResult.

In `@sdk/runanywhere-web/packages/core/src/types/VLMTypes.ts`:
- Around line 16-26: Replace the loose VLMImage interface with a discriminated
union keyed on the existing format property (VLMImageFormat) so each variant
requires exactly the appropriate data field: for format VLMImageFormat.FilePath
require filePath (width/height optional), for VLMImageFormat.RGBPixels require
pixelData and make width and height required, and for VLMImageFormat.Base64
require base64Data (width/height optional); update any usages of VLMImage to
accept the new union type and adjust code that assumed optional fields may be
absent.

In `@sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere`+VLM.ts:
- Line 25: The export change removed several VLM types consumers may still
import (e.g., VLMImageFormat and other VLM* types), so restore a one-release
compatibility re-export from the original source: re-export VLMImageFormat and
any other VLM* symbols alongside VLMModelFamily from './VLMTypes' in
RunAnywhere+VLM.ts (or add a clear comment indicating it is a temporary shim for
migration), ensuring imports like `VLMImageFormat` continue to resolve until
callers are updated.

In
`@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts`:
- Around line 567-599: The code writes topP into the VLM options without
validation; before calling m.setValue(optPtr + vo.topP, topP, 'float') in
VLMWorkerRuntime (around where optPtr and vo are used), guard topP with a
defensive fallback such as const safeTopP = Number.isFinite(topP) ? topP : 1.0
and then write safeTopP instead; ensure you reference and update the m.setValue
call that uses vo.topP to use the validated safeTopP variable.

In `@sdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere`+VAD.ts:
- Around line 251-261: The cleanup() method currently swallows errors when
destroying the VAD handle (_vadHandle)—update the catch block to log the caught
error at debug level instead of ignoring it; specifically, catch the error from
SherpaONNXBridge.shared.module._SherpaOnnxDestroyVoiceActivityDetector(this._vadHandle)
and call your debug logger (e.g., console.debug or your module logger) with a
short message like "Failed to destroy VAD handle" and include the error object,
then continue to set _vadHandle = 0 and clear _jsActivityCallback,
_lastSpeechState and _speechStartMs as before.

In `@sdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.ts`:
- Around line 96-101: The construction of the helper URL uses
SherpaONNXBridge.shared.helperBaseUrl (bridgeBase) but assumes it ends with a
trailing slash, causing malformed URLs when it doesn't; update the logic in
SherpaHelperLoader.ts where bridgeBase is used to build url (the bridgeBase /
url / filename code) to normalize bridgeBase by ensuring it ends with a single
'/' (or use URL(filename, bridgeBase) semantics) before concatenating so
`${bridgeBase}${filename}` always produces a valid path.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61f6f1a and 4a2ea17.

📒 Files selected for processing (39)

examples/web/RunAnywhereAI/src/views/speak.ts
examples/web/RunAnywhereAI/src/views/transcribe.ts
examples/web/RunAnywhereAI/src/views/vision.ts
examples/web/RunAnywhereAI/src/views/voice.ts
sdk/runanywhere-web/packages/core/src/Infrastructure/AudioCapture.ts
sdk/runanywhere-web/packages/core/src/Infrastructure/AudioFileLoader.ts
sdk/runanywhere-web/packages/core/src/Infrastructure/AudioPlayback.ts
sdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.ts
sdk/runanywhere-web/packages/core/src/Infrastructure/ProviderTypes.ts
sdk/runanywhere-web/packages/core/src/Infrastructure/VideoCapture.ts
sdk/runanywhere-web/packages/core/src/Public/Extensions/RunAnywhere+VoicePipeline.ts
sdk/runanywhere-web/packages/core/src/index.ts
sdk/runanywhere-web/packages/core/src/types.ts
sdk/runanywhere-web/packages/core/src/types/LLMTypes.ts
sdk/runanywhere-web/packages/core/src/types/STTTypes.ts
sdk/runanywhere-web/packages/core/src/types/TTSTypes.ts
sdk/runanywhere-web/packages/core/src/types/VADTypes.ts
sdk/runanywhere-web/packages/core/src/types/VLMTypes.ts
sdk/runanywhere-web/packages/core/src/types/enums.ts
sdk/runanywhere-web/packages/core/src/types/index.ts
sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+TextGeneration.ts
sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+VLM.ts
sdk/runanywhere-web/packages/llamacpp/src/Extensions/VLMTypes.ts
sdk/runanywhere-web/packages/llamacpp/src/Foundation/LlamaCppBridge.ts
sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.ts
sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts
sdk/runanywhere-web/packages/llamacpp/src/LlamaCPP.ts
sdk/runanywhere-web/packages/llamacpp/src/LlamaCppProvider.ts
sdk/runanywhere-web/packages/llamacpp/src/index.ts
sdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere+STT.ts
sdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere+TTS.ts
sdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere+VAD.ts
sdk/runanywhere-web/packages/onnx/src/Extensions/STTTypes.ts
sdk/runanywhere-web/packages/onnx/src/Extensions/TTSTypes.ts
sdk/runanywhere-web/packages/onnx/src/Extensions/VADTypes.ts
sdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.ts
sdk/runanywhere-web/packages/onnx/src/Foundation/SherpaONNXBridge.ts
sdk/runanywhere-web/packages/onnx/src/ONNX.ts
sdk/runanywhere-web/packages/onnx/src/index.ts

sdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.ts

coderabbitai · 2026-02-27T22:10:42Z

sdk/runanywhere-web/packages/llamacpp/src/index.ts

+export { VLM, VLMModelFamily } from './Extensions/RunAnywhere+VLM';
 export { ToolCalling, ToolCallFormat, toToolValue, fromToolValue, getStringArg, getNumberArg } from './Extensions/RunAnywhere+ToolCalling';


⚠️ Potential issue | 🟠 Major

Preserve compatibility aliases for moved VLM contract exports.

This barrel now exposes VLM/VLMModelFamily but not shared VLM contract types. Consumers importing those contracts from @runanywhere/web-llamacpp will hit compile-time breakage.

Proposed compatibility alias patch

export { TextGeneration } from './Extensions/RunAnywhere+TextGeneration'; export { VLM, VLMModelFamily } from './Extensions/RunAnywhere+VLM'; +// Backward-compatible aliases (deprecate in docs, remove next major) +export { VLMImageFormat } from '@runanywhere/web'; +export type { + VLMGenerationOptions, + VLMGenerationResult, + VLMStreamingResult, +} from '@runanywhere/web';

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export { VLM, VLMModelFamily } from './Extensions/RunAnywhere+VLM';

export { ToolCalling, ToolCallFormat, toToolValue, fromToolValue, getStringArg, getNumberArg } from './Extensions/RunAnywhere+ToolCalling';

export { VLM, VLMModelFamily } from './Extensions/RunAnywhere+VLM';

// Backward-compatible aliases (deprecate in docs, remove next major)

export { VLMImageFormat } from '@runanywhere/web';

export type {

VLMGenerationOptions,

VLMGenerationResult,

VLMStreamingResult,

} from '@runanywhere/web';

export { ToolCalling, ToolCallFormat, toToolValue, fromToolValue, getStringArg, getNumberArg } from './Extensions/RunAnywhere+ToolCalling';

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/llamacpp/src/index.ts` around lines 34 - 35, Exports were changed to expose VLM and VLMModelFamily but you removed the original shared VLM contract type exports consumers import from `@runanywhere/web-llamacpp`; restore compatibility by re-exporting the original contract type names as aliases from the new module. Update the barrel (index.ts) to export the original shared types (the same public type names consumers used) alongside VLM and VLMModelFamily by forwarding them from './Extensions/RunAnywhere+VLM' so existing imports continue to compile.

coderabbitai · 2026-02-27T22:10:42Z

sdk/runanywhere-web/packages/onnx/src/index.ts

+// Extensions (backend-specific implementations + backend-specific config types)
 export { STT, STTModelType } from './Extensions/RunAnywhere+STT';
-export type {
-  STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles,
-  STTTranscriptionResult, STTWord, STTTranscribeOptions, STTStreamCallback, STTStreamingSession,
-} from './Extensions/RunAnywhere+STT';
+export type { STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles } from './Extensions/RunAnywhere+STT';
 export { TTS } from './Extensions/RunAnywhere+TTS';
-export type { TTSVoiceConfig, TTSSynthesisResult, TTSSynthesizeOptions } from './Extensions/RunAnywhere+TTS';
-export { VAD, SpeechActivity } from './Extensions/RunAnywhere+VAD';
-export type { SpeechActivityCallback, VADModelConfig, SpeechSegment } from './Extensions/RunAnywhere+VAD';
+export type { TTSVoiceConfig } from './Extensions/RunAnywhere+TTS';
+export { VAD } from './Extensions/RunAnywhere+VAD';
+export type { VADModelConfig } from './Extensions/RunAnywhere+VAD';


⚠️ Potential issue | 🟠 Major

Keep ONNX barrel backward-compatible for shared STT/TTS/VAD contracts.

The barrel now mainly exposes backend-specific config types. If consumers import shared contract types from @runanywhere/web-onnx, they will break after this change.

Proposed compatibility alias patch

export { STT, STTModelType } from './Extensions/RunAnywhere+STT'; export type { STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles } from './Extensions/RunAnywhere+STT'; export { TTS } from './Extensions/RunAnywhere+TTS'; export type { TTSVoiceConfig } from './Extensions/RunAnywhere+TTS'; export { VAD } from './Extensions/RunAnywhere+VAD'; export type { VADModelConfig } from './Extensions/RunAnywhere+VAD'; + +// Backward-compatible aliases (deprecate in docs, remove next major) +export type { + STTTranscriptionResult, + STTWord, + STTTranscribeOptions, + STTStreamCallback, + STTStreamingSession, + TTSSynthesisResult, + TTSSynthesizeOptions, + SpeechActivityCallback, + SpeechSegment, +} from '@runanywhere/web'; +export { SpeechActivity } from '@runanywhere/web';

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Extensions (backend-specific implementations + backend-specific config types)

export { STT, STTModelType } from './Extensions/RunAnywhere+STT';

export type {

STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles,

STTTranscriptionResult, STTWord, STTTranscribeOptions, STTStreamCallback, STTStreamingSession,

} from './Extensions/RunAnywhere+STT';

export type { STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles } from './Extensions/RunAnywhere+STT';

export { TTS } from './Extensions/RunAnywhere+TTS';

export type { TTSVoiceConfig, TTSSynthesisResult, TTSSynthesizeOptions } from './Extensions/RunAnywhere+TTS';

export { VAD, SpeechActivity } from './Extensions/RunAnywhere+VAD';

export type { SpeechActivityCallback, VADModelConfig, SpeechSegment } from './Extensions/RunAnywhere+VAD';

export type { TTSVoiceConfig } from './Extensions/RunAnywhere+TTS';

export { VAD } from './Extensions/RunAnywhere+VAD';

export type { VADModelConfig } from './Extensions/RunAnywhere+VAD';

// Extensions (backend-specific implementations + backend-specific config types)

export { STT, STTModelType } from './Extensions/RunAnywhere+STT';

export type { STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles } from './Extensions/RunAnywhere+STT';

export { TTS } from './Extensions/RunAnywhere+TTS';

export type { TTSVoiceConfig } from './Extensions/RunAnywhere+TTS';

export { VAD } from './Extensions/RunAnywhere+VAD';

export type { VADModelConfig } from './Extensions/RunAnywhere+VAD';

// Backward-compatible aliases (deprecate in docs, remove next major)

export type {

STTTranscriptionResult,

STTWord,

STTTranscribeOptions,

STTStreamCallback,

STTStreamingSession,

TTSSynthesisResult,

TTSSynthesizeOptions,

SpeechActivityCallback,

SpeechSegment,

} from '@runanywhere/web';

export { SpeechActivity } from '@runanywhere/web';

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/onnx/src/index.ts` around lines 28 - 34, The public barrel removed shared STT/TTS/VAD contract exports causing breaking changes; restore backward compatibility by re-exporting the original shared contract symbols from the ONNX package index: export the shared STT, TTS and VAD contract types/names under their previous names (e.g. STT, STTModelType, STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles, TTS, TTSVoiceConfig, VAD, VADModelConfig) as aliases that point to the correct backend-specific or common contract definitions so consumers importing from `@runanywhere/web-onnx` keep working. Ensure these re-exports are added to the existing ./index.ts barrel alongside the backend-specific exports.

sdk/runanywhere-web/packages/onnx/src/ONNX.ts

…sources - Bump LLAMACPP_VERSION from b8011 to b8179 for Vulkan/WebGPU precision fixes and FlashAttention improvements - Adapt LoRA adapter API calls in llamacpp_backend.cpp to the new llama_set_adapters_lora() batch API (replaces removed per-adapter functions) - Add nemotron-v2-vl.cpp and paddleocr.cpp to CMakeLists.txt for new clip_graph_* model types introduced in b8179 - Document Qwen2-VL CPU fallback performance (~1 tok/s) caused by M-RoPE NaN logits on WebGPU in rac_vlm_llamacpp.cpp, VLMWorkerBridge.ts, and model-manager.ts

shubhammalhotra28 · 2026-02-28T01:01:18Z

@coderabbitai please do a detailed review, thanks!

coderabbitai · 2026-02-28T01:01:45Z

@shubhammalhotra28, I'll perform a detailed review of all the changes in this PR right away. This includes:

Import reorganization and module structure improvements
New audio/video infrastructure classes (AudioCapture, AudioPlayback, VideoCapture)
Backend-agnostic type definitions for STT, TTS, LLM, and VLM
ModelDownloader streaming implementation
Provider types refactoring
Public API surface changes across packages

Starting the comprehensive review now.

🧠 Learnings used

Learnt from: CR
Repo: RunanywhereAI/runanywhere-sdks PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-27T08:41:48.423Z
Learning: Applies to **/runanywhere-web/packages/core/**/*.{ts,tsx} : Web SDK (TypeScript): Run `npm run typecheck -w packages/core` for type checking TypeScript files in the Web SDK

Learnt from: CR
Repo: RunanywhereAI/runanywhere-sdks PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-27T08:41:48.423Z
Learning: Applies to **/runanywhere-web/packages/core/**/*.{ts,tsx} : Web SDK (TypeScript): Use `npm run build -w packages/core` (from `sdk/runanywhere-web/`) to build; note that `llamacpp` package has a pre-existing duplicate index signature TypeScript error

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (4)

examples/web/RunAnywhereAI/src/views/vision.ts (1)
15-16: Long relative import paths are fragile.

The deep relative paths (../../../../../sdk/...) are brittle and will break if the directory structure changes. Consider using npm workspace package names (e.g., @runanywhere/core, @runanywhere/llamacpp) or TypeScript path aliases for cleaner, more maintainable imports.
♻️ Suggested improvement using package names
-import { VideoCapture, type CapturedFrame } from '../../../../../sdk/runanywhere-web/packages/core/src/index';
-import { VLMWorkerBridge } from '../../../../../sdk/runanywhere-web/packages/llamacpp/src/index';
+import { VideoCapture, type CapturedFrame } from '@runanywhere/core';
+import { VLMWorkerBridge } from '@runanywhere/llamacpp';
This requires proper npm workspace configuration, which the SDK likely already supports.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/web/RunAnywhereAI/src/views/vision.ts` around lines 15 - 16, The
imports using deep relative paths for VideoCapture, CapturedFrame and
VLMWorkerBridge are fragile—replace the long ../../../../../sdk/... imports with
the SDK package names or TypeScript path aliases (e.g., import from
`@runanywhere/core` and `@runanywhere/llamacpp` or configured tsconfig paths) so the
module resolution is stable; update the import statements in the file to
reference the package names (VideoCapture, CapturedFrame from the core package
and VLMWorkerBridge from the llamacpp package) and ensure package/tsconfig paths
are configured in the project.
sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.ts (1)
243-263: Qwen2-VL detection may be overly broad.

The regex /qwen/i could match unintended model names (e.g., a hypothetical "qwerty-vlm" model). Consider a more specific pattern like /qwen.*vl/i or /qwen2.*vl/i to reduce false positives.

Also, the URL replacement on line 257 assumes a specific naming convention (-webgpu.js → .js). If the WASM file naming changes, this could silently fail to switch backends.
🔧 Suggested improvement for more precise detection
-    const isQwenVL = /qwen/i.test(params.modelId) || /qwen/i.test(params.modelName);
+    // Match Qwen VL models specifically (Qwen2-VL, Qwen-VL, etc.)
+    const isQwenVL = /qwen.*vl/i.test(params.modelId) || /qwen.*vl/i.test(params.modelName);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.ts`
around lines 243 - 263, The current Qwen2-VL detection (isQwenVL using /qwen/i
on params.modelId/name) is too broad and the wasm URL rewrite
(bridge.wasmUrl.replace(/-webgpu\.js$/, '.js')) is brittle; update isQwenVL to a
narrower pattern such as /qwen2?.*vl/i or /qwen.*-?vl/i to avoid matching
unrelated names, and make the URL swap more robust in LlamaCppBridge.shared
usage by handling variants (e.g., '-webgpu.js', '.webgpu.js', '-webgpu.wasm.js')
and falling back to a safer transform (check endsWith and replace the suffix or
try replacing 'webgpu' token) before calling this.terminate() and
this.init(cpuUrl), ensuring you still skip restart if cpuUrl equals currentUrl
or if bridge.wasmUrl is undefined.
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (2)
618-627: Extract model-family override mapping into one helper.

The same switch is duplicated in rac_vlm_llamacpp_process and rac_vlm_llamacpp_process_stream. A shared helper will prevent future drift.

Also applies to: 879-888
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around
lines 618 - 627, Extract the duplicated switch that maps options->model_family
to VLMModelType into a single helper (e.g., a static function named
resolve_effective_model_type or rac_vlm_llamacpp_resolve_model_type) that
accepts the backend (or backend->model_type) and options and returns the
resolved VLMModelType; replace the switch blocks in rac_vlm_llamacpp_process and
rac_vlm_llamacpp_process_stream with calls to this helper so both use the same
logic and avoid duplication.
762-795: Gate logits diagnostics behind debug/opt-in.

This full-vocab scan + top-5 logging runs on every request’s first token and logs at INFO. It is noisy and adds avoidable overhead on hot paths.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around
lines 762 - 795, The diagnostic full-vocab logits scan inside the i == 0 block
(uses llama_get_logits, n_vocab, top5_val/top5_idx and RAC_LOG_INFO with
LOG_CAT) is running on every request and logging at INFO; gate this work behind
an opt-in debug flag and lower the log level (e.g. DEBUG) to avoid hot-path
overhead. Modify the i == 0 block to first check a runtime-config or environment
flag (e.g. enable_logits_diag or backend->opts.logit_diag) and only perform the
NaN/Inf scan and top-5 computation when that flag is true, and change
RAC_LOG_INFO to a debug-level logger for the diagnostic messages; keep the
existing logic (max_logit, nan/inf counts, top5 arrays) unchanged except for the
surrounding conditional and log level.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt`:
- Around line 156-157: The CMakeLists refers to new source files
nemotron-v2-vl.cpp and paddleocr.cpp that don't exist in the currently fetched
llama.cpp (b8011); update the FetchContent/variable LLAMACPP_VERSION to at least
b8110 so those files are present, or remove the two file entries from the
sources list if you must stay on b8011; specifically, change the
LLAMACPP_VERSION used by the FetchContent declaration (or wherever
LLAMACPP_VERSION is defined) to >= b8110, or delete the references to
${llamacpp_SOURCE_DIR}/tools/mtmd/models/nemotron-v2-vl.cpp and
${llamacpp_SOURCE_DIR}/tools/mtmd/models/paddleocr.cpp from the CMakeLists.txt
sources block.

In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`:
- Around line 374-375: The calls to llama_set_adapters_lora(context_, ...)
currently ignore the int32_t return value and unconditionally clear the local
lora_adapters_ tracking causing state divergence; change each call site that
uses llama_set_adapters_lora (e.g., the places around where lora_adapters_ is
cleared and the call with context_) to capture and check the return status, only
clear or update lora_adapters_ when the function returns success, and on failure
log or propagate the error (using the same logging/return pattern used where
llama_set_adapters_lora is already checked) so the local state stays consistent
with the llama.cpp context.
- Around line 930-942: The code currently erases from lora_adapters_ before
calling llama_set_adapters_lora(context_, ...) which can desync local state if
the API fails; change the flow in the erase/unload/clear paths (referenced by
lora_adapters_, llama_set_adapters_lora and context_) to first construct the
list of remaining adapters and scales (without mutating lora_adapters_), call
llama_set_adapters_lora and capture/validate its return value, and only upon
success mutate lora_adapters_ (erase or clear) to keep state consistent; apply
the same pattern (check and handle the return value of llama_set_adapters_lora)
at the other call sites mentioned (around lines with unload/clear operations) or
explicitly document why failures can be ignored.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 193-212: The template fallback currently drops an explicit system
prompt when llama_chat_apply_template(tmpl, messages, 2, ...) fails with
effective_system set; change the fallback so that when the system-inclusive
template application fails you still preserve the system_prompt by either (a)
attempting the fallback formatting with only the user message and then
prepending or injecting the original effective_system/system_prompt into the
returned formatted string (with a clear separator) or (b) re-running
llama_chat_apply_template with a modified messages array that ensures the system
content is included in the output; update the code paths around
effective_system, messages, tmpl, user_content and the alternate/template-only
branch so the final returned string always contains the explicit system prompt,
and apply the identical fix to the later similar block (lines 214-229) that
mirrors this behavior.
- Around line 662-664: The INFO-level RAC_LOG_INFO calls that currently print
user/system prompt content (e.g., the call using LOG_CAT and formatting
full_prompt.c_str(), which appears in the v3-process path and the similar
streaming-path calls) must stop emitting prompt text; instead log only
non-sensitive metadata (prompt length, has_image flag, effective_model_type) or
move the full prompt text to a DEBUG-level log guarded by a verbosity check.
Update the RAC_LOG_INFO invocations that reference full_prompt to remove the
"%.200s" / full_prompt.c_str() argument and log only (int)full_prompt.length(),
has_image ? 1 : 0, and (int)effective_model_type, or change the call to
RAC_LOG_DEBUG and keep the full_prompt there behind a conditional so production
INFO logs never contain prompt contents; apply the same change to all analogous
calls (including the streaming path variants).

---

Nitpick comments:
In `@examples/web/RunAnywhereAI/src/views/vision.ts`:
- Around line 15-16: The imports using deep relative paths for VideoCapture,
CapturedFrame and VLMWorkerBridge are fragile—replace the long
../../../../../sdk/... imports with the SDK package names or TypeScript path
aliases (e.g., import from `@runanywhere/core` and `@runanywhere/llamacpp` or
configured tsconfig paths) so the module resolution is stable; update the import
statements in the file to reference the package names (VideoCapture,
CapturedFrame from the core package and VLMWorkerBridge from the llamacpp
package) and ensure package/tsconfig paths are configured in the project.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 618-627: Extract the duplicated switch that maps
options->model_family to VLMModelType into a single helper (e.g., a static
function named resolve_effective_model_type or
rac_vlm_llamacpp_resolve_model_type) that accepts the backend (or
backend->model_type) and options and returns the resolved VLMModelType; replace
the switch blocks in rac_vlm_llamacpp_process and
rac_vlm_llamacpp_process_stream with calls to this helper so both use the same
logic and avoid duplication.
- Around line 762-795: The diagnostic full-vocab logits scan inside the i == 0
block (uses llama_get_logits, n_vocab, top5_val/top5_idx and RAC_LOG_INFO with
LOG_CAT) is running on every request and logging at INFO; gate this work behind
an opt-in debug flag and lower the log level (e.g. DEBUG) to avoid hot-path
overhead. Modify the i == 0 block to first check a runtime-config or environment
flag (e.g. enable_logits_diag or backend->opts.logit_diag) and only perform the
NaN/Inf scan and top-5 computation when that flag is true, and change
RAC_LOG_INFO to a debug-level logger for the diagnostic messages; keep the
existing logic (max_logit, nan/inf counts, top5 arrays) unchanged except for the
surrounding conditional and log level.

In `@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.ts`:
- Around line 243-263: The current Qwen2-VL detection (isQwenVL using /qwen/i on
params.modelId/name) is too broad and the wasm URL rewrite
(bridge.wasmUrl.replace(/-webgpu\.js$/, '.js')) is brittle; update isQwenVL to a
narrower pattern such as /qwen2?.*vl/i or /qwen.*-?vl/i to avoid matching
unrelated names, and make the URL swap more robust in LlamaCppBridge.shared
usage by handling variants (e.g., '-webgpu.js', '.webgpu.js', '-webgpu.wasm.js')
and falling back to a safer transform (check endsWith and replace the suffix or
try replacing 'webgpu' token) before calling this.terminate() and
this.init(cpuUrl), ensuring you still skip restart if cpuUrl equals currentUrl
or if bridge.wasmUrl is undefined.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a2ea17 and 6730759.

📒 Files selected for processing (8)

examples/web/RunAnywhereAI/src/services/model-manager.ts
examples/web/RunAnywhereAI/src/views/vision.ts
sdk/runanywhere-commons/VERSIONS
sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp
sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.ts
sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts

coderabbitai · 2026-02-28T01:07:53Z

sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt

+        ${llamacpp_SOURCE_DIR}/tools/mtmd/models/nemotron-v2-vl.cpp
+        ${llamacpp_SOURCE_DIR}/tools/mtmd/models/paddleocr.cpp


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check the LLAMACPP_VERSION in the CMakeLists.txt file cat -n sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt | head -30

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 1260

🏁 Script executed:

# Also check around lines 156-157 to see the context sed -n '150,165p' sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 738

🌐 Web query:

llama.cpp nemotron-v2-vl.cpp paddleocr.cpp tools/mtmd/models repository

💡 Result:

Those files are in the main llama.cpp repository: ggml-org/llama.cpp. (github.com)

tools/mtmd/models/nemotron-v2-vl.cpp (Nemotron Nano 12B v2 VL / mtmd support was added in llama.cpp). (newreleases.io)

tools/mtmd/models/paddleocr.cpp (PaddleOCR-VL support was added in llama.cpp). (newreleases.io)

Citations:

1: https://github.com/ggml-org/llama.cpp?utm_source=openai

2: https://newreleases.io/project/github/ggml-org/llama.cpp/release/b8054?utm_source=openai

3: https://newreleases.io/project/github/ggml-org/llama.cpp/release/b8110?utm_source=openai

Fix LLAMACPP_VERSION or remove the new model files.

The new source files nemotron-v2-vl.cpp and paddleocr.cpp reference files that do not exist in llama.cpp b8011:

nemotron-v2-vl.cpp was added in b8054

paddleocr.cpp was added in b8110

The build will fail at FetchContent time because these files don't exist at the current version. Update LLAMACPP_VERSION to at least b8110 to include both files, or remove these references if b8011 is required.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt` around lines 156 - 157, The CMakeLists refers to new source files nemotron-v2-vl.cpp and paddleocr.cpp that don't exist in the currently fetched llama.cpp (b8011); update the FetchContent/variable LLAMACPP_VERSION to at least b8110 so those files are present, or remove the two file entries from the sources list if you must stay on b8011; specifically, change the LLAMACPP_VERSION used by the FetchContent declaration (or wherever LLAMACPP_VERSION is defined) to >= b8110, or delete the references to ${llamacpp_SOURCE_DIR}/tools/mtmd/models/nemotron-v2-vl.cpp and ${llamacpp_SOURCE_DIR}/tools/mtmd/models/paddleocr.cpp from the CMakeLists.txt sources block.

sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp

sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp

shubhammalhotra28 · 2026-02-28T01:38:07Z

@coderabbitai please do a detailed review. Thanks!
Just addressed few comments.

coderabbitai · 2026-02-28T01:38:30Z

@shubhammalhotra28, thank you for addressing the comments! I'll perform a comprehensive detailed review of all the changes in this PR, including:

The updated imports and new audio/video infrastructure classes
Backend-agnostic types for STT, TTS, LLM, and VLM
ModelDownloader streaming implementation
Provider types refactoring and public API changes
The recent updates you've made
Overall code quality and modularity

Starting the detailed review now.

🧠 Learnings used

Learnt from: CR
Repo: RunanywhereAI/runanywhere-sdks PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-27T08:41:48.423Z
Learning: Applies to **/runanywhere-web/packages/core/**/*.{ts,tsx} : Web SDK (TypeScript): Use `npm run build -w packages/core` (from `sdk/runanywhere-web/`) to build; note that `llamacpp` package has a pre-existing duplicate index signature TypeScript error

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts (1)

89-90: ⚠️ Potential issue | 🟠 Major

Fail fast when new WASM offset exports are missing.

If vlm_options_system_prompt or vlm_options_model_family exports are absent, workerOffsetOf yields 0, and Line 607 can write into the wrong struct field silently. This is a hard-to-debug ABI mismatch path.

💡 Proposed change

-function workerOffsetOf(m: any, name: string): number {
+function workerOffsetOf(m: any, name: string, required = true): number {
   const fn = m[`_rac_wasm_offsetof_${name}`];
-  return typeof fn === 'function' ? fn() : 0;
+  if (typeof fn !== 'function') {
+    if (required) {
+      throw new Error(`Missing WASM offsetof export: _rac_wasm_offsetof_${name}`);
+    }
+    return 0;
+  }
+  return fn();
 }

     vlmOptions: {
       maxTokens: workerOffsetOf(m, 'vlm_options_max_tokens'),
       temperature: workerOffsetOf(m, 'vlm_options_temperature'),
       topP: workerOffsetOf(m, 'vlm_options_top_p'),
-      streamingEnabled: workerOffsetOf(m, 'vlm_options_streaming_enabled'),
-      systemPrompt: workerOffsetOf(m, 'vlm_options_system_prompt'),
-      modelFamily: workerOffsetOf(m, 'vlm_options_model_family'),
+      streamingEnabled: workerOffsetOf(m, 'vlm_options_streaming_enabled'),
+      systemPrompt: workerOffsetOf(m, 'vlm_options_system_prompt', true),
+      modelFamily: workerOffsetOf(m, 'vlm_options_model_family', true),
     },

Also applies to: 607-607

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts`
around lines 89 - 90, The code currently uses
workerOffsetOf('vlm_options_system_prompt') and
workerOffsetOf('vlm_options_model_family') and will silently continue if those
exports are missing (workerOffsetOf returns 0); change the VLMWorkerRuntime
initialization to immediately validate these offsets: call workerOffsetOf for
'vlm_options_system_prompt' and 'vlm_options_model_family', and if either
returns 0 throw an Error (or assert) with a clear message mentioning the missing
export name and that the ABI is incompatible, so the code fails fast before
writing into the struct (references: workerOffsetOf, vlm_options_system_prompt,
vlm_options_model_family).

🧹 Nitpick comments (1)

sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (1)

631-640: Extract duplicated model-family mapping into a shared helper.

The same options->model_family mapping is duplicated in both process paths. Centralizing it reduces drift risk when adding new families.

Also applies to: 891-900

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around
lines 631 - 640, Extract the duplicated switch that maps options->model_family
to VLMModelType into a single helper (e.g., mapModelFamilyToVLMModelType or
resolveModelTypeFromFamily) that accepts the RAC_VLM model_family enum and
returns a VLMModelType; then replace the in-place switches (including the one
currently in rac_vlm_llamacpp.cpp around the VLMModelType effective_model_type
calculation and the other occurrence around lines ~891-900) with calls to that
helper and fall back to backend->model_type when options is null or
model_family==RAC_VLM_MODEL_FAMILY_AUTO. Ensure the helper handles all
RAC_VLM_MODEL_FAMILY_* cases (QWEN2_VL, SMOLVLM, LLAVA, default->Generic) and is
declared/defined in a shared scope so both call sites can use it.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 774-807: Gate the expensive first-token full-vocab diagnostics
behind a runtime debug/diagnostic switch so they only run when explicitly
enabled: wrap the entire block that begins with the "if (i == 0) { float* logits
= llama_get_logits(...)" and contains the two RAC_LOG_INFO calls and the top-5
scan in a conditional that checks a debug flag (e.g.
backend->enable_first_token_diag or a global/Context-level is_diag_enabled()
function) and skip the logits access and loops when the flag is false; update
any places that construct the backend/context to expose/configure this boolean
and ensure the check occurs before calling llama_get_logits or iterating n_vocab
to avoid the latency and log noise in production.

---

Outside diff comments:
In
`@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts`:
- Around line 89-90: The code currently uses
workerOffsetOf('vlm_options_system_prompt') and
workerOffsetOf('vlm_options_model_family') and will silently continue if those
exports are missing (workerOffsetOf returns 0); change the VLMWorkerRuntime
initialization to immediately validate these offsets: call workerOffsetOf for
'vlm_options_system_prompt' and 'vlm_options_model_family', and if either
returns 0 throw an Error (or assert) with a clear message mentioning the missing
export name and that the ABI is incompatible, so the code fails fast before
writing into the struct (references: workerOffsetOf, vlm_options_system_prompt,
vlm_options_model_family).

---

Nitpick comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 631-640: Extract the duplicated switch that maps
options->model_family to VLMModelType into a single helper (e.g.,
mapModelFamilyToVLMModelType or resolveModelTypeFromFamily) that accepts the
RAC_VLM model_family enum and returns a VLMModelType; then replace the in-place
switches (including the one currently in rac_vlm_llamacpp.cpp around the
VLMModelType effective_model_type calculation and the other occurrence around
lines ~891-900) with calls to that helper and fall back to backend->model_type
when options is null or model_family==RAC_VLM_MODEL_FAMILY_AUTO. Ensure the
helper handles all RAC_VLM_MODEL_FAMILY_* cases (QWEN2_VL, SMOLVLM, LLAVA,
default->Generic) and is declared/defined in a shared scope so both call sites
can use it.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6730759 and 03e2ba6.

📒 Files selected for processing (5)

sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp
sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts
sdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.ts
sdk/runanywhere-web/packages/onnx/src/ONNX.ts

🚧 Files skipped from review as they are similar to previous changes (2)

sdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.ts
sdk/runanywhere-web/packages/onnx/src/ONNX.ts

coderabbitai · 2026-02-28T01:44:04Z

sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp

+        // Diagnostic: on first token, inspect logits for NaN/corruption
+        if (i == 0) {
+            float* logits = llama_get_logits(backend->ctx);
+            int n_vocab = llama_vocab_n_tokens(vocab);
+            if (logits && n_vocab > 0) {
+                float max_logit = logits[0];
+                int max_idx = 0;
+                int nan_count = 0;
+                int inf_count = 0;
+                for (int v = 0; v < n_vocab; v++) {
+                    if (logits[v] != logits[v]) nan_count++;       // NaN check
+                    if (logits[v] > 1e30f || logits[v] < -1e30f) inf_count++;
+                    if (logits[v] > max_logit) { max_logit = logits[v]; max_idx = v; }
+                }
+                RAC_LOG_INFO(LOG_CAT, "[v3-diag] Logits: n_vocab=%d, max_logit=%.4f at token %d, NaN=%d, Inf=%d",
+                             n_vocab, max_logit, max_idx, nan_count, inf_count);
+                // Log top 5 logits
+                float top5_val[5] = {-1e30f, -1e30f, -1e30f, -1e30f, -1e30f};
+                int   top5_idx[5] = {0, 0, 0, 0, 0};
+                for (int v = 0; v < n_vocab; v++) {
+                    if (logits[v] != logits[v]) continue; // skip NaN
+                    for (int k = 0; k < 5; k++) {
+                        if (logits[v] > top5_val[k]) {
+                            for (int j = 4; j > k; j--) { top5_val[j] = top5_val[j-1]; top5_idx[j] = top5_idx[j-1]; }
+                            top5_val[k] = logits[v]; top5_idx[k] = v;
+                            break;
+                        }
+                    }
+                }
+                RAC_LOG_INFO(LOG_CAT, "[v3-diag] Top5: [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f",
+                             top5_idx[0], top5_val[0], top5_idx[1], top5_val[1],
+                             top5_idx[2], top5_val[2], top5_idx[3], top5_val[3],
+                             top5_idx[4], top5_val[4]);
+            }


⚠️ Potential issue | 🟠 Major

Gate first-token full-vocab diagnostics behind a debug/diagnostic switch.

Line 775 currently triggers a full-vocab scan and two RAC_LOG_INFO diagnostics for every request. This adds avoidable latency and production log noise.

💡 Proposed change

- // Diagnostic: on first token, inspect logits for NaN/corruption - if (i == 0) { + // Diagnostic: on first token, inspect logits for NaN/corruption + // Keep this behind an explicit diagnostics build flag. +#ifdef RAC_VLM_ENABLE_DIAGNOSTICS + if (i == 0) { float* logits = llama_get_logits(backend->ctx); int n_vocab = llama_vocab_n_tokens(vocab); if (logits && n_vocab > 0) { @@ - RAC_LOG_INFO(LOG_CAT, "[v3-diag] Logits: n_vocab=%d, max_logit=%.4f at token %d, NaN=%d, Inf=%d", + RAC_LOG_DEBUG(LOG_CAT, "[v3-diag] Logits: n_vocab=%d, max_logit=%.4f at token %d, NaN=%d, Inf=%d", n_vocab, max_logit, max_idx, nan_count, inf_count); @@ - RAC_LOG_INFO(LOG_CAT, "[v3-diag] Top5: [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f", + RAC_LOG_DEBUG(LOG_CAT, "[v3-diag] Top5: [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f", top5_idx[0], top5_val[0], top5_idx[1], top5_val[1], top5_idx[2], top5_val[2], top5_idx[3], top5_val[3], top5_idx[4], top5_val[4]); } } +#endif

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around lines 774 - 807, Gate the expensive first-token full-vocab diagnostics behind a runtime debug/diagnostic switch so they only run when explicitly enabled: wrap the entire block that begins with the "if (i == 0) { float* logits = llama_get_logits(...)" and contains the two RAC_LOG_INFO calls and the top-5 scan in a conditional that checks a debug flag (e.g. backend->enable_first_token_diag or a global/Context-level is_diag_enabled() function) and skip the logits access and loops when the flag is false; update any places that construct the backend/context to expose/configure this boolean and ensure the check occurs before calling llama_get_logits or iterating n_vocab to avoid the latency and log noise in production.

shubhammalhotra28 · 2026-02-28T02:23:57Z

LGTM

shubhammalhotra28 · 2026-02-28T02:41:16Z

https://www.npmjs.com/package/@runanywhere/web
https://www.npmjs.com/package/@runanywhere/web-llamacpp
https://www.npmjs.com/package/@runanywhere/web-onnx

shubhammalhotra28 · 2026-02-28T02:51:18Z

Released 0.10-beta version, good to close/merge this PR

sanchitmonga22 · 2026-02-28T05:55:41Z

@coderabbitai please review

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 3a8d798 in 17 seconds. Click for details.

Reviewed 2109 lines of code in 47 files
Skipped 0 files when reviewing.
Skipped posting 0 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_SKXRc203oErbRTQd

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

sanchitmonga22 · 2026-02-28T05:55:47Z

@greptile please review

coderabbitai · 2026-02-28T05:56:05Z

@sanchitmonga22, I'll perform a detailed review of all the changes in this PR. I'll examine:

Updated imports and new audio/video infrastructure classes (AudioCapture, AudioPlayback, VideoCapture, AudioFileLoader)
Backend-agnostic type definitions for STT, TTS, LLM, VLM, and VAD
ModelDownloader streaming implementation
Provider types refactoring and public API changes
Overall code quality, modularity, and potential issues

Starting the comprehensive review now.

🧠 Learnings used

Learnt from: CR
Repo: RunanywhereAI/runanywhere-sdks PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-27T08:41:48.423Z
Learning: Applies to **/runanywhere-web/packages/core/**/*.{ts,tsx} : Web SDK (TypeScript): Use `npm run build -w packages/core` (from `sdk/runanywhere-web/`) to build; note that `llamacpp` package has a pre-existing duplicate index signature TypeScript error

Learnt from: CR
Repo: RunanywhereAI/runanywhere-sdks PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-27T08:41:48.423Z
Learning: Applies to **/runanywhere-web/packages/core/**/*.{ts,tsx} : Web SDK (TypeScript): Run `npm run typecheck -w packages/core` for type checking TypeScript files in the Web SDK

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (1)

774-807: ⚠️ Potential issue | 🟠 Major

Gate first-token full-vocab diagnostics behind an explicit diagnostics switch.

Line 775 still runs an O(vocab) scan (twice) on every request, which adds avoidable latency in production.

💡 Suggested patch

-        // Diagnostic: on first token, inspect logits for NaN/corruption
-        if (i == 0) {
+        // Diagnostic: on first token, inspect logits for NaN/corruption
+#ifdef RAC_VLM_ENABLE_DIAGNOSTICS
+        if (i == 0) {
             float* logits = llama_get_logits(backend->ctx);
             int n_vocab = llama_vocab_n_tokens(vocab);
             if (logits && n_vocab > 0) {
@@
                 RAC_LOG_DEBUG(LOG_CAT, "[v3-diag] Top5: [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f",
                               top5_idx[0], top5_val[0], top5_idx[1], top5_val[1],
                               top5_idx[2], top5_val[2], top5_idx[3], top5_val[3],
                               top5_idx[4], top5_val[4]);
             }
         }
+#endif

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around
lines 774 - 807, The diagnostics block that scans the full vocabulary on first
token (the code starting at the comment "// Diagnostic: on first token, inspect
logits for NaN/corruption" and the if (i == 0) {...} block that calls
llama_get_logits and loops over n_vocab) must be gated by an explicit runtime
diagnostics switch to avoid O(vocab) overhead in production; add a boolean flag
(e.g. backend->enable_v3_diagnostics or a similarly named config/option) and
change the condition to if (i == 0 && backend->enable_v3_diagnostics) { ... }
(or check a global/config getter) so the full-vocab scans and top-5 computation
only run when that flag is enabled. Ensure the new flag is default-off and
documented where backend configuration is initialized.

🧹 Nitpick comments (1)

sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (1)
631-640: Extract model-family override mapping into a shared helper.

This switch is duplicated in sync and stream paths; centralizing it reduces drift risk when adding new RAC_VLM_MODEL_FAMILY_* values.

Also applies to: 891-900
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around
lines 631 - 640, The duplicated switch that maps options->model_family
(RAC_VLM_MODEL_FAMILY_*) to VLMModelType (VLMModelType::Qwen2VL, ::SmolVLM,
::LLaVA, ::Generic) should be extracted into a shared helper function (e.g., a
static helper like ResolveVLMModelTypeFromFamily or
MapModelFamilyToVLMModelType) and used from both the current branch and the
other occurrence in the stream/sync path; update callers to call that helper
(passing options->model_family and backend->model_type as a default) so the
mapping logic lives in one place and both code paths use it.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 180-184: The code treats an empty system_prompt ("") as provided;
update the logic that sets effective_system so empty strings are considered "not
provided" — replace checks like `if (!effective_system && model_type ==
VLMModelType::Qwen2VL)` with a predicate that treats NULL or empty (e.g.,
`system_prompt == nullptr || system_prompt[0] == '\0'`) and apply the same
change wherever `effective_system` is computed (references: effective_system,
system_prompt, model_type, VLMModelType::Qwen2VL) so the default "You are a
helpful assistant." is injected for Qwen2-VL when system_prompt is missing or
empty.

---

Duplicate comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 774-807: The diagnostics block that scans the full vocabulary on
first token (the code starting at the comment "// Diagnostic: on first token,
inspect logits for NaN/corruption" and the if (i == 0) {...} block that calls
llama_get_logits and loops over n_vocab) must be gated by an explicit runtime
diagnostics switch to avoid O(vocab) overhead in production; add a boolean flag
(e.g. backend->enable_v3_diagnostics or a similarly named config/option) and
change the condition to if (i == 0 && backend->enable_v3_diagnostics) { ... }
(or check a global/config getter) so the full-vocab scans and top-5 computation
only run when that flag is enabled. Ensure the new flag is default-off and
documented where backend configuration is initialized.

---

Nitpick comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 631-640: The duplicated switch that maps options->model_family
(RAC_VLM_MODEL_FAMILY_*) to VLMModelType (VLMModelType::Qwen2VL, ::SmolVLM,
::LLaVA, ::Generic) should be extracted into a shared helper function (e.g., a
static helper like ResolveVLMModelTypeFromFamily or
MapModelFamilyToVLMModelType) and used from both the current branch and the
other occurrence in the stream/sync path; update callers to call that helper
(passing options->model_family and backend->model_type as a default) so the
mapping logic lives in one place and both code paths use it.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03e2ba6 and 3a8d798.

📒 Files selected for processing (4)

sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp
sdk/runanywhere-web/packages/core/package.json
sdk/runanywhere-web/packages/llamacpp/package.json
sdk/runanywhere-web/packages/onnx/package.json

✅ Files skipped from review due to trivial changes (1)

sdk/runanywhere-web/packages/llamacpp/package.json

sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp

coderabbitai

♻️ Duplicate comments (2)

sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (2)

180-184: ⚠️ Potential issue | 🟡 Minor

Treat empty system_prompt as not provided.

effective_system is initialized directly from system_prompt, so "" is treated as provided and can suppress Qwen2-VL default injection while producing an empty system block.

Suggested patch

-    const char* effective_system = system_prompt;
+    const bool has_explicit_system = (system_prompt && system_prompt[0] != '\0');
+    const char* effective_system = has_explicit_system ? system_prompt : nullptr;
     if (!effective_system && model_type == VLMModelType::Qwen2VL) {
         effective_system = "You are a helpful assistant.";
     }
@@
-            bool has_explicit_system = (system_prompt && system_prompt[0] != '\0');
             if (has_explicit_system) {
                 RAC_LOG_WARNING(LOG_CAT, "Template with system failed (size=%d); falling back to manual to preserve explicit system prompt", size);
             } else {

Also applies to: 211-213, 249-253

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around
lines 180 - 184, The code currently treats an empty C string as a provided
system_prompt, so change the initialization and checks for effective_system to
treat NULL and empty string equivalently: when assigning effective_system from
system_prompt (used in the block around effective_system, system_prompt,
model_type and VLMModelType::Qwen2VL) use a test like system_prompt == nullptr
|| system_prompt[0] == '\0' (or equivalent) and only inject the Qwen2-VL default
("You are a helpful assistant.") when system_prompt is null or empty; apply the
same fix to the other occurrences referenced (the checks around the other
effective_system assignments at the later locations you flagged).

774-808: ⚠️ Potential issue | 🟠 Major

Gate first-token full-vocab diagnostics behind an explicit diagnostics switch.

This block still runs an O(vocab) scan on every request and adds avoidable latency/log noise in production paths.

Suggested patch

-        // Diagnostic: on first token, inspect logits for NaN/corruption
-        if (i == 0) {
+        // Diagnostic: on first token, inspect logits for NaN/corruption
+#ifdef RAC_VLM_ENABLE_DIAGNOSTICS
+        if (i == 0) {
             float* logits = llama_get_logits(backend->ctx);
             int n_vocab = llama_vocab_n_tokens(vocab);
             if (logits && n_vocab > 0) {
@@
             }
         }
+#endif

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around
lines 774 - 808, The O(vocab) first-token diagnostics (the entire if (i == 0)
block that calls llama_get_logits, iterates n_vocab, computes NaN/Inf counts and
top5, and emits RAC_LOG_DEBUG) must be gated behind an explicit diagnostics flag
so it doesn't run in production; modify the code to check a boolean (e.g.,
backend->diagnostics_enabled or a global enable_v3_diag) before executing that
block (and keep the existing logits/null and n_vocab checks inside the gated
block), default the flag to false, and only log via RAC_LOG_DEBUG when the flag
is true.

🧹 Nitpick comments (1)

sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (1)

631-640: Extract shared model-family/system-prompt resolution into one helper.

The sync and streaming paths duplicate the same mapping logic; centralizing it will reduce drift risk.

Also applies to: 891-900, 642-643, 902-903

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around
lines 631 - 640, Extract the model-family -> VLMModelType resolution into a
single helper (e.g., ResolveEffectiveModelType or getEffectiveModelType) that
accepts the backend (to read backend->model_type) and the options pointer (to
read options->model_family) and returns a VLMModelType; move the switch mapping
(cases for RAC_VLM_MODEL_FAMILY_QWEN2_VL, SMOLVLM, LLAVA, default Generic) into
that helper and default to backend->model_type when options is null or
model_family is AUTO. Replace the duplicated blocks in rac_vlm_llamacpp.cpp (the
sync and streaming codepaths that compute effective_model_type) to call this new
helper wherever effective_model_type is computed (including the other
occurrences called out near the existing switch usage).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 180-184: The code currently treats an empty C string as a provided
system_prompt, so change the initialization and checks for effective_system to
treat NULL and empty string equivalently: when assigning effective_system from
system_prompt (used in the block around effective_system, system_prompt,
model_type and VLMModelType::Qwen2VL) use a test like system_prompt == nullptr
|| system_prompt[0] == '\0' (or equivalent) and only inject the Qwen2-VL default
("You are a helpful assistant.") when system_prompt is null or empty; apply the
same fix to the other occurrences referenced (the checks around the other
effective_system assignments at the later locations you flagged).
- Around line 774-808: The O(vocab) first-token diagnostics (the entire if (i ==
0) block that calls llama_get_logits, iterates n_vocab, computes NaN/Inf counts
and top5, and emits RAC_LOG_DEBUG) must be gated behind an explicit diagnostics
flag so it doesn't run in production; modify the code to check a boolean (e.g.,
backend->diagnostics_enabled or a global enable_v3_diag) before executing that
block (and keep the existing logits/null and n_vocab checks inside the gated
block), default the flag to false, and only log via RAC_LOG_DEBUG when the flag
is true.

---

Nitpick comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 631-640: Extract the model-family -> VLMModelType resolution into
a single helper (e.g., ResolveEffectiveModelType or getEffectiveModelType) that
accepts the backend (to read backend->model_type) and the options pointer (to
read options->model_family) and returns a VLMModelType; move the switch mapping
(cases for RAC_VLM_MODEL_FAMILY_QWEN2_VL, SMOLVLM, LLAVA, default Generic) into
that helper and default to backend->model_type when options is null or
model_family is AUTO. Replace the duplicated blocks in rac_vlm_llamacpp.cpp (the
sync and streaming codepaths that compute effective_model_type) to call this new
helper wherever effective_model_type is computed (including the other
occurrences called out near the existing switch usage).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03e2ba6 and 3a8d798.

📒 Files selected for processing (4)

sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp
sdk/runanywhere-web/packages/core/package.json
sdk/runanywhere-web/packages/llamacpp/package.json
sdk/runanywhere-web/packages/onnx/package.json

✅ Files skipped from review due to trivial changes (2)

sdk/runanywhere-web/packages/llamacpp/package.json
sdk/runanywhere-web/packages/core/package.json

Resolved conflicts in llama.cpp backend: - VERSIONS: take main's b8201 (newer than b8179) - CMakeLists.txt: deduplicate mtmd model source files - llamacpp_backend.cpp: adopt main's simplified LoRA handling Made-with: Cursor

- Updated the logic for determining the effective system prompt to handle empty strings. - Introduced a new function to resolve the effective VLM model type based on options, simplifying the code in `rac_vlm_llamacpp_process`. - Improved download progress tracking in `ModelDownloader` by using cumulative byte counts for better accuracy. - Enhanced type exports in `llamacpp` and `onnx` packages for better compatibility and clarity. - Adjusted regex for identifying Qwen VL models to ensure more accurate matching.

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

shubhammalhotra28 added 2 commits February 27, 2026 16:56

minor fixes

f7cbb78

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

addressed code rabbit comments

03e2ba6

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

minor fixes

e74dbd9

bumped up the version

3a8d798

sanchitmonga22 marked this pull request as ready for review February 28, 2026 05:55

ellipsis-dev bot reviewed Feb 28, 2026

View reviewed changes

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp Show resolved Hide resolved

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

sanchitmonga22 added 2 commits March 4, 2026 23:59

Merge origin/main into smonga/web_updates

0af259c

Resolved conflicts in llama.cpp backend: - VERSIONS: take main's b8201 (newer than b8179) - CMakeLists.txt: deduplicate mtmd model source files - llamacpp_backend.cpp: adopt main's simplified LoRA handling Made-with: Cursor

sanchitmonga22 merged commit 99216a5 into main Mar 5, 2026
13 of 18 checks passed

coderabbitai bot mentioned this pull request Mar 20, 2026

Add Genie NPU backend support (Qualcomm Snapdragon) #462

Open

4 tasks

		export { VLM, VLMModelFamily } from './Extensions/RunAnywhere+VLM';
		export { ToolCalling, ToolCallFormat, toToolValue, fromToolValue, getStringArg, getNumberArg } from './Extensions/RunAnywhere+ToolCalling';

		${llamacpp_SOURCE_DIR}/tools/mtmd/models/nemotron-v2-vl.cpp
		${llamacpp_SOURCE_DIR}/tools/mtmd/models/paddleocr.cpp

Conversation

sanchitmonga22 commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Platform-Specific Testing (check all that apply)

Labels

Checklist

Screenshots

Summary by CodeRabbit

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

shubhammalhotra28 commented Feb 27, 2026

Uh oh!

shubhammalhotra28 commented Feb 27, 2026

Uh oh!

coderabbitai bot commented Feb 27, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shubhammalhotra28 commented Feb 28, 2026

Uh oh!

coderabbitai bot commented Feb 28, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shubhammalhotra28 commented Feb 28, 2026

Uh oh!

coderabbitai bot commented Feb 28, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

shubhammalhotra28 commented Feb 28, 2026

Uh oh!

shubhammalhotra28 commented Feb 28, 2026

Uh oh!

shubhammalhotra28 commented Feb 28, 2026

Uh oh!

sanchitmonga22 commented Feb 28, 2026

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

sanchitmonga22 commented Feb 28, 2026

Uh oh!

sanchitmonga22 commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading