[Web-SDK] [Web-Example] Web SDK fixes#433
Conversation
- Updated imports in `speak.ts`, `transcribe.ts`, `vision.ts`, and `voice.ts` to use the new core package for audio and video functionalities. - Introduced new `AudioCapture`, `AudioPlayback`, and `VideoCapture` classes to handle audio and video processing more efficiently. - Added backend-agnostic types for STT, TTS, LLM, and VLM in the core types module. - Implemented streaming capabilities for model downloads in `ModelDownloader`. - Enhanced the overall structure for better modularity and maintainability.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThis pull request refactors the RunAnywhere Web SDK module boundaries by consolidating audio I/O infrastructure and speech-related types into the core package, reorganizing type definitions across multiple files to support backend-agnostic interfaces, and adding streaming-to-storage download capability. Additionally, it extends VLM processing with new parameters (system prompt, topP, model family), updates provider interfaces with strongly-typed result types, and introduces configuration options for WASM and helper file base URLs in backend packages. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai please do a detailed review. Thanks! |
|
@greptile please do a detailed review. Thanks! |
|
Let me start the comprehensive review now. 🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.ts (1)
275-300:⚠️ Potential issue | 🟡 MinorUse backend-agnostic size accounting for additional files.
At Line 298,
this.storage.getFileSize(fileKey)only checks OPFS. If local storage is active and streaming succeeds there,sizecan benull, sototalSizeis undercounted.Proposed fix
@@ - // Download additional files (e.g., mmproj for VLM) + // Download additional files (e.g., mmproj for VLM) + let additionalTotalSize = 0; if (model.additionalFiles && model.additionalFiles.length > 0) { for (let i = 0; i < model.additionalFiles.length; i++) { @@ - const streamedSize = await this.downloadAndStoreStreaming(file.url, fileKey, fileProgressCb); - if (streamedSize === null) { + const streamedSize = await this.downloadAndStoreStreaming(file.url, fileKey, fileProgressCb); + if (streamedSize === null) { const fileData = await this.downloadFile(file.url, fileProgressCb); await this.storeInOPFS(fileKey, fileData); + additionalTotalSize += fileData.length; + } else { + additionalTotalSize += streamedSize; } } } @@ - let totalSize = primarySize; - if (model.additionalFiles) { - for (const file of model.additionalFiles) { - const fileKey = this.additionalFileKey(modelId, file.filename); - const size = await this.storage.getFileSize(fileKey); - if (size !== null) totalSize += size; - } - } + const totalSize = primarySize + additionalTotalSize;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.ts` around lines 275 - 300, The loop that sums additional file sizes uses this.storage.getFileSize(fileKey) which only checks OPFS and can return null when the file exists in the other backend (e.g. local storage after streaming); implement a backend-agnostic helper (e.g. this.getStoredFileSize(fileKey)) that first calls this.storage.getFileSize(fileKey) and if that returns null falls back to retrieving the stored file via an existing read method (e.g. this.storage.getFile or similar) and computes the size from the returned Blob/ArrayBuffer, then replace the current call in the ModelDownloader additional-files loop (where totalSize is computed using additionalFileKey and storage.getFileSize) to use this new helper so totalSize correctly accounts for files regardless of storage backend.
🧹 Nitpick comments (6)
sdk/runanywhere-web/packages/core/src/types/TTSTypes.ts (1)
8-18: Consider moving the index signature after named properties.The index signature on line 9 works correctly, but placing it after the named properties is more conventional and improves readability by showing the core contract first.
♻️ Suggested reordering
export interface TTSSynthesisResult { - [key: string]: unknown; /** Raw PCM audio data */ audioData: Float32Array; /** Audio sample rate */ sampleRate: number; /** Duration in milliseconds */ durationMs: number; /** Processing time in milliseconds */ processingTimeMs: number; + /** Allow backend-specific extensions */ + [key: string]: unknown; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/core/src/types/TTSTypes.ts` around lines 8 - 18, The TTSSynthesisResult interface currently places the index signature before the named properties; reorder it so the named properties (audioData, sampleRate, durationMs, processingTimeMs) come first and then include the index signature ([key: string]: unknown) after them to improve readability and emphasize the core contract of TTSSynthesisResult.sdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere+VAD.ts (1)
251-261: Consider logging suppressed errors in cleanup.The empty catch block silently swallows all errors during VAD destruction. While cleanup errors are often non-critical, logging them at debug level aids troubleshooting.
💡 Suggested improvement
cleanup(): void { if (this._vadHandle !== 0) { try { SherpaONNXBridge.shared.module._SherpaOnnxDestroyVoiceActivityDetector(this._vadHandle); - } catch { /* ignore */ } + } catch (e) { + logger.debug('VAD cleanup error (non-critical):', e); + } this._vadHandle = 0; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere`+VAD.ts around lines 251 - 261, The cleanup() method currently swallows errors when destroying the VAD handle (_vadHandle)—update the catch block to log the caught error at debug level instead of ignoring it; specifically, catch the error from SherpaONNXBridge.shared.module._SherpaOnnxDestroyVoiceActivityDetector(this._vadHandle) and call your debug logger (e.g., console.debug or your module logger) with a short message like "Failed to destroy VAD handle" and include the error object, then continue to set _vadHandle = 0 and clear _jsActivityCallback, _lastSpeechState and _speechStartMs as before.sdk/runanywhere-web/packages/core/src/types/VLMTypes.ts (1)
16-26: Consider stricter discriminated union forVLMImage.The current interface allows all data fields to be present or absent simultaneously. A discriminated union would provide compile-time enforcement:
💡 Stricter type alternative (optional)
export type VLMImage = | { format: VLMImageFormat.FilePath; filePath: string; width?: number; height?: number } | { format: VLMImageFormat.RGBPixels; pixelData: Uint8Array; width: number; height: number } | { format: VLMImageFormat.Base64; base64Data: string; width?: number; height?: number };This ensures exactly one data field is provided based on the format. However, the current approach is valid if flexibility is preferred.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/core/src/types/VLMTypes.ts` around lines 16 - 26, Replace the loose VLMImage interface with a discriminated union keyed on the existing format property (VLMImageFormat) so each variant requires exactly the appropriate data field: for format VLMImageFormat.FilePath require filePath (width/height optional), for VLMImageFormat.RGBPixels require pixelData and make width and height required, and for VLMImageFormat.Base64 require base64Data (width/height optional); update any usages of VLMImage to accept the new union type and adjust code that assumed optional fields may be absent.sdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.ts (1)
96-101: Harden helper URL construction against missing trailing slash.Line 99 assumes
helperBaseUrlalways ends with/. If it doesn’t, helper fetches fail due to malformed URLs.Proposed minimal fix
- const bridgeBase = SherpaONNXBridge.shared.helperBaseUrl; - const url = bridgeBase - ? `${bridgeBase}${filename}` - : new URL(`../../wasm/sherpa/${filename}`, import.meta.url).href; + const bridgeBase = SherpaONNXBridge.shared.helperBaseUrl; + const normalizedBase = bridgeBase + ? (bridgeBase.endsWith('/') ? bridgeBase : `${bridgeBase}/`) + : null; + const url = normalizedBase + ? `${normalizedBase}${filename}` + : new URL(`../../wasm/sherpa/${filename}`, import.meta.url).href;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.ts` around lines 96 - 101, The construction of the helper URL uses SherpaONNXBridge.shared.helperBaseUrl (bridgeBase) but assumes it ends with a trailing slash, causing malformed URLs when it doesn't; update the logic in SherpaHelperLoader.ts where bridgeBase is used to build url (the bridgeBase / url / filename code) to normalize bridgeBase by ensuring it ends with a single '/' (or use URL(filename, bridgeBase) semantics) before concatenating so `${bridgeBase}${filename}` always produces a valid path.sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts (1)
567-599: Add a defensivetopPfallback before writing VLM options.At Line 598,
topPis written directly. Guarding against non-finite values in the worker keeps inference stable even if payload shape drifts.Defensive fallback patch
async function processImage( rgbPixels: ArrayBuffer, width: number, height: number, prompt: string, maxTokens: number, temperature: number, - topP: number, systemPrompt?: string, + topP?: number, systemPrompt?: string, ): Promise<VLMWorkerResult> { @@ - m.setValue(optPtr + vo.topP, topP, 'float'); + const safeTopP = Number.isFinite(topP) ? (topP as number) : 0.9; + m.setValue(optPtr + vo.topP, safeTopP, 'float');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts` around lines 567 - 599, The code writes topP into the VLM options without validation; before calling m.setValue(optPtr + vo.topP, topP, 'float') in VLMWorkerRuntime (around where optPtr and vo are used), guard topP with a defensive fallback such as const safeTopP = Number.isFinite(topP) ? topP : 1.0 and then write safeTopP instead; ensure you reference and update the m.setValue call that uses vo.topP to use the validated safeTopP variable.sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+VLM.ts (1)
25-25: Consider a temporary compatibility re-export for removed VLM types.At Line 25, this module now exports only
VLMModelFamily; consumers importingVLMImageFormat/VLM*types from this module path will break. A one-release compatibility re-export (or explicit migration note) would make upgrades safer.Compatibility re-export option
-export { VLMModelFamily } from './VLMTypes'; +export { VLMImageFormat, VLMModelFamily } from './VLMTypes'; +export type { + VLMImage, + VLMGenerationOptions, + VLMGenerationResult, + VLMStreamingResult, +} from './VLMTypes';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere`+VLM.ts at line 25, The export change removed several VLM types consumers may still import (e.g., VLMImageFormat and other VLM* types), so restore a one-release compatibility re-export from the original source: re-export VLMImageFormat and any other VLM* symbols alongside VLMModelFamily from './VLMTypes' in RunAnywhere+VLM.ts (or add a clear comment indicating it is a temporary shim for migration), ensuring imports like `VLMImageFormat` continue to resolve until callers are updated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@sdk/runanywhere-web/packages/llamacpp/src/index.ts`:
- Around line 34-35: Exports were changed to expose VLM and VLMModelFamily but
you removed the original shared VLM contract type exports consumers import from
`@runanywhere/web-llamacpp`; restore compatibility by re-exporting the original
contract type names as aliases from the new module. Update the barrel (index.ts)
to export the original shared types (the same public type names consumers used)
alongside VLM and VLMModelFamily by forwarding them from
'./Extensions/RunAnywhere+VLM' so existing imports continue to compile.
In `@sdk/runanywhere-web/packages/onnx/src/index.ts`:
- Around line 28-34: The public barrel removed shared STT/TTS/VAD contract
exports causing breaking changes; restore backward compatibility by re-exporting
the original shared contract symbols from the ONNX package index: export the
shared STT, TTS and VAD contract types/names under their previous names (e.g.
STT, STTModelType, STTModelConfig, STTWhisperFiles, STTZipformerFiles,
STTParaformerFiles, TTS, TTSVoiceConfig, VAD, VADModelConfig) as aliases that
point to the correct backend-specific or common contract definitions so
consumers importing from `@runanywhere/web-onnx` keep working. Ensure these
re-exports are added to the existing ./index.ts barrel alongside the
backend-specific exports.
In `@sdk/runanywhere-web/packages/onnx/src/ONNX.ts`:
- Around line 47-49: The code sets bridge.helperBaseUrl directly from
options.helperBaseUrl which can omit the required trailing slash; update the
assignment in the initialization block (where options?.helperBaseUrl is handled
before calling ONNXProvider.register()) to normalize the value by appending a
single '/' if one is not present (e.g., check
options.helperBaseUrl.endsWith('/') and add '/' when false) and then assign the
normalized string to bridge.helperBaseUrl so all helper URLs built from it are
valid.
---
Outside diff comments:
In `@sdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.ts`:
- Around line 275-300: The loop that sums additional file sizes uses
this.storage.getFileSize(fileKey) which only checks OPFS and can return null
when the file exists in the other backend (e.g. local storage after streaming);
implement a backend-agnostic helper (e.g. this.getStoredFileSize(fileKey)) that
first calls this.storage.getFileSize(fileKey) and if that returns null falls
back to retrieving the stored file via an existing read method (e.g.
this.storage.getFile or similar) and computes the size from the returned
Blob/ArrayBuffer, then replace the current call in the ModelDownloader
additional-files loop (where totalSize is computed using additionalFileKey and
storage.getFileSize) to use this new helper so totalSize correctly accounts for
files regardless of storage backend.
---
Nitpick comments:
In `@sdk/runanywhere-web/packages/core/src/types/TTSTypes.ts`:
- Around line 8-18: The TTSSynthesisResult interface currently places the index
signature before the named properties; reorder it so the named properties
(audioData, sampleRate, durationMs, processingTimeMs) come first and then
include the index signature ([key: string]: unknown) after them to improve
readability and emphasize the core contract of TTSSynthesisResult.
In `@sdk/runanywhere-web/packages/core/src/types/VLMTypes.ts`:
- Around line 16-26: Replace the loose VLMImage interface with a discriminated
union keyed on the existing format property (VLMImageFormat) so each variant
requires exactly the appropriate data field: for format VLMImageFormat.FilePath
require filePath (width/height optional), for VLMImageFormat.RGBPixels require
pixelData and make width and height required, and for VLMImageFormat.Base64
require base64Data (width/height optional); update any usages of VLMImage to
accept the new union type and adjust code that assumed optional fields may be
absent.
In `@sdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere`+VLM.ts:
- Line 25: The export change removed several VLM types consumers may still
import (e.g., VLMImageFormat and other VLM* types), so restore a one-release
compatibility re-export from the original source: re-export VLMImageFormat and
any other VLM* symbols alongside VLMModelFamily from './VLMTypes' in
RunAnywhere+VLM.ts (or add a clear comment indicating it is a temporary shim for
migration), ensuring imports like `VLMImageFormat` continue to resolve until
callers are updated.
In
`@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts`:
- Around line 567-599: The code writes topP into the VLM options without
validation; before calling m.setValue(optPtr + vo.topP, topP, 'float') in
VLMWorkerRuntime (around where optPtr and vo are used), guard topP with a
defensive fallback such as const safeTopP = Number.isFinite(topP) ? topP : 1.0
and then write safeTopP instead; ensure you reference and update the m.setValue
call that uses vo.topP to use the validated safeTopP variable.
In `@sdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere`+VAD.ts:
- Around line 251-261: The cleanup() method currently swallows errors when
destroying the VAD handle (_vadHandle)—update the catch block to log the caught
error at debug level instead of ignoring it; specifically, catch the error from
SherpaONNXBridge.shared.module._SherpaOnnxDestroyVoiceActivityDetector(this._vadHandle)
and call your debug logger (e.g., console.debug or your module logger) with a
short message like "Failed to destroy VAD handle" and include the error object,
then continue to set _vadHandle = 0 and clear _jsActivityCallback,
_lastSpeechState and _speechStartMs as before.
In `@sdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.ts`:
- Around line 96-101: The construction of the helper URL uses
SherpaONNXBridge.shared.helperBaseUrl (bridgeBase) but assumes it ends with a
trailing slash, causing malformed URLs when it doesn't; update the logic in
SherpaHelperLoader.ts where bridgeBase is used to build url (the bridgeBase /
url / filename code) to normalize bridgeBase by ensuring it ends with a single
'/' (or use URL(filename, bridgeBase) semantics) before concatenating so
`${bridgeBase}${filename}` always produces a valid path.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (39)
examples/web/RunAnywhereAI/src/views/speak.tsexamples/web/RunAnywhereAI/src/views/transcribe.tsexamples/web/RunAnywhereAI/src/views/vision.tsexamples/web/RunAnywhereAI/src/views/voice.tssdk/runanywhere-web/packages/core/src/Infrastructure/AudioCapture.tssdk/runanywhere-web/packages/core/src/Infrastructure/AudioFileLoader.tssdk/runanywhere-web/packages/core/src/Infrastructure/AudioPlayback.tssdk/runanywhere-web/packages/core/src/Infrastructure/ModelDownloader.tssdk/runanywhere-web/packages/core/src/Infrastructure/ProviderTypes.tssdk/runanywhere-web/packages/core/src/Infrastructure/VideoCapture.tssdk/runanywhere-web/packages/core/src/Public/Extensions/RunAnywhere+VoicePipeline.tssdk/runanywhere-web/packages/core/src/index.tssdk/runanywhere-web/packages/core/src/types.tssdk/runanywhere-web/packages/core/src/types/LLMTypes.tssdk/runanywhere-web/packages/core/src/types/STTTypes.tssdk/runanywhere-web/packages/core/src/types/TTSTypes.tssdk/runanywhere-web/packages/core/src/types/VADTypes.tssdk/runanywhere-web/packages/core/src/types/VLMTypes.tssdk/runanywhere-web/packages/core/src/types/enums.tssdk/runanywhere-web/packages/core/src/types/index.tssdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+TextGeneration.tssdk/runanywhere-web/packages/llamacpp/src/Extensions/RunAnywhere+VLM.tssdk/runanywhere-web/packages/llamacpp/src/Extensions/VLMTypes.tssdk/runanywhere-web/packages/llamacpp/src/Foundation/LlamaCppBridge.tssdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.tssdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.tssdk/runanywhere-web/packages/llamacpp/src/LlamaCPP.tssdk/runanywhere-web/packages/llamacpp/src/LlamaCppProvider.tssdk/runanywhere-web/packages/llamacpp/src/index.tssdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere+STT.tssdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere+TTS.tssdk/runanywhere-web/packages/onnx/src/Extensions/RunAnywhere+VAD.tssdk/runanywhere-web/packages/onnx/src/Extensions/STTTypes.tssdk/runanywhere-web/packages/onnx/src/Extensions/TTSTypes.tssdk/runanywhere-web/packages/onnx/src/Extensions/VADTypes.tssdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.tssdk/runanywhere-web/packages/onnx/src/Foundation/SherpaONNXBridge.tssdk/runanywhere-web/packages/onnx/src/ONNX.tssdk/runanywhere-web/packages/onnx/src/index.ts
| export { VLM, VLMModelFamily } from './Extensions/RunAnywhere+VLM'; | ||
| export { ToolCalling, ToolCallFormat, toToolValue, fromToolValue, getStringArg, getNumberArg } from './Extensions/RunAnywhere+ToolCalling'; |
There was a problem hiding this comment.
Preserve compatibility aliases for moved VLM contract exports.
This barrel now exposes VLM/VLMModelFamily but not shared VLM contract types. Consumers importing those contracts from @runanywhere/web-llamacpp will hit compile-time breakage.
Proposed compatibility alias patch
export { TextGeneration } from './Extensions/RunAnywhere+TextGeneration';
export { VLM, VLMModelFamily } from './Extensions/RunAnywhere+VLM';
+// Backward-compatible aliases (deprecate in docs, remove next major)
+export { VLMImageFormat } from '@runanywhere/web';
+export type {
+ VLMGenerationOptions,
+ VLMGenerationResult,
+ VLMStreamingResult,
+} from '@runanywhere/web';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export { VLM, VLMModelFamily } from './Extensions/RunAnywhere+VLM'; | |
| export { ToolCalling, ToolCallFormat, toToolValue, fromToolValue, getStringArg, getNumberArg } from './Extensions/RunAnywhere+ToolCalling'; | |
| export { VLM, VLMModelFamily } from './Extensions/RunAnywhere+VLM'; | |
| // Backward-compatible aliases (deprecate in docs, remove next major) | |
| export { VLMImageFormat } from '@runanywhere/web'; | |
| export type { | |
| VLMGenerationOptions, | |
| VLMGenerationResult, | |
| VLMStreamingResult, | |
| } from '@runanywhere/web'; | |
| export { ToolCalling, ToolCallFormat, toToolValue, fromToolValue, getStringArg, getNumberArg } from './Extensions/RunAnywhere+ToolCalling'; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk/runanywhere-web/packages/llamacpp/src/index.ts` around lines 34 - 35,
Exports were changed to expose VLM and VLMModelFamily but you removed the
original shared VLM contract type exports consumers import from
`@runanywhere/web-llamacpp`; restore compatibility by re-exporting the original
contract type names as aliases from the new module. Update the barrel (index.ts)
to export the original shared types (the same public type names consumers used)
alongside VLM and VLMModelFamily by forwarding them from
'./Extensions/RunAnywhere+VLM' so existing imports continue to compile.
| // Extensions (backend-specific implementations + backend-specific config types) | ||
| export { STT, STTModelType } from './Extensions/RunAnywhere+STT'; | ||
| export type { | ||
| STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles, | ||
| STTTranscriptionResult, STTWord, STTTranscribeOptions, STTStreamCallback, STTStreamingSession, | ||
| } from './Extensions/RunAnywhere+STT'; | ||
| export type { STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles } from './Extensions/RunAnywhere+STT'; | ||
| export { TTS } from './Extensions/RunAnywhere+TTS'; | ||
| export type { TTSVoiceConfig, TTSSynthesisResult, TTSSynthesizeOptions } from './Extensions/RunAnywhere+TTS'; | ||
| export { VAD, SpeechActivity } from './Extensions/RunAnywhere+VAD'; | ||
| export type { SpeechActivityCallback, VADModelConfig, SpeechSegment } from './Extensions/RunAnywhere+VAD'; | ||
| export type { TTSVoiceConfig } from './Extensions/RunAnywhere+TTS'; | ||
| export { VAD } from './Extensions/RunAnywhere+VAD'; | ||
| export type { VADModelConfig } from './Extensions/RunAnywhere+VAD'; |
There was a problem hiding this comment.
Keep ONNX barrel backward-compatible for shared STT/TTS/VAD contracts.
The barrel now mainly exposes backend-specific config types. If consumers import shared contract types from @runanywhere/web-onnx, they will break after this change.
Proposed compatibility alias patch
export { STT, STTModelType } from './Extensions/RunAnywhere+STT';
export type { STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles } from './Extensions/RunAnywhere+STT';
export { TTS } from './Extensions/RunAnywhere+TTS';
export type { TTSVoiceConfig } from './Extensions/RunAnywhere+TTS';
export { VAD } from './Extensions/RunAnywhere+VAD';
export type { VADModelConfig } from './Extensions/RunAnywhere+VAD';
+
+// Backward-compatible aliases (deprecate in docs, remove next major)
+export type {
+ STTTranscriptionResult,
+ STTWord,
+ STTTranscribeOptions,
+ STTStreamCallback,
+ STTStreamingSession,
+ TTSSynthesisResult,
+ TTSSynthesizeOptions,
+ SpeechActivityCallback,
+ SpeechSegment,
+} from '@runanywhere/web';
+export { SpeechActivity } from '@runanywhere/web';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Extensions (backend-specific implementations + backend-specific config types) | |
| export { STT, STTModelType } from './Extensions/RunAnywhere+STT'; | |
| export type { | |
| STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles, | |
| STTTranscriptionResult, STTWord, STTTranscribeOptions, STTStreamCallback, STTStreamingSession, | |
| } from './Extensions/RunAnywhere+STT'; | |
| export type { STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles } from './Extensions/RunAnywhere+STT'; | |
| export { TTS } from './Extensions/RunAnywhere+TTS'; | |
| export type { TTSVoiceConfig, TTSSynthesisResult, TTSSynthesizeOptions } from './Extensions/RunAnywhere+TTS'; | |
| export { VAD, SpeechActivity } from './Extensions/RunAnywhere+VAD'; | |
| export type { SpeechActivityCallback, VADModelConfig, SpeechSegment } from './Extensions/RunAnywhere+VAD'; | |
| export type { TTSVoiceConfig } from './Extensions/RunAnywhere+TTS'; | |
| export { VAD } from './Extensions/RunAnywhere+VAD'; | |
| export type { VADModelConfig } from './Extensions/RunAnywhere+VAD'; | |
| // Extensions (backend-specific implementations + backend-specific config types) | |
| export { STT, STTModelType } from './Extensions/RunAnywhere+STT'; | |
| export type { STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles } from './Extensions/RunAnywhere+STT'; | |
| export { TTS } from './Extensions/RunAnywhere+TTS'; | |
| export type { TTSVoiceConfig } from './Extensions/RunAnywhere+TTS'; | |
| export { VAD } from './Extensions/RunAnywhere+VAD'; | |
| export type { VADModelConfig } from './Extensions/RunAnywhere+VAD'; | |
| // Backward-compatible aliases (deprecate in docs, remove next major) | |
| export type { | |
| STTTranscriptionResult, | |
| STTWord, | |
| STTTranscribeOptions, | |
| STTStreamCallback, | |
| STTStreamingSession, | |
| TTSSynthesisResult, | |
| TTSSynthesizeOptions, | |
| SpeechActivityCallback, | |
| SpeechSegment, | |
| } from '@runanywhere/web'; | |
| export { SpeechActivity } from '@runanywhere/web'; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk/runanywhere-web/packages/onnx/src/index.ts` around lines 28 - 34, The
public barrel removed shared STT/TTS/VAD contract exports causing breaking
changes; restore backward compatibility by re-exporting the original shared
contract symbols from the ONNX package index: export the shared STT, TTS and VAD
contract types/names under their previous names (e.g. STT, STTModelType,
STTModelConfig, STTWhisperFiles, STTZipformerFiles, STTParaformerFiles, TTS,
TTSVoiceConfig, VAD, VADModelConfig) as aliases that point to the correct
backend-specific or common contract definitions so consumers importing from
`@runanywhere/web-onnx` keep working. Ensure these re-exports are added to the
existing ./index.ts barrel alongside the backend-specific exports.
…sources - Bump LLAMACPP_VERSION from b8011 to b8179 for Vulkan/WebGPU precision fixes and FlashAttention improvements - Adapt LoRA adapter API calls in llamacpp_backend.cpp to the new llama_set_adapters_lora() batch API (replaces removed per-adapter functions) - Add nemotron-v2-vl.cpp and paddleocr.cpp to CMakeLists.txt for new clip_graph_* model types introduced in b8179 - Document Qwen2-VL CPU fallback performance (~1 tok/s) caused by M-RoPE NaN logits on WebGPU in rac_vlm_llamacpp.cpp, VLMWorkerBridge.ts, and model-manager.ts
|
@coderabbitai please do a detailed review, thanks! |
|
Starting the comprehensive review now. 🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
examples/web/RunAnywhereAI/src/views/vision.ts (1)
15-16: Long relative import paths are fragile.The deep relative paths (
../../../../../sdk/...) are brittle and will break if the directory structure changes. Consider using npm workspace package names (e.g.,@runanywhere/core,@runanywhere/llamacpp) or TypeScript path aliases for cleaner, more maintainable imports.♻️ Suggested improvement using package names
-import { VideoCapture, type CapturedFrame } from '../../../../../sdk/runanywhere-web/packages/core/src/index'; -import { VLMWorkerBridge } from '../../../../../sdk/runanywhere-web/packages/llamacpp/src/index'; +import { VideoCapture, type CapturedFrame } from '@runanywhere/core'; +import { VLMWorkerBridge } from '@runanywhere/llamacpp';This requires proper npm workspace configuration, which the SDK likely already supports.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/web/RunAnywhereAI/src/views/vision.ts` around lines 15 - 16, The imports using deep relative paths for VideoCapture, CapturedFrame and VLMWorkerBridge are fragile—replace the long ../../../../../sdk/... imports with the SDK package names or TypeScript path aliases (e.g., import from `@runanywhere/core` and `@runanywhere/llamacpp` or configured tsconfig paths) so the module resolution is stable; update the import statements in the file to reference the package names (VideoCapture, CapturedFrame from the core package and VLMWorkerBridge from the llamacpp package) and ensure package/tsconfig paths are configured in the project.sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.ts (1)
243-263: Qwen2-VL detection may be overly broad.The regex
/qwen/icould match unintended model names (e.g., a hypothetical "qwerty-vlm" model). Consider a more specific pattern like/qwen.*vl/ior/qwen2.*vl/ito reduce false positives.Also, the URL replacement on line 257 assumes a specific naming convention (
-webgpu.js→.js). If the WASM file naming changes, this could silently fail to switch backends.🔧 Suggested improvement for more precise detection
- const isQwenVL = /qwen/i.test(params.modelId) || /qwen/i.test(params.modelName); + // Match Qwen VL models specifically (Qwen2-VL, Qwen-VL, etc.) + const isQwenVL = /qwen.*vl/i.test(params.modelId) || /qwen.*vl/i.test(params.modelName);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.ts` around lines 243 - 263, The current Qwen2-VL detection (isQwenVL using /qwen/i on params.modelId/name) is too broad and the wasm URL rewrite (bridge.wasmUrl.replace(/-webgpu\.js$/, '.js')) is brittle; update isQwenVL to a narrower pattern such as /qwen2?.*vl/i or /qwen.*-?vl/i to avoid matching unrelated names, and make the URL swap more robust in LlamaCppBridge.shared usage by handling variants (e.g., '-webgpu.js', '.webgpu.js', '-webgpu.wasm.js') and falling back to a safer transform (check endsWith and replace the suffix or try replacing 'webgpu' token) before calling this.terminate() and this.init(cpuUrl), ensuring you still skip restart if cpuUrl equals currentUrl or if bridge.wasmUrl is undefined.sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (2)
618-627: Extract model-family override mapping into one helper.The same switch is duplicated in
rac_vlm_llamacpp_processandrac_vlm_llamacpp_process_stream. A shared helper will prevent future drift.Also applies to: 879-888
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around lines 618 - 627, Extract the duplicated switch that maps options->model_family to VLMModelType into a single helper (e.g., a static function named resolve_effective_model_type or rac_vlm_llamacpp_resolve_model_type) that accepts the backend (or backend->model_type) and options and returns the resolved VLMModelType; replace the switch blocks in rac_vlm_llamacpp_process and rac_vlm_llamacpp_process_stream with calls to this helper so both use the same logic and avoid duplication.
762-795: Gate logits diagnostics behind debug/opt-in.This full-vocab scan + top-5 logging runs on every request’s first token and logs at INFO. It is noisy and adds avoidable overhead on hot paths.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around lines 762 - 795, The diagnostic full-vocab logits scan inside the i == 0 block (uses llama_get_logits, n_vocab, top5_val/top5_idx and RAC_LOG_INFO with LOG_CAT) is running on every request and logging at INFO; gate this work behind an opt-in debug flag and lower the log level (e.g. DEBUG) to avoid hot-path overhead. Modify the i == 0 block to first check a runtime-config or environment flag (e.g. enable_logits_diag or backend->opts.logit_diag) and only perform the NaN/Inf scan and top-5 computation when that flag is true, and change RAC_LOG_INFO to a debug-level logger for the diagnostic messages; keep the existing logic (max_logit, nan/inf counts, top5 arrays) unchanged except for the surrounding conditional and log level.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt`:
- Around line 156-157: The CMakeLists refers to new source files
nemotron-v2-vl.cpp and paddleocr.cpp that don't exist in the currently fetched
llama.cpp (b8011); update the FetchContent/variable LLAMACPP_VERSION to at least
b8110 so those files are present, or remove the two file entries from the
sources list if you must stay on b8011; specifically, change the
LLAMACPP_VERSION used by the FetchContent declaration (or wherever
LLAMACPP_VERSION is defined) to >= b8110, or delete the references to
${llamacpp_SOURCE_DIR}/tools/mtmd/models/nemotron-v2-vl.cpp and
${llamacpp_SOURCE_DIR}/tools/mtmd/models/paddleocr.cpp from the CMakeLists.txt
sources block.
In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`:
- Around line 374-375: The calls to llama_set_adapters_lora(context_, ...)
currently ignore the int32_t return value and unconditionally clear the local
lora_adapters_ tracking causing state divergence; change each call site that
uses llama_set_adapters_lora (e.g., the places around where lora_adapters_ is
cleared and the call with context_) to capture and check the return status, only
clear or update lora_adapters_ when the function returns success, and on failure
log or propagate the error (using the same logging/return pattern used where
llama_set_adapters_lora is already checked) so the local state stays consistent
with the llama.cpp context.
- Around line 930-942: The code currently erases from lora_adapters_ before
calling llama_set_adapters_lora(context_, ...) which can desync local state if
the API fails; change the flow in the erase/unload/clear paths (referenced by
lora_adapters_, llama_set_adapters_lora and context_) to first construct the
list of remaining adapters and scales (without mutating lora_adapters_), call
llama_set_adapters_lora and capture/validate its return value, and only upon
success mutate lora_adapters_ (erase or clear) to keep state consistent; apply
the same pattern (check and handle the return value of llama_set_adapters_lora)
at the other call sites mentioned (around lines with unload/clear operations) or
explicitly document why failures can be ignored.
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 193-212: The template fallback currently drops an explicit system
prompt when llama_chat_apply_template(tmpl, messages, 2, ...) fails with
effective_system set; change the fallback so that when the system-inclusive
template application fails you still preserve the system_prompt by either (a)
attempting the fallback formatting with only the user message and then
prepending or injecting the original effective_system/system_prompt into the
returned formatted string (with a clear separator) or (b) re-running
llama_chat_apply_template with a modified messages array that ensures the system
content is included in the output; update the code paths around
effective_system, messages, tmpl, user_content and the alternate/template-only
branch so the final returned string always contains the explicit system prompt,
and apply the identical fix to the later similar block (lines 214-229) that
mirrors this behavior.
- Around line 662-664: The INFO-level RAC_LOG_INFO calls that currently print
user/system prompt content (e.g., the call using LOG_CAT and formatting
full_prompt.c_str(), which appears in the v3-process path and the similar
streaming-path calls) must stop emitting prompt text; instead log only
non-sensitive metadata (prompt length, has_image flag, effective_model_type) or
move the full prompt text to a DEBUG-level log guarded by a verbosity check.
Update the RAC_LOG_INFO invocations that reference full_prompt to remove the
"%.200s" / full_prompt.c_str() argument and log only (int)full_prompt.length(),
has_image ? 1 : 0, and (int)effective_model_type, or change the call to
RAC_LOG_DEBUG and keep the full_prompt there behind a conditional so production
INFO logs never contain prompt contents; apply the same change to all analogous
calls (including the streaming path variants).
---
Nitpick comments:
In `@examples/web/RunAnywhereAI/src/views/vision.ts`:
- Around line 15-16: The imports using deep relative paths for VideoCapture,
CapturedFrame and VLMWorkerBridge are fragile—replace the long
../../../../../sdk/... imports with the SDK package names or TypeScript path
aliases (e.g., import from `@runanywhere/core` and `@runanywhere/llamacpp` or
configured tsconfig paths) so the module resolution is stable; update the import
statements in the file to reference the package names (VideoCapture,
CapturedFrame from the core package and VLMWorkerBridge from the llamacpp
package) and ensure package/tsconfig paths are configured in the project.
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 618-627: Extract the duplicated switch that maps
options->model_family to VLMModelType into a single helper (e.g., a static
function named resolve_effective_model_type or
rac_vlm_llamacpp_resolve_model_type) that accepts the backend (or
backend->model_type) and options and returns the resolved VLMModelType; replace
the switch blocks in rac_vlm_llamacpp_process and
rac_vlm_llamacpp_process_stream with calls to this helper so both use the same
logic and avoid duplication.
- Around line 762-795: The diagnostic full-vocab logits scan inside the i == 0
block (uses llama_get_logits, n_vocab, top5_val/top5_idx and RAC_LOG_INFO with
LOG_CAT) is running on every request and logging at INFO; gate this work behind
an opt-in debug flag and lower the log level (e.g. DEBUG) to avoid hot-path
overhead. Modify the i == 0 block to first check a runtime-config or environment
flag (e.g. enable_logits_diag or backend->opts.logit_diag) and only perform the
NaN/Inf scan and top-5 computation when that flag is true, and change
RAC_LOG_INFO to a debug-level logger for the diagnostic messages; keep the
existing logic (max_logit, nan/inf counts, top5 arrays) unchanged except for the
surrounding conditional and log level.
In `@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.ts`:
- Around line 243-263: The current Qwen2-VL detection (isQwenVL using /qwen/i on
params.modelId/name) is too broad and the wasm URL rewrite
(bridge.wasmUrl.replace(/-webgpu\.js$/, '.js')) is brittle; update isQwenVL to a
narrower pattern such as /qwen2?.*vl/i or /qwen.*-?vl/i to avoid matching
unrelated names, and make the URL swap more robust in LlamaCppBridge.shared
usage by handling variants (e.g., '-webgpu.js', '.webgpu.js', '-webgpu.wasm.js')
and falling back to a safer transform (check endsWith and replace the suffix or
try replacing 'webgpu' token) before calling this.terminate() and
this.init(cpuUrl), ensuring you still skip restart if cpuUrl equals currentUrl
or if bridge.wasmUrl is undefined.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/web/RunAnywhereAI/src/services/model-manager.tsexamples/web/RunAnywhereAI/src/views/vision.tssdk/runanywhere-commons/VERSIONSsdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txtsdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cppsdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cppsdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerBridge.tssdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts
| ${llamacpp_SOURCE_DIR}/tools/mtmd/models/nemotron-v2-vl.cpp | ||
| ${llamacpp_SOURCE_DIR}/tools/mtmd/models/paddleocr.cpp |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the LLAMACPP_VERSION in the CMakeLists.txt file
cat -n sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt | head -30Repository: RunanywhereAI/runanywhere-sdks
Length of output: 1260
🏁 Script executed:
# Also check around lines 156-157 to see the context
sed -n '150,165p' sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txtRepository: RunanywhereAI/runanywhere-sdks
Length of output: 738
🌐 Web query:
llama.cpp nemotron-v2-vl.cpp paddleocr.cpp tools/mtmd/models repository
💡 Result:
Those files are in the main llama.cpp repository: ggml-org/llama.cpp. (github.com)
tools/mtmd/models/nemotron-v2-vl.cpp(Nemotron Nano 12B v2 VL / mtmd support was added in llama.cpp). (newreleases.io)tools/mtmd/models/paddleocr.cpp(PaddleOCR-VL support was added in llama.cpp). (newreleases.io)
Citations:
- 1: https://github.com/ggml-org/llama.cpp?utm_source=openai
- 2: https://newreleases.io/project/github/ggml-org/llama.cpp/release/b8054?utm_source=openai
- 3: https://newreleases.io/project/github/ggml-org/llama.cpp/release/b8110?utm_source=openai
Fix LLAMACPP_VERSION or remove the new model files.
The new source files nemotron-v2-vl.cpp and paddleocr.cpp reference files that do not exist in llama.cpp b8011:
nemotron-v2-vl.cppwas added in b8054paddleocr.cppwas added in b8110
The build will fail at FetchContent time because these files don't exist at the current version. Update LLAMACPP_VERSION to at least b8110 to include both files, or remove these references if b8011 is required.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk/runanywhere-commons/src/backends/llamacpp/CMakeLists.txt` around lines
156 - 157, The CMakeLists refers to new source files nemotron-v2-vl.cpp and
paddleocr.cpp that don't exist in the currently fetched llama.cpp (b8011);
update the FetchContent/variable LLAMACPP_VERSION to at least b8110 so those
files are present, or remove the two file entries from the sources list if you
must stay on b8011; specifically, change the LLAMACPP_VERSION used by the
FetchContent declaration (or wherever LLAMACPP_VERSION is defined) to >= b8110,
or delete the references to
${llamacpp_SOURCE_DIR}/tools/mtmd/models/nemotron-v2-vl.cpp and
${llamacpp_SOURCE_DIR}/tools/mtmd/models/paddleocr.cpp from the CMakeLists.txt
sources block.
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp
Outdated
Show resolved
Hide resolved
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp
Outdated
Show resolved
Hide resolved
|
@coderabbitai please do a detailed review. Thanks! |
|
Starting the detailed review now. 🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts (1)
89-90:⚠️ Potential issue | 🟠 MajorFail fast when new WASM offset exports are missing.
If
vlm_options_system_promptorvlm_options_model_familyexports are absent,workerOffsetOfyields0, and Line 607 can write into the wrong struct field silently. This is a hard-to-debug ABI mismatch path.💡 Proposed change
-function workerOffsetOf(m: any, name: string): number { +function workerOffsetOf(m: any, name: string, required = true): number { const fn = m[`_rac_wasm_offsetof_${name}`]; - return typeof fn === 'function' ? fn() : 0; + if (typeof fn !== 'function') { + if (required) { + throw new Error(`Missing WASM offsetof export: _rac_wasm_offsetof_${name}`); + } + return 0; + } + return fn(); }vlmOptions: { maxTokens: workerOffsetOf(m, 'vlm_options_max_tokens'), temperature: workerOffsetOf(m, 'vlm_options_temperature'), topP: workerOffsetOf(m, 'vlm_options_top_p'), - streamingEnabled: workerOffsetOf(m, 'vlm_options_streaming_enabled'), - systemPrompt: workerOffsetOf(m, 'vlm_options_system_prompt'), - modelFamily: workerOffsetOf(m, 'vlm_options_model_family'), + streamingEnabled: workerOffsetOf(m, 'vlm_options_streaming_enabled'), + systemPrompt: workerOffsetOf(m, 'vlm_options_system_prompt', true), + modelFamily: workerOffsetOf(m, 'vlm_options_model_family', true), },Also applies to: 607-607
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts` around lines 89 - 90, The code currently uses workerOffsetOf('vlm_options_system_prompt') and workerOffsetOf('vlm_options_model_family') and will silently continue if those exports are missing (workerOffsetOf returns 0); change the VLMWorkerRuntime initialization to immediately validate these offsets: call workerOffsetOf for 'vlm_options_system_prompt' and 'vlm_options_model_family', and if either returns 0 throw an Error (or assert) with a clear message mentioning the missing export name and that the ABI is incompatible, so the code fails fast before writing into the struct (references: workerOffsetOf, vlm_options_system_prompt, vlm_options_model_family).
🧹 Nitpick comments (1)
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (1)
631-640: Extract duplicated model-family mapping into a shared helper.The same
options->model_familymapping is duplicated in both process paths. Centralizing it reduces drift risk when adding new families.Also applies to: 891-900
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around lines 631 - 640, Extract the duplicated switch that maps options->model_family to VLMModelType into a single helper (e.g., mapModelFamilyToVLMModelType or resolveModelTypeFromFamily) that accepts the RAC_VLM model_family enum and returns a VLMModelType; then replace the in-place switches (including the one currently in rac_vlm_llamacpp.cpp around the VLMModelType effective_model_type calculation and the other occurrence around lines ~891-900) with calls to that helper and fall back to backend->model_type when options is null or model_family==RAC_VLM_MODEL_FAMILY_AUTO. Ensure the helper handles all RAC_VLM_MODEL_FAMILY_* cases (QWEN2_VL, SMOLVLM, LLAVA, default->Generic) and is declared/defined in a shared scope so both call sites can use it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 774-807: Gate the expensive first-token full-vocab diagnostics
behind a runtime debug/diagnostic switch so they only run when explicitly
enabled: wrap the entire block that begins with the "if (i == 0) { float* logits
= llama_get_logits(...)" and contains the two RAC_LOG_INFO calls and the top-5
scan in a conditional that checks a debug flag (e.g.
backend->enable_first_token_diag or a global/Context-level is_diag_enabled()
function) and skip the logits access and loops when the flag is false; update
any places that construct the backend/context to expose/configure this boolean
and ensure the check occurs before calling llama_get_logits or iterating n_vocab
to avoid the latency and log noise in production.
---
Outside diff comments:
In
`@sdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.ts`:
- Around line 89-90: The code currently uses
workerOffsetOf('vlm_options_system_prompt') and
workerOffsetOf('vlm_options_model_family') and will silently continue if those
exports are missing (workerOffsetOf returns 0); change the VLMWorkerRuntime
initialization to immediately validate these offsets: call workerOffsetOf for
'vlm_options_system_prompt' and 'vlm_options_model_family', and if either
returns 0 throw an Error (or assert) with a clear message mentioning the missing
export name and that the ABI is incompatible, so the code fails fast before
writing into the struct (references: workerOffsetOf, vlm_options_system_prompt,
vlm_options_model_family).
---
Nitpick comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 631-640: Extract the duplicated switch that maps
options->model_family to VLMModelType into a single helper (e.g.,
mapModelFamilyToVLMModelType or resolveModelTypeFromFamily) that accepts the
RAC_VLM model_family enum and returns a VLMModelType; then replace the in-place
switches (including the one currently in rac_vlm_llamacpp.cpp around the
VLMModelType effective_model_type calculation and the other occurrence around
lines ~891-900) with calls to that helper and fall back to backend->model_type
when options is null or model_family==RAC_VLM_MODEL_FAMILY_AUTO. Ensure the
helper handles all RAC_VLM_MODEL_FAMILY_* cases (QWEN2_VL, SMOLVLM, LLAVA,
default->Generic) and is declared/defined in a shared scope so both call sites
can use it.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cppsdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cppsdk/runanywhere-web/packages/llamacpp/src/Infrastructure/VLMWorkerRuntime.tssdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.tssdk/runanywhere-web/packages/onnx/src/ONNX.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- sdk/runanywhere-web/packages/onnx/src/Foundation/SherpaHelperLoader.ts
- sdk/runanywhere-web/packages/onnx/src/ONNX.ts
| // Diagnostic: on first token, inspect logits for NaN/corruption | ||
| if (i == 0) { | ||
| float* logits = llama_get_logits(backend->ctx); | ||
| int n_vocab = llama_vocab_n_tokens(vocab); | ||
| if (logits && n_vocab > 0) { | ||
| float max_logit = logits[0]; | ||
| int max_idx = 0; | ||
| int nan_count = 0; | ||
| int inf_count = 0; | ||
| for (int v = 0; v < n_vocab; v++) { | ||
| if (logits[v] != logits[v]) nan_count++; // NaN check | ||
| if (logits[v] > 1e30f || logits[v] < -1e30f) inf_count++; | ||
| if (logits[v] > max_logit) { max_logit = logits[v]; max_idx = v; } | ||
| } | ||
| RAC_LOG_INFO(LOG_CAT, "[v3-diag] Logits: n_vocab=%d, max_logit=%.4f at token %d, NaN=%d, Inf=%d", | ||
| n_vocab, max_logit, max_idx, nan_count, inf_count); | ||
| // Log top 5 logits | ||
| float top5_val[5] = {-1e30f, -1e30f, -1e30f, -1e30f, -1e30f}; | ||
| int top5_idx[5] = {0, 0, 0, 0, 0}; | ||
| for (int v = 0; v < n_vocab; v++) { | ||
| if (logits[v] != logits[v]) continue; // skip NaN | ||
| for (int k = 0; k < 5; k++) { | ||
| if (logits[v] > top5_val[k]) { | ||
| for (int j = 4; j > k; j--) { top5_val[j] = top5_val[j-1]; top5_idx[j] = top5_idx[j-1]; } | ||
| top5_val[k] = logits[v]; top5_idx[k] = v; | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| RAC_LOG_INFO(LOG_CAT, "[v3-diag] Top5: [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f", | ||
| top5_idx[0], top5_val[0], top5_idx[1], top5_val[1], | ||
| top5_idx[2], top5_val[2], top5_idx[3], top5_val[3], | ||
| top5_idx[4], top5_val[4]); | ||
| } |
There was a problem hiding this comment.
Gate first-token full-vocab diagnostics behind a debug/diagnostic switch.
Line 775 currently triggers a full-vocab scan and two RAC_LOG_INFO diagnostics for every request. This adds avoidable latency and production log noise.
💡 Proposed change
- // Diagnostic: on first token, inspect logits for NaN/corruption
- if (i == 0) {
+ // Diagnostic: on first token, inspect logits for NaN/corruption
+ // Keep this behind an explicit diagnostics build flag.
+#ifdef RAC_VLM_ENABLE_DIAGNOSTICS
+ if (i == 0) {
float* logits = llama_get_logits(backend->ctx);
int n_vocab = llama_vocab_n_tokens(vocab);
if (logits && n_vocab > 0) {
@@
- RAC_LOG_INFO(LOG_CAT, "[v3-diag] Logits: n_vocab=%d, max_logit=%.4f at token %d, NaN=%d, Inf=%d",
+ RAC_LOG_DEBUG(LOG_CAT, "[v3-diag] Logits: n_vocab=%d, max_logit=%.4f at token %d, NaN=%d, Inf=%d",
n_vocab, max_logit, max_idx, nan_count, inf_count);
@@
- RAC_LOG_INFO(LOG_CAT, "[v3-diag] Top5: [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f",
+ RAC_LOG_DEBUG(LOG_CAT, "[v3-diag] Top5: [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f",
top5_idx[0], top5_val[0], top5_idx[1], top5_val[1],
top5_idx[2], top5_val[2], top5_idx[3], top5_val[3],
top5_idx[4], top5_val[4]);
}
}
+#endif🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around
lines 774 - 807, Gate the expensive first-token full-vocab diagnostics behind a
runtime debug/diagnostic switch so they only run when explicitly enabled: wrap
the entire block that begins with the "if (i == 0) { float* logits =
llama_get_logits(...)" and contains the two RAC_LOG_INFO calls and the top-5
scan in a conditional that checks a debug flag (e.g.
backend->enable_first_token_diag or a global/Context-level is_diag_enabled()
function) and skip the logits access and loops when the flag is false; update
any places that construct the backend/context to expose/configure this boolean
and ensure the check occurs before calling llama_get_logits or iterating n_vocab
to avoid the latency and log noise in production.
|
LGTM |
|
Released 0.10-beta version, good to close/merge this PR |
|
@coderabbitai please review |
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 3a8d798 in 17 seconds. Click for details.
- Reviewed
2109lines of code in47files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_SKXRc203oErbRTQd
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
|
@greptile please review |
|
Starting the comprehensive review now. 🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (1)
774-807:⚠️ Potential issue | 🟠 MajorGate first-token full-vocab diagnostics behind an explicit diagnostics switch.
Line 775 still runs an O(vocab) scan (twice) on every request, which adds avoidable latency in production.
💡 Suggested patch
- // Diagnostic: on first token, inspect logits for NaN/corruption - if (i == 0) { + // Diagnostic: on first token, inspect logits for NaN/corruption +#ifdef RAC_VLM_ENABLE_DIAGNOSTICS + if (i == 0) { float* logits = llama_get_logits(backend->ctx); int n_vocab = llama_vocab_n_tokens(vocab); if (logits && n_vocab > 0) { @@ RAC_LOG_DEBUG(LOG_CAT, "[v3-diag] Top5: [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f [%d]=%.2f", top5_idx[0], top5_val[0], top5_idx[1], top5_val[1], top5_idx[2], top5_val[2], top5_idx[3], top5_val[3], top5_idx[4], top5_val[4]); } } +#endif🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around lines 774 - 807, The diagnostics block that scans the full vocabulary on first token (the code starting at the comment "// Diagnostic: on first token, inspect logits for NaN/corruption" and the if (i == 0) {...} block that calls llama_get_logits and loops over n_vocab) must be gated by an explicit runtime diagnostics switch to avoid O(vocab) overhead in production; add a boolean flag (e.g. backend->enable_v3_diagnostics or a similarly named config/option) and change the condition to if (i == 0 && backend->enable_v3_diagnostics) { ... } (or check a global/config getter) so the full-vocab scans and top-5 computation only run when that flag is enabled. Ensure the new flag is default-off and documented where backend configuration is initialized.
🧹 Nitpick comments (1)
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (1)
631-640: Extract model-family override mapping into a shared helper.This switch is duplicated in sync and stream paths; centralizing it reduces drift risk when adding new
RAC_VLM_MODEL_FAMILY_*values.Also applies to: 891-900
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around lines 631 - 640, The duplicated switch that maps options->model_family (RAC_VLM_MODEL_FAMILY_*) to VLMModelType (VLMModelType::Qwen2VL, ::SmolVLM, ::LLaVA, ::Generic) should be extracted into a shared helper function (e.g., a static helper like ResolveVLMModelTypeFromFamily or MapModelFamilyToVLMModelType) and used from both the current branch and the other occurrence in the stream/sync path; update callers to call that helper (passing options->model_family and backend->model_type as a default) so the mapping logic lives in one place and both code paths use it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 180-184: The code treats an empty system_prompt ("") as provided;
update the logic that sets effective_system so empty strings are considered "not
provided" — replace checks like `if (!effective_system && model_type ==
VLMModelType::Qwen2VL)` with a predicate that treats NULL or empty (e.g.,
`system_prompt == nullptr || system_prompt[0] == '\0'`) and apply the same
change wherever `effective_system` is computed (references: effective_system,
system_prompt, model_type, VLMModelType::Qwen2VL) so the default "You are a
helpful assistant." is injected for Qwen2-VL when system_prompt is missing or
empty.
---
Duplicate comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 774-807: The diagnostics block that scans the full vocabulary on
first token (the code starting at the comment "// Diagnostic: on first token,
inspect logits for NaN/corruption" and the if (i == 0) {...} block that calls
llama_get_logits and loops over n_vocab) must be gated by an explicit runtime
diagnostics switch to avoid O(vocab) overhead in production; add a boolean flag
(e.g. backend->enable_v3_diagnostics or a similarly named config/option) and
change the condition to if (i == 0 && backend->enable_v3_diagnostics) { ... }
(or check a global/config getter) so the full-vocab scans and top-5 computation
only run when that flag is enabled. Ensure the new flag is default-off and
documented where backend configuration is initialized.
---
Nitpick comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 631-640: The duplicated switch that maps options->model_family
(RAC_VLM_MODEL_FAMILY_*) to VLMModelType (VLMModelType::Qwen2VL, ::SmolVLM,
::LLaVA, ::Generic) should be extracted into a shared helper function (e.g., a
static helper like ResolveVLMModelTypeFromFamily or
MapModelFamilyToVLMModelType) and used from both the current branch and the
other occurrence in the stream/sync path; update callers to call that helper
(passing options->model_family and backend->model_type as a default) so the
mapping logic lives in one place and both code paths use it.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cppsdk/runanywhere-web/packages/core/package.jsonsdk/runanywhere-web/packages/llamacpp/package.jsonsdk/runanywhere-web/packages/onnx/package.json
✅ Files skipped from review due to trivial changes (1)
- sdk/runanywhere-web/packages/llamacpp/package.json
There was a problem hiding this comment.
♻️ Duplicate comments (2)
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (2)
180-184:⚠️ Potential issue | 🟡 MinorTreat empty
system_promptas not provided.
effective_systemis initialized directly fromsystem_prompt, so""is treated as provided and can suppress Qwen2-VL default injection while producing an empty system block.Suggested patch
- const char* effective_system = system_prompt; + const bool has_explicit_system = (system_prompt && system_prompt[0] != '\0'); + const char* effective_system = has_explicit_system ? system_prompt : nullptr; if (!effective_system && model_type == VLMModelType::Qwen2VL) { effective_system = "You are a helpful assistant."; } @@ - bool has_explicit_system = (system_prompt && system_prompt[0] != '\0'); if (has_explicit_system) { RAC_LOG_WARNING(LOG_CAT, "Template with system failed (size=%d); falling back to manual to preserve explicit system prompt", size); } else {Also applies to: 211-213, 249-253
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around lines 180 - 184, The code currently treats an empty C string as a provided system_prompt, so change the initialization and checks for effective_system to treat NULL and empty string equivalently: when assigning effective_system from system_prompt (used in the block around effective_system, system_prompt, model_type and VLMModelType::Qwen2VL) use a test like system_prompt == nullptr || system_prompt[0] == '\0' (or equivalent) and only inject the Qwen2-VL default ("You are a helpful assistant.") when system_prompt is null or empty; apply the same fix to the other occurrences referenced (the checks around the other effective_system assignments at the later locations you flagged).
774-808:⚠️ Potential issue | 🟠 MajorGate first-token full-vocab diagnostics behind an explicit diagnostics switch.
This block still runs an O(vocab) scan on every request and adds avoidable latency/log noise in production paths.
Suggested patch
- // Diagnostic: on first token, inspect logits for NaN/corruption - if (i == 0) { + // Diagnostic: on first token, inspect logits for NaN/corruption +#ifdef RAC_VLM_ENABLE_DIAGNOSTICS + if (i == 0) { float* logits = llama_get_logits(backend->ctx); int n_vocab = llama_vocab_n_tokens(vocab); if (logits && n_vocab > 0) { @@ } } +#endif🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around lines 774 - 808, The O(vocab) first-token diagnostics (the entire if (i == 0) block that calls llama_get_logits, iterates n_vocab, computes NaN/Inf counts and top5, and emits RAC_LOG_DEBUG) must be gated behind an explicit diagnostics flag so it doesn't run in production; modify the code to check a boolean (e.g., backend->diagnostics_enabled or a global enable_v3_diag) before executing that block (and keep the existing logits/null and n_vocab checks inside the gated block), default the flag to false, and only log via RAC_LOG_DEBUG when the flag is true.
🧹 Nitpick comments (1)
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp (1)
631-640: Extract shared model-family/system-prompt resolution into one helper.The sync and streaming paths duplicate the same mapping logic; centralizing it will reduce drift risk.
Also applies to: 891-900, 642-643, 902-903
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp` around lines 631 - 640, Extract the model-family -> VLMModelType resolution into a single helper (e.g., ResolveEffectiveModelType or getEffectiveModelType) that accepts the backend (to read backend->model_type) and the options pointer (to read options->model_family) and returns a VLMModelType; move the switch mapping (cases for RAC_VLM_MODEL_FAMILY_QWEN2_VL, SMOLVLM, LLAVA, default Generic) into that helper and default to backend->model_type when options is null or model_family is AUTO. Replace the duplicated blocks in rac_vlm_llamacpp.cpp (the sync and streaming codepaths that compute effective_model_type) to call this new helper wherever effective_model_type is computed (including the other occurrences called out near the existing switch usage).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 180-184: The code currently treats an empty C string as a provided
system_prompt, so change the initialization and checks for effective_system to
treat NULL and empty string equivalently: when assigning effective_system from
system_prompt (used in the block around effective_system, system_prompt,
model_type and VLMModelType::Qwen2VL) use a test like system_prompt == nullptr
|| system_prompt[0] == '\0' (or equivalent) and only inject the Qwen2-VL default
("You are a helpful assistant.") when system_prompt is null or empty; apply the
same fix to the other occurrences referenced (the checks around the other
effective_system assignments at the later locations you flagged).
- Around line 774-808: The O(vocab) first-token diagnostics (the entire if (i ==
0) block that calls llama_get_logits, iterates n_vocab, computes NaN/Inf counts
and top5, and emits RAC_LOG_DEBUG) must be gated behind an explicit diagnostics
flag so it doesn't run in production; modify the code to check a boolean (e.g.,
backend->diagnostics_enabled or a global enable_v3_diag) before executing that
block (and keep the existing logits/null and n_vocab checks inside the gated
block), default the flag to false, and only log via RAC_LOG_DEBUG when the flag
is true.
---
Nitpick comments:
In `@sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cpp`:
- Around line 631-640: Extract the model-family -> VLMModelType resolution into
a single helper (e.g., ResolveEffectiveModelType or getEffectiveModelType) that
accepts the backend (to read backend->model_type) and the options pointer (to
read options->model_family) and returns a VLMModelType; move the switch mapping
(cases for RAC_VLM_MODEL_FAMILY_QWEN2_VL, SMOLVLM, LLAVA, default Generic) into
that helper and default to backend->model_type when options is null or
model_family is AUTO. Replace the duplicated blocks in rac_vlm_llamacpp.cpp (the
sync and streaming codepaths that compute effective_model_type) to call this new
helper wherever effective_model_type is computed (including the other
occurrences called out near the existing switch usage).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
sdk/runanywhere-commons/src/backends/llamacpp/rac_vlm_llamacpp.cppsdk/runanywhere-web/packages/core/package.jsonsdk/runanywhere-web/packages/llamacpp/package.jsonsdk/runanywhere-web/packages/onnx/package.json
✅ Files skipped from review due to trivial changes (2)
- sdk/runanywhere-web/packages/llamacpp/package.json
- sdk/runanywhere-web/packages/core/package.json
Resolved conflicts in llama.cpp backend: - VERSIONS: take main's b8201 (newer than b8179) - CMakeLists.txt: deduplicate mtmd model source files - llamacpp_backend.cpp: adopt main's simplified LoRA handling Made-with: Cursor
- Updated the logic for determining the effective system prompt to handle empty strings. - Introduced a new function to resolve the effective VLM model type based on options, simplifying the code in `rac_vlm_llamacpp_process`. - Improved download progress tracking in `ModelDownloader` by using cumulative byte counts for better accuracy. - Enhanced type exports in `llamacpp` and `onnx` packages for better compatibility and clarity. - Adjusted regex for identifying Qwen VL models to ensure more accurate matching.
speak.ts,transcribe.ts,vision.ts, andvoice.tsto use the new core package for audio and video functionalities.AudioCapture,AudioPlayback, andVideoCaptureclasses to handle audio and video processing more efficiently.ModelDownloader.Description
Brief description of the changes made.
Type of Change
Testing
Platform-Specific Testing (check all that apply)
Swift SDK / iOS Sample:
Kotlin SDK / Android Sample:
Flutter SDK / Flutter Sample:
React Native SDK / React Native Sample:
Playground:
Web SDK / Web Sample:
Labels
Please add the appropriate label(s):
SDKs:
Swift SDK- Changes to Swift SDK (sdk/runanywhere-swift)Kotlin SDK- Changes to Kotlin SDK (sdk/runanywhere-kotlin)Flutter SDK- Changes to Flutter SDK (sdk/runanywhere-flutter)React Native SDK- Changes to React Native SDK (sdk/runanywhere-react-native)Web SDK- Changes to Web SDK (sdk/runanywhere-web)Commons- Changes to shared native code (sdk/runanywhere-commons)Sample Apps:
iOS Sample- Changes to iOS example app (examples/ios)Android Sample- Changes to Android example app (examples/android)Flutter Sample- Changes to Flutter example app (examples/flutter)React Native Sample- Changes to React Native example app (examples/react-native)Web Sample- Changes to Web example app (examples/web)Checklist
Screenshots
Attach relevant UI screenshots for changes (if applicable):
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Chores
Important
Refactor Web SDK to improve modularity and type safety by centralizing audio/video processing and types in a core package, and implement streaming model downloads.
speak.ts,transcribe.ts,vision.ts, andvoice.tsto use new core package for audio/video functionalities.AudioCapture,AudioPlayback, andVideoCaptureclasses for efficient audio/video processing.ModelDownloader.This description was created by
for 3a8d798. You can customize this summary. It will automatically update as commits are pushed.
Greptile Summary
Refactors Web SDK architecture by consolidating backend-agnostic types (STT, TTS, VLM, VAD, LLM) into the core package, enabling cleaner separation between infrastructure and backend implementations.
Key improvements:
llama_set_adapters_lora)AudioCapture,AudioPlayback,VideoCapture) now properly exported from coreArchitecture changes:
@runanywhere/webcore:VLMTypes.ts,STTTypes.ts,TTSTypes.ts,VADTypes.ts,LLMTypes.tsVLMModelFamilyenum in llamacpp, model file configurations in onnx)Confidence Score: 4/5
rac_vlm_llamacpp.cppfor VLM runtime behavior with new sampler chain and CPU fallback logicImportant Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TB Core["@runanywhere/web (core)<br/>- VLMTypes.ts<br/>- STTTypes.ts<br/>- TTSTypes.ts<br/>- LLMTypes.ts<br/>- AudioCapture<br/>- VideoCapture<br/>- ModelDownloader"] LlamaCpp["@runanywhere/web-llamacpp<br/>(Backend)"] ONNX["@runanywhere/web-onnx<br/>(Backend)"] LlamaCppTypes["VLMTypes.ts<br/>- Re-exports core types<br/>- Adds VLMModelFamily enum"] ONNXTypes["STTTypes.ts<br/>- Re-exports core types<br/>- Adds STTModelConfig"] Providers["Provider Interfaces<br/>(ProviderTypes.ts)<br/>- LLMProvider<br/>- STTProvider<br/>- TTSProvider"] Examples["Example Apps<br/>(speak.ts, vision.ts, etc.)"] Core --> |"Exports generic types"| LlamaCpp Core --> |"Exports generic types"| ONNX Core --> |"Type-safe interfaces"| Providers LlamaCpp --> |"Re-exports + backend enums"| LlamaCppTypes ONNX --> |"Re-exports + model configs"| ONNXTypes LlamaCppTypes --> |"Implements"| Providers ONNXTypes --> |"Implements"| Providers Core --> |"Imports audio/video"| Examples LlamaCppTypes --> |"Uses VLM types"| Examples ONNXTypes --> |"Uses STT/TTS types"| Examples style Core fill:#e1f5ff style LlamaCpp fill:#fff4e1 style ONNX fill:#fff4e1 style Providers fill:#e8f5e8Last reviewed commit: 3a8d798