Add memory layer with vector similarity search and RAG pipeline#337
Add memory layer with vector similarity search and RAG pipeline#337vedantagarwal-web wants to merge 2 commits intoRunanywhereAI:mainfrom
Conversation
Replace the full production RunAnywhere app with the clean kotlin-starter-example that matches the blog tutorial series. Uses io.github.sanchitmonga22 Maven Central SDK coordinates. Add kotlin-starter-app entry to Playground README.
Implements a full memory/RAG layer using hnswlib for approximate nearest neighbor search and a flat brute-force backend for small indices. The C++ core provides a vtable-based service abstraction, and the Swift SDK wraps it with a high-level remember/recall/forget API and a RAGAgent that composes memory retrieval with LLM generation. C++ core: - Flat backend (exact brute-force, <10K vectors) - HNSW backend (hnswlib v0.8.0, approximate, scalable) - Vtable dispatch, module/service registration - Persistence (.racm format) with save/load - Embedding extraction via llama.cpp - 10/10 standalone tests passing Swift SDK: - CppBridge.Memory actor (C FFI bridge) - RAGMemoryService (remember/recall/forget) - RAGAgent (full RAG: retrieve + augment + generate) - EmbeddingProvider protocol + LlamaCpp implementation - MemoryConfiguration, MemoryModule, MemoryTypes Performance: 0.10ms/query at 10K vectors (128-dim, HNSW) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR introduces a comprehensive memory/vector search subsystem across the RunAnywhere SDK, adds a new Kotlin Android starter app demonstrating LLM chat, speech-to-text, text-to-speech, and voice pipeline capabilities, and extends iOS test infrastructure with memory testing components. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| #include <algorithm> | ||
| #include <chrono> | ||
| #include <cmath> | ||
| #include <cstdlib> | ||
| #include <cstring> | ||
| #include <memory> | ||
| #include <mutex> | ||
| #include <shared_mutex> | ||
| #include <string> | ||
| #include <unordered_map> | ||
| #include <vector> |
There was a problem hiding this comment.
Missing header include
std::priority_queue is used in this file but <queue> is never included, which will fail compilation on standard toolchains.
| #include <algorithm> | |
| #include <chrono> | |
| #include <cmath> | |
| #include <cstdlib> | |
| #include <cstring> | |
| #include <memory> | |
| #include <mutex> | |
| #include <shared_mutex> | |
| #include <string> | |
| #include <unordered_map> | |
| #include <vector> | |
| #include <queue> |
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/memory/memory_backend_hnswlib.cpp
Line: 12:22
Comment:
**Missing header include**
`std::priority_queue` is used in this file but `<queue>` is never included, which will fail compilation on standard toolchains.
```suggestion
#include <queue>
```
How can I resolve this? If you propose a fix, please make it concise.| // Allocate output buffer (caller frees with rac_free) | ||
| float* result = static_cast<float*>(malloc(embeddings.size() * sizeof(float))); | ||
| if (!result) { | ||
| return RAC_ERROR_OUT_OF_MEMORY; | ||
| } | ||
|
|
||
| memcpy(result, embeddings.data(), embeddings.size() * sizeof(float)); | ||
| *out_embeddings = result; | ||
| *out_dimension = static_cast<uint32_t>(embeddings.size()); |
There was a problem hiding this comment.
Allocator mismatch for embeddings
rac_llm_llamacpp_get_embeddings allocates the output buffer with malloc, but the API contract (and Swift caller) frees it via rac_free. If rac_free is ever swapped to a different allocator (as intended by the wrapper), this becomes a real crash/corruption risk. Allocate with rac_alloc (or at minimum ensure the free function matches the allocator used).
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/backends/llamacpp/rac_llm_llamacpp.cpp
Line: 282:290
Comment:
**Allocator mismatch for embeddings**
`rac_llm_llamacpp_get_embeddings` allocates the output buffer with `malloc`, but the API contract (and Swift caller) frees it via `rac_free`. If `rac_free` is ever swapped to a different allocator (as intended by the wrapper), this becomes a real crash/corruption risk. Allocate with `rac_alloc` (or at minimum ensure the free function matches the allocator used).
How can I resolve this? If you propose a fix, please make it concise.| // Get dimension from stats | ||
| rac_memory_stats_t stats = {}; | ||
| if (service->ops->get_stats) { | ||
| service->ops->get_stats(service->impl, &stats); | ||
| } | ||
|
|
||
| return service->ops->add(service->impl, vectors, ids, metadata, count, stats.dimension); |
There was a problem hiding this comment.
Dimension can be zero
rac_memory_add (and similarly rac_memory_search) derives dimension from get_stats but ignores the return value; if get_stats fails or isn't implemented correctly, stats.dimension stays 0 and every backend will return RAC_ERROR_MEMORY_DIMENSION_MISMATCH even though the handle is valid. This makes the public API brittle (operations can start failing after an internal stats error). At minimum, check get_stats’s result and treat failure as an error instead of passing dimension=0 into the backend.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/features/memory/rac_memory_service.cpp
Line: 72:78
Comment:
**Dimension can be zero**
`rac_memory_add` (and similarly `rac_memory_search`) derives `dimension` from `get_stats` but ignores the return value; if `get_stats` fails or isn't implemented correctly, `stats.dimension` stays 0 and every backend will return `RAC_ERROR_MEMORY_DIMENSION_MISMATCH` even though the handle is valid. This makes the public API brittle (operations can start failing after an internal stats error). At minimum, check `get_stats`’s result and treat failure as an error instead of passing `dimension=0` into the backend.
How can I resolve this? If you propose a fix, please make it concise.| guard result == RAC_SUCCESS, let emb = embeddings else { | ||
| throw SDKError.memory(.processingFailed, | ||
| "Failed to extract embeddings: \(result)") | ||
| } | ||
|
|
||
| defer { rac_free(emb) } | ||
|
|
||
| return Array(UnsafeBufferPointer(start: emb, count: Int(outDim))) |
There was a problem hiding this comment.
Embedding dimension not validated
LlamaCppEmbeddingProvider stores an expected dimension, but embed() returns outDim without checking it matches _dimension. If the loaded model’s embedding size differs (or changes after reload), the Swift memory bridge will try to add/search with mismatched dimensions and fail at runtime with RAC_ERROR_MEMORY_DIMENSION_MISMATCH. Validate outDim == _dimension and surface a clear error when it doesn’t match.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-swift/Sources/RunAnywhere/Features/Memory/EmbeddingProvider.swift
Line: 60:67
Comment:
**Embedding dimension not validated**
`LlamaCppEmbeddingProvider` stores an expected `dimension`, but `embed()` returns `outDim` without checking it matches `_dimension`. If the loaded model’s embedding size differs (or changes after reload), the Swift memory bridge will try to add/search with mismatched dimensions and fail at runtime with `RAC_ERROR_MEMORY_DIMENSION_MISMATCH`. Validate `outDim == _dimension` and surface a clear error when it doesn’t match.
How can I resolve this? If you propose a fix, please make it concise.| /// Add vectors with IDs and optional metadata | ||
| public func add(vectors: [[Float]], ids: [UInt64], metadata: [String?]? = nil) throws { | ||
| let handle = try getHandle() | ||
| let count = UInt32(vectors.count) | ||
| guard count > 0 else { return } | ||
|
|
||
| // Flatten vectors into contiguous array | ||
| let flat = vectors.flatMap { $0 } | ||
|
|
There was a problem hiding this comment.
Vectors/IDs length mismatch
add(vectors:ids:metadata:) computes count from vectors.count but never verifies ids.count == vectors.count (and same for metadata?.count). If these arrays differ, the bridge will read past the end of ids/metadata buffers when calling rac_memory_add, causing memory-safety issues (crash/corruption). Add explicit count checks before flattening/calling into C.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/Extensions/CppBridge+Memory.swift
Line: 68:76
Comment:
**Vectors/IDs length mismatch**
`add(vectors:ids:metadata:)` computes `count` from `vectors.count` but never verifies `ids.count == vectors.count` (and same for `metadata?.count`). If these arrays differ, the bridge will read past the end of `ids`/`metadata` buffers when calling `rac_memory_add`, causing memory-safety issues (crash/corruption). Add explicit count checks before flattening/calling into C.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 17
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🤖 Fix all issues with AI agents
In
`@Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/services/ModelService.kt`:
- Around line 288-299: unloadAllModels currently wraps
RunAnywhere.unloadLLMModel, unloadSTTModel, and unloadTTSVoice in one try/catch
so a thrown exception stops subsequent unloads and skips refreshModelState;
change it so each unload call (RunAnywhere.unloadLLMModel(),
RunAnywhere.unloadSTTModel(), RunAnywhere.unloadTTSVoice()) is executed in its
own try/catch (or use runCatching) to log/collect individual errors without
aborting the rest, and ensure refreshModelState() is invoked after all attempts
(e.g., in a finally or after all runCatching blocks) so the UI reflects the
final state.
In
`@Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/SpeechToTextScreen.kt`:
- Around line 80-114: The background read Thread started in the recording method
isn’t tracked or joined, so stopRecording() sets isRecording=false and
immediately releases audioRecord and returns audioData while the thread may
still be in audioRecord?.read(...); fix by storing the Thread reference (e.g.,
readingThread) when starting it, then in stopRecording() after setting
isRecording = false call readingThread.join(<short timeout>) and handle
InterruptedException before stopping/releasing audioRecord and returning
audioData to ensure the reader thread has finished accessing audioRecord and
audioData.
In
`@Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/TextToSpeechScreen.kt`:
- Around line 85-111: The AudioTrack created with AudioTrack.Builder (assigned
to audioTrack) can leak if the coroutine is cancelled during the delay: ensure
the play/cleanup sequence is wrapped in a try/finally so stop() and release()
always run; specifically move audioTrack.write(...) and audioTrack.play() into a
try block and call audioTrack.stop() and audioTrack.release() in the finally
block, and apply the same change to the playWavAudio implementation in
VoicePipelineScreen.kt to guarantee cleanup on CancellationException.
In
`@Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/VoicePipelineScreen.kt`:
- Around line 326-335: When handling VoiceSessionEvent.TurnCompleted in the Flow
collector, don't call the suspending playWavAudio(audio) inline because it
blocks collection; instead launch playback in a separate coroutine (e.g.,
CoroutineScope(Dispatchers.IO).launch or viewModelScope.launch) so the collector
can continue receiving events. Set sessionState = VoiceSessionState.SPEAKING
before launching the coroutine, call playWavAudio(audio) inside that coroutine,
and after playback update sessionState = VoiceSessionState.LISTENING and
audioLevel = 0f inside the coroutine to restore UI state without blocking the
Flow collector. Ensure you reference the same symbols:
VoiceSessionEvent.TurnCompleted, playWavAudio, sessionState, and audioLevel.
In `@sdk/runanywhere-commons/include/rac/core/rac_component_types.h`:
- Line 38: Add a switch case for RAC_COMPONENT_MEMORY in component_types.cpp
inside rac_sdk_component_display_name, rac_sdk_component_raw_value, and
rac_component_to_resource_type: return the human label "Memory" from
rac_sdk_component_display_name, the raw identifier "memory" from
rac_sdk_component_raw_value, and map to the same resource-type enum/constant
used for memory in other components (do not return -1)—use the existing
resource-type symbol (e.g., RESOURCE_TYPE_MEMORY or the project’s equivalent) to
keep the pattern consistent with other cases.
In `@sdk/runanywhere-commons/include/rac/core/rac_error.h`:
- Around line 386-395: Add switch cases for the five new memory error macros
inside the rac_error_message() switch (the "OTHER ERRORS (-800 to -899)"
section): handle RAC_ERROR_MEMORY_INDEX_FULL,
RAC_ERROR_MEMORY_DIMENSION_MISMATCH, RAC_ERROR_MEMORY_INDEX_NOT_FOUND,
RAC_ERROR_MEMORY_INVALID_CONFIG, and RAC_ERROR_MEMORY_CORRUPT_INDEX by returning
descriptive string literals (e.g., "Memory index is full (max_elements
reached)", "Vector dimension does not match index dimension", "Memory index not
found at specified path", "Invalid memory index configuration", "Memory index
file is corrupt") so these codes no longer fall through to "Unknown error code".
In `@sdk/runanywhere-commons/src/backends/memory/CMakeLists.txt`:
- Around line 1-46: The CMake file unconditionally fetches hnswlib and builds
the memory backend which violates the RAC_BUILD_* gating guideline; add an
option named RAC_BUILD_MEMORY and early-return if it's OFF, then wrap the
FetchContent_Declare(hnswlib) and the creation of target rac_backend_memory (and
its
target_include_directories/target_link_libraries/target_compile_features/platform-specific
blocks) so they only run when RAC_BUILD_MEMORY is ON; ensure the option text
matches "Build the memory/vector search backend" and that all references to
hnswlib_SOURCE_DIR and target rac_backend_memory are only configured inside that
guarded block.
In `@sdk/runanywhere-commons/src/backends/memory/memory_backend_flat.cpp`:
- Around line 281-296: The save routine currently writes metadata with
fprintf("%llu\t%s\n") which breaks when metadata contains newlines; change the
format to a binary length-prefixed write: for each id in index->ids iterate the
metadata via index->metadata and write id (same type) then a uint32_t length
followed by the raw metadata bytes (use fwrite) instead of the tab/newline text
format, and update the corresponding load code that currently uses fgets (around
the fgets call referenced) to read the id, then read the uint32_t length and
then fread exactly that many bytes to reconstruct meta_it->second (unlike
line-based parsing). Ensure the length type and byte-ordering match the existing
binary header handling for vectors/IDs.
- Around line 316-346: In flat_load, several fread calls (reading version,
index_type, dimension, metric, num_vectors and the bulk reads into
index->vectors.data() and index->ids.data()) are unchecked and can leave the
index in a corrupt state; update flat_load to verify each fread returns the
expected count and on any short read fclose(f) and return
RAC_ERROR_MEMORY_CORRUPT_INDEX, ensuring you validate reads for
sizeof(uint32_t)/sizeof(uint64_t) for the header fields and for (num_vectors *
dimension) floats and num_vectors uint64_t for the bulk reads into
index->vectors.data() and index->ids.data(), and only resize/populate
index->id_to_index and index fields after successful reads.
In `@sdk/runanywhere-commons/src/backends/memory/memory_backend_hnswlib.cpp`:
- Around line 394-401: RAC_DISTANCE_COSINE is currently mapped to
hnswlib::L2Space (in rac_memory_hnsw_create and hnsw_load) which yields
incorrect results for non-normalized vectors; change the mapping so
RAC_DISTANCE_COSINE uses hnswlib::InnerProductSpace and then normalize vectors
to unit length on insertion and before searching: update rac_memory_hnsw_create
and hnsw_load to construct InnerProductSpace when config->metric ==
RAC_DISTANCE_COSINE, and modify hnsw_add and hnsw_search to perform L2
normalization of the input vectors (unit-length) before passing them to hnswlib
so cosine similarity is computed correctly via inner product.
- Around line 262-298: The header fields read after the magic (version,
index_type, dimension, metric, num_vectors, ef_search) are not validated, so
truncated files can leave those locals uninitialized and later crash when used
(e.g., creating hnswlib::L2Space(dimension) or HierarchicalNSW). Fix it by
checking the return value of each fread (or fread into a buffer and validate
total bytes read) and if any read returns fewer items than expected, fclose(f)
and return RAC_ERROR_MEMORY_CORRUPT_INDEX; ensure the checks cover version,
index_type, dimension, metric, num_vectors and ef_search before using them. Also
keep the existing index_type check (index_type != RAC_INDEX_HNSW) after these
validated reads.
In `@sdk/runanywhere-commons/src/features/memory/rac_memory_service.cpp`:
- Around line 128-169: In rac_memory_load: validate the return values of the two
fread calls that read version and index_type and return
RAC_ERROR_MEMORY_CORRUPT_INDEX on failure to avoid using uninitialized locals;
after creating the service (rac_memory_backend_create_service) check that
service->ops is non-null and that service->ops->load is set before calling it,
returning an appropriate error (and cleaning up via rac_memory_destroy(handle)
if needed) — mirror the existing dispatch-pattern used elsewhere in this file
(check service->ops and service->ops->load, then call load(service->impl,
path)).
In `@sdk/runanywhere-commons/tests/CMakeLists.txt`:
- Around line 1-4: Wrap the test target creation for test_memory in an
RAC_BACKEND_MEMORY guard so it is only added when the memory backend is enabled:
conditionally add the add_executable(test_memory ...),
target_link_libraries(test_memory PRIVATE rac_commons rac_backend_memory) and
target_include_directories(test_memory ...) inside an if(RAC_BACKEND_MEMORY) ...
endif() block to avoid linking against rac_backend_memory when the backend is
disabled.
In `@sdk/runanywhere-swift/Sources/RunAnywhere/Features/Memory/MemoryTypes.swift`:
- Around line 30-45: MemorySearchResult and MemoryRecallResult incorrectly
declare Sendable while using metadata: [String: Any]?; change the metadata type
to a Sendable-safe form (e.g., metadata: [String: any Sendable]?) and update the
corresponding init parameters and stored property types in both structs
(MemorySearchResult and MemoryRecallResult) so the structs can legitimately
conform to Sendable under Swift 6.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Features/Memory/RAGMemoryService.swift`:
- Line 33: The nextId counter resets because nextId (declared in
RAGMemoryService) isn't persisted or recomputed on load, causing ID collisions
when remember() generates new entries after load() or restart; fix by persisting
nextId with the index (e.g., metadata sidecar or dedicated field) during save()
and restoring it in load(), or alternatively implement logic in load() to scan
existing memory IDs (or query backend stats) to compute nextId = maxExistingId +
1 before returning, ensuring remember() uses the updated nextId.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/Extensions/CppBridge`+Memory.swift:
- Around line 78-93: Remove the dead preallocation for metadataPointers and
metadataBuffers in CppBridge+Memory.swift: the variables metadataPointers and
metadataBuffers (and the block that builds ptrs/Data from metadata) are never
used because the code later uses strdup-based allocation; delete that entire
if-let metadata block so only the strdup approach remains (search for
metadataPointers, metadataBuffers, and the for meta in metadata loop to locate
the code to remove).
- Around line 69-73: In add(vectors: [[Float]], ids: [UInt64], metadata:
[String?]? = nil) validate that ids.count == vectors.count and if metadata !=
nil then metadata!.count == vectors.count (and not empty for non-zero vectors)
before calling rac_memory_add; if counts mismatch throw an error/return early.
Update the metadata handling so you do not force-unwrap metaBuf.baseAddress!
when metadata is non-nil — only create and pass a metadata buffer when
metadata.count == vectors.count, otherwise pass nil to the C call; reference the
add(...) method, the rac_memory_add invocation, and the metaBuf.baseAddress
usage when making these changes.
🟡 Minor comments (16)
Playground/kotlin-starter-app/.gitignore-82-84 (1)
82-84:⚠️ Potential issue | 🟡 Minor
**/models/is overly broad and may silently ignore source files.Line 83 already covers the intended
app/src/main/assets/models/path for downloaded AI models. The**/models/glob on line 84 would also match any Kotlin package directory namedmodels/(e.g.,com/runanywhere/.../models/), causing source files to be silently untracked.Consider removing the broad pattern and keeping only the specific path:
Proposed fix
# RunAnywhere downloaded models app/src/main/assets/models/ -**/models/Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/components/FeatureCard.kt-55-56 (1)
55-56:⚠️ Potential issue | 🟡 MinorEmpty
gradientColorslist will crash at runtime.
Brush.linearGradient(gradientColors)throwsIllegalArgumentExceptionif fewer than 2 colors are provided. Consider adding arequire(gradientColors.size >= 2)precondition or documenting the constraint.Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/TextToSpeechScreen.kt-48-49 (1)
48-49:⚠️ Potential issue | 🟡 Minor
audioFormatis parsed but never validated.Line 49 reads the audio format code (1 = PCM) but the value is never checked. Non-PCM WAV files (e.g., compressed formats) would be silently misinterpreted as raw PCM, producing garbled audio. A guard like
if (audioFormat != 1.toShort().toInt()) return@withContextwould make this defensive.Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/SpeechToTextScreen.kt-94-98 (1)
94-98:⚠️ Potential issue | 🟡 MinorSwallowed
SecurityExceptionloses diagnostic context.When
AudioRecordcreation throws aSecurityException, the function returnsfalsebut the exception message is silently discarded. This makes it difficult to diagnose permission-related failures on specific devices.🔧 Proposed fix: log before returning
} catch (e: SecurityException) { + android.util.Log.w("AudioRecorder", "Permission denied for audio recording", e) audioRecord?.release() audioRecord = null return false }Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/HomeScreen.kt-64-109 (1)
64-109:⚠️ Potential issue | 🟡 MinorNested scrollable:
LazyVerticalGridinside averticalScrollColumnis fragile.
LazyVerticalGridis a scrollable container placed inside averticalScrollColumn(Line 48). This forces a fixed.height(700.dp)(Line 68) which defeats lazy loading and breaks on varying screen sizes (clips on small, wastes space on large).Since you only have 4 fixed items in a 2-column grid, replace
LazyVerticalGridwith a simpleColumn+Rowlayout to avoid nested scrollable issues entirely.💡 Suggested approach: replace with static grid layout
- LazyVerticalGrid( - columns = GridCells.Fixed(2), - horizontalArrangement = Arrangement.spacedBy(16.dp), - verticalArrangement = Arrangement.spacedBy(16.dp), - modifier = Modifier.height(700.dp) - ) { - item { - FeatureCard( - title = "Chat", - subtitle = "LLM Text Generation", - icon = Icons.Rounded.Chat, - gradientColors = listOf(AccentCyan, Color(0xFF0EA5E9)), - onClick = onNavigateToChat - ) - } - - item { - FeatureCard( - title = "Speech", - subtitle = "Speech to Text", - icon = Icons.Rounded.Mic, - gradientColors = listOf(AccentViolet, Color(0xFF7C3AED)), - onClick = onNavigateToSTT - ) - } - - item { - FeatureCard( - title = "Voice", - subtitle = "Text to Speech", - icon = Icons.Rounded.VolumeUp, - gradientColors = listOf(AccentPink, Color(0xFFDB2777)), - onClick = onNavigateToTTS - ) - } - - item { - FeatureCard( - title = "Pipeline", - subtitle = "Voice Agent", - icon = Icons.Rounded.AutoAwesome, - gradientColors = listOf(AccentGreen, Color(0xFF059669)), - onClick = onNavigateToVoicePipeline - ) - } - } + Row(horizontalArrangement = Arrangement.spacedBy(16.dp)) { + FeatureCard( + title = "Chat", + subtitle = "LLM Text Generation", + icon = Icons.Rounded.Chat, + gradientColors = listOf(AccentCyan, Color(0xFF0EA5E9)), + onClick = onNavigateToChat, + modifier = Modifier.weight(1f) + ) + FeatureCard( + title = "Speech", + subtitle = "Speech to Text", + icon = Icons.Rounded.Mic, + gradientColors = listOf(AccentViolet, Color(0xFF7C3AED)), + onClick = onNavigateToSTT, + modifier = Modifier.weight(1f) + ) + } + Spacer(modifier = Modifier.height(16.dp)) + Row(horizontalArrangement = Arrangement.spacedBy(16.dp)) { + FeatureCard( + title = "Voice", + subtitle = "Text to Speech", + icon = Icons.Rounded.VolumeUp, + gradientColors = listOf(AccentPink, Color(0xFFDB2777)), + onClick = onNavigateToTTS, + modifier = Modifier.weight(1f) + ) + FeatureCard( + title = "Pipeline", + subtitle = "Voice Agent", + icon = Icons.Rounded.AutoAwesome, + gradientColors = listOf(AccentGreen, Color(0xFF059669)), + onClick = onNavigateToVoicePipeline, + modifier = Modifier.weight(1f) + ) + }Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/services/ModelService.kt-277-283 (1)
277-283:⚠️ Potential issue | 🟡 Minor
downloadAndLoadAllModelssilently clears errors from earlier model failures.Each
downloadAndLoad*Suspend()starts witherrorMessage = null(e.g., line 147, 193, 239). When called sequentially here, a failure in the LLM step setserrorMessage, but the subsequent STT step immediately clears it. The user loses visibility into the first failure.Consider collecting errors rather than overwriting, or short-circuiting on failure:
Option A: stop on first failure
fun downloadAndLoadAllModels() { viewModelScope.launch { - if (!isLLMLoaded) downloadAndLoadLLMSuspend() - if (!isSTTLoaded) downloadAndLoadSTTSuspend() - if (!isTTSLoaded) downloadAndLoadTTSSuspend() + if (!isLLMLoaded) downloadAndLoadLLMSuspend() + if (errorMessage != null) return@launch + if (!isSTTLoaded) downloadAndLoadSTTSuspend() + if (errorMessage != null) return@launch + if (!isTTSLoaded) downloadAndLoadTTSSuspend() } }sdk/runanywhere-commons/include/rac/core/rac_error.h-383-398 (1)
383-398:⚠️ Potential issue | 🟡 MinorError codes look well-placed; update the top-level range summary to match.
The new
-850 to -869memory error sub-range is correctly carved out of the Other Errors block, but the range table at line 47 still readsOther errors: -800 to -899. Add a Memory entry to keep the summary consistent with the actual layout.Suggested range table update (around line 47)
// - Event errors: -700 to -799 -// - Other errors: -800 to -899 +// - Other errors: -800 to -849, -870 to -899 +// - Memory/Vector Search: -850 to -869 // - Reserved: -900 to -999sdk/runanywhere-swift/Sources/RunAnywhere/Features/Memory/TextChunker.swift-39-56 (1)
39-56:⚠️ Potential issue | 🟡 MinorChunks can silently exceed
maxCharactersin two scenarios.
- A single sentence longer than
maxCharactersis added verbatim at line 41 with no character-level fallback (contrary to the doc at line 19).- After overlap insertion (line 52), the new
currentChunkisoverlapTail + " " + sentence, which is never re-checked againstmaxCharacters, so the next append at line 46 emits an oversized chunk.Both paths can produce chunks significantly larger than the requested limit, which may cause downstream embedding models to silently truncate or error.
Proposed fix sketch: re-check after overlap and add character-level fallback
} else { // Current chunk is full chunks.append(currentChunk) // Start new chunk with overlap from end of previous if overlap > 0 && currentChunk.count > overlap { let overlapStart = currentChunk.index(currentChunk.endIndex, offsetBy: -overlap) - currentChunk = String(currentChunk[overlapStart...]) + " " + sentence + let overlapText = String(currentChunk[overlapStart...]) + if overlapText.count + 1 + sentence.count <= maxCharacters { + currentChunk = overlapText + " " + sentence + } else { + currentChunk = sentence + } } else { currentChunk = sentence } + + // If a single sentence exceeds maxCharacters, split by characters + while currentChunk.count > maxCharacters { + let splitIdx = currentChunk.index(currentChunk.startIndex, + offsetBy: maxCharacters) + chunks.append(String(currentChunk[..<splitIdx])) + currentChunk = String(currentChunk[splitIdx...]) + } }sdk/runanywhere-commons/include/rac/features/memory/rac_memory.h-11-12 (1)
11-12:⚠️ Potential issue | 🟡 MinorUse relative includes to match the Swift bridge implementation and avoid unnecessary path verbosity.
This header is in
sdk/runanywhere-commons/include/rac/features/memory/alongsiderac_memory_service.handrac_memory_types.h. Use relative includes instead of full paths:`#include` "rac_memory_service.h" `#include` "rac_memory_types.h"This matches the pattern used in the Swift bridge version (
sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_memory.h) and follows the standard practice for umbrella headers in the same directory.sdk/runanywhere-swift/Sources/RunAnywhere/Features/Memory/EmbeddingProvider.swift-48-68 (1)
48-68:⚠️ Potential issue | 🟡 MinorMissing validation: returned embedding dimension vs. expected
_dimension.
outDimfrom the C API may not match the configured_dimension. If the model produces embeddings of a different size, this silently returns a mismatched vector that will fail downstream atrac_memory_addwith a crypticDIMENSION_MISMATCHerror.Adding an early check here gives a much clearer error to the caller.
Proposed fix
defer { rac_free(emb) } + guard outDim == _dimension else { + throw SDKError.memory(.processingFailed, + "Embedding dimension mismatch: expected \(_dimension), got \(outDim)") + } + return Array(UnsafeBufferPointer(start: emb, count: Int(outDim)))sdk/runanywhere-commons/tests/test_memory.cpp-494-509 (1)
494-509:⚠️ Potential issue | 🟡 MinorBenchmark silently ignores search failures.
Line 498 calls
rac_memory_searchwithout checking the return value. If the search silently fails (e.g., returns an error), the benchmark will still report 0 ms/query andPASS.Proposed fix
for (int q = 0; q < 100; q++) { rac_memory_search_results_t results = {}; - rac_memory_search(handle, &vecs[q * D], 10, &results); + rac_result_t r = rac_memory_search(handle, &vecs[q * D], 10, &results); + if (r != RAC_SUCCESS) { + rac_memory_destroy(handle); + FAIL("search failed during benchmark"); + } rac_memory_search_results_free(&results); }sdk/runanywhere-commons/src/backends/memory/memory_backend_hnswlib.cpp-126-136 (1)
126-136:⚠️ Potential issue | 🟡 Minor
setEfundershared_lockis a data race.
index->hnsw->setEf(...)on line 128 mutates the internalef_member of the hnswlib index, but the lock held is ashared_lock(read lock). Concurrenthnsw_searchcalls will race on this write. Although the value is always the same here, this is UB per the C++ memory model.Move
setEfto index creation andhnsw_loadinstead (where it's set once under exclusive lock), and remove it from the search path.sdk/runanywhere-commons/src/backends/memory/memory_backend_hnswlib.cpp-283-298 (1)
283-298:⚠️ Potential issue | 🟡 MinorFixed 65 KB stack buffer for metadata lines may silently truncate large metadata entries.
char line[65536]on line 285 means any metadata JSON longer than ~65 KB is truncated byfgets. This is unlikely for typical use but creates a silent data corruption path. Also, 64 KB on the stack is large for some embedded targets.sdk/runanywhere-swift/Sources/RunAnywhere/Features/Memory/RAGAgent.swift-110-125 (1)
110-125:⚠️ Potential issue | 🟡 MinorPotential resource leak:
rac_llm_result_freenot called on the error path.If
rac_llm_generatereturnsRAC_SUCCESSbutresult.textisnil(or if the C layer partially allocates on failure),rac_llm_result_freeis never called. Calling it unconditionally after extracting the text is safer, since the free function should handle zero/nil members gracefully.Proposed fix
guard genResult == RAC_SUCCESS, let text = result.text else { + rac_llm_result_free(&result) throw SDKError.llm(.generationFailed, "LLM generation failed: \(genResult)") } let answer = String(cString: text) rac_llm_result_free(&result)sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_memory_types.h-92-101 (1)
92-101:⚠️ Potential issue | 🟡 MinorDocument
scoresemantics forRAC_DISTANCE_INNER_PRODUCT.Line 96 only documents L2/cosine behavior ("lower is closer"). For inner product, higher scores indicate greater similarity, and the polarity is reversed. Callers interpreting scores without knowing which metric is in use could sort or threshold incorrectly.
📝 Proposed documentation fix
/** Vector ID */ uint64_t id; - /** Distance/similarity score (lower is closer for L2/cosine) */ + /** Distance/similarity score. + * For L2/cosine: lower is more similar. + * For inner product: higher is more similar. */ float score;sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_memory_types.h-106-118 (1)
106-118:⚠️ Potential issue | 🟡 MinorClarify sort order for inner product results.
Line 107 documents ascending sort for L2/cosine but doesn't specify the order for
RAC_DISTANCE_INNER_PRODUCT, where "best" results have the highest score. Callers need to know whether results are still ascending (least similar first) or descending (most similar first) when using inner product.📝 Proposed documentation fix
- /** Array of results sorted by score (ascending for L2/cosine) */ + /** Array of results sorted by relevance (best match first). + * For L2/cosine: ascending score (lowest distance first). + * For inner product: descending score (highest similarity first). */ rac_memory_result_t* results;
🧹 Nitpick comments (33)
Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/components/FeatureCard.kt (1)
30-34: Prefer theCard(onClick = ...)overload for built-in click semantics.Using the non-clickable
Card+Modifier.clickableworks, but theCard(onClick = onClick)overload provides proper semantic role, state handling, and a consistent interaction indication out of the box.♻️ Suggested refactor
Card( - modifier = modifier - .fillMaxWidth() - .aspectRatio(0.85f) - .clickable(onClick = onClick), + onClick = onClick, + modifier = modifier + .fillMaxWidth() + .aspectRatio(0.85f), shape = RoundedCornerShape(20.dp),Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/components/ModelLoaderWidget.kt (1)
76-102: Both progress indicators can be visible simultaneously ifisDownloadingandisLoadingare both true.If the caller ever sets both flags to
true, two stacked progress bars will appear. Consider whether these states should be mutually exclusive or if the component should enforce precedence (e.g., show only the downloading indicator when both are true).Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/TextToSpeechScreen.kt (1)
41-112: ExtractplayWavAudioto a shared utility — implementations have minor differences to reconcile.This function exists in both
TextToSpeechScreen.ktandVoicePipelineScreen.kt. While the core AudioTrack playback logic is identical, the implementations differ in WAV header parsing: TextToSpeechScreen usesbuffer.position(20)without RIFF validation, whereas VoicePipelineScreen usesbuffer.position(22)with RIFF magic byte validation and a raw PCM fallback. Consider extracting to a shared utility (e.g.,AudioPlaybackUtils.kt) while deciding which parsing approach better suits your needs.Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/VoicePipelineScreen.kt (2)
145-221:playWavAudio()is duplicated fromTextToSpeechScreen.kt.This function is nearly identical to the one in
TextToSpeechScreen.kt(Lines 40-111 in that file). Extract it into a shared utility (e.g., autils/AudioPlayer.kt) to avoid maintaining the same WAV-parsing andAudioTrackplayback logic in two places.
346-353: SwallowingCancellationExceptionbreaks cooperative cancellation.While the comment says "Expected when stopping," swallowing
CancellationExceptionin coroutine code prevents proper cancellation propagation. The idiomatic Kotlin coroutine pattern is to re-throw it.♻️ Proposed fix
} catch (e: CancellationException) { - // Expected when stopping + throw e // Maintain cooperative cancellation } catch (e: Exception) {Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/ChatScreen.kt (1)
195-197: Redundant fully qualified call —chatis already imported.Line 59 imports
com.runanywhere.sdk.public.extensions.chat. The FQN on Line 196 can be simplified.♻️ Suggested fix
- val response = com.runanywhere.sdk.public.RunAnywhere.chat(userMessage) + val response = RunAnywhere.chat(userMessage)Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/services/ModelService.kt (1)
139-180: Consider extracting a generic download-and-load helper to reduce duplication.The LLM, STT, and TTS flows (lines 139–180, 185–226, 231–272) share identical structure. In a playground app this is understandable for readability, but the boilerplate still invites copy-paste divergence over time. A small generic helper parameterized by model ID, state setters, and the load function would keep it maintainable.
sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp (2)
574-587: Creating a temporaryllama_contextper embedding call is expensive — consider caching it.Each call to
get_embeddingsallocates and tears down a fullllama_contextwith embeddings enabled (Lines 575-587, 650-651). For RAG workflows whereremember()is called in a loop (e.g., ingesting chunked documents), this repeated allocation/deallocation is a significant bottleneck — context creation involves memory allocation and potentially GPU resource setup.Consider lazily creating a persistent embedding context (
emb_ctx_) alongside the generation context, reusing it across calls, and tearing it down inunload_model_internal().Sketch of caching approach
// In the class header (llamacpp_backend.h), add: + llama_context* emb_ctx_ = nullptr; + void ensure_embedding_context(); // In get_embeddings(), replace temp context creation with: - llama_context* emb_ctx = llama_init_from_model(model_, ctx_params); - if (!emb_ctx) { ... } + ensure_embedding_context(); + if (!emb_ctx_) { + LOGE("Failed to create embedding context"); + return {}; + } // Remove per-call llama_free(emb_ctx) at cleanup; // Instead, free emb_ctx_ in unload_model_internal(): + if (emb_ctx_) { + llama_free(emb_ctx_); + emb_ctx_ = nullptr; + }
604-618: Batch is allocated for full context size — could be sized to actual token count.Line 605 allocates
llama_batch_init(n_ctx, 0, 1)using the full context window. For embedding extraction, where inputs are typically short, sizing the batch totokens.size()would reduce memory waste.Minimal batch allocation
- llama_batch batch = llama_batch_init(n_ctx, 0, 1); + llama_batch batch = llama_batch_init((int)tokens.size(), 0, 1);sdk/runanywhere-commons/src/backends/memory/memory_backend_hnswlib.h (1)
1-7: Consider adding lifecycle/ownership documentation.The Doxygen file header describes what the backend does but doesn't mention that the caller owns the
out_handlefromrac_memory_hnsw_createand must destroy it via the vtable's destroy op. Given the coding guideline about documenting lifecycle requirements, a brief note on therac_memory_hnsw_createdoc would be helpful. As per coding guidelines, public C API headers must document vtable operations, error codes, and lifecycle requirements.sdk/runanywhere-swift/Sources/RunAnywhere/Features/Memory/TextChunker.swift (2)
26-30: No validation onmaxCharactersandoverlaprelationship.If
overlap >= maxCharactersormaxCharacters <= 0, the chunking loop can produce degenerate output (infinite growth or empty chunks). A precondition or early clamp would make this safer.Proposed guard
public static func chunk( _ text: String, maxCharacters: Int, overlap: Int = 50 ) -> [String] { + precondition(maxCharacters > 0, "maxCharacters must be positive") + let overlap = min(overlap, maxCharacters / 2) guard !text.isEmpty else { return [] } guard text.count > maxCharacters else { return [text] }
68-90: Naive sentence splitting will mis-split on abbreviations and decimals.Splitting on every
.will break on"Dr. Smith","3.14","U.S.A.", URLs, etc. For a RAG chunker this can degrade retrieval quality. Consider usingNLTokenizerwith.sentenceunit from NaturalLanguage framework, which handles these cases natively on Apple platforms.sdk/runanywhere-commons/include/rac/backends/rac_llm_llamacpp.h (1)
184-200: Good API shape and lifecycle documentation.The ownership contract (
caller must free with rac_free) is clear. Per project guidelines, consider enumerating the specific error codes this function may return (NULL_POINTER, INVALID_HANDLE, INFERENCE_FAILED, OUT_OF_MEMORY based on the implementation) to complete the public API documentation.As per coding guidelines: "Public C API headers in include/rac/ must document vtable operations, error codes, and lifecycle requirements."
sdk/runanywhere-commons/src/backends/memory/CMakeLists.txt (1)
35-38: Accelerate framework linked unconditionally despite "if available" comment.On macOS/iOS this will typically succeed, but for cross-compilation or minimal toolchains it may fail. Consider using
find_libraryto verify availability first.Suggested conditional check
if(CMAKE_SYSTEM_NAME STREQUAL "iOS" OR CMAKE_SYSTEM_NAME STREQUAL "Darwin") - # Use Accelerate framework for optimized BLAS operations if available - target_link_libraries(rac_backend_memory PUBLIC "-framework Accelerate") + include(CheckLibraryExists) + find_library(ACCELERATE_FRAMEWORK Accelerate) + if(ACCELERATE_FRAMEWORK) + target_link_libraries(rac_backend_memory PUBLIC "-framework Accelerate") + message(STATUS " Accelerate framework found") + endif() endif()sdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/Extensions/CppBridge+Memory.swift (1)
143-154: Metadata assumed to be JSON, fails silently for plain strings.If the stored metadata is a plain string rather than JSON,
JSONSerializationfails andmetadatabecomesnil, silently dropping the value. If the contract is JSON-only, document it; otherwise consider falling back to a{"text": rawString}wrapper.sdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/CppBridge.swift (2)
61-61: NSLock usage violates Swift 6 concurrency guidelines.This
NSLockis pre-existing, but new code in this PR (lines 119–127) continues to rely on it. Consider migrating to a Swift 6 concurrency primitive (e.g., anos_unfair_lock-based wrapper, or restructuringCppBridgeas an actor) in a follow-up. As per coding guidelines,**/*.swift: "Do not use NSLock as it is outdated. Use Swift 6 concurrency primitives instead."
191-191: Consider logging if memory backend unregistration fails unexpectedly.
rac_backend_memory_unregister()can returnRAC_ERROR_MODULE_NOT_FOUNDif something went wrong during init. The return value is silently discarded. This is fine for normal shutdown, but a debug-level log on unexpected failure would be consistent with the registration path (line 122).🔧 Optional: log unexpected unregister failures
- rac_backend_memory_unregister() + let unregResult = rac_backend_memory_unregister() + if unregResult != RAC_SUCCESS && unregResult != RAC_ERROR_MODULE_NOT_FOUND { + SDKLogger(category: "CppBridge").warning("Memory backend unregistration failed: \(unregResult)") + }sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.h (1)
123-124: Temporary embedding context per call may be a performance bottleneck for batch ingestion.The implementation (in
llamacpp_backend.cpp) creates and destroys allama_contextwithembeddings = trueon everyget_embeddings()invocation. For a RAGremember()flow ingesting many chunks, this context allocation/deallocation overhead will dominate. Consider lazily creating and caching a dedicated embedding context (similar to the existingcontext_member for generation) and reusing it across calls.💡 Sketch: cache the embedding context
// In llamacpp_backend.h, add a member: + llama_context* emb_context_ = nullptr; // In get_embeddings(), reuse instead of recreating: - llama_context* emb_ctx = llama_init_from_model(model_, ctx_params); + if (!emb_context_) { + emb_context_ = llama_init_from_model(model_, ctx_params); + } + llama_context* emb_ctx = emb_context_; // Clean up in unload_model_internal() and destructor: + if (emb_context_) { llama_free(emb_context_); emb_context_ = nullptr; }examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Memory/MemoryTestView.swift (1)
11-12: Consider migrating to@Observablemacro.The
@StateObject+ObservableObjectpattern works but the newer@Observablemacro (Observation framework, iOS 17+) with@Stateis the idiomatic Swift approach, giving better performance by tracking property-level access. As per coding guidelines,**/*.swift: Use the latest Swift 6 APIs always.sdk/runanywhere-commons/include/rac/backends/rac_backend_memory.h (1)
19-36: Document specific error codes per coding guidelines.The
@returnannotations say "RAC_SUCCESS or error code" without listing which error codes can be returned. From the implementation inrac_backend_memory_register.cpp,registermay returnRAC_ERROR_MODULE_ALREADY_REGISTERED(if called twice) or propagated registry errors, andunregisterreturnsRAC_ERROR_MODULE_NOT_FOUND(if not registered). Listing these helps consumers handle failures correctly. Based on learnings: "Public C API headers in include/rac/ must document vtable operations, error codes, and lifecycle requirements."📝 Proposed documentation improvement
/** * Registers the Memory backend with the commons module and service registries. * * Should be called once during SDK initialization. * This registers: * - Module: "memory" with VECTOR_SEARCH capability * - Service provider: Memory vector search provider * - * `@return` RAC_SUCCESS or error code + * `@return` RAC_SUCCESS on success, or one of: + * - RAC_ERROR_MODULE_ALREADY_REGISTERED if called more than once + * - Other registry error codes on internal failure */ RAC_API rac_result_t rac_backend_memory_register(void); /** * Unregisters the Memory backend. * - * `@return` RAC_SUCCESS or error code + * `@return` RAC_SUCCESS on success, or one of: + * - RAC_ERROR_MODULE_NOT_FOUND if not currently registered */ RAC_API rac_result_t rac_backend_memory_unregister(void);sdk/runanywhere-commons/CMakeLists.txt (1)
279-283: Memory backend bypasses theRAC_BUILD_BACKENDSgate — intentional?The other individual backend options (
RAC_BACKEND_LLAMACPP,RAC_BACKEND_ONNX,RAC_BACKEND_WHISPERCPP) are all gated behindRAC_BUILD_BACKENDS(line 285).RAC_BACKEND_MEMORYis independent and ON by default. This meanscmake -DRAC_BUILD_BACKENDS=OFFdisables ML backends but still builds the memory backend, which may surprise users. If this is intentional (memory has no heavy external deps), the comment on line 279 is helpful but consider also documenting this in the options section.📝 Suggested documentation enhancement in options block
option(RAC_BUILD_BACKENDS "Build ML backends (LlamaCPP, ONNX, WhisperCPP)" OFF) -option(RAC_BACKEND_MEMORY "Build Memory/Vector Search backend (hnswlib)" ON) +option(RAC_BACKEND_MEMORY "Build Memory/Vector Search backend (hnswlib) - independent of RAC_BUILD_BACKENDS" ON)sdk/runanywhere-commons/src/backends/memory/rac_memory_backend.cpp (2)
48-57: Defensive cleanup: destroybackend_handleifmallocfails but also if create partially succeeds.Two concerns with resource management:
- Lines 48-50: If a future backend implementation sets
*out_handlebefore returning an error, the handle leaks. Consider destroying the handle when it's non-null regardless ofresult.- Line 54:
mallocleaves memory uninitialized. Ifrac_memory_service_tgains additional fields, they'll contain garbage. Prefercallocfor zero-initialization.🛡️ Proposed defensive improvements
if (result != RAC_SUCCESS || !backend_handle) { + if (backend_handle && ops) { + ops->destroy(backend_handle); + } RAC_LOG_ERROR(LOG_CAT, "Failed to create backend: %d", result); return nullptr; } // Allocate service struct - auto* service = static_cast<rac_memory_service_t*>(malloc(sizeof(rac_memory_service_t))); + auto* service = static_cast<rac_memory_service_t*>(calloc(1, sizeof(rac_memory_service_t)));
25-28: No error code propagation to callers.
rac_memory_backend_create_servicereturnsrac_handle_t(an opaque pointer), so callers only seenullptron failure with no way to distinguish between a null config, unknown index type, backend creation failure, or OOM. The specificrac_result_tfrom the backend is logged but lost. If this function is part of the public C API surface, consider an out-parameter for the error code or a different return convention.sdk/runanywhere-commons/tests/test_memory.cpp (1)
45-60: Resource leak on earlyFAIL/EXPECT_OKexits in tests.The
FAILmacro issues a barereturn, andEXPECT_OKdelegates toFAIL. When these trigger mid-test (e.g., Line 102 afterrac_memory_createsucceeds but before destroy), therac_handle_tis leaked. This is tolerable for a short-lived test process, but it can mask bugs in the destroy path and makes valgrind/ASAN runs noisy.A common pattern is a
goto cleanuplabel or an RAII guard:struct HandleGuard { rac_handle_t h = nullptr; ~HandleGuard() { if (h) rac_memory_destroy(h); } };Not blocking, just a suggestion to improve test robustness.
examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Memory/MemoryTestViewModel.swift (2)
150-182: Loose assertion intestSaveLoad— consider asserting exact vector count.Line 167 checks
stats.numVectors > 0, but at this point in the test sequence (aftertestRemovedeleted id=5), the expected count is exactly 4. A tighter assertion would catch subtle persistence bugs (e.g., duplicates on load, dropped entries).- guard stats.numVectors > 0 else { - throw TestError("No vectors after load") + guard stats.numVectors == 4 else { + throw TestError("Expected 4 vectors after load, got \(stats.numVectors)") }
35-59: Tests depend on sequential execution order — state leaks between tests.The test functions (e.g.,
testAddVectors,testSearchExact) rely on index state created/mutated by prior tests. If any middle test fails, subsequent tests will also fail with confusing errors (e.g., "Memory index not created"). This is acceptable for a visual test runner, but consider adding a note in the status message when a test fails mid-chain, or short-circuiting remaining tests.sdk/runanywhere-commons/src/backends/memory/memory_backend_flat.cpp (1)
138-209:search_time_usincludes lock acquisition time.The timer at Line 148 starts before the shared_lock on Line 150. Under write contention this inflates the reported search latency. Consider moving
startafter lock acquisition if the metric is meant to reflect pure computation time.sdk/runanywhere-commons/src/backends/memory/rac_backend_memory_register.cpp (1)
122-136: Unchecked return values fromrac_service_unregister_provider/rac_module_unregister.Lines 130–131 discard the results of the unregister calls. If either fails (e.g., the provider was already removed externally),
state.registeredis set tofalseanyway, which could leave the registry in an inconsistent state — one component unregistered and the other still present.Not blocking, but consider logging warnings if either returns an unexpected error.
sdk/runanywhere-swift/Sources/RunAnywhere/Features/Memory/RAGAgent.swift (1)
106-108: LLM generation options are hardcoded.
max_tokensandtemperatureare not configurable per query. Consider accepting an optional configuration parameter for callers who need different generation behavior.sdk/runanywhere-commons/src/backends/memory/memory_backend_hnswlib.cpp (2)
174-193: Remove silently swallows all exceptions — consider logging.Line 186 catches all exceptions with
catch (...)when marking an element for deletion. An element-not-found scenario is expected, but other exceptions (corruption, etc.) are silently lost. A debug-level log would aid troubleshooting.
195-246: Save mixes binary header with text metadata — verify metadata cannot contain raw newlines/tabs.The format writes a binary header, then metadata as
id\tmetadata\nlines. This relies on the metadata JSON strings never containing literal tab or newline bytes. Standard JSON serializers escape these, but there's no validation here. A corrupted or hand-crafted metadata string could break the load parser.sdk/runanywhere-swift/Sources/RunAnywhere/CRACommons/include/rac_memory_service.h (1)
63-64:index_idownership is undocumented.
const char* index_idinrac_memory_service_tis heap-allocated and freed viafree()inrac_memory_destroy(seerac_memory_service.cppline 194), but the header doesn't document this. Consider adding a brief ownership comment. Based on learnings, "Public C API headers in include/rac/ must document vtable operations, error codes, and lifecycle requirements."sdk/runanywhere-commons/src/features/memory/rac_memory_service.cpp (1)
59-100: Dimension retrieved viaget_statson everyaddandsearch— unnecessary overhead.
dimensionis immutable after index creation. Callingget_statson everyadd/searchacquires a shared_lock in the backend just to read the dimension. Ifget_statsis unset or fails,stats.dimensionremains 0, silently passing a zero dimension to the backend.Consider storing
dimensiondirectly inrac_memory_service_tat creation time.
| fun unloadAllModels() { | ||
| viewModelScope.launch { | ||
| try { | ||
| RunAnywhere.unloadLLMModel() | ||
| RunAnywhere.unloadSTTModel() | ||
| RunAnywhere.unloadTTSVoice() | ||
| refreshModelState() | ||
| } catch (e: Exception) { | ||
| errorMessage = "Failed to unload models: ${e.message}" | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
unloadAllModels: if the first unload throws, remaining models stay loaded.
The three unload calls are wrapped in a single try/catch. If unloadLLMModel() throws, unloadSTTModel() and unloadTTSVoice() are never invoked, leaving those models resident in memory. refreshModelState() is also skipped, so the UI will not reflect the actual state. For an app managing large on-device models this can silently waste significant memory.
Proposed fix: unload each model independently
fun unloadAllModels() {
viewModelScope.launch {
- try {
- RunAnywhere.unloadLLMModel()
- RunAnywhere.unloadSTTModel()
- RunAnywhere.unloadTTSVoice()
- refreshModelState()
- } catch (e: Exception) {
- errorMessage = "Failed to unload models: ${e.message}"
- }
+ val errors = mutableListOf<String>()
+ try { RunAnywhere.unloadLLMModel() } catch (e: Exception) { errors += "LLM: ${e.message}" }
+ try { RunAnywhere.unloadSTTModel() } catch (e: Exception) { errors += "STT: ${e.message}" }
+ try { RunAnywhere.unloadTTSVoice() } catch (e: Exception) { errors += "TTS: ${e.message}" }
+ refreshModelState()
+ if (errors.isNotEmpty()) {
+ errorMessage = "Failed to unload: ${errors.joinToString("; ")}"
+ }
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fun unloadAllModels() { | |
| viewModelScope.launch { | |
| try { | |
| RunAnywhere.unloadLLMModel() | |
| RunAnywhere.unloadSTTModel() | |
| RunAnywhere.unloadTTSVoice() | |
| refreshModelState() | |
| } catch (e: Exception) { | |
| errorMessage = "Failed to unload models: ${e.message}" | |
| } | |
| } | |
| } | |
| fun unloadAllModels() { | |
| viewModelScope.launch { | |
| val errors = mutableListOf<String>() | |
| try { RunAnywhere.unloadLLMModel() } catch (e: Exception) { errors += "LLM: ${e.message}" } | |
| try { RunAnywhere.unloadSTTModel() } catch (e: Exception) { errors += "STT: ${e.message}" } | |
| try { RunAnywhere.unloadTTSVoice() } catch (e: Exception) { errors += "TTS: ${e.message}" } | |
| refreshModelState() | |
| if (errors.isNotEmpty()) { | |
| errorMessage = "Failed to unload: ${errors.joinToString("; ")}" | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In
`@Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/services/ModelService.kt`
around lines 288 - 299, unloadAllModels currently wraps
RunAnywhere.unloadLLMModel, unloadSTTModel, and unloadTTSVoice in one try/catch
so a thrown exception stops subsequent unloads and skips refreshModelState;
change it so each unload call (RunAnywhere.unloadLLMModel(),
RunAnywhere.unloadSTTModel(), RunAnywhere.unloadTTSVoice()) is executed in its
own try/catch (or use runCatching) to log/collect individual errors without
aborting the rest, and ensure refreshModelState() is invoked after all attempts
(e.g., in a finally or after all runCatching blocks) so the UI reflects the
final state.
| // Start reading audio in a thread | ||
| Thread { | ||
| val buffer = ByteArray(bufferSize) | ||
| while (isRecording) { | ||
| val read = audioRecord?.read(buffer, 0, buffer.size) ?: 0 | ||
| if (read > 0) { | ||
| synchronized(audioData) { | ||
| audioData.write(buffer, 0, read) | ||
| } | ||
| } | ||
| } | ||
| }.start() | ||
|
|
||
| return true | ||
| } catch (e: SecurityException) { | ||
| audioRecord?.release() | ||
| audioRecord = null | ||
| return false | ||
| } | ||
| } | ||
|
|
||
| fun stopRecording(): ByteArray { | ||
| isRecording = false | ||
| audioRecord?.let { record -> | ||
| if (record.recordingState == AudioRecord.RECORDSTATE_RECORDING) { | ||
| record.stop() | ||
| } | ||
| record.release() | ||
| } | ||
| audioRecord = null | ||
|
|
||
| synchronized(audioData) { | ||
| return audioData.toByteArray() | ||
| } | ||
| } |
There was a problem hiding this comment.
Reading thread is not joined in stopRecording(), risking incomplete data.
After setting isRecording = false, stopRecording() immediately grabs the audioData lock and returns. The background Thread (Line 81) may still be executing audioRecord?.read(...) — which can block or crash since audioRecord is stopped/released on Line 105-108 before the thread finishes. Consider keeping a reference to the thread and joining it (with a short timeout) before releasing the AudioRecord.
🔧 Proposed fix: join the reading thread before releasing
private class AudioRecorder {
private var audioRecord: AudioRecord? = null
`@Volatile`
private var isRecording = false
private val audioData = ByteArrayOutputStream()
+ private var readingThread: Thread? = null
...
- Thread {
+ readingThread = Thread {
val buffer = ByteArray(bufferSize)
while (isRecording) {
val read = audioRecord?.read(buffer, 0, buffer.size) ?: 0
if (read > 0) {
synchronized(audioData) {
audioData.write(buffer, 0, read)
}
}
}
- }.start()
+ }
+ readingThread?.start()
...
fun stopRecording(): ByteArray {
isRecording = false
+ readingThread?.join(500) // Wait for the reading thread to exit
+ readingThread = null
audioRecord?.let { record ->🧰 Tools
🪛 detekt (1.23.8)
[warning] 94-94: The caught exception is swallowed. The original exception could be lost.
(detekt.exceptions.SwallowedException)
🤖 Prompt for AI Agents
In
`@Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/SpeechToTextScreen.kt`
around lines 80 - 114, The background read Thread started in the recording
method isn’t tracked or joined, so stopRecording() sets isRecording=false and
immediately releases audioRecord and returns audioData while the thread may
still be in audioRecord?.read(...); fix by storing the Thread reference (e.g.,
readingThread) when starting it, then in stopRecording() after setting
isRecording = false call readingThread.join(<short timeout>) and handle
InterruptedException before stopping/releasing audioRecord and returning
audioData to ensure the reader thread has finished accessing audioRecord and
audioData.
| val audioTrack = AudioTrack.Builder() | ||
| .setAudioAttributes( | ||
| AudioAttributes.Builder() | ||
| .setUsage(AudioAttributes.USAGE_MEDIA) | ||
| .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH) | ||
| .build() | ||
| ) | ||
| .setAudioFormat( | ||
| AudioFormat.Builder() | ||
| .setSampleRate(sampleRate) | ||
| .setEncoding(audioFormatConfig) | ||
| .setChannelMask(channelConfig) | ||
| .build() | ||
| ) | ||
| .setBufferSizeInBytes(maxOf(minBufferSize, pcmData.size)) | ||
| .setTransferMode(AudioTrack.MODE_STATIC) | ||
| .build() | ||
|
|
||
| audioTrack.write(pcmData, 0, pcmData.size) | ||
| audioTrack.play() | ||
|
|
||
| // Wait for playback to complete | ||
| val durationMs = (pcmData.size.toLong() * 1000) / (sampleRate * numChannels * (bitsPerSample / 8)) | ||
| delay(durationMs + 100) // Add small buffer | ||
|
|
||
| audioTrack.stop() | ||
| audioTrack.release() |
There was a problem hiding this comment.
AudioTrack leaks if the coroutine is cancelled mid-playback.
If the user navigates away while audio is playing, the delay on Line 108 throws CancellationException, skipping stop() and release(). Wrap the AudioTrack lifecycle in try/finally.
🐛 Proposed fix
val audioTrack = AudioTrack.Builder()
.setAudioAttributes(
AudioAttributes.Builder()
.setUsage(AudioAttributes.USAGE_MEDIA)
.setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
.build()
)
.setAudioFormat(
AudioFormat.Builder()
.setSampleRate(sampleRate)
.setEncoding(audioFormatConfig)
.setChannelMask(channelConfig)
.build()
)
.setBufferSizeInBytes(maxOf(minBufferSize, pcmData.size))
.setTransferMode(AudioTrack.MODE_STATIC)
.build()
- audioTrack.write(pcmData, 0, pcmData.size)
- audioTrack.play()
-
- // Wait for playback to complete
- val durationMs = (pcmData.size.toLong() * 1000) / (sampleRate * numChannels * (bitsPerSample / 8))
- delay(durationMs + 100) // Add small buffer
-
- audioTrack.stop()
- audioTrack.release()
+ try {
+ audioTrack.write(pcmData, 0, pcmData.size)
+ audioTrack.play()
+
+ // Wait for playback to complete
+ val durationMs = (pcmData.size.toLong() * 1000) / (sampleRate * numChannels * (bitsPerSample / 8))
+ delay(durationMs + 100) // Add small buffer
+ } finally {
+ audioTrack.stop()
+ audioTrack.release()
+ }Note: The same issue exists in VoicePipelineScreen.kt's playWavAudio (see relevant snippet, around line 213–220).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val audioTrack = AudioTrack.Builder() | |
| .setAudioAttributes( | |
| AudioAttributes.Builder() | |
| .setUsage(AudioAttributes.USAGE_MEDIA) | |
| .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH) | |
| .build() | |
| ) | |
| .setAudioFormat( | |
| AudioFormat.Builder() | |
| .setSampleRate(sampleRate) | |
| .setEncoding(audioFormatConfig) | |
| .setChannelMask(channelConfig) | |
| .build() | |
| ) | |
| .setBufferSizeInBytes(maxOf(minBufferSize, pcmData.size)) | |
| .setTransferMode(AudioTrack.MODE_STATIC) | |
| .build() | |
| audioTrack.write(pcmData, 0, pcmData.size) | |
| audioTrack.play() | |
| // Wait for playback to complete | |
| val durationMs = (pcmData.size.toLong() * 1000) / (sampleRate * numChannels * (bitsPerSample / 8)) | |
| delay(durationMs + 100) // Add small buffer | |
| audioTrack.stop() | |
| audioTrack.release() | |
| val audioTrack = AudioTrack.Builder() | |
| .setAudioAttributes( | |
| AudioAttributes.Builder() | |
| .setUsage(AudioAttributes.USAGE_MEDIA) | |
| .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH) | |
| .build() | |
| ) | |
| .setAudioFormat( | |
| AudioFormat.Builder() | |
| .setSampleRate(sampleRate) | |
| .setEncoding(audioFormatConfig) | |
| .setChannelMask(channelConfig) | |
| .build() | |
| ) | |
| .setBufferSizeInBytes(maxOf(minBufferSize, pcmData.size)) | |
| .setTransferMode(AudioTrack.MODE_STATIC) | |
| .build() | |
| try { | |
| audioTrack.write(pcmData, 0, pcmData.size) | |
| audioTrack.play() | |
| // Wait for playback to complete | |
| val durationMs = (pcmData.size.toLong() * 1000) / (sampleRate * numChannels * (bitsPerSample / 8)) | |
| delay(durationMs + 100) // Add small buffer | |
| } finally { | |
| audioTrack.stop() | |
| audioTrack.release() | |
| } |
🤖 Prompt for AI Agents
In
`@Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/TextToSpeechScreen.kt`
around lines 85 - 111, The AudioTrack created with AudioTrack.Builder (assigned
to audioTrack) can leak if the coroutine is cancelled during the delay: ensure
the play/cleanup sequence is wrapped in a try/finally so stop() and release()
always run; specifically move audioTrack.write(...) and audioTrack.play() into a
try block and call audioTrack.stop() and audioTrack.release() in the finally
block, and apply the same change to the playWavAudio implementation in
VoicePipelineScreen.kt to guarantee cleanup on CancellationException.
| is VoiceSessionEvent.TurnCompleted -> { | ||
| // Play the synthesized audio | ||
| event.audio?.let { audio -> | ||
| sessionState = VoiceSessionState.SPEAKING | ||
| playWavAudio(audio) | ||
| } | ||
| // Resume listening state | ||
| sessionState = VoiceSessionState.LISTENING | ||
| audioLevel = 0f | ||
| } |
There was a problem hiding this comment.
playWavAudio() blocks event collection during audio playback.
playWavAudio(audio) is a suspending call with an internal delay(durationMs + 100). Since it's called inline inside collect { ... }, it blocks the entire Flow collection for the duration of playback. Any events emitted by the SDK during that time (e.g., Listening, Error) will be buffered and delivered late, causing stale UI state.
Launch playback in a separate coroutine so collection continues unblocked.
🔧 Proposed fix
is VoiceSessionEvent.TurnCompleted -> {
// Play the synthesized audio
event.audio?.let { audio ->
sessionState = VoiceSessionState.SPEAKING
- playWavAudio(audio)
+ launch { playWavAudio(audio) }
}
// Resume listening state
sessionState = VoiceSessionState.LISTENING
audioLevel = 0f
}🤖 Prompt for AI Agents
In
`@Playground/kotlin-starter-app/app/src/main/java/com/runanywhere/kotlin_starter_example/ui/screens/VoicePipelineScreen.kt`
around lines 326 - 335, When handling VoiceSessionEvent.TurnCompleted in the
Flow collector, don't call the suspending playWavAudio(audio) inline because it
blocks collection; instead launch playback in a separate coroutine (e.g.,
CoroutineScope(Dispatchers.IO).launch or viewModelScope.launch) so the collector
can continue receiving events. Set sessionState = VoiceSessionState.SPEAKING
before launching the coroutine, call playWavAudio(audio) inside that coroutine,
and after playback update sessionState = VoiceSessionState.LISTENING and
audioLevel = 0f inside the coroutine to restore UI state without blocking the
Flow collector. Ensure you reference the same symbols:
VoiceSessionEvent.TurnCompleted, playWavAudio, sessionState, and audioLevel.
| RAC_COMPONENT_VAD = 3, /**< Voice Activity Detection */ | ||
| RAC_COMPONENT_VOICE = 4, /**< Voice Agent */ | ||
| RAC_COMPONENT_EMBEDDING = 5, /**< Embedding generation */ | ||
| RAC_COMPONENT_MEMORY = 6, /**< Vector memory / RAG */ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "rac_sdk_component_display_name|rac_sdk_component_raw_value" --type=cpp --type=c -B2 -A10Repository: RunanywhereAI/runanywhere-sdks
Length of output: 7670
🏁 Script executed:
# Also search for where RAC_COMPONENT_MEMORY might be referenced
rg -n "RAC_COMPONENT_MEMORY" --type=cpp --type=c -B2 -A2Repository: RunanywhereAI/runanywhere-sdks
Length of output: 603
🏁 Script executed:
# Find the header file to understand the context
find . -name "rac_component_types.h" -type fRepository: RunanywhereAI/runanywhere-sdks
Length of output: 223
🏁 Script executed:
cat -n sdk/runanywhere-commons/src/core/component_types.cpp | head -60Repository: RunanywhereAI/runanywhere-sdks
Length of output: 2374
🏁 Script executed:
# Also check if there are any default cases or what happens at the end
cat -n sdk/runanywhere-commons/src/core/component_types.cppRepository: RunanywhereAI/runanywhere-sdks
Length of output: 4990
Update rac_sdk_component_display_name, rac_sdk_component_raw_value, and rac_component_to_resource_type to handle RAC_COMPONENT_MEMORY.
The enum value was added but the switch statements in component_types.cpp are missing the new case. Currently:
rac_sdk_component_display_namereturns "Unknown" for memoryrac_sdk_component_raw_valuereturns "unknown" for memoryrac_component_to_resource_typereturns -1 for memory
Add cases for RAC_COMPONENT_MEMORY in all three functions to match the pattern of existing components.
🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/include/rac/core/rac_component_types.h` at line 38,
Add a switch case for RAC_COMPONENT_MEMORY in component_types.cpp inside
rac_sdk_component_display_name, rac_sdk_component_raw_value, and
rac_component_to_resource_type: return the human label "Memory" from
rac_sdk_component_display_name, the raw identifier "memory" from
rac_sdk_component_raw_value, and map to the same resource-type enum/constant
used for memory in other components (do not return -1)—use the existing
resource-type symbol (e.g., RESOURCE_TYPE_MEMORY or the project’s equivalent) to
keep the pattern consistent with other cases.
| # Memory layer test executable | ||
| add_executable(test_memory test_memory.cpp) | ||
| target_link_libraries(test_memory PRIVATE rac_commons rac_backend_memory) | ||
| target_include_directories(test_memory PRIVATE ${PROJECT_SOURCE_DIR}/include) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if RAC_BUILD_MEMORY option is defined elsewhere
echo "=== Searching for RAC_BUILD_MEMORY option ==="
rg -n 'RAC_BUILD_MEMORY' --type=cmake
echo ""
echo "=== Check existing RAC_BUILD_* patterns for reference ==="
rg -n 'RAC_BUILD_' --type=cmake -C 1 | head -80
echo ""
echo "=== Check how other test targets are defined ==="
fd -t f 'CMakeLists.txt' --exec grep -l 'add_executable' {} \; | head -10Repository: RunanywhereAI/runanywhere-sdks
Length of output: 5650
🏁 Script executed:
#!/bin/bash
# Check the actual tests/CMakeLists.txt content
echo "=== Contents of tests/CMakeLists.txt ==="
cat -n sdk/runanywhere-commons/tests/CMakeLists.txt
echo ""
echo "=== How tests directory is included in main CMakeLists.txt ==="
rg -n 'add_subdirectory.*tests' --type=cmake
echo ""
echo "=== Check if RAC_BUILD_TESTS guards the tests inclusion ==="
rg -n -B2 -A2 'add_subdirectory.*tests' sdk/runanywhere-commons/CMakeLists.txt --type=cmake
echo ""
echo "=== Check for add_test() calls in the codebase ==="
rg -n 'add_test\(' --type=cmakeRepository: RunanywhereAI/runanywhere-sdks
Length of output: 715
🏁 Script executed:
#!/bin/bash
# Double-check: verify that rac_backend_memory target is created only when RAC_BACKEND_MEMORY is enabled
echo "=== How rac_backend_memory target is created ==="
rg -n -B2 -A5 'add_library.*rac_backend_memory' --type=cmake
echo ""
echo "=== Memory backend CMakeLists.txt structure ==="
if [ -f sdk/runanywhere-commons/src/backends/memory/CMakeLists.txt ]; then
cat sdk/runanywhere-commons/src/backends/memory/CMakeLists.txt
else
echo "File not found"
fiRepository: RunanywhereAI/runanywhere-sdks
Length of output: 2364
Guard with RAC_BACKEND_MEMORY to prevent build failures when the memory backend is disabled.
The test target links against rac_backend_memory, which is only created when RAC_BACKEND_MEMORY=ON. If the memory backend is disabled while tests are enabled, the build will fail with undefined reference errors. Wrap the test definition in a RAC_BACKEND_MEMORY guard to match the optional backend pattern used elsewhere in the project.
🔧 Proposed fix
-# Memory layer test executable
-add_executable(test_memory test_memory.cpp)
-target_link_libraries(test_memory PRIVATE rac_commons rac_backend_memory)
-target_include_directories(test_memory PRIVATE ${PROJECT_SOURCE_DIR}/include)
+# Memory layer test executable
+if(RAC_BACKEND_MEMORY)
+ add_executable(test_memory test_memory.cpp)
+ target_link_libraries(test_memory PRIVATE rac_commons rac_backend_memory)
+ target_include_directories(test_memory PRIVATE ${PROJECT_SOURCE_DIR}/include)
+endif()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Memory layer test executable | |
| add_executable(test_memory test_memory.cpp) | |
| target_link_libraries(test_memory PRIVATE rac_commons rac_backend_memory) | |
| target_include_directories(test_memory PRIVATE ${PROJECT_SOURCE_DIR}/include) | |
| # Memory layer test executable | |
| if(RAC_BACKEND_MEMORY) | |
| add_executable(test_memory test_memory.cpp) | |
| target_link_libraries(test_memory PRIVATE rac_commons rac_backend_memory) | |
| target_include_directories(test_memory PRIVATE ${PROJECT_SOURCE_DIR}/include) | |
| endif() |
🤖 Prompt for AI Agents
In `@sdk/runanywhere-commons/tests/CMakeLists.txt` around lines 1 - 4, Wrap the
test target creation for test_memory in an RAC_BACKEND_MEMORY guard so it is
only added when the memory backend is enabled: conditionally add the
add_executable(test_memory ...), target_link_libraries(test_memory PRIVATE
rac_commons rac_backend_memory) and target_include_directories(test_memory ...)
inside an if(RAC_BACKEND_MEMORY) ... endif() block to avoid linking against
rac_backend_memory when the backend is disabled.
| public struct MemorySearchResult: Sendable { | ||
| /// Vector ID | ||
| public let id: UInt64 | ||
|
|
||
| /// Distance/similarity score (lower is closer for L2/cosine) | ||
| public let score: Float | ||
|
|
||
| /// Associated metadata (parsed from JSON) | ||
| public let metadata: [String: Any]? | ||
|
|
||
| public init(id: UInt64, score: Float, metadata: [String: Any]? = nil) { | ||
| self.id = id | ||
| self.score = score | ||
| self.metadata = metadata | ||
| } | ||
| } |
There was a problem hiding this comment.
Sendable conformance is invalid — [String: Any]? is not Sendable.
Both MemorySearchResult and MemoryRecallResult declare Sendable conformance but store metadata: [String: Any]?. Any does not conform to Sendable, so this will produce a compiler error under Swift 6 strict concurrency checking.
Consider using [String: any Sendable]? or a dedicated Sendable-conforming metadata type.
Proposed fix
public struct MemorySearchResult: Sendable {
public let id: UInt64
public let score: Float
- public let metadata: [String: Any]?
+ public let metadata: [String: any Sendable]?
- public init(id: UInt64, score: Float, metadata: [String: Any]? = nil) {
+ public init(id: UInt64, score: Float, metadata: [String: any Sendable]? = nil) {Apply the same change to MemoryRecallResult.
As per coding guidelines, **/*.swift: "Use the latest Swift 6 APIs always."
Also applies to: 48-67
🤖 Prompt for AI Agents
In `@sdk/runanywhere-swift/Sources/RunAnywhere/Features/Memory/MemoryTypes.swift`
around lines 30 - 45, MemorySearchResult and MemoryRecallResult incorrectly
declare Sendable while using metadata: [String: Any]?; change the metadata type
to a Sendable-safe form (e.g., metadata: [String: any Sendable]?) and update the
corresponding init parameters and stored property types in both structs
(MemorySearchResult and MemoryRecallResult) so the structs can legitimately
conform to Sendable under Swift 6.
| private let bridge: CppBridge.Memory | ||
| private let embedder: any EmbeddingProvider | ||
| private let config: MemoryConfiguration | ||
| private var nextId: UInt64 = 1 |
There was a problem hiding this comment.
nextId counter resets after load() or app restart — causes silent ID collisions.
nextId starts at 1 and is not persisted. After a save()/load() round-trip (or an app relaunch with load()), new remember() calls will produce IDs starting from 1 again. Since the flat backend treats duplicate IDs as upserts, this silently overwrites previously stored memories.
Consider either:
- Persisting
nextIdalongside the index (e.g., in the metadata sidecar or a dedicated field), or - After
load(), querying stats or scanning existing IDs to setnextId = maxExistingId + 1.
🤖 Prompt for AI Agents
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Features/Memory/RAGMemoryService.swift`
at line 33, The nextId counter resets because nextId (declared in
RAGMemoryService) isn't persisted or recomputed on load, causing ID collisions
when remember() generates new entries after load() or restart; fix by persisting
nextId with the index (e.g., metadata sidecar or dedicated field) during save()
and restoring it in load(), or alternatively implement logic in load() to scan
existing memory IDs (or query backend stats) to compute nextId = maxExistingId +
1 before returning, ensuring remember() uses the updated nextId.
| public func add(vectors: [[Float]], ids: [UInt64], metadata: [String?]? = nil) throws { | ||
| let handle = try getHandle() | ||
| let count = UInt32(vectors.count) | ||
| guard count > 0 else { return } | ||
|
|
There was a problem hiding this comment.
Missing count validation between vectors, ids, and metadata.
The C API (rac_memory_add) reads count entries from all three buffers. If ids.count != vectors.count or metadata?.count is non-nil but differs, this causes out-of-bounds reads on the C side (undefined behavior). Additionally, if metadata is non-nil but empty while vectors is non-empty, the force unwrap at line 104 (metaBuf.baseAddress!) will crash.
Proposed validation
public func add(vectors: [[Float]], ids: [UInt64], metadata: [String?]? = nil) throws {
let handle = try getHandle()
let count = UInt32(vectors.count)
guard count > 0 else { return }
+
+ guard vectors.count == ids.count else {
+ throw SDKError.memory(.invalidInput,
+ "vectors.count (\(vectors.count)) != ids.count (\(ids.count))")
+ }
+ if let metadata = metadata {
+ guard metadata.count == vectors.count else {
+ throw SDKError.memory(.invalidInput,
+ "metadata.count (\(metadata.count)) != vectors.count (\(vectors.count))")
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public func add(vectors: [[Float]], ids: [UInt64], metadata: [String?]? = nil) throws { | |
| let handle = try getHandle() | |
| let count = UInt32(vectors.count) | |
| guard count > 0 else { return } | |
| public func add(vectors: [[Float]], ids: [UInt64], metadata: [String?]? = nil) throws { | |
| let handle = try getHandle() | |
| let count = UInt32(vectors.count) | |
| guard count > 0 else { return } | |
| guard vectors.count == ids.count else { | |
| throw SDKError.memory(.invalidInput, | |
| "vectors.count (\(vectors.count)) != ids.count (\(ids.count))") | |
| } | |
| if let metadata = metadata { | |
| guard metadata.count == vectors.count else { | |
| throw SDKError.memory(.invalidInput, | |
| "metadata.count (\(metadata.count)) != vectors.count (\(vectors.count))") | |
| } | |
| } |
🤖 Prompt for AI Agents
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/Extensions/CppBridge`+Memory.swift
around lines 69 - 73, In add(vectors: [[Float]], ids: [UInt64], metadata:
[String?]? = nil) validate that ids.count == vectors.count and if metadata !=
nil then metadata!.count == vectors.count (and not empty for non-zero vectors)
before calling rac_memory_add; if counts mismatch throw an error/return early.
Update the metadata handling so you do not force-unwrap metaBuf.baseAddress!
when metadata is non-nil — only create and pass a metadata buffer when
metadata.count == vectors.count, otherwise pass nil to the C call; reference the
add(...) method, the rac_memory_add invocation, and the metaBuf.baseAddress
usage when making these changes.
| var metadataPointers: [UnsafePointer<CChar>?]? | ||
| var metadataBuffers: [Data] = [] | ||
|
|
||
| if let metadata = metadata { | ||
| var ptrs = [UnsafePointer<CChar>?]() | ||
| for meta in metadata { | ||
| if let m = meta { | ||
| let data = Data(m.utf8 + [0]) | ||
| metadataBuffers.append(data) | ||
| ptrs.append(nil) // Will set in withUnsafeBytes | ||
| } else { | ||
| ptrs.append(nil) | ||
| } | ||
| } | ||
| metadataPointers = ptrs | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Dead code: metadataPointers and metadataBuffers are never used.
These variables are populated but immediately shadowed by the strdup-based approach at lines 96-110. Remove them to avoid confusion.
Remove dead code
- // Prepare metadata C strings
- var metadataPointers: [UnsafePointer<CChar>?]?
- var metadataBuffers: [Data] = []
-
- if let metadata = metadata {
- var ptrs = [UnsafePointer<CChar>?]()
- for meta in metadata {
- if let m = meta {
- let data = Data(m.utf8 + [0])
- metadataBuffers.append(data)
- ptrs.append(nil) // Will set in withUnsafeBytes
- } else {
- ptrs.append(nil)
- }
- }
- metadataPointers = ptrs
- }
-
let result: rac_result_t📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var metadataPointers: [UnsafePointer<CChar>?]? | |
| var metadataBuffers: [Data] = [] | |
| if let metadata = metadata { | |
| var ptrs = [UnsafePointer<CChar>?]() | |
| for meta in metadata { | |
| if let m = meta { | |
| let data = Data(m.utf8 + [0]) | |
| metadataBuffers.append(data) | |
| ptrs.append(nil) // Will set in withUnsafeBytes | |
| } else { | |
| ptrs.append(nil) | |
| } | |
| } | |
| metadataPointers = ptrs | |
| } | |
| let result: rac_result_t |
🤖 Prompt for AI Agents
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/Extensions/CppBridge`+Memory.swift
around lines 78 - 93, Remove the dead preallocation for metadataPointers and
metadataBuffers in CppBridge+Memory.swift: the variables metadataPointers and
metadataBuffers (and the block that builds ptrs/Data from metadata) are never
used because the code later uses strdup-based allocation; delete that entire
if-let metadata block so only the strdup approach remains (search for
metadataPointers, metadataBuffers, and the for meta in metadata loop to locate
the code to remove).
|
Let's close this right now |
Summary
remember()→ embed + store,recall()→ embed + search,RAGAgent→ retrieve + augment prompt + LLM generateArchitecture
What's included
C++ core (~1,500 lines):
rac_memory_types.h/rac_memory_service.h— types, vtable interface, public C API (create, add, search, remove, save, load)shared_mutexhnswlib::HierarchicalNSW<float>, auto-resize, cosine/L2/IP metrics.racmformat with metadata sidecarrac_llm_llamacpp_get_embeddings()— extract embeddings from any GGUF modelSwift SDK (~900 lines):
CppBridge.Memoryactor — wraps all C FFI callsMemoryConfiguration,MemoryTypes,MemoryModuleEmbeddingProviderprotocol +LlamaCppEmbeddingProviderRAGMemoryService— remember/recall/forget with automatic embeddingRAGAgent— full RAG: retrieve context from memory, augment prompt, generate with LLMTextChunker— configurable text splitting for document ingestionTests & example app:
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests
Greptile Overview
Greptile Summary
rac_memory_*) and vtable-dispatched backends (Flat brute-force + HNSW via hnswlib), including persistence.rac_llm_llamacpp_get_embeddings) for use in retrieval workflows.RAGMemoryService,RAGAgent,TextChunker) and aCppBridge.Memoryactor wrapping the C FFI.Confidence Score: 2/5
memory_backend_hnswlib.cppuses std::priority_queue without including (compile failure), Swift bridge lacks basic length validation before passing pointers into C (OOB risk), and the new embeddings API allocates with malloc while callers free via rac_free (allocator contract mismatch).Important Files Changed
Sequence Diagram
sequenceDiagram participant App as Swift App participant RAG as RAGAgent (Swift actor) participant MemSvc as RAGMemoryService (Swift actor) participant Bridge as CppBridge.Memory (Swift actor) participant CAPI as rac_memory_* (C API) participant Core as rac_memory_service (vtable dispatch) participant Backend as Flat/HNSW backend participant LLM as rac_llm_llamacpp_get_embeddings App->>RAG: query(question,k) RAG->>MemSvc: recall(question,k) MemSvc->>LLM: embed(question) LLM-->>MemSvc: [Float] queryEmbedding MemSvc->>Bridge: search(queryEmbedding,k) Bridge->>CAPI: rac_memory_search(handle, query, k, &out) CAPI->>Core: ops->get_stats() (dimension) Core->>Backend: ops->search(query, dimension, k) Backend-->>CAPI: rac_memory_search_results_t CAPI-->>Bridge: results + metadata C strings Bridge-->>MemSvc: [MemorySearchResult] MemSvc-->>RAG: [MemoryRecallResult] RAG->>App: augmented prompt -> LLM generate App->>RAG: ingest(text) RAG->>MemSvc: remember(chunk, metadata) MemSvc->>LLM: embed(chunk) LLM-->>MemSvc: [Float] embedding MemSvc->>Bridge: add([embedding],[id],[metadataJson]) Bridge->>CAPI: rac_memory_add(handle, vectors, ids, metadata, count) CAPI->>Core: ops->get_stats() (dimension) Core->>Backend: ops->add(vectors, ids, metadata, count, dimension) Backend-->>CAPI: RAC_SUCCESS