Skip to content

Commit 2c9c857

Browse files
tests
1 parent ee583ad commit 2c9c857

File tree

28 files changed

+7896
-710
lines changed

28 files changed

+7896
-710
lines changed

Playground/YapRun/mac_plan.md

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
Now I have comprehensive details. Let me compile the complete UX walkthrough.
2+
3+
Here's the full Wispr Flow macOS user experience — end to end — so you can replicate it for RunAnywhere.
4+
## Download & Install Experience
5+
The entire install takes under 2 minutes: [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
6+
7+
1. Go to **wisprflow.ai** → click **"Download for macOS"**
8+
2. A `.dmg` file downloads (~100 MB) [reddit](https://www.reddit.com/r/macapps/comments/1nzi52c/wispr_flow_managed_to_get_their_shit_together/)
9+
3. Open the DMG → **drag the Flow icon into Applications** [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
10+
4. Launch Wispr Flow from Applications [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
11+
5. **Sign in** (Apple Sign-In or email) [roadmap.wisprflow](https://roadmap.wisprflow.ai/changelog/pointup2-new-flow-bar-flow-pro)
12+
6. Follow the on-screen setup prompts
13+
## First-Launch Permissions
14+
On first launch, macOS asks for two critical permissions: [docs.wisprflow](https://docs.wisprflow.ai/articles/6409258247-starting-your-first-dictation)
15+
16+
1. **Microphone**`System Settings → Privacy & Security → Microphone` — allow Wispr Flow
17+
2. **Accessibility**`System Settings → Privacy & Security → Accessibility` — allow Wispr Flow (needed for global hotkey capture, reading active app context, and simulating paste)
18+
19+
No other configuration is needed. The app is immediately ready to use. [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
20+
## What Appears On Screen
21+
After setup, two things are visible:
22+
### The Flow Bar (Floating Pill)
23+
A small **rounded pill/bubble at the bottom center of the screen**. It: [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
24+
25+
- Sits **always on top** of all windows, on every desktop/space
26+
- Is semi-transparent and unobtrusive when idle
27+
- **Left-click** it → starts hands-free dictation [roadmap.wisprflow](https://roadmap.wisprflow.ai/changelog/pointup2-new-flow-bar-flow-pro)
28+
- **Right-click** it → shows quick settings (language, microphone selection) [roadmap.wisprflow](https://roadmap.wisprflow.ai/changelog/pointup2-new-flow-bar-flow-pro)
29+
- Can be toggled off in Settings → "Show flow bar at all times" [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
30+
### Menu Bar Icon
31+
A small icon in the macOS menu bar (top-right area) for quick access to the Hub (main settings window). [willowvoice](https://willowvoice.com/blog/super-whisper-vs-wispr-flow-comparison-reviews-and-alternatives-in-2025)
32+
## Core Dictation UX (Push-to-Talk)
33+
This is the primary experience — **hold a key, speak, release, text appears**: [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
34+
35+
1. **Click into any text field** in any app (Slack, Gmail, Notes, Notion, Cursor, browser, etc.) [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
36+
2. **Hold the Fn key** (default hotkey) [docs.wisprflow](https://docs.wisprflow.ai/articles/6409258247-starting-your-first-dictation)
37+
3. You hear an **audio ping** and see the Flow Bar animate with **white moving bars** (waveform) — this means the mic is live [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
38+
4. **Speak naturally** — ramble, use filler words, change your mind mid-sentence [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
39+
5. **Release the Fn key** [docs.wisprflow](https://docs.wisprflow.ai/articles/6409258247-starting-your-first-dictation)
40+
6. Flow processes your speech (200ms start, 450ms paste in Instant Mode) [wisprflow](https://wisprflow.ai/post/top-10-dictation-tools-december-2025)
41+
7. **Formatted, clean text appears** exactly where your cursor was — filler words removed, punctuation added, paragraphs created as needed [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
42+
43+
If you just **tap** the hotkey instead of holding it, a reminder prompt appears telling you to hold it down. Press **Esc** while dictating to cancel — nothing pastes, but the transcript is saved in Recent Activity. [docs.wisprflow](https://docs.wisprflow.ai/articles/6409258247-starting-your-first-dictation)
44+
## Hands-Free Dictation
45+
For longer dictation without holding a key: [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
46+
47+
1. Press **Fn + Space** → hands-free mode activates
48+
2. An **X icon** and a **stop button (circle with square)** appear on the Flow Bar [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
49+
3. The waveform animates to show the mic is active
50+
4. **Speak freely** without holding any key
51+
5. Press **Fn** once to stop and paste the transcript [docs.wisprflow](https://docs.wisprflow.ai/articles/6409258247-starting-your-first-dictation)
52+
6. Or **click the stop button** on the Flow Bar [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
53+
## Flow Bar Click Dictation
54+
For people who prefer mouse over keyboard: [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
55+
56+
1. **Click** the Flow Bar bubble → recording starts
57+
2. Hover shows tooltip instructions
58+
3. **Click again** → stops recording and pastes text
59+
4. Same behavior as push-to-talk, just mouse-driven
60+
## What the Text Looks Like
61+
Flow doesn't just dump raw speech. Examples from the tutorial: [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
62+
63+
| What you say | What Flow types |
64+
|-------------|----------------|
65+
| "Let's meet tomorrow. Actually, let's do Friday instead." | "Let's meet Friday." (one clean sentence) |
66+
| "Grocery list, apples, pears, bananas, strawberries" | A bulleted list with colon header and one bullet per item |
67+
| "um so like I was thinking maybe we could uh do that thing" | Clean sentence with filler removed |
68+
| Mid-sentence corrections | Reshapes the sentence so it reads clean [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ) |
69+
70+
Flow also **adapts tone per app** — formal in Gmail, casual in iMessage, technical in Cursor. [wisprflow](https://wisprflow.ai/post/wispr-flow-for-seamless-communication)
71+
## The Hub (Main App Window)
72+
Opened from the menu bar icon. Contains these sections: [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
73+
### Home
74+
- **Today** — recent dictation history with full transcripts [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
75+
- **Stats** — total words dictated, words per minute (WPM) [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
76+
- Quick tips (hands-free shortcut reminder)
77+
### Dictionary
78+
- Add custom words, names, jargon [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
79+
- **Misspelling correction**: e.g., you say "CJ" but want it typed as "C-J" with a dash — add the correction once, it's fixed forever [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
80+
- **Auto-add to Dictionary** option — Flow watches your corrections and learns automatically [ppl-ai-file-upload.s3.amazonaws](https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/images/160267873/a090ff18-8e94-4931-a7cb-616097500514/image.jpg)
81+
- Up to 800 words/phrases [wisprflow](https://wisprflow.ai/post/top-10-dictation-tools-december-2025)
82+
- Shareable with your team [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
83+
### Snippets
84+
- Saved text shortcuts expanded by voice [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
85+
- Say a trigger phrase → Flow expands it to the full saved text
86+
- Great for: email signatures, links, addresses, bios, boilerplate [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
87+
### Styles
88+
- Control how Flow writes and formats your words [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
89+
- Set different tones for different apps/tasks
90+
- Keeps writing style consistent across tools
91+
### Notes
92+
- Built-in quick notes — press hotkey, speak a note, it's saved [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
93+
- Recent notes shown at bottom
94+
- Notes sync across desktop and mobile [ppl-ai-file-upload.s3.amazonaws](https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/images/160267873/a090ff18-8e94-4931-a7cb-616097500514/image.jpg)
95+
### Settings
96+
Organized into tabs: [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
97+
98+
| Tab | Options |
99+
|-----|---------|
100+
| **General** | Change push-to-talk hotkey, hands-free shortcut, command mode shortcut, paste-last-transcript shortcut. Reset to defaults button [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ) |
101+
| **System** | Launch at login (on/off), Show flow bar at all times, Show app in Dock, Dictation sound effects (audio ping), **Mute music while dictating** (ducks music, restores after) [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ) |
102+
| **Vibe** | Writing style/tone preferences [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ) |
103+
| **Coding** | Developer-specific settings [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ) |
104+
| **Experiment** | Auto-add to dictionary, smart formatting, email signature, creator mode [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ) |
105+
| **Account** | Login, plan management, reset app [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ) |
106+
## Command Mode (Pro Feature)
107+
For editing existing text with your voice: [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
108+
109+
- Activate with a separate hotkey
110+
- Speak instructions like: "make this more formal", "use lowercase in iMessage", "break into paragraphs" [wisprflow](https://wisprflow.ai/post/wispr-flow-for-seamless-communication)
111+
- Flow reads the surrounding text via the Accessibility API and applies the edit [ppl-ai-file-upload.s3.amazonaws](https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/attachments/images/160267873/a090ff18-8e94-4931-a7cb-616097500514/image.jpg)
112+
## How It Stays Running
113+
The app persistence model for your replication: [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
114+
115+
- **Agent app** — runs without a Dock icon by default (togglable) [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
116+
- **Launch at login** — auto-starts with macOS [youtube](https://www.youtube.com/watch?v=qlg3p5HdXRQ)
117+
- **Global hotkey listener** — always monitoring for Fn press via `CGEvent` tap, even when the app has no window [docs.wisprflow](https://docs.wisprflow.ai/articles/6409258247-starting-your-first-dictation)
118+
- **Near-zero CPU when idle** — only activates on hotkey press [reddit](https://www.reddit.com/r/macapps/comments/1nzi52c/wispr_flow_managed_to_get_their_shit_together/)
119+
- **No background audio recording** — mic only activates when hotkey is held or hands-free is active [reddit](https://www.reddit.com/r/macapps/comments/1nzi52c/wispr_flow_managed_to_get_their_shit_together/)
120+
- **Flow Bar** is an always-on-top overlay that shows the app is ready [roadmap.wisprflow](https://roadmap.wisprflow.ai/changelog/pointup2-new-flow-bar-flow-pro)
121+
## Technical Implementation Summary for RunAnywhere macOS
122+
| UX Element | Implementation |
123+
|-----------|---------------|
124+
| **Flow Bar** | `NSPanel` with `.floating` level, `.borderless` style, `.canJoinAllSpaces` collection behavior |
125+
| **Global Fn hotkey** | `CGEvent.tapCreate` with `kCGEventKeyDown`/`kCGEventKeyUp` for Fn key monitoring |
126+
| **Audio ping on start** | `AVAudioPlayer` playing a short sound file on keyDown |
127+
| **Waveform animation** | Core Animation or SwiftUI animation on the Flow Bar showing audio levels from `AVAudioEngine.inputNode.installTap` |
128+
| **Text insertion** | Copy to `NSPasteboard.general`, then simulate `Cmd+V` via `CGEvent(keyboardEventSource:virtualKey:keyDown:)` |
129+
| **Context awareness** | `AXUIElementCopyAttributeValue` on the focused element to read surrounding text |
130+
| **App name detection** | `NSWorkspace.shared.frontmostApplication?.localizedName` |
131+
| **Mute music** | Use `MediaRemote` private framework or AppleScript to pause/play iTunes/Spotify |
132+
| **Launch at login** | `SMAppService.mainApp.register()` (modern) or `SMLoginItemSetEnabled` (legacy) |
133+
| **Agent mode (no Dock)** | `LSUIElement = YES` in Info.plist |
134+
| **Menu bar** | `NSStatusBar.system.statusItem(withLength:)` |
135+
| **Settings/Hub** | Standard `NSWindow` shown on status item click |
136+
| **Recent Activity** | Local SQLite or Core Data store of all transcriptions |
137+
| **Dictionary/Snippets** | JSON or plist in App Group container, synced via CloudKit if needed |
138+
139+
The key difference for your build: instead of streaming audio to cloud, you run **Sherpa Whisper on-device** in the same process. On your M3 Max, Whisper Tiny should return results in <500ms, which is competitive with Wispr's cloud latency. [wisprflow](https://wisprflow.ai/post/top-10-dictation-tools-december-2025)

examples/ios/RunAnywhereAI/RunAnywhereAI/Features/VoiceKeyboard/DictationActivityAttributes.swift

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,15 @@
99
// The extension gets this file via the manual pbxproj wiring (same pattern as SharedConstants.swift).
1010
//
1111

12+
#if os(iOS)
1213
import ActivityKit
14+
#endif
1315
import Foundation
1416

1517
/// The static attributes for a dictation flow session.
1618
/// These do not change after the activity is started.
19+
#if os(iOS)
20+
@available(iOS 16.1, *)
1721
struct DictationActivityAttributes: ActivityAttributes {
1822

1923
/// The dynamic / live state updated throughout the session.
@@ -31,3 +35,4 @@ struct DictationActivityAttributes: ActivityAttributes {
3135
/// Session identifier — set once at start
3236
var sessionId: String
3337
}
38+
#endif

sdk/runanywhere-commons/cmake/FetchONNXRuntime.cmake

Lines changed: 39 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -69,12 +69,10 @@ elseif(IOS OR CMAKE_SYSTEM_NAME STREQUAL "iOS")
6969
add_library(onnxruntime STATIC IMPORTED GLOBAL)
7070

7171
# Determine architecture-specific library path
72-
if(CMAKE_OSX_SYSROOT MATCHES "simulator")
73-
if(CMAKE_OSX_ARCHITECTURES MATCHES "arm64")
74-
set(ONNX_FRAMEWORK_ARCH "ios-arm64_x86_64-simulator")
75-
else()
76-
set(ONNX_FRAMEWORK_ARCH "ios-arm64_x86_64-simulator")
77-
endif()
72+
# Check both CMAKE_OSX_SYSROOT (case-insensitive) and IOS_PLATFORM from ios.toolchain.cmake
73+
string(TOLOWER "${CMAKE_OSX_SYSROOT}" _sysroot_lower)
74+
if(_sysroot_lower MATCHES "simulator" OR (DEFINED IOS_PLATFORM AND IOS_PLATFORM MATCHES "SIMULATOR"))
75+
set(ONNX_FRAMEWORK_ARCH "ios-arm64_x86_64-simulator")
7876
else()
7977
set(ONNX_FRAMEWORK_ARCH "ios-arm64")
8078
endif()
@@ -138,6 +136,41 @@ elseif(ANDROID)
138136
)
139137
target_include_directories(onnxruntime INTERFACE "${ONNX_HEADER_PATH}")
140138

139+
# Sherpa-ONNX Android prebuilts only ship the C API header.
140+
# The ONNX C++ API headers (onnxruntime_cxx_api.h etc.) are header-only
141+
# wrappers needed by wakeword_onnx.cpp. Download them if missing.
142+
if(NOT EXISTS "${ONNX_HEADER_PATH}/onnxruntime_cxx_api.h")
143+
set(ONNX_CXX_HEADER_DIR "${CMAKE_BINARY_DIR}/_deps/onnxruntime-cxx-headers")
144+
file(MAKE_DIRECTORY "${ONNX_CXX_HEADER_DIR}")
145+
146+
set(ONNX_HEADER_BASE_URL "https://raw.githubusercontent.com/microsoft/onnxruntime/v${ONNX_VERSION_ANDROID}/include/onnxruntime/core/session")
147+
set(ONNX_CXX_HEADERS
148+
onnxruntime_cxx_api.h
149+
onnxruntime_cxx_inline.h
150+
onnxruntime_float16.h
151+
onnxruntime_session_options_config_keys.h
152+
onnxruntime_run_options_config_keys.h
153+
)
154+
155+
foreach(header ${ONNX_CXX_HEADERS})
156+
if(NOT EXISTS "${ONNX_CXX_HEADER_DIR}/${header}")
157+
message(STATUS "Downloading ONNX C++ header: ${header}")
158+
file(DOWNLOAD
159+
"${ONNX_HEADER_BASE_URL}/${header}"
160+
"${ONNX_CXX_HEADER_DIR}/${header}"
161+
STATUS download_status
162+
)
163+
list(GET download_status 0 download_code)
164+
if(NOT download_code EQUAL 0)
165+
message(WARNING "Failed to download ${header} (status: ${download_status})")
166+
endif()
167+
endif()
168+
endforeach()
169+
170+
target_include_directories(onnxruntime INTERFACE "${ONNX_CXX_HEADER_DIR}")
171+
message(STATUS "ONNX Runtime C++ headers: ${ONNX_CXX_HEADER_DIR}")
172+
endif()
173+
141174
message(STATUS "ONNX Runtime Android library: ${ONNX_LIB_PATH}")
142175
message(STATUS "ONNX Runtime Android headers: ${ONNX_HEADER_PATH}")
143176
else()

sdk/runanywhere-commons/src/backends/onnx/CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,8 @@ if(RAC_PLATFORM_IOS)
4545
message(STATUS "Found Sherpa-ONNX xcframework at: ${SHERPA_ONNX_ROOT}")
4646
set(SHERPA_ONNX_AVAILABLE ON)
4747

48-
if(CMAKE_OSX_SYSROOT MATCHES "simulator")
48+
string(TOLOWER "${CMAKE_OSX_SYSROOT}" _sherpa_sysroot_lower)
49+
if(_sherpa_sysroot_lower MATCHES "simulator" OR (DEFINED IOS_PLATFORM AND IOS_PLATFORM MATCHES "SIMULATOR"))
4950
set(SHERPA_ARCH "ios-arm64_x86_64-simulator")
5051
else()
5152
set(SHERPA_ARCH "ios-arm64")

sdk/runanywhere-commons/src/backends/onnx/rac_backend_onnx_register.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,8 @@ rac_bool_t onnx_stt_can_handle(const rac_service_request_t* request, void* user_
243243

244244
if (strstr(path, "whisper") != nullptr || strstr(path, "zipformer") != nullptr ||
245245
strstr(path, "paraformer") != nullptr || strstr(path, "parakeet") != nullptr ||
246-
strstr(path, "nemo") != nullptr || strstr(path, ".onnx") != nullptr) {
246+
strstr(path, "nemo") != nullptr || strstr(path, "moonshine") != nullptr ||
247+
strstr(path, ".onnx") != nullptr) {
247248
RAC_LOG_INFO(LOG_CAT, "onnx_stt_can_handle: path matches -> TRUE");
248249
return RAC_TRUE;
249250
}

sdk/runanywhere-commons/src/features/stt/rac_stt_service.cpp

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,20 @@ void rac_stt_result_free(rac_stt_result_t* result) {
146146
free(result->text);
147147
result->text = nullptr;
148148
}
149+
if (result->detected_language) {
150+
free(result->detected_language);
151+
result->detected_language = nullptr;
152+
}
153+
if (result->words) {
154+
for (size_t i = 0; i < result->num_words; i++) {
155+
if (result->words[i].text) {
156+
free(const_cast<char*>(result->words[i].text));
157+
}
158+
}
159+
free(result->words);
160+
result->words = nullptr;
161+
result->num_words = 0;
162+
}
149163
}
150164

151165
} // extern "C"

0 commit comments

Comments
 (0)