-
-
Notifications
You must be signed in to change notification settings - Fork 892
feat: add text replacements feature #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Introduces a new 'replacements' setting for accent-insensitive, punctuation-aware, and capitalization-controlled text substitutions in transcriptions. Adds UI for managing replacements, import/export functionality, backend support, and updates settings/types to support the feature.
(?i)ouvr(ir|ez|e) la parenthèse -------------------------------- This will match: "ouvrez la parenthèse" "Ouvre la parenthèse" "OUVRIR LA PARENTHÈSE"
Introduces magic tags (e.g., [lowercase], [date]) to the replacement system in both backend and frontend. Backend parses and applies transformations based on tags: '[lowercase]': 'Converts the entire text to lowercase', '[uppercase]': 'Converts the entire text to uppercase', '[capitalize]': 'Capitalizes the first letter of each word', '[nospace]': 'Removes all spaces from the text', '[date]': 'Inserts current date (YYYY-MM-DD)', '[time]': 'Inserts current time (HH:MM)', Frontend provides tags autocomplete, tooltips, and visual indicators for magic tags in the replacements UI.
- You can now trim after or/and before spaces and punctuation separately. - Replacements edition doesn't require to scroll anymore. - Possibility to clone a Replacement item - Better sizing for long texts / large fields
Introduces a global 'replacements_enabled' setting and per-replacement 'enabled' flags to control text replacements. Updates backend, bindings, and UI to support toggling replacements globally and individually, improving flexibility.
New [run] magic tag to allow running shell commands with text in parameter
Example:
[run]"C:\Windows\notepad.exe"
[run]"start cmd /k echo {text_nospace_nopunctuation}"
{text}: The original text.
{text_nospace}: The text without spaces.
{text_nopunctuation}: The text without punctuation.
{text_nospace_nopunctuation}: The text without spaces or punctuation.
Replaced by power on / off icon instead of an eye
English and French added
Add Replacements keys to the German, Spanish, Japanese, Vietnamese and Chinese locale files. Add tab naming support.
|
I'm going to be honest this adds a lot of UI for a marginal feature in my opinion. There is already a feature which does this, in addition to post-processing. I'm not sure we need a 3rd option as well. The magic commands are quite interesting. I will have to give a deeper look soon. I'm not quite decided on this overall and I will come back with some more in depth feedback The UI is generally well laid out but seems to add a lot of complexity which I'm not sure is a good tradeoff Again paging @dannysmith for an opinion. |
|
Hello, just to say this feature is exactly what I need! It would be really great even just to be able to add new lines and other punctuation. Post processing with LLM doesn't really work for me. Thanks for creating it and I really hope that it makes it into Handy. |
|
@dannysmith maybe this is something like "developer mode" or something like I suggested in #454? im just very skeptical of something this complex making it into the main ui for mainstream users. ive also not had a chance to review the code yet, so not sure what the implementation looks like. it's possible developer/power mode could open extra keyboard shortcuts up too.. paging @VirenMohindra for feedback as well |
|
I wonder if some standard users could benefit from a "standard" set of pre-set (punctuation) replacements, with more advanced users being able to change the configuration. For certain use cases Handy needs a lot of manual editing afterwards for punctuation - for instance writing sections of prose with speech - because there is no way to insert punctuation. |
|
perhaps @jamaggs, I generally favor having good defaults that work for 90% of people out of the box. I haven't considered the prose use-case a ton but makes a lot of sense. If you could provide which ones would help you the most that would be great fwiw post-processing largely solves this for me I believe, using qwen3-a30b has generally been a good model from my experience. but this is a much heavier handed way of doing things. |
|
Hello I think that for out of the box options, mimicking the replacements that the Microsoft dictation has under the punctuation heading would be good. It probably also eases usability (I'm forced to use MS at work and others may be too, making the consistency good).
If there is the option for power users to refine then great. For instance, a notable omission from the Microsoft list is the em dash which is used quite a bit in prose. I am coming at this from the perceptive of an English speaker but it would probably be a different list in different languages. |
|
Thanks for your first feedback :)
Do you mean on the user-facing side, or on the code/implementation side? From the user’s perspective, if it doesn't deserve a dedicated tab in the main settings menu, this UI could easily live behind a “Replacements” button inside the Post‑Processing tab. I deliberately tried to keep the default experience lightweight: when no rules exist, the form is essentially empty. The intent was to make something approachable. A user can start with just two fields: “word to replace” and “replacement” and ignore everything else.
To be honest I don't get this point. I’m not sure how Handy is typically used on your side, but in my own usage it’s hard to imagine using it without some form of replacement layer. Otherwise, users must manually edit nearly every sentence to add punctuation, line breaks, parentheses, etc. Yes, some LLMs can manage punctuation reasonably well, but:
When you say this, do you mean there is already a way to define explicit replacements today? If so, could you point me to where this is currently possible?
Alright 🙂. My feeling is that rule-based replacements are easier to grasp than LLM-based post‑processing, especially for non‑technical users.
Yes and this is where I think a lot of the long‑term value lies. Magic commands open the door to many workflows, currently it already offer few possibilities, for example:
More importantly, they allow users to pipe their text into any external binary for custom processing. Offering this level of openness will almost certainly lead to unexpected and creative use‑cases from advanced users.
Do you really feel it is complex from the user’s point of view? I understand that the PR itself may look intimidating, but in practice the UI can be used at a very basic level with just two fields. Everything else is optional and intended for advanced usage. If the feature is primarily aimed at advanced users, alternatives could be:
Conceptually, this is simply post‑processing without an LLM that's why I have created a dedicated tab.
I’m confident that this is already true. I will provide an English punctuation preset soon. The current UI already makes it quite easy to create such rules (as shown in the video), and we could initially base them on Microsoft’s punctuation guidelines (even if they’re not always perfect imo). This would also allow you to concretely test the import/export feature 😁. About current post processing, I may be underestimating the current post‑processing feature, but for users who are not already familiar with LLMs, Handy does not yet really hold their hand:
In that context, replacements provide immediate value with almost zero friction, while also enabling things that LLMs simply do not offer. By the way, there is one additional idea I deliberately did not include yet: That said, this illustrates the broader idea: Magic Commands are meant to let users send their text through any kind of processing pipeline, very easily without hardcoding every possible use‑case into Handy itself. |
|
Okay thank you for a detailed response and also a genuine and interesting position. I think there are a lot of PR's that honestly are not always as well thought out. A lot are entirely AI slop, with a glimmer of an idea behind them, but not necessarily full rationale. So I often am coming at things from a defensive side as the app's initial purpose is quite slim, and there are a lot of outstanding issues that I typically put as priority in front of features.
mostly on the user facing side. to be precise it's around:
For me Handy is used primarily for programming with LLM's or building up very specific context windows where spelling, punctuation, and precise language is not necessary because a human is not reading it. I do occasionally use it for other things, like typing parts of this message, but my speaking voice is also quite different than my written voice, so there's times at which I choose to use one over the other intentionally. The times that I'm writing with my voice I often have to do a lot of editing because my spoken word is a very conversational style that doesn't always lend itself well to reading. So the minor errors for me are even more minor in comparison to the actual syntactical structure of what I'm saying.
I guess this is more the 'dictionary' rather than replacements itself. And upon reading your full message I'm honestly fine to drop/move that feature away from the primary modality potentially.
you are correct.
this is the main reason I support this feature and you overall. you are genuinely thinking, and getting me excited to read the code.
its the ui point from above. "in practice the UI can be used at a very basic level". this is true, but again it's making the "basic level" the primary use case. Not adding anything more than is explicitly necessary, and hiding advanced things in creative ways so power users can still do the things they want. I think the app still has much to grow in this way as I'm just figuring everything out. I'm trying to find the best balance for both people using the app without knowing a thing about computers, and also giving power users the tools they want. (theres a reason, and more than one, that A LOT of stuff is in the debug menu). And to be honest with you, I'm the maintainer of a much bigger repo than I ever had imagined in much less time than I expected. Every PR, GitHub Issue, Discussion, is a learning moment for me still, and this one certainly is in that category. I could address even more, but I just want you to know I support you and I appreciate the contributions you've made. I think there's some small things we can change and I'll sleep on it with more concrete feedback for the PR. I also need to review the code myself still, and really see how things are working. I love the extensibility you're talking about and intrigued. The standout feedback piece is just minimizing the amount of UI immediately seen. I don't 100% know if this is a sidebar feature or not, and generally might be another thing needs to be tackled in the discussion #449. I think a UI overhaul is long overdue, and really would like to get it into a place where we genuinely feel there is a 1.0.0 release at some point. We are not there yet, but I think as things get more stable, we will start to finalize on all the things that will go into that release. |
|
@jamaggs, this is a first version of English punctuation rules : handy-replacements-english punctuation-v1.0.json You just have to click on "import" button and select the file to start to use it. To illustrate the action of these replacement rules: I made this english voice sound file (It makes no sense and deliberately overuses punctuation). Then I inject this sound file in handy (tested here with Parakeet v3): Without rules:
With rules imported:
|
|
@schmurtzm looks amazing! I really hope this feature makes it in so that I can make use of this. |
|
just expressing that I would also see big useability improvements from handy with this feature. Then I really don't see anything else missing. What this pull request adds would replace lots of tedious manual corrections I had in the past. |
yeah I think let's try to do this, that's roughly what I was imagining as well
Totally understandable and it's still appreciated and welcome! Hell most of the code in the codebase is written by AI! But I think there is definitely some uses of it which are more tasteful than others. I actually welcome AI generated code especially if a human has already reviewed it.
totally understand, and just know we are both in the same boat!
thats the goal :) It will probably take me a few days before I can give this a proper review as a heads up |
|
@cjpais I tried to make the interface less overwhelming: by default I hide the options in an "Advanced Options" section and I revised the layout of these options to make it more conventional.
|
Put replacement options in an "Advanced Options" section for cleaner UI
|
that actually looks pretty neat |
| #[cfg(not(target_os = "windows"))] | ||
| { | ||
| std::process::Command::new("sh") | ||
| .arg("-c") | ||
| .arg(&cmd_str) | ||
| .spawn() | ||
| .ok(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe the [run] command magic tag allows arbitrary shell command execution. this is a pretty significant security risk. some possible issues~
- voice input could trigger unintended commands (ie "run delete everything")
- no sanitization of
{text}placeholders - command injection possible - no user confirmation before execution
.ok()silently swallows errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to add this we should look into a confirmation dialog before running commands (which seems to add a lot of friction and is the opposite of what this PR intends to do) or a separate setting to enable / disable [run] functionality
regardless, we should sanitize the input for placeholder values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooh yeah we might want to put a pin in this for now, I think we can possibly pick this up in another PR. Largely this is something im curious about, but I would love to be executing in a sandbox where it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obvious point: allowing this even with sanitisation etc introduces a future maintenance burden re ensuring new features don't accidentally introduce code which can potentially exploit it.
VirenMohindra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think unit tests are needed before we can land something of this magnitude, some good examples are
- for accent-insensitive pattern building
- magic tag parsing
- edge cases (empty text, overlapping matches)
we definitely need to add documentation AND a note in the UI warning users about the [run] command's security implications
| let re = match regex::Regex::new(&search_pattern) { | ||
| Ok(re) => re, | ||
| Err(_) => continue, // Skip invalid regex | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would compile on every transcription, we should probably cache it in the Replacement struct right
| let mut replaced_result = corrected_result.trim().to_string(); | ||
| let mut global_transformations = Vec::new(); | ||
|
|
||
| if settings.replacements_enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should extract this into an appropriately named function apply_replacements()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the contract could look something like
fn apply_replacements(text: &str, settings: &AppSettings) -> String| const CREATE_NO_WINDOW: u32 = 0x08000000; | ||
| std::process::Command::new("cmd") | ||
| .args(["/C", &cmd_str]) | ||
| // .creation_flags(CREATE_NO_WINDOW) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we ✂️ ?
| // .creation_flags(CREATE_NO_WINDOW) |
| } | ||
| *text = result; | ||
| } | ||
| MagicTransformation::Run(cmd_template) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might make sense to log invalid regex and surface this to users. generally regex seems scary and not UI-friendly in my opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the very fact that i have to load the diff on github intimates to me that this file might be too large. we should break these into smaller more digestible components
ReplacementItem.tsxReplacementForm.tsxReplacementImportExport.tsx
for easier review. happy to take a second pass at this once we compose this better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're also using inline styles while the codebase generally relies on tailwind classes. think you can throw this over to claude to bridge the gap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjpais we should add rules to avoid typecasting, whenever i see any in a codebase alarm bells start ringing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will add eslint rules for sure, and any should generally be avoided unless there's a good reason
| const MAGIC_TAGS: Record<string, string> = { | ||
| '[lowercase]': 'Converts the entire text to lowercase', | ||
| '[uppercase]': 'Converts the entire text to uppercase', | ||
| '[capitalize]': 'Capitalizes the first letter of each word', | ||
| '[nospace]': 'Removes all spaces from the text', | ||
| '[date]': 'Inserts current date (YYYY-MM-DD)', | ||
| '[time]': 'Inserts current time (HH:MM)', | ||
| '[run]': 'Run a command. Usage: [run]"command {text}"', | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be moved to our new translations files!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably add a rule to look for this kinda stuff when I get around to #409
| d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" | ||
| /> | ||
| </svg> | ||
| {showTooltip && ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: its better to go with a ternary / null combination rather than && to avoid falsy behaviors
| const [search, setSearch] = useState(""); | ||
| const [replace, setReplace] = useState(""); | ||
| const [isRegex, setIsRegex] = useState(false); | ||
| const [trimPunctuationBefore, setTrimPunctuationBefore] = useState(false); | ||
| const [trimPunctuationAfter, setTrimPunctuationAfter] = useState(false); | ||
| const [trimSpacesBefore, setTrimSpacesBefore] = useState(false); | ||
| const [trimSpacesAfter, setTrimSpacesAfter] = useState(false); | ||
| const [capitalization, setCapitalization] = useState<CapitalizationRule>("none"); | ||
| const [editingIndex, setEditingIndex] = useState<number | null>(null); | ||
| const [isAdding, setIsAdding] = useState(false); | ||
| const [filterText, setFilterText] = useState(""); | ||
| const [lastImportedRange, setLastImportedRange] = useState<{start: number, count: number} | null>(null); | ||
| const [showAdvancedOptions, setShowAdvancedOptions] = useState(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the number of states we're using here concerns me, there has to be a better way to handle "modifying" text, and maybe this should be a single state object
| ); | ||
|
|
||
| if (isValid) { | ||
| // Append imported items to existing replacements, allowing duplicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a product decision on why we're allowing duplicates?
| </div> | ||
| </div> | ||
|
|
||
| {/* Trim Spaces: inline controls right after label */} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should compose this in a better way and expose the props which are needed. possibly make these inputs controlled from outside and the handling should be done via hooks like
useReplacementsData()to handle the import / exports and initial loading- utils for helper functions
- components for re-use
| ? "bg-mid-gray/30 text-white" | ||
| : "hover:bg-mid-gray/20" | ||
| }`} | ||
| title="Force Uppercase" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
translations if possible!
| <span>{t('settings.replacements.addNew')}</span> | ||
| </Button> | ||
| )} | ||
| {isAdding && renderForm()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we looked into using an actual form package? i have had great success with https://react-hook-form.com/
|
Aside: after seeing so damm many AI slop PRs and subsequent AI slop comments... Reading this discussion is ❤️ (I've been busy - apologies for not chiming in on this properly 🙂) |
|
I want to chime on on the UI question here, because I think this UI is too complicated for your average user of Handy, but I also think @schmurtzm has done a good job of making it as un-complicated as possible. A feature like this is inherently complicated and any UI for it will likely be unintuitive to a lot of people. If it wasn't this PR wouldn't have needed the (very good ❤️) video explaining how it works.
I disagree. I can't imagine any non-technical users being able to get this set up without referring to some sort of documentation or tutorial video – which non-technical users aren't gonna do. Since #391 macOS users can do this by choosing "Apple Intelligence" and adding a post=processing rule which says "Yo, replace stuff like "full stop" with an actual fullstop and "newline" with an actual newline. And make sure everything is well formatted. Like apple dictation, ya know?" and it'd probably work fine. The UI here reminded me of the settings for iTerm2 or Drafts - extremely powerful and flexible if you are used to looking at "power user" UIs like this and are starting from a mental model which includes "I know what a string is and what string replacement means". I'm certain that the majority of users who are active in this repo/discord could work out how to use this UI...
But I'd guess there are a huge number of users who wouldn't (we already know a lot of new users struggle to understand the General settings pane on first use). The newer UI is waaaaay better, but still. For me, one of the strengths of Handy is that it's pretty easy for normal people to understand. And hopefully after #449 it'll be even moreso. I used to use Voiceink and tbh one of the reasons I stopped is because the settings UI got so full of features that I had no fucking idea how anything worked, and no inclination to spend 40 minutes of my life learning how to configure my press-a-button-and-talk-to-my-computer-to-save-me-time app. And I'm definitely a technical user. Having said all that I feel like this is a really valuable feature, which I'd 💯 use myself. Thoughts in no particular order:
@schmurtzm @VirenMohindra @cjpais Just wanna repeat how nice it is to see people who care about quality doing good work. This repo feels like OSS used to feel, but maybe that's just because I've not been coding much for the last ~8 years ❤️ |
Please do not point out that Handy has clearly consumed > 40m of my time 😂 |
|
this feature is great and it would be definitely needed! Maybe add it under advanced to scare off normal users, but please merge this! |
I agree with this 1000%. This software is great, but with this addition, it becomes downright magical! Personally, I built (with a lot of difficulty) the executable with this pull request integrated. I now know that I won't be updating Handy until this PR has been integrated into the software (official release). Once the regexes are perfectly configured according to what you want to say (pronounce) for each replacement, it's just magical. It's a shame that there isn't at least a trial version (unofficial release) available for those who would like to try out this feature. I'm running Windows 11. I've put my W11 version here, if it can help anyone: https://drive.google.com/drive/folders/1CWaIn65c-YvUDeCUQ3kT_zhqm-tRh9V8?usp=sharing Because it's easier to judge by testing than just watching schmurtzm's explanatory video and reading the content of this discussion. So I really hope, too, that one day this PR will be added. It seems so obvious to me. And indeed, if it remains hidden in the developer menu, the software is still just as easy to use for novice users. So I really don't see what the problem is with integrating it into the advanced menu. In any case, a big thank you to everyone who contributes to this great software.
|
|
I appreciate the overwhelming support. It's going to make it in, it will just take me some time to ultimately merge it. I'm traveling. Let's keep this thread fairly focused on the development effort. To show support you can heart this message |
🧪 Test Build ReadyBuild artifacts for PR #455 are available for testing. Download artifacts from workflow run Artifacts expire after 30 days. |
|
I just tested this and this is a really great functionality. Good improvement! Some questions and observations:
|







✨ New Feature: Text Replacements
This new feature introduces a simple and intuitive interface for performing text replacements within Handy.
To help better understand this feature in practice, here is a tutorial video (with sound):
Handy.Replacements.mp4
🎯 Purpose and Capabilities
The interface is designed to be simple, offline, powerful, and intuitive.
It allows users to:
🛠️ Replacement Management
It supports both basic and advanced usage through:
The Replacements interface also provides tools to efficiently manage rules:
The UI visually highlights:
🔮 About Magic Commands
Available Magic Commands
[lowercase]Converts the entire current phrase to lowercase
"transform in lowercase" → [lowercase]
"Hello World transform in lowercase"→"hello world"[uppercase]Converts the entire phrase to UPPERCASE
Example:
"Hello"→"HELLO"[capitalize]Capitalizes the first letter of each word
Example:
"jean dupont"→"Jean Dupont"[nospace]Removes all spaces from the phrase
Example:
"a test"→"atest"[date]Inserts the current date YYYY-MM-DD (depending your current region settings)
[time]Inserts the current time (HH:MM)
[run]"command…"Executes the specified command and prevents any transcription output for that trigger
The following placeholders can be used inside
[run]commands:{text}— full phrase{text_nospace}— text without spaces{text_nopunctuation}— text without punctuation{text_nospace_nopunctuation}— text without spaces or punctuationExample:
[run]"cmd /k echo {text_nospace_nopunctuation}"French punctuation sample: handy-french-punctuation-v1.0.json