avoid allocating hats to the first letter of a token #1723

josharian · 2023-08-02T23:34:26Z

We could get much fancier than this,
but after running this with a day it appears to help some,
and it is nice and simple.

I propose that we declare that it fixes #1658,
at least for now.

Checklist

[/] I have added tests
[/] I have updated the docs and cheatsheet
[/] I have not broken the cheatsheet

josharian · 2023-08-02T23:34:48Z

I plan to keep running this for a little while longer, gathering data, but I thought I would share it in case anyone else wants to play with it.

(I know the tests are busted.)

pokey

Looks good with minor tweak

packages/cursorless-engine/src/tokenGraphemeSplitter/tokenGraphemeSplitter.ts

I propose that we declare that it fixes cursorless-dev#1658, at least for now.

josharian · 2023-08-08T02:36:16Z

here's another rev. lots of tests are still failing; it's going to be tedious to fix them, so I'd like to wait until we are relatively confident in the rest of the direction.

...rless-engine/src/processTargets/modifiers/scopeHandlers/WordScopeHandler/WordScopeHandler.ts

josharian · 2023-08-12T01:32:56Z

notes to self:

correctly handle _abcTest (are we avoiding _ or a?)
perf test
maybe re-use tokenizers
switch to ranges
tests: stats, fixtures
data gathering for end users
- no phones/replace
- jsonl
- open append/exclusive
- command payload
- rotate monthly
- include extension version

pokey · 2024-06-20T10:20:39Z

update: @AndreasArvidsson is going to have a look and take this one home if it's pretty close to mergeable in its current form

josharian · 2024-06-25T00:20:53Z

update: @AndreasArvidsson is going to have a look and take this one home if it's pretty close to mergeable in its current form

great, thanks!

AndreasArvidsson · 2024-06-25T04:01:19Z

@josharian Have you evaluated the difference between just avoiding the first character in the token verses the first character in every subword? When I first thought about this problem I kinda just envisioned the first character in the token, but your implementation is doing every subword which could be better. Any insight?

josharian · 2024-06-25T04:19:42Z

I remember thinking at the time that doing sub words was important. But It is not something I ever gathered data about, because the effects are purely qualitative. And a lot of time has now gone by…

…into josh/no-first-word

AndreasArvidsson · 2025-02-22T14:32:51Z

I just did some performance tests. Using a single editor with typescript the hat allocation went from about 6ms to 8ms. Percentage wise quite a lot, but two milliseconds we can live with.

AndreasArvidsson

@josharian Thank you for your immense patience. I have finally had time to properly evaluate this. Great work!

josharian requested a review from pokey as a code owner August 2, 2023 23:34

josharian marked this pull request as draft August 3, 2023 00:33

pokey reviewed Aug 4, 2023

View reviewed changes

packages/cursorless-engine/src/tokenGraphemeSplitter/tokenGraphemeSplitter.ts Outdated Show resolved Hide resolved

avoid allocating hats to the first letter of a word in a token

da3c7f1

I propose that we declare that it fixes cursorless-dev#1658, at least for now.

josharian force-pushed the josh/no-first-word branch from 0385b48 to da3c7f1 Compare August 8, 2023 02:35

pokey reviewed Aug 8, 2023

View reviewed changes

...rless-engine/src/processTargets/modifiers/scopeHandlers/WordScopeHandler/WordScopeHandler.ts Show resolved Hide resolved

fix tests

b33055b

pokey assigned AndreasArvidsson Jun 20, 2024

AndreasArvidsson added 3 commits June 25, 2024 03:07

Merge branch 'main' into josh/no-first-word

36904fc

testing

5486b7b

clean up

b3bce4d

AndreasArvidsson and others added 8 commits June 25, 2024 10:25

Added comment

0cd2090

Fix merge conflict in test

021bafa

[pre-commit.ci lite] apply automatic fixes

10dc73e

Testing

fed82c5

testing

90b4e02

Merge branch 'josh/no-first-word' of github.com:josharian/cursorless …

ba616c7

…into josh/no-first-word

update

5ebd10c

Update tests

44aeec7

AndreasArvidsson marked this pull request as ready for review June 25, 2024 10:39

AndreasArvidsson self-requested a review as a code owner June 25, 2024 10:39

Merge branch 'main' into pr/josharian/1723

754b5b6

AndreasArvidsson requested a review from a team as a code owner February 22, 2025 14:02

Clean up comment

5abccf4

AndreasArvidsson added 2 commits February 22, 2025 15:53

Clean up

223943b

More refactoring

d09e752

AndreasArvidsson approved these changes Feb 22, 2025

View reviewed changes

AndreasArvidsson added this pull request to the merge queue Feb 22, 2025

Merged via the queue into cursorless-dev:main with commit fa01ba9 Feb 22, 2025
16 checks passed

Uh oh!

avoid allocating hats to the first letter of a token #1723

avoid allocating hats to the first letter of a token #1723

Uh oh!

Conversation

josharian commented Aug 2, 2023

Checklist

Uh oh!

josharian commented Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pokey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

josharian commented Aug 8, 2023

Uh oh!

Uh oh!

josharian commented Aug 12, 2023

Uh oh!

pokey commented Jun 20, 2024

Uh oh!

josharian commented Jun 25, 2024

Uh oh!

AndreasArvidsson commented Jun 25, 2024

Uh oh!

josharian commented Jun 25, 2024

Uh oh!

AndreasArvidsson commented Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreasArvidsson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

josharian commented Aug 2, 2023 •

edited

Loading

AndreasArvidsson commented Feb 22, 2025 •

edited

Loading