Skip to content

Conversation

josharian
Copy link
Collaborator

We could get much fancier than this,
but after running this with a day it appears to help some,
and it is nice and simple.

I propose that we declare that it fixes #1658,
at least for now.

Checklist

  • [/] I have added tests
  • [/] I have updated the docs and cheatsheet
  • [/] I have not broken the cheatsheet

@josharian josharian requested a review from pokey as a code owner August 2, 2023 23:34
@josharian
Copy link
Collaborator Author

josharian commented Aug 2, 2023

I plan to keep running this for a little while longer, gathering data, but I thought I would share it in case anyone else wants to play with it.

(I know the tests are busted.)

@josharian josharian marked this pull request as draft August 3, 2023 00:33
Copy link
Member

@pokey pokey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with minor tweak

I propose that we declare that it fixes cursorless-dev#1658,
at least for now.
@josharian josharian force-pushed the josh/no-first-word branch from 0385b48 to da3c7f1 Compare August 8, 2023 02:35
@josharian
Copy link
Collaborator Author

here's another rev. lots of tests are still failing; it's going to be tedious to fix them, so I'd like to wait until we are relatively confident in the rest of the direction.

@josharian
Copy link
Collaborator Author

notes to self:

  • correctly handle _abcTest (are we avoiding _ or a?)
  • perf test
  • maybe re-use tokenizers
  • switch to ranges
  • tests: stats, fixtures
  • data gathering for end users
    • no phones/replace
    • jsonl
    • open append/exclusive
    • command payload
    • rotate monthly
    • include extension version

@pokey
Copy link
Member

pokey commented Jun 20, 2024

update: @AndreasArvidsson is going to have a look and take this one home if it's pretty close to mergeable in its current form

@josharian
Copy link
Collaborator Author

update: @AndreasArvidsson is going to have a look and take this one home if it's pretty close to mergeable in its current form

great, thanks!

@AndreasArvidsson
Copy link
Member

@josharian Have you evaluated the difference between just avoiding the first character in the token verses the first character in every subword? When I first thought about this problem I kinda just envisioned the first character in the token, but your implementation is doing every subword which could be better. Any insight?

@josharian
Copy link
Collaborator Author

I remember thinking at the time that doing sub words was important. But It is not something I ever gathered data about, because the effects are purely qualitative. And a lot of time has now gone by…

@AndreasArvidsson AndreasArvidsson marked this pull request as ready for review June 25, 2024 10:39
@AndreasArvidsson AndreasArvidsson self-requested a review as a code owner June 25, 2024 10:39
@AndreasArvidsson AndreasArvidsson requested a review from a team as a code owner February 22, 2025 14:02
@AndreasArvidsson
Copy link
Member

AndreasArvidsson commented Feb 22, 2025

I just did some performance tests. Using a single editor with typescript the hat allocation went from about 6ms to 8ms. Percentage wise quite a lot, but two milliseconds we can live with.

Copy link
Member

@AndreasArvidsson AndreasArvidsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josharian Thank you for your immense patience. I have finally had time to properly evaluate this. Great work!

@AndreasArvidsson AndreasArvidsson added this pull request to the merge queue Feb 22, 2025
Merged via the queue into cursorless-dev:main with commit fa01ba9 Feb 22, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

choose hats to avoid the Stroop effect?

3 participants