Skip to content

Conversation

camc314
Copy link
Contributor

@camc314 camc314 commented Aug 11, 2025

When using project services, ToFileNameLowerCase is called 10, 312, 056 times on various paths, this means that it's incredibly hot path.

Since most projects won't hav e non-ascii filenames, we can optimize for this case. This yields up to a 5x perf improvement in some scenarios.

Please feel free to close, if the perf degregation againt non-ascii file paths outweighs the perf gains for the common case

Test Case Likelihood Category Old (ns) New (ns)
All lowercase ASCII short High 45.1 8.5
All uppercase ASCII short Medium 71.8 37.9
Mixed case ASCII short High 62.2 41.9
Mixed case ASCII longer High 134.0 60.9
Non-ASCII without İ (e.g., ß) Low 152.0 166.7
Non-ASCII with İ Very low 147.5 156.8
Non-ASCII with ı Low 128.2 138.1
Long mixed ASCII Low 1260.6 547.8

@jakebailey
Copy link
Member

jakebailey commented Aug 11, 2025

Can you include some benchmarks that show the difference? As in, Go benchmarks in path_test.go?

I'm not surprised by this PR, of course; strings.ToLower makes this same optimization, we just can't use it directly. (It might be worth simply using the same code as ToLower and tweaking it for the special case.)

@camc314
Copy link
Contributor Author

camc314 commented Aug 11, 2025

Can you include some benchmarks that show the difference? As in, Go benchmarks in path_test.go?

mind clarifying exactly what your looking for?

func BenchmarkToFileNameLowerCase(b *testing.B) {

exists already, which is what i've been using to bench the perf differences. are you looking for a bench function that compares the two (old vs new impl)

@camc314
Copy link
Contributor Author

camc314 commented Aug 11, 2025

@microsoft-github-policy-service agree

1 similar comment
@camc314
Copy link
Contributor Author

camc314 commented Aug 11, 2025

@microsoft-github-policy-service agree

@jakebailey
Copy link
Member

Ah, yes. I didn't realize that's what you were doing. I think I was expecting results through https://pkg.go.dev/golang.org/x/perf/benchstat.

Comparing main and this PR with go test -run=- -bench='BenchmarkToFileNameLowerCase' -benchmem -count=10 ./internal/tspath:

goos: linux
goarch: amd64
pkg: github.com/microsoft/typescript-go/internal/tspath
cpu: Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz
                                                     │   old.txt    │               new.txt               │
                                                     │    sec/op    │   sec/op     vs base                │
ToFileNameLowerCase//path/to/file.ext-20               58.495n ± 2%   9.892n ± 1%  -83.09% (p=0.000 n=10)
ToFileNameLowerCase//PATH/TO/FILE.EXT-20               101.25n ± 1%   45.84n ± 0%  -54.73% (p=0.000 n=10)
ToFileNameLowerCase//path/to/FILE.EXT-20                85.81n ± 2%   46.68n ± 1%  -45.60% (p=0.000 n=10)
ToFileNameLowerCase//user/UserName/proje...etc-20      186.45n ± 1%   97.24n ± 1%  -47.84% (p=0.000 n=10)
ToFileNameLowerCase//user/UserName/proje...etc#01-20    206.2n ± 1%   222.8n ± 1%   +8.05% (p=0.000 n=10)
ToFileNameLowerCase//user/UserName/proje...etc#02-20    202.5n ± 1%   215.2n ± 1%   +6.25% (p=0.000 n=10)
ToFileNameLowerCase//user/UserName/proje...etc#03-20    174.4n ± 1%   190.2n ± 1%   +9.06% (p=0.000 n=10)
ToFileNameLowerCase/FoO/FoO/FoO/FoO/FoO/...etc-20      1623.5n ± 1%   952.5n ± 1%  -41.33% (p=0.000 n=10)
geomean                                                 180.4n        107.5n       -40.38%
                                                     │   old.txt    │                new.txt                │
                                                     │     B/op     │    B/op     vs base                   │
ToFileNameLowerCase//path/to/file.ext-20               0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase//PATH/TO/FILE.EXT-20               24.00 ± 0%     24.00 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase//path/to/FILE.EXT-20               24.00 ± 0%     24.00 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase//user/UserName/proje...etc-20      48.00 ± 0%     96.00 ± 0%  +100.00% (p=0.000 n=10)
ToFileNameLowerCase//user/UserName/proje...etc#01-20   48.00 ± 0%     48.00 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase//user/UserName/proje...etc#02-20   48.00 ± 0%     48.00 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase//user/UserName/proje...etc#03-20   48.00 ± 0%     48.00 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase/FoO/FoO/FoO/FoO/FoO/...etc-20      416.0 ± 0%     832.0 ± 0%  +100.00% (p=0.000 n=10)
geomean                                                           ²                +18.92%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
                                                     │   old.txt    │                new.txt                │
                                                     │  allocs/op   │ allocs/op   vs base                   │
ToFileNameLowerCase//path/to/file.ext-20               0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase//PATH/TO/FILE.EXT-20               1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase//path/to/FILE.EXT-20               1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase//user/UserName/proje...etc-20      1.000 ± 0%     2.000 ± 0%  +100.00% (p=0.000 n=10)
ToFileNameLowerCase//user/UserName/proje...etc#01-20   1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase//user/UserName/proje...etc#02-20   1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase//user/UserName/proje...etc#03-20   1.000 ± 0%     1.000 ± 0%         ~ (p=1.000 n=10) ¹
ToFileNameLowerCase/FoO/FoO/FoO/FoO/FoO/...etc-20      1.000 ± 0%     2.000 ± 0%  +100.00% (p=0.000 n=10)
geomean                                                           ²                +18.92%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Which I think matches what you were quoting. Surprised the other cases get slower, honestly.

@camc314
Copy link
Contributor Author

camc314 commented Aug 11, 2025

Ah, yes. I didn't realize that's what you were doing. I think I was expecting results through https://pkg.go.dev/golang.org/x/perf/benchstat.

yess, although I can write go, it's been a while and I am far from an expert, i'm also not up to date with the latest tooling, so there's definitly cases where I could be using a tool to do what i'm doing manually 😅

Surprised the other cases get slower, honestly.

the other cases will have slowed down because we now have to iterate over the string twice, this shouldn't be a problem in 99% of cases since most file paths will be ascii, and won't contain that special i character

@jakebailey
Copy link
Member

the other cases will have slowed down because we now have to iterate over the string twice, this shouldn't be a problem in 99% of cases since most file paths will be ascii, and won't contain that special i character

Or at least for usernames with non-ASCII characters, they'll be early in the path via /home/<name> or /Users/<name> or C:/Users/<name>..

@jakebailey
Copy link
Member

yess, although I can write go, it's been a while and I am far from an expert, i'm also not up to date with the latest tooling, so there's definitly cases where I could be using a tool to do what i'm doing manually 😅

For the record, I do:

$ go test -run=- -bench='BenchmarkToFileNameLowerCase' -benchmem -count=10 ./internal/tspath | tee old.txt
# switch branch
$ go test -run=- -bench='BenchmarkToFileNameLowerCase' -benchmem -count=10 ./internal/tspath | tee new.txt
$ benchstat old.txt new.txt

@jakebailey
Copy link
Member

Just out of curiosity, where did you experience this being a bottleneck?

@Boshen
Copy link

Boshen commented Aug 12, 2025

Just out of curiosity, where did you experience this being a bottleneck?

We are diving into

i.e. running tsgolint in large monorepos (1000+ files).

@camc314
Copy link
Contributor Author

camc314 commented Aug 12, 2025

Just out of curiosity, where did you experience this being a bottleneck?

Yep to add to Boshen's point, i'm testing this on vscode repo (~6100) files, during which, this function is called 10 million + time.

thanks for the extra bechmarking info Jake, i'll be using that next time!

Copy link
Member

@jakebailey jakebailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fuzz test is clean after running for a bit.

(We don't check these in CI, it would be good if we did though they take a while and you need to select a specific test to fuzz, not just ask for all to be fuzzed.)

@jakebailey jakebailey enabled auto-merge August 12, 2025 17:52
@jakebailey jakebailey added this pull request to the merge queue Aug 12, 2025
Merged via the queue into microsoft:main with commit ae0b6e5 Aug 12, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants