-
Notifications
You must be signed in to change notification settings - Fork 30
Update lobsters #358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update lobsters #358
Conversation
|
Profile after the upgrade: https://share.firefox.dev/4hrxsdY Observations:
Nothing else very obvious just yet, after that it's all |
Holy cow, it's digesting the entire backtrace to make a unique key for queries: https://github.com/charkost/prosopite/blob/36f7f5a4fa800fcfa1f9bef5cae94c9739b128fc/lib/prosopite.rb#L263-L272 |
|
Thanks for pinging me, I'm very curious to hear what you find. Though we don't have APM set up, I think we would've noticed a prod slowdown that big, so Prosopite or another dev/test change seems plausible. I should mention two in-progress projects, neither of which has merged to
(EDIT: I have been scratching my head for two days trying to remember where I first learned of Phlex, and it just clicked that it's this thread that also includes some criticism from you. Small world!) |
Note that the benchmark runs in production environment. AFAICT this Prosopite overhead isn't dev/test only.
Not really, we're only testing a handful of endpoint, and I had to get rid of some queries. It's probably a tons of work to migrate, but that's your call, doesn't affect us much.
Probably yes. We can always chose to stop following lobsters and just freeze the codebase forever, just to Rails upgrades once in a while. |
|
Worth profiling and maybe looking at how much code YJIT generates before/after since the change is so massive, also yjit insn executed count, percentage in YJIT. |
Before: After: |
|
Alright, so a major piece of that was my mistake. I thought the benchmark had caching entirely disabled, but actually no. If I re-enable the cache like it was before, it's already much closer: Now it's only about 50% slower. I'll see if I can figure out where the remaining difference comes from. |
While upgrading the lobsters benchmark in yjit-bench I noticed 2.8% of the overall time spent in `Dir[]`, all coming from FileUpdateChecker, which shouldn't be a think in production. After tracking it down, it comes from mission-control. Profile: https://share.firefox.dev/4hBdygq Ref: ruby/ruby-bench#358
|
Found a 3% regression caused by mission-control: rails/mission_control-jobs#241 I'll keep digging. |
|
Found a regression in |
|
Seems like a big regression is the |
While upgrading the lobsters benchmark in yjit-bench I noticed 2.8% of the overall time spent in `Dir[]`, all coming from FileUpdateChecker, which shouldn't be a think in production. After tracking it down, it comes from mission-control. Profile: https://share.firefox.dev/4hBdygq Ref: ruby/ruby-bench#358
While upgrading the lobsters benchmark in yjit-bench I noticed 2.8% of the overall time spent in `Dir[]`, all coming from FileUpdateChecker, which shouldn't be a think in production. After tracking it down, it comes from mission-control. Profile: https://share.firefox.dev/4hBdygq Ref: ruby/ruby-bench#358
|
Another important change seems to be: get "/threads" => "comments#threads"That became: get "/threads" => "comments#user_threads"And that new action a at least one order of magnitude more costly. But in a way that is legit, because the old action would simply redirect back to |
|
Alright, I went over all routes one by one, the final big difference comes from Somehow the Noah had left a comment in the route generator that seems related: Which makes me wonder if perhaps he had disabled some part of the action before, I haven't yet identified what. |
|
Something interesting, but not necessarily a huge deal. The newer version of lobsters has way more handcrafted SQL, which causes Active Record to attempt to parse some of it to get some metadata. E.g.: Comment
.joins(<<~SQL
inner join (
with recursive discussion as (
select
c.id,
0 as depth,
(select count(*) from comments where parent_comment_id = c.id) as reply_count,
cast(confidence_order as char(#{Comment::COP_LENGTH})) as confidence_order_path
from comments c
where
thread_id in (#{thread_ids.join(", ")}) and
parent_comment_id is null
union all
select
c.id,
discussion.depth + 1,
(select count(*) from comments where parent_comment_id = c.id),
cast(concat(
discussion.confidence_order_path, 3 * (depth + 1),
c.confidence_order
) as char(#{Comment::COP_LENGTH}))
from comments c join discussion on c.parent_comment_id = discussion.id
)
select * from discussion as comments
) as comments_recursive on comments.id = comments_recursive.id
SQL
)
.order("comments.thread_id desc, comments_recursive.confidence_order_path")
.select('
comments.*,
comments_recursive.depth as depth,
comments_recursive.reply_count as reply_count
')Here since the And since the string is pretty big, that end up accounting for 2% of the runtime on |
THe initial snapshot was from a few years back, the application changed a lot since then. I had to run a bunch of migrations and had to fiddle with various sqlite3/msyql incompatibilities. Also some route changes, but with a bit of effort I managed to make it work again. What is interesting is it seems lobste.rs became 2-3-x as slow in these couple years, I haven't yet profiled to know why. This branch: ``` itr: time ruby#1: 1532ms ruby#2: 1013ms ruby#3: 910ms ruby#4: 852ms ruby#5: 851ms ruby#6: 901ms ruby#7: 857ms ruby#8: 886ms ruby#9: 875ms ``` main: ``` itr: time ruby#1: 884ms ruby#2: 482ms ruby#3: 344ms ruby#4: 358ms ruby#5: 300ms ruby#6: 316ms ruby#7: 315ms ruby#8: 328ms ruby#9: 309ms ```
202d155 to
d332d91
Compare
|
Since I don't see any more obvious illegitimate slowdowns, here's main: updated-lobsters: Based on my profiling, most of the slowdown comes from logic that has moved from Ruby to SQL queries. Now there's the question of whether we wish to upgrade or not, given that arguably this newer version of lobsters has much less Ruby code in it. On the other hand, the displayed speedup is more in line with what Rails apps seem to experience in production, so perhaps it's actually a good thing? Dunno. |
|
Thanks for the investigation Jean 🙏 How does the percentage in YJIT compare before/after?
Yeah I'm torn. On the one hand it's nice if the benchmark is bigger. On the other hand we kind of need as much Ruby code as possible to focus our optimization efforts I feel. When YJIT team members try to implement small optimizations, we look at performance differences on lobsters a lot (because it's the biggest benchmark). If the difference is below 1%, it's hard for us to make the call. |
Good question, I need to figure out how to extract that one. |
|
Ok, found it. Full YJIT stats in https://gist.github.com/byroot/e0d86a2ab174b140968f3b9b8e3d439d Before: After: |
|
Tagging @XrXr to have his opinion. If the slowdown is indeed more logic moving to SQL, then we may want to either:
|
We are already. |
XrXr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's the mostly the same amount of ruby work, but there is now more non-ruby workload in the mix to make it slower overall. I share the concern about having a weaker signal for small changes, but a 30% speedup stills seems like a good chunk of the workload. Tracking new changes to keep benchmark relevant is probably important (Rails 8 instead of 7, for example).
I vote that we upgrade.
| @@ -1,75 +1,82 @@ | |||
| source "https://rubygems.org" | |||
|
|
|||
| # Everything except Action Cable. It's unused and it installs native gems. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we upgrade, I'd like to get rid of nio4r, still, since the bench doesn't use Action Cable. We can do that in a separate PR, though.
Jean suggested we could just upgrade Rails while keeping the benchmark largely the same and it shouldn't be too much trouble? We're agreeing on going that path?
👍 |
That sounds good to me too! |
Fix: ruby#358 Before: ``` inline_code_size: 6,752,176 outlined_code_size: 2,067,600 code_region_size: 17,268,736 yjit_alloc_size: 25,934,245 yjit_compile_time: 989.69ms RSS: 393.7MiB MAXRSS: 393.9MiB Average of last 15, non-warmup iters: 299ms Total time spent benchmarking: 27s interp: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin23] yjit: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23] -------- ----------- ---------- --------- ---------- ------------ ----------- bench interp (ms) stddev (%) yjit (ms) stddev (%) yjit 1st itr interp/yjit lobsters 463.9 1.3 299.8 2.6 0.563 1.547 -------- ----------- ---------- --------- ---------- ------------ ----------- ``` After: ``` inline_code_size: 6,808,680 outlined_code_size: 2,178,968 code_region_size: 17,448,960 yjit_alloc_size: 26,006,453 yjit_compile_time: 978.48ms RSS: 380.1MiB MAXRSS: 380.3MiB Average of last 16, non-warmup iters: 296ms Total time spent benchmarking: 27s interp: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin23] yjit: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23] -------- ----------- ---------- --------- ---------- ------------ ----------- bench interp (ms) stddev (%) yjit (ms) stddev (%) yjit 1st itr interp/yjit lobsters 463.1 1.8 296.3 2.2 0.596 1.563 -------- ----------- ---------- --------- ---------- ------------ ----------- ```
Fix: ruby#358 Before: ``` inline_code_size: 6,752,176 outlined_code_size: 2,067,600 code_region_size: 17,268,736 yjit_alloc_size: 25,934,245 yjit_compile_time: 989.69ms RSS: 393.7MiB MAXRSS: 393.9MiB Average of last 15, non-warmup iters: 299ms Total time spent benchmarking: 27s interp: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin23] yjit: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23] -------- ----------- ---------- --------- ---------- ------------ ----------- bench interp (ms) stddev (%) yjit (ms) stddev (%) yjit 1st itr interp/yjit lobsters 463.9 1.3 299.8 2.6 0.563 1.547 -------- ----------- ---------- --------- ---------- ------------ ----------- ``` After: ``` inline_code_size: 6,808,680 outlined_code_size: 2,178,968 code_region_size: 17,448,960 yjit_alloc_size: 26,006,453 yjit_compile_time: 978.48ms RSS: 380.1MiB MAXRSS: 380.3MiB Average of last 16, non-warmup iters: 296ms Total time spent benchmarking: 27s interp: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin23] yjit: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23] -------- ----------- ---------- --------- ---------- ------------ ----------- bench interp (ms) stddev (%) yjit (ms) stddev (%) yjit 1st itr interp/yjit lobsters 463.1 1.8 296.3 2.2 0.596 1.563 -------- ----------- ---------- --------- ---------- ------------ ----------- ```
Fix: #358 Before: ``` inline_code_size: 6,752,176 outlined_code_size: 2,067,600 code_region_size: 17,268,736 yjit_alloc_size: 25,934,245 yjit_compile_time: 989.69ms RSS: 393.7MiB MAXRSS: 393.9MiB Average of last 15, non-warmup iters: 299ms Total time spent benchmarking: 27s interp: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin23] yjit: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23] -------- ----------- ---------- --------- ---------- ------------ ----------- bench interp (ms) stddev (%) yjit (ms) stddev (%) yjit 1st itr interp/yjit lobsters 463.9 1.3 299.8 2.6 0.563 1.547 -------- ----------- ---------- --------- ---------- ------------ ----------- ``` After: ``` inline_code_size: 6,808,680 outlined_code_size: 2,178,968 code_region_size: 17,448,960 yjit_alloc_size: 26,006,453 yjit_compile_time: 978.48ms RSS: 380.1MiB MAXRSS: 380.3MiB Average of last 16, non-warmup iters: 296ms Total time spent benchmarking: 27s interp: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [arm64-darwin23] yjit: ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23] -------- ----------- ---------- --------- ---------- ------------ ----------- bench interp (ms) stddev (%) yjit (ms) stddev (%) yjit 1st itr interp/yjit lobsters 463.1 1.8 296.3 2.2 0.596 1.563 -------- ----------- ---------- --------- ---------- ------------ ----------- ```
|
I found this again while cleaning out some old tabs. I hadn't wanted to distract from your PR work.
I rewrote this because the ERB produced invalid RSS. It's the same reason I'm looking to replace ERB at the view layer, we're constantly firefighting very basic HTML bugs like missing closing tags. Lobsters didn't get slower because it replaced ERB with a builder, it got correct. ERB reminds me very strongly of 1990s C++ where a tool supposedly produced high-performance code, but it achieved it by completely lacking any kind of safety. The C++ culture reinforced it by blaming the developer for bugs that tools could have prevented. If ERB can't help this program produce correct output, it's going to get replaced with one that can. I don't have large amounts of developer time to throw after the unending stream of bugs it produces, and I definitely have too much professional self-respect to accept that correctness should be, or even can be handled by blaming individuals instead of developing reliable systems and processes. I saw the newly-announced herb tooling and I'll give that a few months to see if it produces tools like to address ERB's fundamental shortcomings. I'm not optimistic for staying. To echo your comment: I guess that's expected, can't beat a Ruby DSL with string soup. |
|
THe initial snapshot was from a few years back, the application changed a lot since then.
I had to run a bunch of migrations and had to fiddle with various sqlite3/msyql incompatibilities. Also some route changes, but with a bit of effort I managed to make it work again.
What is interesting is it seems lobste.rs became almost 3x as slow in these couple years, I haven't yet profiled to know why.
This branch:
main:
cc @rwstauner
Also FYI @pushcx, I'll try to let you know if I find the source of this massive slow-down. But perhaps it's just a red herring.