Skip to content

Conversation

@Earlopain
Copy link
Collaborator

@Earlopain Earlopain commented Jan 24, 2026

Using it seems pretty bad for performance:

require "benchmark/ips"
require "prism"
require "ripper"

codes = Dir["**/*.rb"].map { File.read(it) }

Benchmark.ips do |x|
  x.report("prism") { codes.each { Prism::Translation::Ripper.lex(it) } }
  x.report("ripper") { codes.each { Ripper.lex(it) } }
  x.compare!
end
# Before
ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [x86_64-linux]
Warming up --------------------------------------
               prism     1.000 i/100ms
              ripper     1.000 i/100ms
Calculating -------------------------------------
               prism      0.319 (± 0.0%) i/s     (3.14 s/i) -      2.000 in   6.276154s
              ripper      0.647 (± 0.0%) i/s     (1.54 s/i) -      4.000 in   6.182662s

Comparison:
              ripper:        0.6 i/s
               prism:        0.3 i/s - 2.03x  slower
# After
ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [x86_64-linux]
Warming up --------------------------------------
               prism     1.000 i/100ms
              ripper     1.000 i/100ms
Calculating -------------------------------------
               prism      0.482 (± 0.0%) i/s     (2.08 s/i) -      3.000 in   6.225603s
              ripper      0.645 (± 0.0%) i/s     (1.55 s/i) -      4.000 in   6.205636s

Comparison:
              ripper:        0.6 i/s
               prism:        0.5 i/s - 1.34x  slower

vernier tells me it does method_missing even for explicitly defined methods like location. Probably that is for [].

@Earlopain Earlopain force-pushed the ripper-translator-no-delegate branch 3 times, most recently from ebf61ff to 111c256 Compare January 24, 2026 17:52
# little easier to work with. We delegate all other methods to the array.
class Token < SimpleDelegator
# @dynamic initialize, each, []
class Token < BasicObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not just subclass Array?
That would avoid an extra indirection and (in theory at least) be more compatible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this but then ignores the == from the subclass. Somehow also slower on cruby

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, that's subtle, I think it's because of these semantics: https://github.com/truffleruby/truffleruby/blob/a48ecf894f19636f09235f5ac70ea6d9889be255/src/main/ruby/truffleruby/core/array.rb#L129-L132
i.e. if other is not an Array then always use other == self, but if other is an Array then just compare all elements and don't call other.==

Copy link
Member

@eregon eregon Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change turned out to reveal a bug in TruffleRuby, as TruffleRuby wouldn't destructure/splat such "BasicObject arrays" since BasicObject doesn't have respond_to?.
Fixed in truffleruby/truffleruby#4128

Fixing/revealing that bug is good, but perf-wise having to go through that slow path coercion (rb_convert_type()) doesn't seem great, I think we should use regular arrays like ::Ripper and then maybe have some private constants so we can do things like:

token[LOCATION]
token[EVENT]
token[VALUE]
token[STATE]

Any thoughts on that?

Copy link
Member

@eregon eregon Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Token subclasses we'd still need to use those to have the custom ==, but for instances of Token we could use a plain Array instead.
It would also mean less allocations which is always helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this https://github.com/ruby/prism/pull/3868/changes#r2724651412? Sorry, I didn't see that comment at all. Would defining to_ary do it?

Copy link
Member

@eregon eregon Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would still go through rb_convert_type() but more direct than without:
https://github.com/truffleruby/truffleruby/blob/4180ab960fe223d333e86b6f3263eb1e29d8f942/src/main/ruby/truffleruby/core/type.rb#L301
vs
https://github.com/truffleruby/truffleruby/blob/4180ab960fe223d333e86b6f3263eb1e29d8f942/src/main/ruby/truffleruby/core/type.rb#L303

So yeah that'd be good but even better would be to use real arrays as I'd think e.g. all method lookups in rb_convert_type() are uncached on CRuby, and likely megamorphic on TruffleRuby.

@eregon
Copy link
Member

eregon commented Jan 24, 2026

Great speedup and nice to the extra require.
I wonder if we could just subclass Array though.

Using it seems pretty bad for performance:

```rb
require "benchmark/ips"
require "prism"
require "ripper"

codes = Dir["**/*.rb"].map { File.read(it) }

Benchmark.ips do |x|
  x.report("prism") { codes.each { Prism::Translation::Ripper.lex(it) } }
  x.report("ripper") { codes.each { Ripper.lex(it) } }
  x.compare!
end
```

```
# Before
ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [x86_64-linux]
Warming up --------------------------------------
               prism     1.000 i/100ms
              ripper     1.000 i/100ms
Calculating -------------------------------------
               prism      0.319 (± 0.0%) i/s     (3.14 s/i) -      2.000 in   6.276154s
              ripper      0.647 (± 0.0%) i/s     (1.54 s/i) -      4.000 in   6.182662s

Comparison:
              ripper:        0.6 i/s
               prism:        0.3 i/s - 2.03x  slower
# After
ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [x86_64-linux]
Warming up --------------------------------------
               prism     1.000 i/100ms
              ripper     1.000 i/100ms
Calculating -------------------------------------
               prism      0.482 (± 0.0%) i/s     (2.08 s/i) -      3.000 in   6.225603s
              ripper      0.645 (± 0.0%) i/s     (1.55 s/i) -      4.000 in   6.205636s

Comparison:
              ripper:        0.6 i/s
               prism:        0.5 i/s - 1.34x  slower
```

`vernier` tells me it does `method_missing` even for explicitly defined methods like `location`.
@Earlopain Earlopain force-pushed the ripper-translator-no-delegate branch from 111c256 to 2ea8139 Compare January 24, 2026 19:35
# We want to pretend that this is just an Array.
def ==(other) # :nodoc:
@array == other
end
Copy link
Member

@eregon eregon Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For speed it might be relevant to define to_ary, since that will be checked by Array#== and more methods when using this Token as an Array (e.g. https://github.com/truffleruby/truffleruby/blob/a48ecf894f19636f09235f5ac70ea6d9889be255/src/main/ruby/truffleruby/core/array.rb#L130).
Defining to_ary will avoid going through respond_to_missing? for that case.

@kddnewton kddnewton merged commit e11d393 into ruby:main Jan 24, 2026
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants