-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Educational project exploring the utility in searching and manipulating codebases using S-expressions.
I once found wide spread use of a "magical" Ruby method which was unnecessary. The intent of this method was to relieve the developer from the repetition of setting instance variables from method parameters. How this magic method did this was difficult to understand for most. Upon examining this method, I noticed it made costly calls to do at run time what could have been done by a developer with a keyboard, or if you wanted to, with code at load time.
I was new to the team and project, and because the use of this method was wide spread, I wanted a systematic and repeatable approach to refactoring it out of existence so that I and my new colleagues could trust the widespread change.
In Chapter 6 of Practical Object-Oriented Design in Ruby by Sandi Metz, part of the discussion is focused on finding the ideal coupling between child and parent classes. One proposal introduced in that chapter is to use hook methods, instead of calling super.
Lets imagine a scenario where we have achieved total consensus in an organization, and the new direction is to dogmatically use hook methods, instead of calling super.
- Replace methods that call super with a hook method
- Modify parent class' implementation of the supered method to call hook methods
We will begin with the classes Bicycle, RoadBike and MountainBike. We will build them up to the state from Managing Coupling Between Superclasses and Subclasses until we can recognize the important parts of the "discernible pattern."
Things we must be able to interogate about this code:
- Which are the children, and which is the parent class?
- Which methods call super, and which is the method that responds to super?
- What in each method that calls super needs to be in the respective hook method?
- What change needs to occur in the method responding to super to leverage the hook methods?
An s-expression for an empty class in Ruby, as parsed by ruby_parser, looks like this:
class Bicycle
ends(:class, :Bicycle, nil)An s-expression has a type and a body. The above s-expression's type is :class and the body is s(:Bicycle, nil).
An s-expression for an empty class with a parent looks like this:
class MountainBike < Bicycle
ends(:class, :MountainBike, s(:const, Bicycle))This s-expression's type is still :class, but the body is: s(:MountainBike, s(:const, :Bicycle)).
An s-expression is a representation of the abstract syntax tree, and the s-expressions generated by ruby_parser use this sexp_body recursion to create that tree.
ruby_parser comes with a class Sexp::Matcher which provides a terse syntax that we can use to select nodes from the s-expression tree.
The Sexp::Matcher expression that matches any class definition is: (class ___). That expression uses the triple underscore ___ wildcard to match anything following a class type s-expression.
The Sexp::Matcher expression that matches any class with an explicit parent is: (class _ (const _) ___). This uses the single underscore _ positional wild card match, and then matches the constant s-expression containing the parent class.
It is also possible to include negation in Sexp::Matcher. A class with an implicit parent does not have the constant s-expression (const _). Right now, our class s-expression matcher, (class ___) matches all our classes. To match only Bicycle we must use negation. That s-expression is (class _ [not? (const _)] ___).
Knowing the syntax for Sexp::Matcher expressions gives us some confidence that we can start iterating on a tool to help us achieve our goal. The implicit expectation in the project name is that a command line interface is provided. To complete an initial release of a command line tool, we'll use the rubygem aruba to help with test setup and teardown.
The sexp command offers a convenient shortcut to the Sexp::Matcher expressions we'll develop. As we figure out the s-expression matchers along the way, we can add to the list of known matchers to create simple shortcuts, like with the builtin sexp find child-class or sexp find parent-class.
- Checkout the tests for examples of how to test drive your own.
- Checkout the implementation to see how easy it is to add one.
What isn't shown in the commit which added the Sexp::Matcher is the trial and error in the console trying to remember the terse rules.
Setting up a unit test can help close that iteration loop. Consider the unit test for SexpCliTools::Matchers::SuperCaller
Allowing users to experiment with s-expressions might enable exploration and discovery. The sexp find command also supports inline s-expressions. Try these in your projects:
-
sexp find '(class ___)'to find class definitions -
sexp find '[child (class ___)]'to find class definitions nested in a namespace -
sexp find '[child (case ___)]'to find case statements
So having test driven the development of the super-caller matcher next we have to find the methods that respond to super.
So far we've been using Sexp::Matcher strings to find quite abstract parts of our code. But, it's completely possible to fill in what in the parts we know that we'd like to find.
-
sexp find '(class :Bicycle ___)'from my working copy of this project turns up the test fixture file forBicycle, as well as the copy of itarubamakes in thetmp/directory for testing purposes. -
sexp find '[child (defn :initialize ___)]'only turns up the test fixture file forRoadBike. I guess it is time to fill in more of ourBicycleclass!
Finding the super implementation will involve finding a class that contains a method defintion. So far, our matchers haven't taken any parameters. A (naive) matcher for a super implementation might have two parameters, the name of the class we expect to define the method, and the name of the method.
Early on I chose to have the second sequence argument to the command line interface sexp find the glob pattern of files to include in the search. However, I want to prioritize matcher parameters for that position now. Although my test coverage didn't include tests for that glob pattern, I did document it.
So, when I moved that out into the --include command line option, that was a breaking change to the public interface. That would necessitate incrementing the major version number according to semantic versioning. I have a hunch that because I'm still in the 0 major release, I could get away with not bumping it. But, I think the --include is something I can stick to.
What I remember about semantic versioning is that additions can just be minor version bumps. So, as long as I don't make a backwards incompatible change to the find command or the --include option I should be good.
Following merge of: ✨ sexp find method-implementation passed_method lists files that define the passed method I'll release v1.0.0! In that PR I chose to do inside-out testing because the aruba tests are a bit slow.
I found it helpful to run just the CLI command tests I was working on using the TEST and TESTOPTS options to the rake test tast, like so:
rake TEST='test/sexp_cli_tools/cli_test.rb' TESTOPTS="--name=/method-implementation/"I believe it is useful to think of Sexp::Matcher patterns as analogous to Regexp patterns.
One of the differences is that a Regexp matches a pattern in a String, whereas Sexp::Matcher matches an s-expressions "as a whole or in a sub-tree".
Revisiting our example of the empy Bicycle class:
class Bicycle
ends(:class, :Bicycle, nil)We can consider that a Regexp /class \w+/ matches part of the String of the contents of the file, and the Sexp::Matcher (class _ ___) similarly matches a sub-tree of the s-expression.
With Sexp::Matcher we could match the whole s-expression tree with: (class _ nil).
A feature of Regexp that I think would get us closer to our goals is something analogous to MatchData. MatchData returned from a Regexp match captures the fragment in the String the Regexp matched.
Our primative class statement Regexp's MatchData would include "class Bicycle". This is roughly similar to using Sexp::Matcher#/ or Sexp::Matcher#search_each which return or yeild the matching sub-trees.
Returned MatchData also includes any capture groups the Regexp matched. If we change our primative Regexp to use a named capture group we'd get a Hash like mapping for our named captured groups to the fragments they captured. So, /class (?<class_name>\w+)/ would return a MatchData with match_data[:class_name] == 'Bicycle'
We need to be able to capture the name of the method that calls super, in order to find the correct method implementation in the superclass to modify with a hook method call. We also need to find the name of the superclass for the subclass with a method that calls super.
We'll next create an API inspired by MatchData and named capture groups. Then, we'll modify our super-caller matcher to capture the name of the method and the name of the superclass.
Given we used test driven development to create our SuperCaller we now have a good basis from which we can explore the impacts of enhancing #satisfy? to return so-called match data.
####### Experimentation notes
- Will our
wont_be :satisfy?ormust_be :satisfy?expectations break if we return something different?- Right now
SexpCliTools::Matchers::SuperCallerisn't a class we've defined, but a simple instance ofSexp::Matcher- Refactor to create a class that passes the current tests.
- Change the definition to be a class with the same name
- Capture the
Sexp::Matcherinstance in a class constant - Define a class method
satisfy?that callssatisfy?on the class constantSexp::Matcher
- Change the
#satisfy?return value
- What is the returned by
Sexp::Matcher#satisfy??- Calling
#tapon the last call in a method and callingbinding.pryis a handy way to check out what a method will return. - I saw
trueorfalse
- Calling
- If we return an instance of a
Structwith fields for the data we want to capture, will it still pass?- Yes an instance of most objects is truthy.
- Define a
Structinternal to ourSuperCallermatcher - If our
Sexp::Matcheris satisfied by the passed in s-expression, return a new instance of thatStruct
- Can we always return an object, or if there is no match, do we need to return something that is already falsey? Can we make an object which is falsey?
- Not sure if this is valuable
- Would allow me to expect that the return could respond to captured data names, even if it didn't match. Don't know if that is actually a benefit.
- Right now
- How can we select the part of the s-expression that contains the method name?
- Setup a failing test expecting a specific method name
- Try a betterspecs.org style nested describe focusing on the
#method_namecapture data- 2 failures and 1 error
- the error is because of
nilreturnNoMethodError
- Try a betterspecs.org style nested describe focusing on the
- Expirement with
Sexp::Matcher#/and see what we can find in the returned/yielded data.- Is an empty
MatchCollectionfalsey?- Replace the call to
MATCHER.satisfy?(sexp)withMATCHER / sexp- An empty
MatchCollectionis not falsey - Change the
if ... satisfy?tounless ... / ... empty?
- An empty
- Replace the call to
- What is the first element of the collection if the
Sexp::Matcheris only looking any sub-tree with a call to super?- Replace the
SexpMatchData.new(:some_method)with a call tobinding.pryand investigate -
MountainBike- The first element is the whole s-expression input
- The second element is the child sub-tree that contains the
supercall, in this case, the spares definition! - The third element is the call sub-tree of the method that contains the
supercall, chainingmergeoff of the return ofsuper. - The last element was the matching subtree, just the
s(:zsuper)for the call tosuperwith no arguments
-
RoadBike- The second element is the child sub-tree that contains the
super(args)call, in this case, a method definition! - The first element is the whole s-expression input.
- The last element was the matching subtree, just the
s(:super, s(:lvar, :args))for the call tosuper(args).
- The second element is the child sub-tree that contains the
- Replace the
- Is an empty
- Iterate on the
SuperCaller::MATCHERto include a method definition in the search if we need more context to find the method name.- If the
Sexp::Matcheris changed to include a method definition containing a call to super, will the method definition sub-tree be the first element of theMatchCollection?- It will likely be the last element, based on above observations.
- However, maybe we could traverse a portion of the
MatchCollectionlooking for the nearest method definition!
- If the
-
MatchCollectionalso responds to/, what is the resultingMatchCollectionif we drop the first and last elements and look for the first method definition?- I was surprised that I couldn't get
/off of a newMatchCollectionto work. - I achieved to use
#findwithsatisfy?on a specificSexp::Matcher - I can use
Arrayunpacking/destructuring to grab the method name
- I was surprised that I couldn't get
- Setup a failing test expecting a specific method name
For now, our initial implementation of capturing data relevant to our matcher works. We likely need support for multiple matches within the same file. Still, I have a hunch that the affordances that Sexp::Matcher#/ provides will enable composition of matchers and captured data, which I hope means it will remain relatively easy.
I did notice that ruby_parser isn't the only tool available to parse ruby into s-expressions or abstract syntax trees. RuboCop is built on the parser and ast gem. parser has node matching syntax which supports, among other things, capture groups. It could be an interesting exercise to refactor our current implementation to enable swapping in an alternate parser library.
For now, I'll continue setting up examples to test drive thie make it work implementation. The Bicycle, MountainBike and RoadBike examples could be filled in a bit more, so we can observe how our current implementation works when there are multiple methods that call super in a single file.
####### Capture groups iteration 2
Observe
- Our
satisfy?method now leveragesSexp::Matcher#/ -
Sexp::Matcher#/returns anMatchCollection < Arrayof sub-trees - In the single result case, the
MatchCollectionelements are the shortest path from the root of the s-expression tree, to the leaf containing the matching s-expression. -
Sexp::Matcher#search_eachorSexp#search_eachoffers a recursive search through the tree. - I believe that with the
MatchCollectioninterface I could useEnumerable#slice_afterto group tree walks into the paths to the matching result, with the last node being the specific match. - Goal: support for classes/files with multiple methods calling super
- Goal: support for methods with multiple calls to super
Orient
- Is the call sequence for the block
#search_eachroughly equivalent to depth first search? - What does a
MatchCollectionfor a class with multiple methods that callsuperlook like? - What does a
MatchCollectionfor a method that calls super multiple times look like, compared to the call sequence through#search_each? - Now, we want the name of the method captured, but later we'll want the expression that includes
super. If we include the method definition in theSexp::Matcherto make capturing the method name easier, will we still need to take a second pass to find thesuperexpressions to modify? Likewise with the superclass name. - What are all the variations of method definitions that could match a
supercall? Which are the most common or idiomatic?
Decide
- Add all the methods that call super to our bikes classes and observe how the tests fail. Use
binding.pryto observe how to group the matches to find the paths to the calls. - Expand the captured data to include the superclass name and the
superexpression, to get more information for considering how to proceed at this iteration. - Pull up the development console chain
#search_eachwith#each_with_indexandputseach call, compare to#each_with_indexfrom aMatchCollection - Write an
Sexp::MatcherforSuperCallerthat includes the method definition, and consider if to find thesuperexpression or superclass name we could avoid additional passes at the s-expression with otherSexp::Matchers - Scaffold out a new project with the
arubaacceptance tests and see what its like to useparserand it's node matchers, evaluate if their capture groups or parent nodes readily solve this, or just make the problem different. - Ask Ryan Davis, author of
ruby_parserandSexpProcessor
Act
I had been sharing a little bit of this work with SeattleRB, and the first reaction I got was surprise that I was figuring out Sexp::Matcher. When I describe my goal as trying to guess Bicycle#spare_parts from:
class MountainBike < Bicycle
def spare_parts
super.merge({fork: "suspended"})
end
endRyan was kind enough to point me towards MethodBasedSexpProcessor and after reading some of its, and the containing, source code, it has dawned on me that there's a different way to tackle this problem.
Observe
-
SexpProcessorsets up hook methods to call as it traverses an AST. - The hook methods have a
processandrewriteform, and they are specific to ansexp_type. IE:process_callorrewrite_call, orprocess_zsuperandrewrite_zsuper. - Subclasses of
SexpProcessorimplement those hook methods to perform "processing" or rewriting when a node of asexp_typeis encountered. -
MethodBasedSexpProcessorimplementsprocess_class,process_module,process_defnandprocess_defs… if you choose to need those methods, then you must callsuper… -
MethodBasedSexpProcessorbuilds up a context stack, capturing the namespace as it descends into nested module(s), class(es) or singleton class, and finally, the method. - Goal: include superclass in the context
-
SexpCliTools::Matchers::SuperCaller#in_superclassused in theprocess_classhook
-
- Goal: learn how to build an
SexpProcessorsubclass that'll pass the current tests: provide a reader to the method name that calls super - Goal: add to the test coverage to capture the superclass name too
- although this seems like a means not an ends
- Goal: finish the PR by providing a flag that outputs the captured data in
Superclass#method_namenotation.- in the PR I state that we need to output the match to support finding the implementation. I believe a json output will simplify consuming the results
- After calls to
MethodBasedSexpProcessor#in_methodtheMethodBasedSexpProcessorinstance is modified. The readerMethodBasedSexpProcessor#method_locationsreturns a map of rdoc method signatures to filename and line numbers. -
MethodBasedSexpProcessor#signaturereduces the class and method stack to the currentClassName#method_name
Orient
- Is there a Ruby or Rdoc convention for including the superclass in a subclass's method signature?
- When a subclass of
MethodBasedSexpProcessorencounters a(class _ (const _) ___)willprocess_classconsume the type and subclass name, or will it callprocess_constafter theprocess_classhook adds to theMethodBasedSexpProcessor#class_stack?- it will
yieldthe remaining expression in a call tosuperwith a block - it will continue proccessing the remaining expression if
superis called without a block
- it will
- Does it make sense to continue relying on
Sexp::Matcher#/or can I refactor that completely to use aMethodBasedSexpProcessorsubclass only?- A subclass of
MethodBasedSexpProcessorcan do this job - If
Sexp::Matcherhad "capture groups" like inprocessor/astthen could make a pattern that matches on class, captures the superclass name, matches on define method, captures the method name, and then captures the expression which includessuperand that's your first pass.
- A subclass of
- What I remember of reading the inline comments in
lib/sexp_processor.rbis thatrewrite_*hooks rewrite, so why does it look likeprocess_hooks can too?- Modern
SexpProcessordesign favours layering over preparing in arewrite_*hook. -
ruby2corruby2rubyprobably contain the only valid examples ofrewrite_*hooks.
- Modern
Decide
- Test drive
#super_signatureon a sublcass ofMethodBasedSexpProcessor - Nerd party with Ryan Davis
Act
Nerd party with Ryan Davis
- What is the difference between process and rewrite?
- rewrite: lighter weight for normalization
- not meant for real work
- IE: ensure
ifhave bothtrueandfalse - Old project
ParseTreeused it a lot - Somewhat in
ruby2c - Is actually used in
ruby2ruby- Probably only valid examples
- New project: probably wouldn't use rewrite
- Would have processor layers
- process: for the real stuff
- rewrite: lighter weight for normalization
-
MethodBasedSexpProcessor#process_class, does it hookprocess_constor does it handle that part of the s-expression?- shifts off the node type, shift off the class name
- would process the superclass const
- There was a paper that tried getting from a diff to an ast transformation
- Maybe wrote a SexpDiff?
- Racket has some Sexp stuff which might include Sexp diffs
- Maybe port it over?
- Racket project https://docs.racket-lang.org/sexp-diff/index.html
Test drive #super_signature on a sublcass of MethodBasedSexpProcessor
I started by choosing a small change, adding failing tests for #method_name. I don't mind continuing to support #method_name for this SexpCliTools::Matchers::SuperCaller class, but the thing that best captures what I need is super_signature
- Refactor to
MethodBasedSexpProcessor
- Change the superclass of
SexpCliTools::Matchers::SuperCallertoMethodBasedSexpProcessor- need to
require 'sexp_processor' - should I make this dependency explicit in the gemspec, or rely on the fact that
ruby_parserdepends on it?
- need to
- What is the entrypoint to a
SexpProcessor?.new#process(sexp)- How bad do the tests fail if I just use the default behaviour of
MethodBasedSexpProcessor#method_locations?- Obviously the hash is wrong, so lets just start by munging with the keys.
- This is really close!
- Implement the
process_defnandprocess_defsmethods to:- So far I've gotten away with just
process_defn - What is
s(:defs, )? - call
superpassing in a block - in the block, check if the rest of the expression matches a call to super
- capture that method name in a
SexpMatchData- I've made this pass without that, but it is nasty
- So I'm going to add test coverage for files with multiple
supercallers to see if that adds pressure on the implementation.- If I add more methods with calls to
superin the test fixtures, will that cause wider spread regressions?- Just one, where I was lazy in reusing
MountainBikeas a test file fixture of a class without an#initializemethod
- Just one, where I was lazy in reusing
- Can I create inline examples and still play nice with rubocop?
- If I add more methods with calls to
- have the return value of
satisfy?remain thatSexpMatchDatainstance
- So far I've gotten away with just
- Capture the superclass
- Add failing test coverage for
super_signature- I wonder if minitest has more expressive
ArrayorEnumerablematchers - Adding the minitest rubocop to see if that catches anything
- nothing, unless I need to create and update a .rubocop.yml file to make it happen.
- I wonder if minitest has more expressive
- Implement the
process_classmethods to:- call
superpassing in a block - in the block, check if the next expression is a
nilor providing a superclass - add
Objectto the list of superclass ifnil- No test requires this
- add the superclass expression to the list of superclasses otherwise
- also needed to add
process_until_emptyto continue processing, regression test covered that!
- call
- Refactor towards the stack methods like
MethodBasedSexpProcessor
- assert members of
superclass_stackcontain the argument within thein_superclassblock- went with more bdd style, asserted behaviour of
#superclass_name - copy
test_sexp_processor.rbcoverage for different forms of class
- went with more bdd style, asserted behaviour of
- assert value of
superclass_namegiven state ofsuperclass_stack - refactor
SexpMatchDatato returnsuper_signatureandsignatureinstead of the smaller components. - experiment with
process_superandprocess_zsuperinstead ofprocess_defnwith asatisfy?-
process_supermight work for thepost_initializecase -
process_zsuperhas zero context - these are good for detecting, but not for the refactoring step
-
Observe
- Goal: finish the PR by providing a flag that outputs the captured data in
Superclass#method_namenotation. - in the PR I state that we need to output the match to support finding the implementation. I believe a json output will simplify consuming the results
-
SexpProcessorREADME illustrates passing a path toRubyParser#processand I recall that supplies additional context inSexpProcessorsublcasses - This is supposed to fulfill a future dependency, which I haven't yet created.
- The future dependency needs at a minimum the sender and the receiver of
super.
Orient
- We need the path to parse, without which we're going to be searching through a set hoping to find a method with the listed signature.
- We'll have the sender's path because we're looking for calls to super. Maybe we can model the problem as a dependency graph? IE: the refactoring has dependencies on the sender and receiver, and to fulfill those needs their path, so they also have a dependency on the path. If the path is not provided it could searching to find it.
- The inference our sexp processor comes up with doesn't consider modules prepended to the superclass, or if the superclass is an intermediary. How could we observe the runtime receiver of
super?
Decide
Act
####### Tdd #to_json
- I like the
#as_jsonpattern, but don't depend on ActiveSupport so I wonder if or what rubocop will want me to delegate toas_json. - I'm unpracticed with minitest, compared to rspec, for hash matching, will need to check whats available.
-
assert_includes/must_include(usesinclude?) orassert_match(with something that responds to=~) -
Hash#include?is aliased tokey? -
Hashdoesn't implement=~
-
-
Struct#as_jsonreturns theJSONrepresentation which could load it back up as another instance of theStruct- despite this statement in the ruby-lang docs, we might need to require 'json'
- nope kaizen that's an unknown to me, https://docs.ruby-lang.org/en/3.0/Struct.html#method-i-as_json
- despite this statement in the ruby-lang docs, we might need to require 'json'
- write a method with the name
- write a method with one hard coded value
- generalize the method
- got partway to here and dropped into pry to find where I could get the file and line numbers from, and it's in the
Sexp-
Sexp#file,Sexp#line,Sexp#line_max
-
- got partway to here and dropped into pry to find where I could get the file and line numbers from, and it's in the