Skip to content

Conversation

@PerfectSlayer
Copy link
Contributor

What Does This Do

This PR  provides small feedback about the newly created project.
The commits are suggestions only, feel free to cherry-pick them or not.
They should not be considered as change request, only change examples.

Questions

  • datadog.instrument.classmatch.ClassHeader#interfaces()
    Is it okay to allocate an ArrayList every time the interfaces are accessed?
    Would accessing the array be good enough? (or is it about defensive copy?)
    Same applies for fields, methods, and annotations from datadog.instrument.classmatch.ClassOutline.

  • datadog.instrument.classmatch.ClassHeader#access
    What about documenting how to test the value? It can be referring to the JVMLS or referring to java.lang.reflect.Modifier for example.
    It would make it easier for the API consumer to test flags presence with existing constants.
    Same applies for datadog.instrument.classmatch.FieldOutline#access.

  • datadog.instrument.classmatch.TypeMatcher
    Could is(String) be useful?

  • I could not find usage / documentation about ClassInfoCache, ClassNameFilter, ClassNameTrie
    It could be useful to explicit the feature in the README too for discoverability, or to move them to utils if they are internal optimization only.

Suggestion

  • class-match module source structure
    I would introduce package to help with discoverability:

    • parser (parse, outline, hearder)
    • matcher (all matcher interfaces and internal implementation)
    • data structure (cache, filter, trie)
  • datadog.instrument.classmatch.ClassFile
    What about moving the parsing of fields and methods to their related classes?
    This would reduce the ClassFile complexity to avoid trying to parse everything in one place.

Nitpick

  • datadog.instrument.classmatch.ClassFile is more about parser / reader than file / file system.

  • datadog.instrument.classmatch.ClassMatcher#declares(java.util.function.IntPredicate, datadog.instrument.classmatch.FieldMatcher)
    This kind of method - using two predicates / filters on the same type feels weird when using the fluent approach.
    I would have expected to use declares(field("fieldname").access(Modifier::isPublic)) rather than declares(Modifier::isPublic, field("fieldname")) because the "fluent" API allow me to build / specify the filter on the fly directly on the type I’m filtering -- the feeling is a bit hard to express, sorry.
    To illustrate what happened to me, I had to look at the implementation to check if you were doing some specific optimization as the fluent way seem more natural, and I could not find a reason to have the overloaded method.

Motivation

Provide preliminary feedback before wider adoption.

Additional Notes

Contributor Checklist

Jira ticket: [PROJ-IDENT]

@PerfectSlayer PerfectSlayer requested a review from a team as a code owner September 26, 2025 08:38
@PerfectSlayer PerfectSlayer added the question Further information is requested label Sep 26, 2025
@PerfectSlayer PerfectSlayer requested review from mcculls and removed request for manuel-alvarez-alvarez September 26, 2025 08:39
@mcculls
Copy link
Collaborator

mcculls commented Sep 26, 2025

I'll add separate comments for each of the other questions (not necessarily in the same order as above!)

datadog.instrument.classmatch.ClassFile is more about parser / reader than file / file system.

This class deals with reading the class-file format as defined in https://docs.oracle.com/javase/specs/jvms/se24/html/jvms-4.html - in this context "class-file" has a specific meaning which I want to keep in this particular class name.

(I'm also preferring short, snappy class names where possible)

@mcculls
Copy link
Collaborator

mcculls commented Sep 26, 2025

datadog.instrument.classmatch.ClassMatcher#declares(java.util.function.IntPredicate, datadog.instrument.classmatch.FieldMatcher)
This kind of method - using two predicates / filters on the same type feels weird when using the fluent approach.
I would have expected to use declares(field("fieldname").access(Modifier::isPublic)) rather than declares(Modifier::isPublic, field("fieldname")) because the "fluent" API allow me to build / specify the filter on the fly directly on the type I’m filtering -- the feeling is a bit hard to express, sorry.
To illustrate what happened to me, I had to look at the implementation to check if you were doing some specific optimization as the fluent way seem more natural, and I could not find a reason to have the overloaded method.

This is syntactic sugar to make matchers read more like source code - there are predefined access matchers in StandardMatchers to go with this

For example rather than writing:

declares(field("myField").access(Modifier::isPublic))

You'd write:

declares(PUBLIC, field("myField")) 

@mcculls
Copy link
Collaborator

mcculls commented Sep 26, 2025

datadog.instrument.classmatch.ClassFile
What about moving the parsing of fields and methods to their related classes?
This would reduce the ClassFile complexity to avoid trying to parse everything in one place

This file has been extensively optimized through benchmarking - the complexity comes from the class-file format and the need to avoid allocations, virtual calls, and abstractions while parsing.

There's a primary method that reads through the class-file in one pass, with utility methods to handle situations where we need to jump over large sections. But we deliberately wanted to avoid state, as otherwise you'd have to continually create parser instances. This also means avoiding methods where you take a cursor position and want to return both a parsed value and the updated cursor position, because that would involve further allocations to hold both pieces of data. The final design has the primary method maintaining the cursor, and IMHO the code nicely aligns with the class-file format document.

@mcculls
Copy link
Collaborator

mcculls commented Sep 26, 2025

class-match module source structure
I would introduce package to help with discoverability:
parser (parse, outline, hearder)
matcher (all matcher interfaces and internal implementation)
data structure (cache, filter, trie)

There is intentional coupling between the class-file parser and the matcher code for performance reasons (driven by benchmarking) that relies on package-level access

Moving the classes to separate packages would mean changing this implementation coupling to be public - which would bloat the public API, or at least muddy the waters between public API for internal use and public API for external use.

It's also not expected for users to work with the class-file parser or outlines directly. This will become clearer soon with more instrumentation examples, but the idea is for the user to primarily to work with the matcher APIs. The class-file class is currently public in case others wanted to take advantage of its quick (limited) parsing, but most users will not be aware of it.

@mcculls
Copy link
Collaborator

mcculls commented Sep 26, 2025

I could not find usage / documentation about ClassInfoCache, ClassNameFilter, ClassNameTrie
It could be useful to explicit the feature in the README too for discoverability, or to move them to utils if they are internal optimization only.

They're utility classes - mainly used to support matching but could be used elsewhere - I've moved them to utils (some began in utils before I moved them to class-match)

@mcculls
Copy link
Collaborator

mcculls commented Sep 26, 2025

datadog.instrument.classmatch.TypeMatcher
Could is(String) be useful?

Yes, TypeMatcher is going to get a few more methods

@mcculls
Copy link
Collaborator

mcculls commented Sep 26, 2025

datadog.instrument.classmatch.ClassHeader#access
What about documenting how to test the value? It can be referring to the JVMLS or referring to java.lang.reflect.Modifier for example.
It would make it easier for the API consumer to test flags presence with existing constants.
Same applies for datadog.instrument.classmatch.FieldOutline#access.

The access values are defined under the access_flags section of https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html - this also matches the values in https://docs.oracle.com/javase/8/docs/api/java/lang/reflect/Modifier.html (note that some values only apply to certain elements)

It is not expected that users will deal with the access values directly, instead preferring to use the StandardMatchers pre-define matchers. The access fields are currently public in case other people want to re-use the class-file approach separately to our matchers, in which case the user should already be aware that they come from the class-file spec.

@mcculls
Copy link
Collaborator

mcculls commented Sep 26, 2025

datadog.instrument.classmatch.ClassHeader#interfaces()
Is it okay to allocate an ArrayList every time the interfaces are accessed?
Would accessing the array be good enough? (or is it about defensive copy?)
Same applies for fields, methods, and annotations from datadog.instrument.classmatch.ClassOutline.

Originally the arrays were public, like the other record-style elements, but I thought it might be safer to provide list access in the public API while keeping the raw access for internal usage.

However this was an over-complication and it is more consistent to have the outlines mainly act like records with public fields containing the parsed data. They may still have private/package visible methods for situations where we need to associate a small amount of cached data with the outline, as that avoids having to maintain a side-cache. Where possible though the outlines should be as compact as possible, so we can store more of them and avoid having to reload and re-parse old content.

@mcculls mcculls merged commit fa4278f into main Sep 26, 2025
4 checks passed
@mcculls mcculls deleted the bbujon/feedback branch September 26, 2025 20:33
@PerfectSlayer
Copy link
Contributor Author

This is syntactic sugar to make matchers read more like source code - there are predefined access matchers in StandardMatchers to go with this

For example rather than writing:

declares(field("myField").access(Modifier::isPublic))

You'd write:

declares(PUBLIC, field("myField")) 

Agreed, I found the StandardMatchers after writing this comment.

I like the way you can read it, I mostly surprised when it comes to write it where I would have expected the access predicate to relate to the field matcher like declares(field(PUBLIC, "myfield")). But readability is more important than its writing counterpart 👍

@PerfectSlayer
Copy link
Contributor Author

This also means avoiding methods where you take a cursor position and want to return both a parsed value and the updated cursor position, because that would involve further allocations to hold both pieces of data.

Yeah, I though about multiple outputs issue. There would be ways to avoid it but that would not make the design better by then end 😒

@PerfectSlayer
Copy link
Contributor Author

Moving the classes to separate packages would mean changing this implementation coupling to be public - which would bloat the public API, or at least muddy the waters between public API for internal use and public API for external use.

That's something module and exports would help with, but again, stuck on Java 8 😅

It's also not expected for users to work with the class-file parser or outlines directly. This will become clearer soon with more instrumentation examples, but the idea is for the user to primarily to work with the matcher APIs. The class-file class is currently public in case others wanted to take advantage of its quick (limited) parsing, but most users will not be aware of it.

Interesting. Can't wait to discover them then 😃

@PerfectSlayer
Copy link
Contributor Author

I thought it might be safer to provide list access in the public API while keeping the raw access for internal usage.

I doubt people using this kind of library are going to maliciously trick the lib behavior by tempering the array content.

@PerfectSlayer
Copy link
Contributor Author

Thanks for the chat and welcoming my feedback! 🙏
(so I might start to understand why the GitHub discussion session exists 😅 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants