-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Pull out ElementType implementations into separate classes #133713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @Override | ||
| VectorData parseVectorArray(DocumentParserContext context, int dims, IntBooleanConsumer dimChecker, VectorSimilarity similarity) | ||
| throws IOException { | ||
| return super.parseVectorArray( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-using the implementation for bytes does mean the square magnitude is calculated then thrown away
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
| r -> r | ||
| ); | ||
| } | ||
| public void writeValue(ByteBuffer byteBuffer, float value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling with introducing a delegate. Can't these methods be declared abstract on the enum, and implemented on the different enum constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what they were. But you can't inherit implementations between different enum values, you have to copy them. Hence the behavior classes (and bfloat16 will later extend from float behavior)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can't inherit implementations between different enum values, you have to copy them.
Can't we define common implementation methods on the enum class itself, something like
public enum Enum {
ONE {
@Override
public boolean myOverridenMethod() {
return mySharedMethod();
}
},
TWO {
@Override
public boolean myOverridenMethod() {
return !mySharedMethod();
}
};
public abstract boolean myOverridenMethod();
protected boolean mySharedMethod(){
return true;
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some method calls are layers deep, with the inner methods overridden (eg Bit parseVectorArray). And there's more than one superclass involved - BitBehavior overrides ElementTypeBehavior.checkDimensions, its super-super-class. That's not easily representable with anything other than a proper class hierarchy.
Another reason for doing this refactor - I think ElementType was getting very heavyweight for an enum. Ideally the public methods would be elsewhere, and ElementType is just a lookup key, but I don't want to change the public API here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks!
I think ElementType was getting very heavyweight for an enum. Ideally the public methods would be elsewhere
Agreed. Why not use a non-enum based class hierarchy instead of a delegate? You mention about not changing the public API, I guess that would imply changing other parts of the code that depend on the enum values.
I wonder though if we should go for that to avoid having another layer of indirection 🤔 . You probably already explored that 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the way you put it. How about keeping the identity separate by using a bare enum? Something like:
enum Element {
FLOAT, BYTE, BIT;
}
abstract class ElementType {
// behavior methods
public abstract Element type();
}Behavior would be encapsulated in the class hierarchy, and identity would be asked to the behavior via type() to be used in switch statements.
I know it's not ideal, but in my mind we're trying to overcome language limitations (enum inheritance limitations and switch matching).
I find keeping identity and behavior separate is good, just trying to find a way to avoid indirection if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work - I'm starting on those changes, but it is a lot of changes - is it worth it throughout the codebase to remove complexity from within DenseVectorFieldMapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a cleaner solution. Though I agree the changes can be impactful throughout the codebase, I don't think we can reduce complexity otherwise. IMO maintaining the current public API of ElementType limits our ability to clean the internal representation.
I'll let others chime in here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Propagating those changes ^^ is making me nervous about breaking something non-obvious - it is very impactful. I suggest we split it up - have this PR improve the internal implementation without changing the API, and explore splitting up ElementType and Element in a separate PR, which will be easier to review than both changes together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as an intermediate step - we should keep pushing to have less complexity by removing the behavior and use a non-enum class hierarchy.
BASE=d5fa4e6f91ed3556f50de013e248bba2d25b4177 HEAD=eb5a32836b5b12b196375f1dc55d467e8280d146 Branch=main
BASE=d5fa4e6f91ed3556f50de013e248bba2d25b4177 HEAD=eb5a32836b5b12b196375f1dc55d467e8280d146 Branch=main
BASE=d5fa4e6f91ed3556f50de013e248bba2d25b4177 HEAD=eb5a32836b5b12b196375f1dc55d467e8280d146 Branch=main
BASE=d5fa4e6f91ed3556f50de013e248bba2d25b4177 HEAD=eb5a32836b5b12b196375f1dc55d467e8280d146 Branch=main
BASE=d5fa4e6f91ed3556f50de013e248bba2d25b4177 HEAD=eb5a32836b5b12b196375f1dc55d467e8280d146 Branch=main
There is a lot of duplication of code within
ElementTypeimplementations. This pulls out the implementations into separate classes, allowing a type hierarchy to be established and implementations to be shared and overridden. In particular, the BIT implementation is significantly smaller.Due to the nature of these changes, the diff is pretty much unintelligible. It's best to review the files in their entirety side-by-side.