Skip to content

Conversation

@thecoop
Copy link
Member

@thecoop thecoop commented Aug 28, 2025

There is a lot of duplication of code within ElementType implementations. This pulls out the implementations into separate classes, allowing a type hierarchy to be established and implementations to be shared and overridden. In particular, the BIT implementation is significantly smaller.

Due to the nature of these changes, the diff is pretty much unintelligible. It's best to review the files in their entirety side-by-side.

@Override
VectorData parseVectorArray(DocumentParserContext context, int dims, IntBooleanConsumer dimChecker, VectorSimilarity similarity)
throws IOException {
return super.parseVectorArray(
Copy link
Member Author

@thecoop thecoop Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-using the implementation for bytes does mean the square magnitude is calculated then thrown away

@thecoop thecoop marked this pull request as ready for review August 28, 2025 14:06
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 28, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

r -> r
);
}
public void writeValue(ByteBuffer byteBuffer, float value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling with introducing a delegate. Can't these methods be declared abstract on the enum, and implemented on the different enum constants?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what they were. But you can't inherit implementations between different enum values, you have to copy them. Hence the behavior classes (and bfloat16 will later extend from float behavior)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can't inherit implementations between different enum values, you have to copy them.

Can't we define common implementation methods on the enum class itself, something like

    public enum Enum {
        ONE {
            @Override
            public boolean myOverridenMethod() {
                return mySharedMethod();
            }
        },
        TWO {
            @Override
            public boolean myOverridenMethod() {
                return !mySharedMethod();
            }
        };
        
        public abstract boolean myOverridenMethod();
        
        protected boolean mySharedMethod(){
            return true;
        }  
    }

Copy link
Member Author

@thecoop thecoop Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some method calls are layers deep, with the inner methods overridden (eg Bit parseVectorArray). And there's more than one superclass involved - BitBehavior overrides ElementTypeBehavior.checkDimensions, its super-super-class. That's not easily representable with anything other than a proper class hierarchy.

Another reason for doing this refactor - I think ElementType was getting very heavyweight for an enum. Ideally the public methods would be elsewhere, and ElementType is just a lookup key, but I don't want to change the public API here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks!

I think ElementType was getting very heavyweight for an enum. Ideally the public methods would be elsewhere

Agreed. Why not use a non-enum based class hierarchy instead of a delegate? You mention about not changing the public API, I guess that would imply changing other parts of the code that depend on the enum values.

I wonder though if we should go for that to avoid having another layer of indirection 🤔 . You probably already explored that 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way you put it. How about keeping the identity separate by using a bare enum? Something like:

enum Element {
   FLOAT, BYTE, BIT;
}

abstract class ElementType {

  // behavior methods

  public abstract Element type();
}

Behavior would be encapsulated in the class hierarchy, and identity would be asked to the behavior via type() to be used in switch statements.

I know it's not ideal, but in my mind we're trying to overcome language limitations (enum inheritance limitations and switch matching).

I find keeping identity and behavior separate is good, just trying to find a way to avoid indirection if possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work - I'm starting on those changes, but it is a lot of changes - is it worth it throughout the codebase to remove complexity from within DenseVectorFieldMapper?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a cleaner solution. Though I agree the changes can be impactful throughout the codebase, I don't think we can reduce complexity otherwise. IMO maintaining the current public API of ElementType limits our ability to clean the internal representation.

I'll let others chime in here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagating those changes ^^ is making me nervous about breaking something non-obvious - it is very impactful. I suggest we split it up - have this PR improve the internal implementation without changing the API, and explore splitting up ElementType and Element in a separate PR, which will be easier to review than both changes together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me 👍

Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as an intermediate step - we should keep pushing to have less complexity by removing the behavior and use a non-enum class hierarchy.

@thecoop thecoop merged commit 04a5b1b into elastic:main Sep 4, 2025
32 of 33 checks passed
@thecoop thecoop deleted the element-type-behavior branch September 4, 2025 11:51
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Sep 23, 2025
BASE=d5fa4e6f91ed3556f50de013e248bba2d25b4177
HEAD=eb5a32836b5b12b196375f1dc55d467e8280d146
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 2, 2025
BASE=d5fa4e6f91ed3556f50de013e248bba2d25b4177
HEAD=eb5a32836b5b12b196375f1dc55d467e8280d146
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 6, 2025
BASE=d5fa4e6f91ed3556f50de013e248bba2d25b4177
HEAD=eb5a32836b5b12b196375f1dc55d467e8280d146
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 17, 2025
BASE=d5fa4e6f91ed3556f50de013e248bba2d25b4177
HEAD=eb5a32836b5b12b196375f1dc55d467e8280d146
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 23, 2025
BASE=d5fa4e6f91ed3556f50de013e248bba2d25b4177
HEAD=eb5a32836b5b12b196375f1dc55d467e8280d146
Branch=main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>refactoring :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants