Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions enginetest/queries/join_queries.go
Original file line number Diff line number Diff line change
Expand Up @@ -1161,6 +1161,25 @@ var JoinScriptTests = []ScriptTest{
},
},
},
{
// Since hash.HashOf takes in a sql.Schema to convert and hash keys,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing without context. This is a regression test but it doesn't describe what issue was being fixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we were passing in the wrong schema; now, we're passing in the right one.

// we need to pass in the right keys.
Name: "HashLookups regression test",
SetUpScript: []string{
"create table t1 (i int primary key, j varchar(1), k int);",
"create table t2 (i int primary key, k int);",
"insert into t1 values (111111, 'a', 111111);",
"insert into t2 values (111111, 111111);",
},
Assertions: []ScriptTestAssertion{
{
Query: "select /*+ HASH_JOIN(t1, t2) */ * from t1 join t2 on t1.i = t2.i and t1.k = t2.k;",
Expected: []sql.Row{
{111111, "a", 111111, 111111, 111111},
},
},
},
},
}

var LateralJoinScriptTests = []ScriptTest{
Expand Down
9 changes: 8 additions & 1 deletion sql/plan/hash_lookup.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"sync"

"github.com/dolthub/go-mysql-server/sql"
"github.com/dolthub/go-mysql-server/sql/expression"
"github.com/dolthub/go-mysql-server/sql/hash"
"github.com/dolthub/go-mysql-server/sql/types"
)
Expand Down Expand Up @@ -127,7 +128,13 @@ func (n *HashLookup) GetHashKey(ctx *sql.Context, e sql.Expression, row sql.Row)
return nil, err
}
if s, ok := key.([]interface{}); ok {
return hash.HashOf(ctx, n.Schema(), s)
var sch sql.Schema
if tup, isTup := e.(*expression.Tuple); isTup {
for _, expr := range tup.Children() {
sch = append(sch, &sql.Column{Type: expr.Type()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the performance implications of creating a schema every time we call GetHashKey. And it's not clear what was wrong with using the HashLookup's precomputed schema: does the value of row have a different schema based on which side of the join we're computing the hash for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can pass in nil for schema instead or just store it once we create the key.

HashLookup's "precomputed" schema is the schema of the join not the key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't pass in nil because we need to convert the keys to the same type before computing the hash. Otherwise if the tables being joined have different-but-convertible types for the join column, we might not detect that two rows are equal because they'll have different hashes.

It looks like in the proposed change, we're passing in a nil schema if the expression isn't a tuple. That's probably incorrect. I'm surprised that it's not breaking any tests.

We should precompute the schema of the key column(s) and store it in the HashLookup. We should also add a test where the tables being joined have different types that would produce different hashes, to make sure we handle that correctly.

}
}
return hash.HashOf(ctx, sch, s)
}
// byte slices are not hashable
if k, ok := key.([]byte); ok {
Expand Down
Loading