-
-
Notifications
You must be signed in to change notification settings - Fork 575
Open
dolthub/go-mysql-server
#3099Labels
Description
TLDR; slow left join where primary key can be used as index, but isn't. benchmark size 30k rows, slow query takes 0.30 sec
, optimal query takes 0.00 sec
.
there's a case where the primary key isn't used in a left join, when it should be.
here's the query plan for two identical queries on two tables, test
and test2
. test
uses primary key. test2
doesn't have a primary key, and uses an index on col, id
.
here's the reproduction:
# setup
SQL_CMD="dolt sql" # change to `mysql` to test difference in behavior
$SQL_CMD <<< "CREATE DATABASE test"
$SQL_CMD <<< "DROP TABLE IF EXISTS test"
$SQL_CMD <<< "CREATE TABLE test (id VARCHAR(255), parent VARCHAR(255), primary key (id))"
$SQL_CMD <<< "DROP TABLE IF EXISTS test2"
$SQL_CMD <<< "CREATE TABLE test2 (id VARCHAR(255), parent VARCHAR(255))"
$SQL_CMD <<< "CREATE UNIQUE INDEX idx_test2_id ON test2 (id)"
# data generation (not really needed for query plan, but here for performance benchmarks)
TABLE="test" # change this test2 to generate for
# 10,000 grandparents
for j in {0..9}; do
sql="INSERT INTO $TABLE (id, parent) VALUES ('test_$(( j * 1000 + 1 ))', NULL)"
for ((i = j * 1000 + 2; i <= (j + 1) * 1000; i++)); do
sql+=",('test_$i', NULL)"
done
$SQL_CMD <<< "$sql"
done
# 10,000 parents
for j in {0..9}; do
sql="INSERT INTO $TABLE (id, parent) VALUES ('test_$(( j * 1000 + 10001 ))', 'test_$(( j * 1000 ))')"
for ((i = j * 1000 + 10002; i <= (j + 1) * 1000 + 10000; i++)); do
sql+=",('test_$i', 'test_$(( i - 10000 ))')"
done
$SQL_CMD <<< "$sql"
done
# 10,000 children
for j in {0..9}; do
sql="INSERT INTO $TABLE (id, parent) VALUES ('test_$(( j * 1000 + 20001 ))', 'test_$(( j * 1000 + 10000 ))')"
for ((i = j * 1000 + 20002; i <= (j + 1) * 1000 + 20000; i++)); do
sql+=",('test_$i', 'test_$(( i - 10000 ))')"
done
$SQL_CMD <<< "$sql"
done
now for the query:
select *
from test child
left join test parent on parent.id = child.parent
left join test grandparent on grandparent.id = parent.parent
where child.id in ('test_0', 'test_5000', 'test_10000', 'test_15000', 'test_20000', 'test_25000')
Change the table from test
to test2
for the other side.
performance different on my machine:
test/main*> select *
-> from test child
-> left join test parent on parent.id = child.parent
-> left join test grandparent on grandparent.id = parent.parent
-> where child.id in ('test_0', 'test_5000', 'test_10000', 'test_15000', 'test_20000', 'test_25000');
+------------+------------+------------+-----------+-----------+--------+
| id | parent | id | parent | id | parent |
+------------+------------+------------+-----------+-----------+--------+
| test_10000 | NULL | NULL | NULL | NULL | NULL |
| test_15000 | test_5000 | test_5000 | NULL | NULL | NULL |
| test_20000 | test_10000 | test_10000 | NULL | NULL | NULL |
| test_25000 | test_15000 | test_15000 | test_5000 | test_5000 | NULL |
| test_5000 | NULL | NULL | NULL | NULL | NULL |
+------------+------------+------------+-----------+-----------+--------+
5 rows in set (0.30 sec)
test/main*> select *
-> from test2 child
on parent.id = child.parent
-> left join test2 parent on parent.id = child.parent
-> left join test2 grandparent on grandparent.id = parent.parent
-> where child.id in ('test_0', 'test_5000', 'test_10000', 'test_15000', 'test_20000', 'test_25000');
+------------+------------+------------+-----------+-----------+--------+
| id | parent | id | parent | id | parent |
+------------+------------+------------+-----------+-----------+--------+
| test_10000 | NULL | NULL | NULL | NULL | NULL |
| test_15000 | test_5000 | test_5000 | NULL | NULL | NULL |
| test_20000 | test_10000 | test_10000 | NULL | NULL | NULL |
| test_25000 | test_15000 | test_15000 | test_5000 | test_5000 | NULL |
| test_5000 | NULL | NULL | NULL | NULL | NULL |
+------------+------------+------------+-----------+-----------+--------+
5 rows in set (0.00 sec)