Skip to content

[Query engine] cross shard queries with multiple shard keys provided seem to randomly route to a shard #639

@mijoharas

Description

@mijoharas

Now this one I'm not sure if it's expected to be working, but I didn't see anything clearly in the docs about it. (I may have missed it of course).

(michael)@127.0.0.1:6432 16:55:33 [repro_sharded]
# select count(1), user_id from example where user_id in (1,3) group by user_id;
 count | user_id
-------+---------
     3 |       1
(1 row)

(michael)@127.0.0.1:6432 16:56:03 [repro_sharded]
# select count(1), user_id from example where user_id in (1,3) group by user_id;
 count | user_id
-------+---------
     2 |       3
(1 row)

Now, in this example , we ask for a group/count that should hit two shards. We seem to just connect to one of the shards (non-deterministically) and execute the query there.

# explain select count(1), user_id from example where user_id in (1,3) group by user_id;
                           QUERY PLAN            
----------------------------------------------------------------
 HashAggregate  (cost=35.60..35.79 rows=19 width=12)
   Group Key: user_id
   ->  Seq Scan on example  (cost=0.00..35.50 rows=20 width=4)
         Filter: (user_id = ANY ('{1,3}'::integer[]))

 PgDog Routing:
   Summary: shard=0 role=replica
   Shard 0: matched sharding key example.user_id using constant
   Shard 1: matched sharding key example.user_id using constant
(9 rows)

Full repro details follow:

  • first create a db
createdb pgdog_repro
createdb shard_repro_1
createdb shard_repro_2
  • then connect and create a table
CREATE TABLE example (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  user_id INTEGER
);
  • let's throw some data in there
INSERT INTO example (user_id) VALUES (1), (2), (3), (1), (2), (3), (1);
  • now, let's shard that into a couple of shards, gotta create them first
createdb shard_repro_1
createdb shard_repro_2
[[databases]]
name = "pgdog_repro"
host = "127.0.0.1"
port = 5432
role = "primary"

[[databases]]
name = "repro_sharded"
database_name = "shard_repro_1"
host = "127.0.0.1"
port = 5432
role = "primary"
shard = 0

[[databases]]
name = "repro_sharded"
database_name = "shard_repro_2"
host = "127.0.0.1"
port = 5432
role = "primary"
shard = 1

[[sharded_tables]]
database = "repro_sharded"
name = "example"
column = "user_id"
data_type = "bigint" # it's actually integer, but this should work the same

(ensure you have a user with schema_admin = true)

[[users]]
name = "whatever"
database = "repro_sharded"
schema_admin = true

[[users]]
name = "whatever"
database = "pgdog_repro"
schema_admin = true
  • now, let's reshard everything
create publication sharding_test for table "example";
./target/release/pgdog -c ./local.pgdog.toml -u ./local.users.toml schema-sync --from-database pgdog_repro --to-database repro_sharded --publication sharding_test

# not specifying an option means sync then replicate
./target/release/pgdog -c ./local.pgdog.toml -u ./local.users.toml data-sync --from-database pgdog_repro --to-database repro_sharded --publication sharding_test --replication-slot sharding_test --sync-only

./target/release/pgdog -c ./local.pgdog.toml -u ./local.users.toml schema-sync --from-database pgdog_repro --to-database repro_sharded --publication sharding_test --data-sync-complete
./target/release/pgdog -c ./local.pgdog.toml -u ./local.users.toml schema-sync --from-database pgdog_repro --to-database repro_sharded --publication sharding_test --cutover
  • then let's run pgdog
./target/release/pgdog -c ./local.pgdog.toml -u ./local.users.toml
  • and connect to it
psql -h 127.0.0.1 -p 6432 repro_sharded
  • and we've got our repro!
(michael)@127.0.0.1:6432 16:55:11 [repro_sharded]
# select count(1) from example;
 count
-------
     7
(1 row)

(michael)@127.0.0.1:6432 16:55:24 [repro_sharded]
# select count(1), user_id from example group by user_id;
unexpected field count in "D" message
(michael)@127.0.0.1:6432 16:55:33 [repro_sharded]
# select count(1), user_id from example where user_id in (1,3) group by user_id;
 count | user_id
-------+---------
     3 |       1
(1 row)

(michael)@127.0.0.1:6432 16:56:03 [repro_sharded]
# select count(1), user_id from example where user_id in (1,3) group by user_id;
 count | user_id
-------+---------
     2 |       3
(1 row)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions