You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sql: optimize query to populate crdb_internal.table_row_statistics
The query to populate the `crdb_internal.table_row_statistics` virtual
table has been rewritten to avoid scanning the `system.table_statistics`
table twice.
Before this commit the query plan was:
• group (hash)
│ group by: tableID
│
└── • hash join
│ actual row count: 515,822
│ equality: (tableID, createdAt) = (tableID, max)
│
├── • scan
│ table: table_statistics@primary
│ spans: FULL SCAN
│
└── • group (streaming)
│ group by: tableID
│ ordered: +"tableID"
│
└── • scan
table: table_statistics@primary
spans: FULL SCAN
Now it is:
• distinct
│ distinct on: tableID
│ order key: tableID
│
└── • sort
│ order: +"tableID",-"createdAt",-"rowCount"
│ already ordered: +"tableID"
│
└── • scan
table: table_statistics@primary
spans: FULL SCAN
The `crdb_internal.table_row_statistics` table is used to populate the
`estimated_row_count` column in the output of `SHOW TABLES`. In
pathological cases where there a many rows in `system.table_statistics`,
this new query makes `SHOW TABLES` significantly faster. In a test of
mine with ~1.1 million rows in `system.table_statistics` setup with
[this](https://gist.github.com/mgartner/b72b39901be0d942d5a026054e688a8c),
I observed the latency of `SHOW TABLES` drop by ~50% with the new query,
from ~720ms to ~350ms.
Informs #143438
Release note (performance improvement): `SHOW TABLES` is now faster,
especially in cases where there are many tables, both live and
previously dropped.
0 commit comments