Skip to content

feat: add query phase tracking for SHOW QUERIES#34706

Open
yihaoDeng wants to merge 31 commits into3.0from
feat/addShowQuery
Open

feat: add query phase tracking for SHOW QUERIES#34706
yihaoDeng wants to merge 31 commits into3.0from
feat/addShowQuery

Conversation

@yihaoDeng
Copy link
Contributor

Add current_phase and action_start_time fields to track query execution stages:

  • 0=query, 1=fetch, 2=query_callback, 3=fetch_callback

This helps monitor what phase a query is in and how long each phase takes.

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

Add current_phase and action_start_time fields to track query execution stages:
- 0=query, 1=fetch, 2=query_callback, 3=fetch_callback

This helps monitor what phase a query is in and how long each phase takes.
Copilot AI review requested due to automatic review settings March 7, 2026 11:54
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability of query execution by adding detailed phase tracking. It introduces new fields to monitor the current stage of a query and the timestamp when that stage began, making this information accessible through the SHOW QUERIES command. This improvement allows for better performance analysis, bottleneck identification, and a clearer understanding of query lifecycle, ultimately aiding in debugging and optimization efforts.

Highlights

  • Query Phase Tracking: Introduced currentPhase and actionStartTime fields in SQueryDesc and SRequestObj to track the current execution stage and its start time for queries.
  • Visibility in SHOW QUERIES: Exposed the new current_phase and action_start_time fields in the SHOW QUERIES output, providing users with real-time insights into query progress.
  • Phase Lifecycle Management: Implemented logic to update the currentPhase and actionStartTime during various stages of query execution, including initial query, data fetching, and their respective callbacks.
  • Comprehensive Testing: Added a new Python test suite to validate the functionality, schema, and accuracy of the query phase tracking feature under different scenarios, including long-running and concurrent queries.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • include/common/tmsg.h
    • Added currentPhase and actionStartTime fields to the SQueryDesc structure.
  • source/client/inc/clientInt.h
    • Added currentPhase and actionStartTime fields to the SRequestObj structure.
  • source/client/src/clientEnv.c
    • Initialized currentPhase and actionStartTime for new SRequestObj instances.
    • Removed an unnecessary memory allocation for msgBuf.
  • source/client/src/clientHb.c
    • Populated currentPhase and actionStartTime in SQueryDesc from the SRequestObj during heartbeat request building.
  • source/client/src/clientImpl.c
    • Updated currentPhase and actionStartTime when doRequestCallback is invoked for query and fetch callbacks.
  • source/client/src/clientMain.c
    • Set currentPhase to 'query' and recorded actionStartTime at the beginning of doAsyncQuery.
    • Set currentPhase to 'fetch' and recorded actionStartTime at the beginning of taos_fetch_rows_a.
  • source/common/src/msg/tmsg.c
    • Modified tSerializeSClientHbReq to serialize the new currentPhase and actionStartTime fields.
    • Modified tDeserializeSClientHbReq to deserialize the new currentPhase and actionStartTime fields.
  • source/common/src/systable.c
    • Added current_phase (VARCHAR) and action_start_time (TIMESTAMP) columns to the querySchema for SHOW QUERIES.
  • source/dnode/mnode/impl/src/mndProfile.c
    • Implemented logic to convert currentPhase integer to a human-readable string and added it to the SHOW QUERIES output.
    • Added actionStartTime to the SHOW QUERIES output.
  • test/cases/24-Users/test_query_phase_tracking.py
    • Added a new Python test file test_query_phase_tracking.py to validate the query phase tracking feature.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces query phase tracking for SHOW QUERIES by adding current_phase and action_start_time fields. The changes are well-contained and correctly implemented across the data structures, client logic, and server-side display logic. My main suggestion is to introduce an enum for the query phases to replace the magic numbers currently used, which will enhance code readability and maintainability. I've also provided a suggestion to strengthen the new test case for timing accuracy.

Note: Security Review did not run due to the size of the PR.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds query execution phase tracking for the SHOW QUERIES command in TDengine. It introduces two new columns (current_phase and action_start_time) to the query schema, tracking which execution stage (query, fetch, query_callback, fetch_callback) a query is in and when that stage began.

Changes:

  • New currentPhase and actionStartTime fields added to SRequestObj and SQueryDesc structs, with lifecycle tracking at each execution phase
  • Heartbeat serialization/deserialization updated to transmit the new fields to the MNode, and MNode updated to pack them into the SHOW QUERIES block
  • New test file added to validate the new columns and phase values

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
include/common/tmsg.h Adds currentPhase and actionStartTime fields to SQueryDesc
source/client/inc/clientInt.h Adds the same fields to SRequestObj
source/client/src/clientEnv.c Initializes new fields; accidentally removes msgBuf allocation
source/client/src/clientMain.c Sets phase=0 at query start, phase=1 at fetch start
source/client/src/clientImpl.c Transitions phase to 2/3 in doRequestCallback
source/client/src/clientHb.c Copies new fields into heartbeat descriptor
source/common/src/msg/tmsg.c Encodes/decodes new fields in heartbeat (breaking wire change)
source/common/src/systable.c Adds two new columns to querySchema
source/dnode/mnode/impl/src/mndProfile.c Packs phase string and start time into SHOW QUERIES result block
test/cases/24-Users/test_query_phase_tracking.py New test file for the feature
Comments suppressed due to low confidence (1)

source/client/src/clientEnv.c:604

  • The line (*pRequest)->msgBuf = taosMemoryCalloc(1, ERROR_MSG_BUF_DEFAULT_SIZE); was accidentally removed from createRequest(). Since *pRequest is zero-initialized via taosMemoryCalloc, msgBuf will always be NULL, causing the null-check on line 601 to always trigger and createRequest to always fail. This breaks all query requests, as msgBuf is used by the parse context in multiple places (e.g., clientMain.c:1964, clientImpl.c:378, clientImpl.c:600).
  if (NULL == (*pRequest)->msgBuf) {
    code = terrno;
    goto _return;
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Use EQueryExecPhase enum (none/parse/catalog/plan/schedule/execute/fetch/done)
instead of raw integer phases. Fix field name mismatches, serialization order,
and backward-compatible deserialization for SHOW QUERIES phase tracking.

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 10, 2026 14:28
Extend SQuerySubDesc with startTs/endTs from scheduler task profile.
Update sub_status format to tid:status:startMs:endMs for each sub-task.
Backward-compatible serialization via tDecodeIsEnd guard.

Made-with: Cursor
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings March 12, 2026 09:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove endTs from sub_status (always 0 for active queries)
- Change startTs display from unix timestamp to human-readable format
  (YYYY-MM-DD HH:MM:SS.mmm)
- Set startTs at task init time so INIT state tasks have creation time
- Fix test API calls and add database cleanup in setup

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 12, 2026 12:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 12, 2026 13:05
@yihaoDeng yihaoDeng requested a review from zitsen as a code owner March 13, 2026 12:50
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 13, 2026 12:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

| 11 | sub_num | INT | 子查询数量 |
| 12 | sub_status | BINARY(1000) | 子查询状态 |
| 13 | sql | BINARY(1024) | SQL 语句 |
| 14 | user_app | BINARY(24) | 应用名称(由客户端设置) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14/15两列是怎么来的?感觉有点重复,这个信息是在connection里展示的

int64_t startTs; // sub-task first execution start time, us
} SQuerySubDesc;

typedef enum EQueryExecPhase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加这个字段的本意不是想显示这些阶段,但是显示也无所谓,我们要核心解决的一个问题是SCHEDULE、FETCH阶段再细分才能区分当前具体在做什么,所以这两个阶段还要再细分一下,比如FETCH阶段目前是在哪一步,是在服务端处理还是响应处理等。

- Introduced new phases and sub-phases for query execution in `tmsg.h`.
- Added `schedulerGetJobPhase` function to retrieve job execution phase.
- Updated `clientHb.c`, `clientImpl.c`, and `scheduler.c` to utilize new phase tracking.
- Enhanced `test_query_phase_tracking.py` to validate new phases in query output.
- Fix phase_state column width from 16 to 32 bytes to hold longest
  phase string (fetch:preparing_response = 25 chars)
- Fix variable shadowing in clientHb.c (code redeclared inside
  hbBuildQueryDesc)
- Fix QUERY_PHASE_PLAN never being set; planner phase was incorrectly
  mapped to SCHEDULE_PLANNING
- Fix ANALYSIS phase immediately overwritten by PLANNING in
  schedulerExecJob; insert real work (schSwitchJobStatus INIT) between
  them
- Fix FETCH_CLIENT_REQUEST immediately overwritten by
  SERVER_PROCESSING in client; remove client-side fetch sub-phase
  writes, let scheduler be the single authority
- Unify phase write ownership: client writes main phases (PARSE,
  CATALOG, PLAN, SCHEDULE, EXECUTE, FETCH, DONE), scheduler writes
  sub-phases (SCHEDULE_*, EXEC_*, FETCH_*)
- Fix non-atomic writes to execPhase/phaseStartTime in
  doRequestCallback and asyncExecSchQuery
- Fix concurrent scan tasks overwriting phaseStartTime; use CAS
  check in schBuildAndSendMsg
- Remove dead QUERY_PHASE_SCHEDULE_RESOURCE_ALLOC enum value and
  reorder enum groups (4x SCHEDULE, 5x EXECUTE, 6x FETCH)
- Improve test reliability and add phase_state max length test

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 16, 2026 11:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +467 to +474
if (!tDecodeIsEnd(pDecoder)) {
desc.execPhase = QUERY_PHASE_NONE;
desc.phaseStartTime = 0;
code = tDecodeI32(pDecoder, &desc.execPhase);
TAOS_CHECK_GOTO(code, &line, _error);
code = tDecodeI64(pDecoder, &desc.phaseStartTime);
TAOS_CHECK_GOTO(code, &line, _error);
}
| 13 | sql | BINARY(1024) | SQL 语句 |
| 14 | user_app | BINARY(24) | 应用名称(由客户端设置) |
| 15 | user_ip | BINARY(16) | 应用所使用的 IP 地址 (由客户端设置) |
| 16 | phase_state | BINARY(16) | 查询当前阶段 / 状态 |

print("test phase state max length ....................... [passed]")

def cleanup_class(cls):
Comment on lines +91 to +97

for row in range(tdSql.getRows()):
if phase_idx >= 0:
phase_value = tdSql.getData(row, phase_idx)
tdLog.info(f"Row {row} phase: {phase_value}")
assert phase_value in self.VALID_PHASES, \
f"Phase should be one of {self.VALID_PHASES}, got {phase_value}"
Comment on lines +240 to +268
tdSql.query(f"select count(*) from db2.stb2 group by tbname")

tdSql.query(f"show queries")
if tdSql.getRows() > 0:
col_names = [desc[0] for desc in tdSql.cursor.description]
sub_status_idx = self._get_col_idx(col_names, "sub_status")
sub_num_idx = self._get_col_idx(col_names, "sub_num")

if sub_num_idx >= 0:
sub_num = tdSql.getData(0, sub_num_idx)
tdLog.info(f"Sub plan num: {sub_num}")

if sub_status_idx >= 0:
sub_status = tdSql.getData(0, sub_status_idx)
tdLog.info(f"Sub status: {sub_status}")
if sub_status:
parts = sub_status.split(",")
for part in parts:
fields = part.split(":", 2)
tdLog.info(f" Sub-task fields: {fields}")
assert len(fields) == 3, \
f"sub_status entry should have 3 fields (tid:status:startTime), got {len(fields)}: {part}"
tid_str, status, start_time = fields
assert tid_str.isdigit(), f"tid should be numeric, got: {tid_str}"
assert len(status) > 0, f"status should not be empty"
if start_time != "-":
assert "." in start_time, \
f"startTime should be human-readable (YYYY-MM-DD HH:MM:SS.ms) or '-', got: {start_time}"

SQuerySubDesc *sDesc = taosArrayGet(desc->subDesc, m);
TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->tid));
TAOS_CHECK_RETURN(tEncodeCStr(pEncoder, sDesc->status));
TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->startTs));
}

offset += tsnprintf(subStatus + offset, sizeof(subStatus) - offset,
"%" PRIu64 ":%s:%s", pDesc->tid, pDesc->status, startBuf);
schedulerExecFp execFp;
schedulerFetchFp fetchFp;
void *cbParam;
void *pRequest; // Add pointer to request object for phase tracking
def test_phase_state_max_length(self):
"""MaxLen: Verify phase_state column can hold the longest phase string

1. The longest phase string is 'fetch:preparing_response' (25 chars)
When a super table query spans multiple vnodes, the scheduler dispatches
scan tasks to each vnode then waits for all to complete before launching
the merge task. This transition was previously invisible.

Add QUERY_PHASE_EXEC_WAITING_CHILDREN (53) to track the window between
the first scan task completing and the merge task launching. Uses CAS to
set the phase only once when transitioning from EXEC_DATA_QUERY.

SHOW QUERIES phase_state now shows:
  execute:data_query → execute:waiting → execute:merge_query

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 16, 2026 12:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

offset += tsnprintf(subStatus + offset, sizeof(subStatus) - offset,
"%" PRIu64 ":%s:%s", pDesc->tid, pDesc->status, startBuf);
| 11 | sub_num | INT | 子查询数量 |
| 12 | sub_status | BINARY(1000) | 子查询状态 |
| 13 | sql | BINARY(1024) | SQL 语句 |
| 14 | phase_state | BINARY(16) | 查询当前阶段 / 状态 |
Comment on lines +1458 to +1459
atomic_store_32((int32_t*)&pRequest->execPhase, QUERY_PHASE_PLAN);
atomic_store_64((int64_t*)&pRequest->phaseStartTime, taosGetTimestampMs());
Comment on lines +468 to +475
if (!tDecodeIsEnd(pDecoder)) {
desc.execPhase = QUERY_PHASE_NONE;
desc.phaseStartTime = 0;
code = tDecodeI32(pDecoder, &desc.execPhase);
TAOS_CHECK_GOTO(code, &line, _error);
code = tDecodeI64(pDecoder, &desc.phaseStartTime);
TAOS_CHECK_GOTO(code, &line, _error);
}
Wrap all execPhase/phaseStartTime atomic stores with a guard that
checks whether the phase value is actually different before writing.
This prevents phaseStartTime from being silently reset while the
query remains in the same phase.

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 16, 2026 13:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +64 to +65
| 14 | phase_state | BINARY(16) | 查询当前阶段 / 状态 |
| 15 | phase_start_time | TIMESTAMP | 当前阶段的开始时间 |
Comment on lines +468 to +472
if (!tDecodeIsEnd(pDecoder)) {
desc.execPhase = QUERY_PHASE_NONE;
desc.phaseStartTime = 0;
code = tDecodeI32(pDecoder, &desc.execPhase);
TAOS_CHECK_GOTO(code, &line, _error);
Comment on lines 328 to 336
for (int32_t m = 0; m < snum; ++m) {
SQuerySubDesc *sDesc = taosArrayGet(desc->subDesc, m);
TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->tid));
TAOS_CHECK_RETURN(tEncodeCStr(pEncoder, sDesc->status));
TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->startTs));
}
TAOS_CHECK_RETURN(tEncodeI32(pEncoder, desc->execPhase));
TAOS_CHECK_RETURN(tEncodeI64(pEncoder, desc->phaseStartTime));
}
}

offset += tsnprintf(subStatus + offset, sizeof(subStatus) - offset,
"%" PRIu64 ":%s:%s", pDesc->tid, pDesc->status, startBuf);
| 11 | sub_num | INT | 子查询数量 |
| 12 | sub_status | BINARY(1000) | 子查询状态 |
| 13 | sql | BINARY(1024) | SQL 语句 |
| 14 | phase_state | BINARY(16) | 查询当前阶段 / 状态 |
schedulerExecFp execFp;
schedulerFetchFp fetchFp;
void *cbParam;
void *pRequest; // Add pointer to request object for phase tracking
Comment on lines +343 to +345
// Phase tracking helper functions
void schSetExecPhase(void *pRequest, int32_t phase);

Improve SHOW QUERIES sub_status with fine-grained task state strings while keeping tid:status:startTime output format. Add a full branch-level documentation note that summarizes feat/addShowQuery changes excluding merge main/3.0 logic.

Made-with: Cursor
Revert the branch-level design notes document commit while keeping the related sub_status implementation and tests unchanged.

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 16, 2026 14:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +468 to +475
if (!tDecodeIsEnd(pDecoder)) {
desc.execPhase = QUERY_PHASE_NONE;
desc.phaseStartTime = 0;
code = tDecodeI32(pDecoder, &desc.execPhase);
TAOS_CHECK_GOTO(code, &line, _error);
code = tDecodeI64(pDecoder, &desc.phaseStartTime);
TAOS_CHECK_GOTO(code, &line, _error);
}
}

offset += tsnprintf(subStatus + offset, sizeof(subStatus) - offset,
"%" PRIu64 ":%s:%s", pDesc->tid, pDesc->status, startBuf);
| 11 | sub_num | INT | 子查询数量 |
| 12 | sub_status | BINARY(1000) | 子查询状态 |
| 13 | sql | BINARY(1024) | SQL 语句 |
| 14 | phase_state | BINARY(16) | 查询当前阶段 / 状态 |
Comment on lines +344 to +345
void schSetExecPhase(void *pRequest, int32_t phase);

SQuerySubDesc *sDesc = taosArrayGet(desc->subDesc, m);
TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->tid));
TAOS_CHECK_RETURN(tEncodeCStr(pEncoder, sDesc->status));
TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->startTs));
TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->tid));
TAOS_CHECK_RETURN(tEncodeCStr(pEncoder, sDesc->status));
TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->startTs));
}
Keep main phase_state names unchanged and convert sub-phase values to */* format for consistency with sub_status naming style.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants