feat: add query phase tracking for SHOW QUERIES#34706
feat: add query phase tracking for SHOW QUERIES#34706
Conversation
Add current_phase and action_start_time fields to track query execution stages: - 0=query, 1=fetch, 2=query_callback, 3=fetch_callback This helps monitor what phase a query is in and how long each phase takes.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the observability of query execution by adding detailed phase tracking. It introduces new fields to monitor the current stage of a query and the timestamp when that stage began, making this information accessible through the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces query phase tracking for SHOW QUERIES by adding current_phase and action_start_time fields. The changes are well-contained and correctly implemented across the data structures, client logic, and server-side display logic. My main suggestion is to introduce an enum for the query phases to replace the magic numbers currently used, which will enhance code readability and maintainability. I've also provided a suggestion to strengthen the new test case for timing accuracy.
Note: Security Review did not run due to the size of the PR.
There was a problem hiding this comment.
Pull request overview
This PR adds query execution phase tracking for the SHOW QUERIES command in TDengine. It introduces two new columns (current_phase and action_start_time) to the query schema, tracking which execution stage (query, fetch, query_callback, fetch_callback) a query is in and when that stage began.
Changes:
- New
currentPhaseandactionStartTimefields added toSRequestObjandSQueryDescstructs, with lifecycle tracking at each execution phase - Heartbeat serialization/deserialization updated to transmit the new fields to the MNode, and MNode updated to pack them into the
SHOW QUERIESblock - New test file added to validate the new columns and phase values
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
include/common/tmsg.h |
Adds currentPhase and actionStartTime fields to SQueryDesc |
source/client/inc/clientInt.h |
Adds the same fields to SRequestObj |
source/client/src/clientEnv.c |
Initializes new fields; accidentally removes msgBuf allocation |
source/client/src/clientMain.c |
Sets phase=0 at query start, phase=1 at fetch start |
source/client/src/clientImpl.c |
Transitions phase to 2/3 in doRequestCallback |
source/client/src/clientHb.c |
Copies new fields into heartbeat descriptor |
source/common/src/msg/tmsg.c |
Encodes/decodes new fields in heartbeat (breaking wire change) |
source/common/src/systable.c |
Adds two new columns to querySchema |
source/dnode/mnode/impl/src/mndProfile.c |
Packs phase string and start time into SHOW QUERIES result block |
test/cases/24-Users/test_query_phase_tracking.py |
New test file for the feature |
Comments suppressed due to low confidence (1)
source/client/src/clientEnv.c:604
- The line
(*pRequest)->msgBuf = taosMemoryCalloc(1, ERROR_MSG_BUF_DEFAULT_SIZE);was accidentally removed fromcreateRequest(). Since*pRequestis zero-initialized viataosMemoryCalloc,msgBufwill always be NULL, causing the null-check on line 601 to always trigger andcreateRequestto always fail. This breaks all query requests, asmsgBufis used by the parse context in multiple places (e.g.,clientMain.c:1964,clientImpl.c:378,clientImpl.c:600).
if (NULL == (*pRequest)->msgBuf) {
code = terrno;
goto _return;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Use EQueryExecPhase enum (none/parse/catalog/plan/schedule/execute/fetch/done) instead of raw integer phases. Fix field name mismatches, serialization order, and backward-compatible deserialization for SHOW QUERIES phase tracking. Made-with: Cursor
Extend SQuerySubDesc with startTs/endTs from scheduler task profile. Update sub_status format to tid:status:startMs:endMs for each sub-task. Backward-compatible serialization via tDecodeIsEnd guard. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Remove endTs from sub_status (always 0 for active queries) - Change startTs display from unix timestamp to human-readable format (YYYY-MM-DD HH:MM:SS.mmm) - Set startTs at task init time so INIT state tasks have creation time - Fix test API calls and add database cleanup in setup Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Made-with: Cursor
…se_start_time Made-with: Cursor
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | 11 | sub_num | INT | 子查询数量 | | ||
| | 12 | sub_status | BINARY(1000) | 子查询状态 | | ||
| | 13 | sql | BINARY(1024) | SQL 语句 | | ||
| | 14 | user_app | BINARY(24) | 应用名称(由客户端设置) | |
There was a problem hiding this comment.
14/15两列是怎么来的?感觉有点重复,这个信息是在connection里展示的
| int64_t startTs; // sub-task first execution start time, us | ||
| } SQuerySubDesc; | ||
|
|
||
| typedef enum EQueryExecPhase { |
There was a problem hiding this comment.
加这个字段的本意不是想显示这些阶段,但是显示也无所谓,我们要核心解决的一个问题是SCHEDULE、FETCH阶段再细分才能区分当前具体在做什么,所以这两个阶段还要再细分一下,比如FETCH阶段目前是在哪一步,是在服务端处理还是响应处理等。
- Introduced new phases and sub-phases for query execution in `tmsg.h`. - Added `schedulerGetJobPhase` function to retrieve job execution phase. - Updated `clientHb.c`, `clientImpl.c`, and `scheduler.c` to utilize new phase tracking. - Enhanced `test_query_phase_tracking.py` to validate new phases in query output.
- Fix phase_state column width from 16 to 32 bytes to hold longest phase string (fetch:preparing_response = 25 chars) - Fix variable shadowing in clientHb.c (code redeclared inside hbBuildQueryDesc) - Fix QUERY_PHASE_PLAN never being set; planner phase was incorrectly mapped to SCHEDULE_PLANNING - Fix ANALYSIS phase immediately overwritten by PLANNING in schedulerExecJob; insert real work (schSwitchJobStatus INIT) between them - Fix FETCH_CLIENT_REQUEST immediately overwritten by SERVER_PROCESSING in client; remove client-side fetch sub-phase writes, let scheduler be the single authority - Unify phase write ownership: client writes main phases (PARSE, CATALOG, PLAN, SCHEDULE, EXECUTE, FETCH, DONE), scheduler writes sub-phases (SCHEDULE_*, EXEC_*, FETCH_*) - Fix non-atomic writes to execPhase/phaseStartTime in doRequestCallback and asyncExecSchQuery - Fix concurrent scan tasks overwriting phaseStartTime; use CAS check in schBuildAndSendMsg - Remove dead QUERY_PHASE_SCHEDULE_RESOURCE_ALLOC enum value and reorder enum groups (4x SCHEDULE, 5x EXECUTE, 6x FETCH) - Improve test reliability and add phase_state max length test Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!tDecodeIsEnd(pDecoder)) { | ||
| desc.execPhase = QUERY_PHASE_NONE; | ||
| desc.phaseStartTime = 0; | ||
| code = tDecodeI32(pDecoder, &desc.execPhase); | ||
| TAOS_CHECK_GOTO(code, &line, _error); | ||
| code = tDecodeI64(pDecoder, &desc.phaseStartTime); | ||
| TAOS_CHECK_GOTO(code, &line, _error); | ||
| } |
| | 13 | sql | BINARY(1024) | SQL 语句 | | ||
| | 14 | user_app | BINARY(24) | 应用名称(由客户端设置) | | ||
| | 15 | user_ip | BINARY(16) | 应用所使用的 IP 地址 (由客户端设置) | | ||
| | 16 | phase_state | BINARY(16) | 查询当前阶段 / 状态 | |
|
|
||
| print("test phase state max length ....................... [passed]") | ||
|
|
||
| def cleanup_class(cls): |
|
|
||
| for row in range(tdSql.getRows()): | ||
| if phase_idx >= 0: | ||
| phase_value = tdSql.getData(row, phase_idx) | ||
| tdLog.info(f"Row {row} phase: {phase_value}") | ||
| assert phase_value in self.VALID_PHASES, \ | ||
| f"Phase should be one of {self.VALID_PHASES}, got {phase_value}" |
| tdSql.query(f"select count(*) from db2.stb2 group by tbname") | ||
|
|
||
| tdSql.query(f"show queries") | ||
| if tdSql.getRows() > 0: | ||
| col_names = [desc[0] for desc in tdSql.cursor.description] | ||
| sub_status_idx = self._get_col_idx(col_names, "sub_status") | ||
| sub_num_idx = self._get_col_idx(col_names, "sub_num") | ||
|
|
||
| if sub_num_idx >= 0: | ||
| sub_num = tdSql.getData(0, sub_num_idx) | ||
| tdLog.info(f"Sub plan num: {sub_num}") | ||
|
|
||
| if sub_status_idx >= 0: | ||
| sub_status = tdSql.getData(0, sub_status_idx) | ||
| tdLog.info(f"Sub status: {sub_status}") | ||
| if sub_status: | ||
| parts = sub_status.split(",") | ||
| for part in parts: | ||
| fields = part.split(":", 2) | ||
| tdLog.info(f" Sub-task fields: {fields}") | ||
| assert len(fields) == 3, \ | ||
| f"sub_status entry should have 3 fields (tid:status:startTime), got {len(fields)}: {part}" | ||
| tid_str, status, start_time = fields | ||
| assert tid_str.isdigit(), f"tid should be numeric, got: {tid_str}" | ||
| assert len(status) > 0, f"status should not be empty" | ||
| if start_time != "-": | ||
| assert "." in start_time, \ | ||
| f"startTime should be human-readable (YYYY-MM-DD HH:MM:SS.ms) or '-', got: {start_time}" | ||
|
|
| SQuerySubDesc *sDesc = taosArrayGet(desc->subDesc, m); | ||
| TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->tid)); | ||
| TAOS_CHECK_RETURN(tEncodeCStr(pEncoder, sDesc->status)); | ||
| TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->startTs)); |
| } | ||
|
|
||
| offset += tsnprintf(subStatus + offset, sizeof(subStatus) - offset, | ||
| "%" PRIu64 ":%s:%s", pDesc->tid, pDesc->status, startBuf); |
| schedulerExecFp execFp; | ||
| schedulerFetchFp fetchFp; | ||
| void *cbParam; | ||
| void *pRequest; // Add pointer to request object for phase tracking |
| def test_phase_state_max_length(self): | ||
| """MaxLen: Verify phase_state column can hold the longest phase string | ||
|
|
||
| 1. The longest phase string is 'fetch:preparing_response' (25 chars) |
When a super table query spans multiple vnodes, the scheduler dispatches scan tasks to each vnode then waits for all to complete before launching the merge task. This transition was previously invisible. Add QUERY_PHASE_EXEC_WAITING_CHILDREN (53) to track the window between the first scan task completing and the merge task launching. Uses CAS to set the phase only once when transitioning from EXEC_DATA_QUERY. SHOW QUERIES phase_state now shows: execute:data_query → execute:waiting → execute:merge_query Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| offset += tsnprintf(subStatus + offset, sizeof(subStatus) - offset, | ||
| "%" PRIu64 ":%s:%s", pDesc->tid, pDesc->status, startBuf); |
| | 11 | sub_num | INT | 子查询数量 | | ||
| | 12 | sub_status | BINARY(1000) | 子查询状态 | | ||
| | 13 | sql | BINARY(1024) | SQL 语句 | | ||
| | 14 | phase_state | BINARY(16) | 查询当前阶段 / 状态 | |
source/client/src/clientImpl.c
Outdated
| atomic_store_32((int32_t*)&pRequest->execPhase, QUERY_PHASE_PLAN); | ||
| atomic_store_64((int64_t*)&pRequest->phaseStartTime, taosGetTimestampMs()); |
| if (!tDecodeIsEnd(pDecoder)) { | ||
| desc.execPhase = QUERY_PHASE_NONE; | ||
| desc.phaseStartTime = 0; | ||
| code = tDecodeI32(pDecoder, &desc.execPhase); | ||
| TAOS_CHECK_GOTO(code, &line, _error); | ||
| code = tDecodeI64(pDecoder, &desc.phaseStartTime); | ||
| TAOS_CHECK_GOTO(code, &line, _error); | ||
| } |
Wrap all execPhase/phaseStartTime atomic stores with a guard that checks whether the phase value is actually different before writing. This prevents phaseStartTime from being silently reset while the query remains in the same phase. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | 14 | phase_state | BINARY(16) | 查询当前阶段 / 状态 | | ||
| | 15 | phase_start_time | TIMESTAMP | 当前阶段的开始时间 | |
| if (!tDecodeIsEnd(pDecoder)) { | ||
| desc.execPhase = QUERY_PHASE_NONE; | ||
| desc.phaseStartTime = 0; | ||
| code = tDecodeI32(pDecoder, &desc.execPhase); | ||
| TAOS_CHECK_GOTO(code, &line, _error); |
| for (int32_t m = 0; m < snum; ++m) { | ||
| SQuerySubDesc *sDesc = taosArrayGet(desc->subDesc, m); | ||
| TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->tid)); | ||
| TAOS_CHECK_RETURN(tEncodeCStr(pEncoder, sDesc->status)); | ||
| TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->startTs)); | ||
| } | ||
| TAOS_CHECK_RETURN(tEncodeI32(pEncoder, desc->execPhase)); | ||
| TAOS_CHECK_RETURN(tEncodeI64(pEncoder, desc->phaseStartTime)); | ||
| } |
| } | ||
|
|
||
| offset += tsnprintf(subStatus + offset, sizeof(subStatus) - offset, | ||
| "%" PRIu64 ":%s:%s", pDesc->tid, pDesc->status, startBuf); |
| | 11 | sub_num | INT | 子查询数量 | | ||
| | 12 | sub_status | BINARY(1000) | 子查询状态 | | ||
| | 13 | sql | BINARY(1024) | SQL 语句 | | ||
| | 14 | phase_state | BINARY(16) | 查询当前阶段 / 状态 | |
| schedulerExecFp execFp; | ||
| schedulerFetchFp fetchFp; | ||
| void *cbParam; | ||
| void *pRequest; // Add pointer to request object for phase tracking |
| // Phase tracking helper functions | ||
| void schSetExecPhase(void *pRequest, int32_t phase); | ||
|
|
Improve SHOW QUERIES sub_status with fine-grained task state strings while keeping tid:status:startTime output format. Add a full branch-level documentation note that summarizes feat/addShowQuery changes excluding merge main/3.0 logic. Made-with: Cursor
Revert the branch-level design notes document commit while keeping the related sub_status implementation and tests unchanged. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!tDecodeIsEnd(pDecoder)) { | ||
| desc.execPhase = QUERY_PHASE_NONE; | ||
| desc.phaseStartTime = 0; | ||
| code = tDecodeI32(pDecoder, &desc.execPhase); | ||
| TAOS_CHECK_GOTO(code, &line, _error); | ||
| code = tDecodeI64(pDecoder, &desc.phaseStartTime); | ||
| TAOS_CHECK_GOTO(code, &line, _error); | ||
| } |
| } | ||
|
|
||
| offset += tsnprintf(subStatus + offset, sizeof(subStatus) - offset, | ||
| "%" PRIu64 ":%s:%s", pDesc->tid, pDesc->status, startBuf); |
| | 11 | sub_num | INT | 子查询数量 | | ||
| | 12 | sub_status | BINARY(1000) | 子查询状态 | | ||
| | 13 | sql | BINARY(1024) | SQL 语句 | | ||
| | 14 | phase_state | BINARY(16) | 查询当前阶段 / 状态 | |
| void schSetExecPhase(void *pRequest, int32_t phase); | ||
|
|
| SQuerySubDesc *sDesc = taosArrayGet(desc->subDesc, m); | ||
| TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->tid)); | ||
| TAOS_CHECK_RETURN(tEncodeCStr(pEncoder, sDesc->status)); | ||
| TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->startTs)); |
| TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->tid)); | ||
| TAOS_CHECK_RETURN(tEncodeCStr(pEncoder, sDesc->status)); | ||
| TAOS_CHECK_RETURN(tEncodeI64(pEncoder, sDesc->startTs)); | ||
| } |
Keep main phase_state names unchanged and convert sub-phase values to */* format for consistency with sub_status naming style. Made-with: Cursor
Add current_phase and action_start_time fields to track query execution stages:
This helps monitor what phase a query is in and how long each phase takes.
Description
Issue(s)
Checklist
Please check the items in the checklist if applicable.