-
Notifications
You must be signed in to change notification settings - Fork 46
[feature]Add get_table_summary tool #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
uchenily
commented
Jul 30, 2025
- Fix SQL execution error caused by trailing whitespace characters
- Fix SQL execution error caused by trailing whitespace characters
Can this tool be combined with the get_table_schema tool? I don't quite understand the significance of this standalone tool. Can I use parameters to control the calling of the same tool? |
You are right, I also believe that there is a problem of functional overlap in the implementation at that time:
The sample output of {
"success": true,
"timestamp": "2025-08-05 09:18:19",
"result": [
{
"column_name": "user_id",
"data_type": "bigint",
"is_nullable": false,
"default_value": null,
"comment": "",
"key": "true",
"extra": ""
},
{
"column_name": "name",
"data_type": "varchar(20)",
"is_nullable": false,
"default_value": null,
"comment": "",
"key": "false",
"extra": "NONE"
},
{
"column_name": "age",
"data_type": "int",
"is_nullable": false,
"default_value": null,
"comment": "",
"key": "false",
"extra": "NONE"
}
],
"message": "Operation successful",
"_execution_info": {
"tool_name": "get_table_schema",
"execution_time": 0.007,
"timestamp": "2025-08-05T09:18:19.837167"
}
} the sample output of {
"table_name": "test_streamload",
"comment": "",
"row_count": 10,
"create_time": "2025-08-04 09:29:58",
"engine": "Doris",
"column_count": 3,
"columns": [
{
"column_name": "user_id",
"data_type": "bigint",
"is_nullable": "NO",
"column_key": "DUP",
"column_comment": "用户 ID"
},
{
"column_name": "name",
"data_type": "varchar",
"is_nullable": "YES",
"column_key": "",
"column_comment": "用户姓名"
},
{
"column_name": "age",
"data_type": "int",
"is_nullable": "YES",
"column_key": "",
"column_comment": "用户年龄"
}
],
"sample_data": [
{
"user_id": 2,
"name": "Benjamin",
"age": 35
},
{
"user_id": 3,
"name": "Olivia",
"age": 28
}
],
"_execution_info": {
"tool_name": "get_table_summary",
"execution_time": 0.218,
"timestamp": "2025-08-05T09:29:47.557253"
}
} Based on the above information and your suggestions, here I have a simple solution: Tool Proposal: get_table_informationParameters
Response Structure{
"basic_info": {
"table_name": "string",
"db_name": "string",
"row_count": "number",
"comment": "string"
},
"columns": [
{
"name": "string",
"type": "string",
"comment": "string",
"is_nullable": "boolean"
}
],
"sample_data": [
{
// sample rows
}
]
} What do you think of this proposal? @FreeOnePlus |
I think this is a good suggestion, but during implementation, you might need to be mindful of some edge cases, such as when there's no preview data in the table, or whether to respond with specific responses to help users more clearly understand the results. I agree with using the get_table_information tool to unify the capabilities of the two tools. In subsequent commits, please also include screenshots of your tests using tools like Dify and Cursor to ensure proper execution in both Streamable HTTP and Stdio modes. Thank you. |
While making modifications according to this plan, I encountered some issues and noticed that there are already many tools available for retrieving table-related information, such as:
If we were to create a Currently, I lean toward keeping these tools separate, with each performing a single, specific function. For example, we could add a @FreeOnePlus Do you have any ideas? Welcome to give me feedback :) |
We may need to discuss and understand the qualitative and positioning of each tool in different scenarios. Frankly, I think there are a bit too many tools currently, especially basic ones. This excessive number of basic tools can lead to context overflow when the context length of some LLMs is insufficient (especially noticeable in small models after distillation), which is detrimental to overall agent and workflow design. Your other suggestion is excellent: perhaps we should consider dynamic, hierarchical convergence capabilities from an architectural perspective, such as basic tools, advanced tools, and advanced tools, or tools for table information, cluster information, analysis tools, and data governance. A two- to three-tiered dynamic design can also effectively alleviate the context issue. Regarding the second point, I'm also considering contributing to the MCP protocol standard to standardize this issue. Otherwise, we may face additional exceptions caused by the client or host being unable to recognize such operations. Alternatively, you could submit a draft for discussion, and we can focus on this design specifically to fully leverage the benefits of brainstorming and address this issue. Thank you. |