[feature]Add get_table_summary tool #36

uchenily · 2025-07-30T03:25:30Z

Fix SQL execution error caused by trailing whitespace characters

- Fix SQL execution error caused by trailing whitespace characters

FreeOnePlus · 2025-08-04T06:50:39Z

Can this tool be combined with the get_table_schema tool? I don't quite understand the significance of this standalone tool. Can I use parameters to control the calling of the same tool?

uchenily · 2025-08-05T01:57:45Z

You are right, I also believe that there is a problem of functional overlap in the implementation at that time:

get_table_schema: Provides column structure, types, and comments
get_table_summary: Provides table metadata and optional sample data

The sample output of get_table_schema:

{
  "success": true,
  "timestamp": "2025-08-05 09:18:19",
  "result": [
    {
      "column_name": "user_id",
      "data_type": "bigint",
      "is_nullable": false,
      "default_value": null,
      "comment": "",
      "key": "true",
      "extra": ""
    },
    {
      "column_name": "name",
      "data_type": "varchar(20)",
      "is_nullable": false,
      "default_value": null,
      "comment": "",
      "key": "false",
      "extra": "NONE"
    },
    {
      "column_name": "age",
      "data_type": "int",
      "is_nullable": false,
      "default_value": null,
      "comment": "",
      "key": "false",
      "extra": "NONE"
    }
  ],
  "message": "Operation successful",
  "_execution_info": {
    "tool_name": "get_table_schema",
    "execution_time": 0.007,
    "timestamp": "2025-08-05T09:18:19.837167"
  }
}

the sample output of get_table_summary:

{
  "table_name": "test_streamload",
  "comment": "",
  "row_count": 10,
  "create_time": "2025-08-04 09:29:58",
  "engine": "Doris",
  "column_count": 3,
  "columns": [
    {
      "column_name": "user_id",
      "data_type": "bigint",
      "is_nullable": "NO",
      "column_key": "DUP",
      "column_comment": "用户 ID"
    },
    {
      "column_name": "name",
      "data_type": "varchar",
      "is_nullable": "YES",
      "column_key": "",
      "column_comment": "用户姓名"
    },
    {
      "column_name": "age",
      "data_type": "int",
      "is_nullable": "YES",
      "column_key": "",
      "column_comment": "用户年龄"
    }
  ],
  "sample_data": [
    {
      "user_id": 2,
      "name": "Benjamin",
      "age": 35
    },
    {
      "user_id": 3,
      "name": "Olivia",
      "age": 28
    }
  ],
  "_execution_info": {
    "tool_name": "get_table_summary",
    "execution_time": 0.218,
    "timestamp": "2025-08-05T09:29:47.557253"
  }
}

Based on the above information and your suggestions, here I have a simple solution:

Tool Proposal: get_table_information

Parameters

Parameter	Type	Required	Default	Description
table_name	string	Yes	-	Name of the table to analyze
db_name	string	No	Current DB	Target database name
include_schema	boolean	No	true	Whether to include schema information (columns, types, comments)
include_sample	boolean	No	false	Whether to include sample data rows
sample_size	integer	No	3	Number of sample rows to return when include_sample=true

Response Structure

{
  "basic_info": {
    "table_name": "string",
    "db_name": "string",
    "row_count": "number",
    "comment": "string"
  },
  "columns": [
    {
      "name": "string",
      "type": "string",
      "comment": "string",
      "is_nullable": "boolean"
    }
  ],
  "sample_data": [
    {
      // sample rows
    }
  ]
}

What do you think of this proposal? @FreeOnePlus

FreeOnePlus · 2025-08-05T09:36:24Z

I think this is a good suggestion, but during implementation, you might need to be mindful of some edge cases, such as when there's no preview data in the table, or whether to respond with specific responses to help users more clearly understand the results.

I agree with using the get_table_information tool to unify the capabilities of the two tools.

In subsequent commits, please also include screenshots of your tests using tools like Dify and Cursor to ensure proper execution in both Streamable HTTP and Stdio modes. Thank you.
@uchenily

uchenily · 2025-08-06T02:39:37Z

While making modifications according to this plan, I encountered some issues and noticed that there are already many tools available for retrieving table-related information,

such as:

get_table_schema
get_table_comment
get_table_column_comments
get_table_indexes
get_table_data_size
get_table_basic_info

If we were to create a get_table_information tool now, it would inevitably overlap in functionality with these existing tools to some extent.

Currently, I lean toward keeping these tools separate, with each performing a single, specific function. For example, we could add a get_table_sample tool. Perhaps in the future, a unified tool could be introduced to manage all these table-related tools.

@FreeOnePlus Do you have any ideas? Welcome to give me feedback :)

FreeOnePlus · 2025-08-06T09:20:50Z

We may need to discuss and understand the qualitative and positioning of each tool in different scenarios. Frankly, I think there are a bit too many tools currently, especially basic ones. This excessive number of basic tools can lead to context overflow when the context length of some LLMs is insufficient (especially noticeable in small models after distillation), which is detrimental to overall agent and workflow design. Your other suggestion is excellent: perhaps we should consider dynamic, hierarchical convergence capabilities from an architectural perspective, such as basic tools, advanced tools, and advanced tools, or tools for table information, cluster information, analysis tools, and data governance. A two- to three-tiered dynamic design can also effectively alleviate the context issue.

Regarding the second point, I'm also considering contributing to the MCP protocol standard to standardize this issue. Otherwise, we may face additional exceptions caused by the client or host being unable to recognize such operations. Alternatively, you could submit a draft for discussion, and we can focus on this design specifically to fully leverage the benefits of brainstorming and address this issue.

Thank you.
@uchenily

uchenily added 3 commits July 30, 2025 11:22

[feature]Add get_table_summary tool

e8ff2cf

- Fix SQL execution error caused by trailing whitespace characters

Update readme

4e14ece

Make the db_name parameter optional

cbb093c

FreeOnePlus added tools Add new tool Next Release The next Release version will be merged labels Aug 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature]Add get_table_summary tool #36

[feature]Add get_table_summary tool #36

Uh oh!

uchenily commented Jul 30, 2025

Uh oh!

FreeOnePlus commented Aug 4, 2025

Uh oh!

uchenily commented Aug 5, 2025

Uh oh!

FreeOnePlus commented Aug 5, 2025

Uh oh!

uchenily commented Aug 6, 2025

Uh oh!

FreeOnePlus commented Aug 6, 2025

Uh oh!

Uh oh!

[feature]Add get_table_summary tool #36

Are you sure you want to change the base?

[feature]Add get_table_summary tool #36

Uh oh!

Conversation

uchenily commented Jul 30, 2025

Uh oh!

FreeOnePlus commented Aug 4, 2025

Uh oh!

uchenily commented Aug 5, 2025

Tool Proposal: get_table_information

Parameters

Response Structure

Uh oh!

FreeOnePlus commented Aug 5, 2025

Uh oh!

uchenily commented Aug 6, 2025

Uh oh!

FreeOnePlus commented Aug 6, 2025

Uh oh!

Uh oh!