Skip to content

[feature]Add get_table_summary tool #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

uchenily
Copy link
Contributor

  • Fix SQL execution error caused by trailing whitespace characters

uchenily added 3 commits July 30, 2025 11:22
- Fix SQL execution error caused by trailing whitespace characters
@FreeOnePlus
Copy link
Member

Can this tool be combined with the get_table_schema tool? I don't quite understand the significance of this standalone tool. Can I use parameters to control the calling of the same tool?

@uchenily
Copy link
Contributor Author

uchenily commented Aug 5, 2025

You are right, I also believe that there is a problem of functional overlap in the implementation at that time:

  • get_table_schema: Provides column structure, types, and comments
  • get_table_summary: Provides table metadata and optional sample data

The sample output of get_table_schema:

{
  "success": true,
  "timestamp": "2025-08-05 09:18:19",
  "result": [
    {
      "column_name": "user_id",
      "data_type": "bigint",
      "is_nullable": false,
      "default_value": null,
      "comment": "",
      "key": "true",
      "extra": ""
    },
    {
      "column_name": "name",
      "data_type": "varchar(20)",
      "is_nullable": false,
      "default_value": null,
      "comment": "",
      "key": "false",
      "extra": "NONE"
    },
    {
      "column_name": "age",
      "data_type": "int",
      "is_nullable": false,
      "default_value": null,
      "comment": "",
      "key": "false",
      "extra": "NONE"
    }
  ],
  "message": "Operation successful",
  "_execution_info": {
    "tool_name": "get_table_schema",
    "execution_time": 0.007,
    "timestamp": "2025-08-05T09:18:19.837167"
  }
}

the sample output of get_table_summary:

{
  "table_name": "test_streamload",
  "comment": "",
  "row_count": 10,
  "create_time": "2025-08-04 09:29:58",
  "engine": "Doris",
  "column_count": 3,
  "columns": [
    {
      "column_name": "user_id",
      "data_type": "bigint",
      "is_nullable": "NO",
      "column_key": "DUP",
      "column_comment": "用户 ID"
    },
    {
      "column_name": "name",
      "data_type": "varchar",
      "is_nullable": "YES",
      "column_key": "",
      "column_comment": "用户姓名"
    },
    {
      "column_name": "age",
      "data_type": "int",
      "is_nullable": "YES",
      "column_key": "",
      "column_comment": "用户年龄"
    }
  ],
  "sample_data": [
    {
      "user_id": 2,
      "name": "Benjamin",
      "age": 35
    },
    {
      "user_id": 3,
      "name": "Olivia",
      "age": 28
    }
  ],
  "_execution_info": {
    "tool_name": "get_table_summary",
    "execution_time": 0.218,
    "timestamp": "2025-08-05T09:29:47.557253"
  }
}

Based on the above information and your suggestions, here I have a simple solution:

Tool Proposal: get_table_information

Parameters

Parameter Type Required Default Description
table_name string Yes - Name of the table to analyze
db_name string No Current DB Target database name
include_schema boolean No true Whether to include schema information (columns, types, comments)
include_sample boolean No false Whether to include sample data rows
sample_size integer No 3 Number of sample rows to return when include_sample=true

Response Structure

{
  "basic_info": {
    "table_name": "string",
    "db_name": "string",
    "row_count": "number",
    "comment": "string"
  },
  "columns": [
    {
      "name": "string",
      "type": "string",
      "comment": "string",
      "is_nullable": "boolean"
    }
  ],
  "sample_data": [
    {
      // sample rows
    }
  ]
}

What do you think of this proposal? @FreeOnePlus

@FreeOnePlus
Copy link
Member

I think this is a good suggestion, but during implementation, you might need to be mindful of some edge cases, such as when there's no preview data in the table, or whether to respond with specific responses to help users more clearly understand the results.

I agree with using the get_table_information tool to unify the capabilities of the two tools.

In subsequent commits, please also include screenshots of your tests using tools like Dify and Cursor to ensure proper execution in both Streamable HTTP and Stdio modes. Thank you.
@uchenily

@FreeOnePlus FreeOnePlus added tools Add new tool Next Release The next Release version will be merged labels Aug 5, 2025
@uchenily
Copy link
Contributor Author

uchenily commented Aug 6, 2025

While making modifications according to this plan, I encountered some issues and noticed that there are already many tools available for retrieving table-related information,

such as:

  • get_table_schema
  • get_table_comment
  • get_table_column_comments
  • get_table_indexes
  • get_table_data_size
  • get_table_basic_info

If we were to create a get_table_information tool now, it would inevitably overlap in functionality with these existing tools to some extent.

Currently, I lean toward keeping these tools separate, with each performing a single, specific function. For example, we could add a get_table_sample tool. Perhaps in the future, a unified tool could be introduced to manage all these table-related tools.

@FreeOnePlus Do you have any ideas? Welcome to give me feedback :)

@FreeOnePlus
Copy link
Member

We may need to discuss and understand the qualitative and positioning of each tool in different scenarios. Frankly, I think there are a bit too many tools currently, especially basic ones. This excessive number of basic tools can lead to context overflow when the context length of some LLMs is insufficient (especially noticeable in small models after distillation), which is detrimental to overall agent and workflow design. Your other suggestion is excellent: perhaps we should consider dynamic, hierarchical convergence capabilities from an architectural perspective, such as basic tools, advanced tools, and advanced tools, or tools for table information, cluster information, analysis tools, and data governance. A two- to three-tiered dynamic design can also effectively alleviate the context issue.

Regarding the second point, I'm also considering contributing to the MCP protocol standard to standardize this issue. Otherwise, we may face additional exceptions caused by the client or host being unable to recognize such operations. Alternatively, you could submit a draft for discussion, and we can focus on this design specifically to fully leverage the benefits of brainstorming and address this issue.

Thank you.
@uchenily

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Next Release The next Release version will be merged tools Add new tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants