feat: Add comprehensive rule management tools for Prometheus #3

oberkem · 2025-07-18T21:54:08Z

Add create_recording_rule and create_alerting_rule tools
Add update_rule and delete_rule tools for rule management
Add validate_rule tool for PromQL expression validation
Add list_rule_files and get_rule_file_content utilities
Add reload_config tool for triggering Prometheus configuration reloads
Implement comprehensive file-based rule management with YAML support
Add automatic rule group creation and cleanup
Add JSON-based label and annotation support
Include comprehensive test suite with 11 test cases
Add detailed documentation in RULE_MANAGEMENT_TOOLS.md

This addresses the missing rule management capability identified in the original MCP server, providing a complete solution for creating, updating, and deleting both recording and alerting rules.

- Add create_recording_rule and create_alerting_rule tools - Add update_rule and delete_rule tools for rule management - Add validate_rule tool for PromQL expression validation - Add list_rule_files and get_rule_file_content utilities - Add reload_config tool for triggering Prometheus configuration reloads - Implement comprehensive file-based rule management with YAML support - Add automatic rule group creation and cleanup - Add JSON-based label and annotation support - Include comprehensive test suite with 11 test cases - Add detailed documentation in RULE_MANAGEMENT_TOOLS.md This addresses the missing rule management capability identified in the original MCP server, providing a complete solution for creating, updating, and deleting both recording and alerting rules. Signed-off-by: Orcun Berkem <[email protected]>

tjhop

Thanks! I took a first pass and have a few thoughts.

Like we discussed on slack, considering Prometheus itself doesn't have any kind of rule management support in the API, we kinda need to resort to file management and triggered reloads to do manage rules. It's awkward, but we're working with what we have.

Philosophical notes:

this approach currently side steps any form of gitops/configuration management style approaches, and may even conflict if the given instance is already under any kind of rule file management
this will only work if the prometheus-mcp-server has write access to the same storage that the prometheus instance has access to. this means like it likely needs to run alongside prometheus and/or potentially have volume mounts adjusted etc
this would only update the prometheus it is connected to (theoretically), yet it's very common to run prometheus in HA duplicate pairs, thoughts on how to handle that scenario? idk if I'd trust LLMs enough to re-make the exact same rules/edits against multiple prometheus instances if eg a user connects the prometheus-mcp-server multiple times on different ports with different prometheus urls configured, etc.

Technical notes:

was this AI generated? things hinting at it to me:
- the doc style, and the addition of a doc file for just a few tools that discusses everything from usage to go tests, which feels AI summation-like
- regurgitation of ioutil.ReadFile() calls that have been deprecated for years (yet littered across the internet and making great training material for AI lol)
- some scattered odd syntax/formattings that should've been caught by the linter, but apparently I have it enabled to approve workflow runs, so should've done that sooner 😅
left some notes on the /-/reload endpoint support, but that needs some adjustments please
a very general request to not re-invent the wheel and try to base off of and use upstream as much as possible. the beautiful part about having prometheus written in go is that we can literally use prometheus as a library, and they've already done a lot of the hard work for us. the rules and rulefmt packages will be extremely useful here. this should eliminate the need for a lot of the custom types here, dramatically reduce the test coverage needed since we benefit from prometheus' testing, etc.

tjhop · 2025-07-21T02:35:35Z

pkg/mcp/rule_tools.go

+func validateRuleToolHandler(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
+	expr, err := request.RequireString("expr")
+	if err != nil {
+		return mcp.NewToolResultError("expr must be a string"), nil
+	}
+
+	// Use the existing parse query API call if available
+	isValid, err := validatePromQLExpression(ctx, expr)
+	if err != nil {
+		return mcp.NewToolResultError(fmt.Sprintf("Failed to validate expression: %v", err)), nil
+	}
+
+	if isValid {
+		return mcp.NewToolResultText(fmt.Sprintf("PromQL expression is valid: %s", expr)), nil
+	}
+
+	return mcp.NewToolResultText(fmt.Sprintf("PromQL expression is invalid: %s", expr)), nil
+}


Can we refactor this to better lean on upstream rule group validation?
https://pkg.go.dev/github.com/prometheus/[email protected]/model/rulefmt#RuleGroups.Validate

tjhop · 2025-07-21T02:43:26Z

pkg/mcp/rule_utils.go

+		return &PrometheusRule{Groups: []RuleGroup{}}, nil
+	}
+
+	data, err := ioutil.ReadFile(filePath)


ioutil has been deprecated for years now

tjhop · 2025-07-21T02:45:00Z

pkg/mcp/rule_utils.go

+	if prometheusURL == "" {
+		// For now, we'll assume the user provides the URL
+		// In a full implementation, you'd get this from the global configuration
+		return fmt.Errorf("prometheus_url must be provided")


static error doesn't need formatting string -- linter should catch this

tjhop · 2025-07-21T02:51:27Z

pkg/mcp/rule_utils.go

+	// Make HTTP POST request to reload endpoint
+	req, err := http.NewRequestWithContext(ctx, "POST", prometheusURL, nil)
+	if err != nil {
+		return fmt.Errorf("failed to create request: %w", err)
+	}
+
+	client := &http.Client{}
+	resp, err := client.Do(req)
+	if err != nil {
+		return fmt.Errorf("failed to make request: %w", err)
+	}
+	defer resp.Body.Close()
+
+	if resp.StatusCode != http.StatusOK {
+		return fmt.Errorf("reload request failed with status %d", resp.StatusCode)
+	}


We should reuse the client/transport from the v1 API client -- that's what has any HTTP/auth configs set on it from the --http.config flag. Creating a naked request and using the default client negates all of that. This will fail if a user has basic auth set, etc etc.

Better yet, we should discuss getting first party support for Prometheus' management API (which includes the /-/reload endpoint) into prometheus/client_golang so we don't have to put together our own HTTP handling here. I've already brought this up and discussed with @bwplotka a bit on slack, but there's more work to do there:
https://cloud-native.slack.com/archives/C01AUBA4PFE/p1747158200164229

tjhop requested changes Jul 21, 2025

View reviewed changes

tjhop mentioned this pull request Jul 26, 2025

Consider supporting 3rd party Prometheus HTTP API implementations via flag (ex: --backend=thanos) #4

Open

oberkem marked this pull request as draft July 28, 2025 22:25

tjhop force-pushed the main branch from 668e40e to ef44bd9 Compare August 26, 2025 16:48

tjhop mentioned this pull request Oct 31, 2025

feat: allow customizing tools for select backends, add thanos support #58

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add comprehensive rule management tools for Prometheus #3

feat: Add comprehensive rule management tools for Prometheus #3

Uh oh!

oberkem commented Jul 18, 2025

Uh oh!

tjhop left a comment

Uh oh!

tjhop Jul 21, 2025

Uh oh!

tjhop Jul 21, 2025

Uh oh!

tjhop Jul 21, 2025

Uh oh!

tjhop Jul 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add comprehensive rule management tools for Prometheus #3

Are you sure you want to change the base?

feat: Add comprehensive rule management tools for Prometheus #3

Uh oh!

Conversation

oberkem commented Jul 18, 2025

Uh oh!

tjhop left a comment

Choose a reason for hiding this comment

Uh oh!

tjhop Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

tjhop Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

tjhop Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

tjhop Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants