Skip to content

Add LLM-KG-Bench evaluation for the rudof MCP server #492

@samuel-bustamante

Description

@samuel-bustamante

Summary

This issue tracks the implementation of an evaluation framework for the rudof MCP server using a fork of LLM-KG-Bench.

Motivation

The rudof_mcp crate exposes the main functionalities of the rudof library as an MCP (Model Context Protocol) server. To measure how well LLMs can use this server to solve Knowledge Graph tasks we need a systematic evaluation framework.

Approach

We forked LLM-KG-Bench into rudof-project/LLM-KG-Bench-rudof and will adapt it to support MCP servers, allowing LLMs to interact with external tools through the Model Context Protocol during evaluation. This will enable us to evaluate the rudof MCP server specifically.

Tasks

  • Adapt LLM-KG-Bench to support MCP servers as tool providers for LLMs during evaluation
  • Document how to run the benchmark against the rudof MCP server
  • Add BENCHMARKING.md to the rudof_mcp crate linking to the benchmark repo

Metadata

Metadata

Labels

MCPRelated to Model Context ProtocoltestsRelated to the Tests of the implemented features

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions