Skip to content

Commit d6c9974

Browse files
committed
Add command-line interface and improve documentation
- Add bin/llmbench executable script for CLI usage - Update main() function to work as CLI entry point - Add CI badges to README - Expand documentation with CLI usage examples - Add examples for both LLMBenchSimple and custom implementations - Fix argument handling for both programmatic and CLI usage
1 parent 4c148ac commit d6c9974

File tree

3 files changed

+51
-16
lines changed

3 files changed

+51
-16
lines changed

README.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# LLMBenchMCPServer.jl
22

3+
[![CI](https://github.com/JuliaComputing/LLMBenchMCPServer.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/JuliaComputing/LLMBenchMCPServer.jl/actions/workflows/CI.yml)
4+
[![codecov](https://codecov.io/gh/JuliaComputing/LLMBenchMCPServer.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/JuliaComputing/LLMBenchMCPServer.jl)
5+
36
A Julia package that implements the full taiga spec as an MCP (Model Context Protocol) server for LLM benchmarking.
47

58
## Features
@@ -50,14 +53,45 @@ run_stdio_server(server)
5053

5154
### Module-Based Execution
5255

56+
#### Command Line Usage
57+
58+
```bash
59+
# Using the provided script
60+
./bin/llmbench MyBenchmarkModule [options]
61+
62+
# Or directly with Julia
63+
julia --project -e 'using LLMBenchMCPServer; LLMBenchMCPServer.main()' -- MyBenchmarkModule [options]
64+
```
65+
66+
Options:
67+
- `--workdir PATH`: Set the working directory (default: current directory)
68+
- `--no-basic-tools`: Disable basic tools (bash, str_replace_editor)
69+
- `--verbose`: Enable verbose output
70+
- `--help, -h`: Show help message
71+
72+
#### Programmatic Usage
73+
5374
```julia
5475
# Run with a benchmark module
5576
# The module must export setup_problem and grade functions
56-
LLMBenchMCPServer.main("MyBenchmarkModule")
77+
LLMBenchMCPServer.main(["MyBenchmarkModule", "--verbose"])
5778
```
5879

5980
### Creating a Benchmark Module
6081

82+
#### Option 1: Using LLMBenchSimple
83+
84+
```julia
85+
module MyBenchmark
86+
using LLMBenchSimple
87+
88+
@bench "addition" prompt"What is 2 + 2?" == 4
89+
@bench "capital" prompt"What is the capital of France?" == "Paris"
90+
end
91+
```
92+
93+
#### Option 2: Custom Implementation
94+
6195
```julia
6296
module MyBenchmark
6397

bin/llmbench

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/usr/bin/env -S julia --project
2+
3+
# Entry point script for LLMBenchMCPServer
4+
using LLMBenchMCPServer
5+
6+
# Pass command line arguments to main
7+
LLMBenchMCPServer.main(ARGS)

src/server.jl

Lines changed: 9 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -40,25 +40,19 @@ function LLMBenchServer(;
4040
return server
4141
end
4242

43-
"""
44-
main(args=ARGS)
45-
46-
Main entry point for the LLMBenchMCPServer.
47-
48-
Usage:
49-
julia --project -e 'using LLMBenchMCPServer; LLMBenchMCPServer.main()' -- ModuleName [--workdir /path]
50-
51-
The module should export:
52-
- `setup_problem(workdir::String)` - Returns problem description
53-
- `grade(workdir::String, transcript::String)` - Returns grading result
54-
"""
43+
# Main entry point for the LLMBenchMCPServer.
44+
# Usage: julia --project -m LLMBenchMCPServer ModuleName [--workdir /path]
5545
function main(args=ARGS)
56-
if isempty(args) || args[1] in ["--help", "-h"]
46+
# Handle both array and varargs inputs
47+
if isa(args, Tuple)
48+
args = collect(args)
49+
end
50+
if isempty(args) || (length(args) == 1 && args[1] in ["--help", "-h"])
5751
println("""
5852
LLMBenchMCPServer - MCP server for LLM benchmarking
5953
6054
Usage:
61-
julia --project -e 'using LLMBenchMCPServer; LLMBenchMCPServer.main()' -- ModuleName [options]
55+
julia --project -m LLMBenchMCPServer ModuleName [options]
6256
6357
Arguments:
6458
ModuleName Name of the module containing setup_problem and grade functions
@@ -77,7 +71,7 @@ function main(args=ARGS)
7771
Returns grading result with subscores, weights, and total score
7872
7973
Example:
80-
julia --project -e 'using LLMBenchMCPServer; LLMBenchMCPServer.main()' -- MyBenchmark
74+
julia --project -m LLMBenchMCPServer MyBenchmark
8175
""")
8276
return 0
8377
end

0 commit comments

Comments
 (0)