|
| 1 | +# EvalOps CLI |
| 2 | + |
| 3 | +The EvalOps CLI is a powerful tool for evaluating code against Large Language Models (LLMs) using the EvalOps platform. It allows you to define, validate, and run evaluations directly from your command line. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **Initialize Projects**: Quickly set up a new EvalOps project with `evalops init`. |
| 8 | +- **Validate Configurations**: Ensure your `evalops.yaml` file is correctly formatted and your test cases are discoverable with `evalops validate`. |
| 9 | +- **Upload Test Suites**: Upload your evaluation configurations to the EvalOps platform with `evalops upload`. |
| 10 | +- **Local Evaluations (Coming Soon)**: Run evaluations locally against different providers with `evalops run`. |
| 11 | +- **Automatic Test Discovery**: Automatically discover test cases in your codebase defined with `@evalops_test` decorators or `evalops_test()` function calls. |
| 12 | + |
| 13 | +## Installation |
| 14 | + |
| 15 | +```bash |
| 16 | +npm install -g evalops-cli |
| 17 | +``` |
| 18 | + |
| 19 | +## Getting Started |
| 20 | + |
| 21 | +1. **Initialize a new project:** |
| 22 | + |
| 23 | + ```bash |
| 24 | + evalops init |
| 25 | + ``` |
| 26 | + |
| 27 | + This will create a `evalops.yaml` file in your current directory. You can use the interactive prompt to configure your project or start with a template: |
| 28 | + |
| 29 | + ```bash |
| 30 | + evalops init --template basic |
| 31 | + ``` |
| 32 | + |
| 33 | +2. **Define your evaluation in `evalops.yaml`:** |
| 34 | + |
| 35 | + The `evalops.yaml` file is the heart of your evaluation. Here you can define: |
| 36 | + - A description and version for your evaluation. |
| 37 | + - The prompts to be used. |
| 38 | + - The LLM providers to test against. |
| 39 | + - Default and specific test cases with assertions. |
| 40 | + |
| 41 | +3. **Add test cases to your code:** |
| 42 | + |
| 43 | + The CLI can automatically discover test cases in your code. You can define a test case using the `@evalops_test` decorator or the `evalops_test()` function. |
| 44 | + |
| 45 | + **Using Decorator:** |
| 46 | + ```typescript |
| 47 | + import { evalops_test } from 'evalops-cli'; |
| 48 | + |
| 49 | + @evalops_test({ |
| 50 | + description: 'Test case for my function', |
| 51 | + tags: ['critical', 'refactor'], |
| 52 | + }) |
| 53 | + function myFunction() { |
| 54 | + // Your code to be evaluated |
| 55 | + } |
| 56 | + ``` |
| 57 | + |
| 58 | + **Using Function Call:** |
| 59 | + ```typescript |
| 60 | + import { evalops_test } from 'evalops-cli'; |
| 61 | + |
| 62 | + evalops_test({ |
| 63 | + description: 'Another test case', |
| 64 | + }, () => { |
| 65 | + // Your code to be evaluated |
| 66 | + }); |
| 67 | + ``` |
| 68 | + |
| 69 | +4. **Validate your configuration:** |
| 70 | + |
| 71 | + Before uploading, it's a good practice to validate your configuration and discover your test cases: |
| 72 | + |
| 73 | + ```bash |
| 74 | + evalops validate |
| 75 | + ``` |
| 76 | + |
| 77 | +5. **Upload your test suite:** |
| 78 | + |
| 79 | + Once you're ready, upload your test suite to the EvalOps platform: |
| 80 | + |
| 81 | + ```bash |
| 82 | + evalops upload |
| 83 | + ``` |
| 84 | + |
| 85 | + You will need to provide your EvalOps API key. You can do this by setting the `EVALOPS_API_KEY` environment variable or by using the `--api-key` flag. |
| 86 | + |
| 87 | +## CLI Commands |
| 88 | + |
| 89 | +### `init` |
| 90 | + |
| 91 | +Initialize a new EvalOps project. |
| 92 | + |
| 93 | +**Options:** |
| 94 | +- `-f, --force`: Overwrite existing `evalops.yaml` file. |
| 95 | +- `--template <template>`: Use a specific template (`basic`, `advanced`). |
| 96 | + |
| 97 | +### `validate` |
| 98 | + |
| 99 | +Validate the `evalops.yaml` file and discovered test cases. |
| 100 | + |
| 101 | +**Options:** |
| 102 | +- `-v, --verbose`: Show detailed validation output. |
| 103 | +- `-f, --file <file>`: Path to `evalops.yaml` file (default: `./evalops.yaml`). |
| 104 | + |
| 105 | +### `upload` |
| 106 | + |
| 107 | +Upload test suite to the EvalOps platform. |
| 108 | + |
| 109 | +**Options:** |
| 110 | +- `-f, --file <file>`: Path to `evalops.yaml` file (default: `./evalops.yaml`). |
| 111 | +- `--api-key <key>`: EvalOps API key. |
| 112 | +- `--api-url <url>`: EvalOps API URL (default: `https://api.evalops.dev`). |
| 113 | +- `--name <name>`: Name for the test suite. |
| 114 | +- `--dry-run`: Preview what would be uploaded without actually uploading. |
| 115 | + |
| 116 | +### `run` |
| 117 | + |
| 118 | +Run evaluation locally (not yet implemented). |
| 119 | + |
| 120 | +**Options:** |
| 121 | +- `-f, --file <file>`: Path to `evalops.yaml` file (default: `./evalops.yaml`). |
| 122 | +- `--provider <provider>`: Specify provider to use. |
| 123 | +- `--output <output>`: Output file path. |
| 124 | + |
| 125 | +## Configuration |
| 126 | + |
| 127 | +The `evalops.yaml` file supports the following main sections: |
| 128 | + |
| 129 | +- `description`: A brief description of the evaluation. |
| 130 | +- `version`: The version of the evaluation configuration. |
| 131 | +- `prompts`: The prompts to be sent to the LLM. Can be a single prompt or a list of messages with roles. |
| 132 | +- `providers`: A list of LLM providers to use for the evaluation. |
| 133 | +- `defaultTest`: Default assertions and variables for all test cases. |
| 134 | +- `tests`: A list of specific test cases. |
| 135 | +- `config`: Execution configuration like iterations, parallelism, and timeout. |
| 136 | +- `outputPath`: The path to store the results of a local run. |
| 137 | +- `outputFormat`: The format of the output file (`json`, `yaml`, `csv`). |
| 138 | +- `sharing`: Configuration for sharing the evaluation results. |
0 commit comments