evalops
diff --git a/‎.claude/settings.local.json‎
Lines changed: 8 additions & 1 deletion b/‎.claude/settings.local.json‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎.gitignore‎
Lines changed: 70 additions & 0 deletions b/‎.gitignore‎
Lines changed: 70 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 39 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 167 additions & 30 deletions b/‎README.md‎
Lines changed: 167 additions & 30 deletions
diff --git a/‎evalops.yaml‎
Lines changed: 24 additions & 0 deletions b/‎evalops.yaml‎
Lines changed: 24 additions & 0 deletions
@@ -5,7 +5,14 @@
       "Bash(npm install:*)",
       "Bash(mkdir:*)",
       "WebFetch(domain:tree-sitter.github.io)",
-      "Bash(npm uninstall:*)"
+      "Bash(npm uninstall:*)",
+      "Bash(npm run build:*)",
+      "Bash(rm:*)",
+      "Bash(npm run dev:*)",
+      "Bash(npm test)",
+      "Bash(npm test:*)",
+      "Bash(git init:*)",
+      "Bash(git add:*)"
     ],
     "deny": []
   }
 
@@ -0,0 +1,70 @@
+# Dependencies
+node_modules/
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+pnpm-debug.log*
+
+# Build outputs
+dist/
+build/
+*.tsbuildinfo
+
+# Testing
+coverage/
+.nyc_output
+*.lcov
+
+# Environment variables
+.env
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+
+# IDE and editor files
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# OS files
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+
+# Logs
+logs
+*.log
+
+# Runtime data
+pids
+*.pid
+*.seed
+*.pid.lock
+
+# Optional npm cache directory
+.npm
+
+# Optional REPL history
+.node_repl_history
+
+# Output of 'npm pack'
+*.tgz
+
+# Yarn Integrity file
+.yarn-integrity
+
+# EvalOps specific
+results.json
+results.yaml
+results.csv
+
+# Temporary test files
+temp/
+tmp/
@@ -0,0 +1,39 @@
+# EvalOps CLI Development Guidelines
+
+## TypeScript Standards
+
+**CRITICAL**: We must NEVER have type `any` anywhere, unless absolutely, positively necessary. 
+
+- Use proper TypeScript types for all variables, parameters, and return types
+- When interfacing with external libraries that don't have proper types, create interface definitions
+- Use type assertions (`as Type`) rather than `any` when you know the type
+- For Tree-sitter and other complex parsing libraries, create proper type definitions
+
+## Code Quality Standards
+
+- All functions must have explicit return types
+- All parameters must have explicit types
+- Use `const` assertions for literal types
+- Prefer interfaces over type aliases for object shapes
+- Use utility types (Partial, Pick, Omit) when appropriate
+
+## Testing
+
+- All core functionality must have unit tests
+- Use proper mocking for external dependencies
+- Test both success and failure cases
+- Mock file system operations in tests
+
+## Error Handling
+
+- Always provide meaningful error messages
+- Use proper error types and inheritance
+- Handle both expected and unexpected errors gracefully
+- Log warnings for non-critical failures
+
+## Dependencies
+
+- Minimize external dependencies
+- Prefer well-maintained, popular packages
+- Always check for security vulnerabilities
+- Document any complex dependency choices
@@ -4,18 +4,28 @@ The EvalOps CLI is a powerful tool for evaluating code against Large Language Mo
 
 ## Features
 
-- **Initialize Projects**: Quickly set up a new EvalOps project with `evalops init`.
-- **Validate Configurations**: Ensure your `evalops.yaml` file is correctly formatted and your test cases are discoverable with `evalops validate`.
-- **Upload Test Suites**: Upload your evaluation configurations to the EvalOps platform with `evalops upload`.
-- **Local Evaluations (Coming Soon)**: Run evaluations locally against different providers with `evalops run`.
-- **Automatic Test Discovery**: Automatically discover test cases in your codebase defined with `@evalops_test` decorators or `evalops_test()` function calls.
+- **Initialize Projects**: Quickly set up a new EvalOps project with `evalops init`
+- **Validate Configurations**: Ensure your `evalops.yaml` file is correctly formatted and your test cases are discoverable with `evalops validate`
+- **Upload Test Suites**: Upload your evaluation configurations to the EvalOps platform with `evalops upload`
+- **Local Evaluations (Coming Soon)**: Run evaluations locally against different providers with `evalops run`
+- **Automatic Test Discovery**: Automatically discover test cases in your codebase using Tree-sitter parsing
+- **TypeScript & JavaScript Support**: Full support for both TypeScript and JavaScript test files
+- **Multiple Test Patterns**: Support for decorators, function calls, and various file patterns
 
 ## Installation
 
+Install globally via npm:
+
 ```bash
 npm install -g evalops-cli
 ```
 
+Or install locally in your project:
+
+```bash
+npm install --save-dev evalops-cli
+```
+
 ## Getting Started
 
 1. **Initialize a new project:**
@@ -40,32 +50,57 @@ npm install -g evalops-cli
 
 3. **Add test cases to your code:**
 
-   The CLI can automatically discover test cases in your code. You can define a test case using the `@evalops_test` decorator or the `evalops_test()` function.
+   The CLI can automatically discover test cases in your code. You can define test cases in special `.eval.ts` or `.eval.js` files using decorators or function calls.
 
-   **Using Decorator:**
+   **Using Decorator (TypeScript):**
    ```typescript
-   import { evalops_test } from 'evalops-cli';
-
+   // mycode.eval.ts
    @evalops_test({
-     description: 'Test case for my function',
-     tags: ['critical', 'refactor'],
+     prompt: 'Analyze this function: {{code}}',
+     asserts: [
+       { type: 'contains', value: 'function', weight: 0.5 },
+       { type: 'llm-judge', value: 'Is the analysis accurate?', weight: 0.8 }
+     ],
+     tags: ['analysis', 'functions']
    })
-   function myFunction() {
-     // Your code to be evaluated
+   function testMyFunction() {
+     /**
+      * This function calculates the factorial of a number
+      */
+     function factorial(n: number): number {
+       if (n <= 1) return 1;
+       return n * factorial(n - 1);
+     }
+     
+     return factorial;
    }
    ```
 
-   **Using Function Call:**
-   ```typescript
-   import { evalops_test } from 'evalops-cli';
-
+   **Using Function Call (JavaScript):**
+   ```javascript
+   // mycode.eval.js
    evalops_test({
-     description: 'Another test case',
-   }, () => {
-     // Your code to be evaluated
+     prompt: 'Review this code for potential issues: {{code}}',
+     asserts: [
+       { type: 'contains', value: 'error handling', weight: 0.6 },
+       { type: 'llm-judge', value: 'Does the review identify key issues?', weight: 0.9 }
+     ],
+     description: 'Test async function review'
+   }, function() {
+     async function fetchData(url) {
+       const response = await fetch(url);
+       return response.json();
+     }
+     
+     return fetchData;
    });
    ```
 
+   **File Patterns:**
+   The CLI automatically discovers files matching these patterns:
+   - `**/*.eval.{js,ts}` - Dedicated evaluation files
+   - `**/*.test.{js,ts}` - Test files with evaluation decorators
+
 4. **Validate your configuration:**
 
    Before uploading, it's a good practice to validate your configuration and discover your test cases:
@@ -126,13 +161,115 @@ Run evaluation locally (not yet implemented).
 
 The `evalops.yaml` file supports the following main sections:
 
-- `description`: A brief description of the evaluation.
-- `version`: The version of the evaluation configuration.
-- `prompts`: The prompts to be sent to the LLM. Can be a single prompt or a list of messages with roles.
-- `providers`: A list of LLM providers to use for the evaluation.
-- `defaultTest`: Default assertions and variables for all test cases.
-- `tests`: A list of specific test cases.
-- `config`: Execution configuration like iterations, parallelism, and timeout.
-- `outputPath`: The path to store the results of a local run.
-- `outputFormat`: The format of the output file (`json`, `yaml`, `csv`).
-- `sharing`: Configuration for sharing the evaluation results.
+### Basic Configuration
+
+```yaml
+description: "My Code Evaluation Project"
+version: "1.0"
+
+# Prompts can be strings, objects, or arrays
+prompts:
+  - role: "system"
+    content: "You are a helpful code reviewer."
+  - role: "user" 
+    content: "Analyze this code: {{code}}"
+
+# Providers can be simple strings or detailed configurations
+providers:
+  - "openai/gpt-4"
+  - provider: "anthropic"
+    model: "claude-2"
+    temperature: 0.7
+
+# Default assertions applied to all test cases
+defaultTest:
+  assert:
+    - type: "contains"
+      value: "analysis"
+      weight: 0.5
+    - type: "llm-judge"
+      value: "Is the analysis helpful?"
+      weight: 0.8
+
+# Test cases (auto-discovered from code or defined manually)
+tests: []
+
+# Execution settings
+config:
+  iterations: 1
+  parallel: true
+  timeout: 60
+
+# Output configuration
+outputPath: "results.json"
+outputFormat: "json"
+
+# Sharing settings
+sharing:
+  public: false
+  allowForks: true
+```
+
+### File References
+
+You can reference external files using the `@` prefix:
+
+```yaml
+prompts: "@prompts/system-prompt.txt"
+
+# Or in nested structures
+prompts:
+  - role: "system"
+    content: "@prompts/system.txt"
+  - role: "user"
+    content: "@prompts/user.txt"
+```
+
+### Assertion Types
+
+The CLI supports various assertion types:
+
+- `contains` / `not-contains`: Check if output contains specific text
+- `equals` / `not-equals`: Exact match comparisons
+- `llm-judge`: Use another LLM to judge the output quality
+- `regex`: Regular expression matching
+- `json-path`: Extract and validate JSON path values
+- `similarity`: Semantic similarity scoring
+
+### Environment Variables
+
+- `EVALOPS_API_KEY`: Your EvalOps API key
+- `EVALOPS_API_URL`: Custom API URL (defaults to `https://api.evalops.dev`)
+
+## Examples
+
+Check the `examples/` directory for complete examples:
+
+- `examples/basic.eval.ts` - TypeScript decorator examples
+- `examples/functional-approach.eval.js` - JavaScript function call examples
+
+## Development
+
+To build and test the CLI locally:
+
+```bash
+# Install dependencies
+npm install
+
+# Build the project
+npm run build
+
+# Run tests
+npm test
+
+# Test CLI locally
+npm run dev -- init --template basic
+```
+
+## Contributing
+
+Contributions are welcome! Please read the contributing guidelines and submit pull requests to the main repository.
+
+## License
+
+MIT License - see LICENSE file for details.
@@ -0,0 +1,24 @@
+description: Basic EvalOps evaluation
+version: '1.0'
+prompts:
+  - role: system
+    content: You are a helpful assistant.
+  - role: user
+    content: 'Analyze the following code: {{code}}'
+providers:
+  - openai/gpt-4
+defaultTest:
+  assert:
+    - type: contains
+      value: analysis
+      weight: 0.5
+    - type: llm-judge
+      value: Is the analysis accurate?
+      weight: 0.8
+tests: []
+config:
+  iterations: 1
+  parallel: true
+  timeout: 60
+outputPath: results.json
+outputFormat: json