Skip to content

Commit 735750d

Browse files
remove evals (#62)
* remove evals * update readme
1 parent c013933 commit 735750d

File tree

8 files changed

+0
-1117
lines changed

8 files changed

+0
-1117
lines changed

README.md

Lines changed: 0 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -449,43 +449,6 @@ config = StagehandConfig(
449449
)
450450
```
451451

452-
## Evaluations
453-
454-
The Stagehand Python SDK includes a set of evaluations to test its core functionality. These evaluations are organized by the primary methods they test: `act`, `extract`, and `observe`.
455-
456-
### Running Evaluations
457-
458-
You can run evaluations using the `run_all_evals.py` script in the `evals/` directory:
459-
460-
```bash
461-
# Run only observe evaluations (default behavior)
462-
python -m evals.run_all_evals
463-
464-
# Run all evaluations (act, extract, and observe)
465-
python -m evals.run_all_evals --all
466-
467-
# Run a specific evaluation
468-
python -m evals.run_all_evals --all --eval observe_taxes
469-
python -m evals.run_all_evals --all --eval google_jobs
470-
471-
# Specify a different model
472-
python -m evals.run_all_evals --model gpt-4o-mini
473-
```
474-
475-
### Evaluation Types
476-
477-
The evaluations test the following capabilities:
478-
479-
- **act**: Tests for browser actions (clicking, typing)
480-
- `google_jobs`: Google jobs search and extraction
481-
482-
- **extract**: Tests for data extraction capabilities
483-
- `extract_press_releases`: Extracting press releases from a dummy site
484-
485-
- **observe**: Tests for element observation and identification
486-
- `observe_taxes`: Tax form elements observation
487-
488-
Results are printed to the console with a summary showing success/failure for each evaluation.
489452

490453
## License
491454

evals/act/google_jobs.py

Lines changed: 0 additions & 191 deletions
This file was deleted.

evals/env_loader.py

Lines changed: 0 additions & 41 deletions
This file was deleted.

0 commit comments

Comments
 (0)