You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the base Jekyll theme. You can find out more info about customizing your Jekyll theme, as well as basic Jekyll usage documentation at [jekyllrb.com](https://jekyllrb.com/)
6
+
## Overview
8
7
9
-
You can find the source code for Minima at GitHub:
10
-
[jekyll][jekyll-organization] /
11
-
[minima](https://github.com/jekyll/minima)
8
+
CAT Harness provides the infrastructure needed to:
12
9
13
-
You can find the source code for Jekyll at GitHub:
14
-
[jekyll][jekyll-organization] /
15
-
[jekyll](https://github.com/jekyll/jekyll)
10
+
- Run and track CAT tests against LLM outputs
11
+
- Store and analyze test results over time
12
+
- Monitor changes in LLM behavior as prompts/models/data evolve
Copy file name to clipboardExpand all lines: docs/getting-started.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,13 +7,14 @@ title: Getting Started
7
7
8
8
## Poetry
9
9
```sh
10
-
poetry install cat-ai
11
-
10
+
poetry add cat-ai
11
+
```
12
12
## UV
13
13
14
+
```sh
14
15
uv add cat-ai
15
16
```
16
17
17
18
# Driving out non-deterministic projects with CAT
18
19
19
-
Let's do a step by step journey through the lifecycle of a project to show how and why to use CAT. We will use an example of a project using an LLM and prompt to give recommendations of software teams for a project. The first step will be working with the prompt and LLM in [local development](local-development.md)
20
+
Let's do a step by step journey through the lifecycle of a project to show how and why to use CAT. We will use an example of a project using an LLM and prompt to give recommendations of software teams for a project. The first step will be working with the prompt and LLM in [local development](local-development.html)
Copy file name to clipboardExpand all lines: docs/local-development.md
+19-14Lines changed: 19 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,20 +8,25 @@ The first step will be just to be able to run the first version of your prompt a
8
8
Imagine we have a python project called `team_recommender` where we recommend teams of developers to be used on a given project. The basic structure looks like this:
9
9
10
10
```
11
-
team_recommender/
12
-
├── README.md
13
-
├── requirements.txt
14
-
├── src/
15
-
│ ├── __init__.py
16
-
│ ├── main.py
17
-
│ └── utils.py
18
-
└── tests/
19
-
├── fixtures/
20
-
| ├── example_output.json
21
-
| └── skills.json
22
-
├── __init__.py
23
-
├── test_allocations.py
11
+
examples/team_recommender
12
+
├── conftest.py
13
+
├── readme.md
14
+
└── tests
15
+
├── example_0_text_output
16
+
├── example_1_unit
17
+
│ └── test_allocations_unit.py
18
+
├── example_2_loop
19
+
│ └── test_allocations_loop.py
20
+
├── example_3_loop_no_hallucinating
21
+
│ └── test_allocations_hallucinating.py
22
+
├── example_4_gate_on_success_threshold
23
+
│ └── test_allocations_threshold.py
24
+
├── fixtures
25
+
│ ├── example_output.json
26
+
│ ├── output_schema.json
27
+
│ └── skills.json
24
28
└── settings.py
29
+
25
30
```
26
31
27
32
## Single Test
@@ -457,4 +462,4 @@ O.k! Great! Lets look at our second failure:
457
462
}
458
463
}
459
464
```
460
-
WOW! We didn't get any developers at all. Great! We can work with this! From here we can update our prompt to be more reslient. Once we make our updates, we will want to make sure these promblems are decreasing and not not regressing over time. Obviously, that isn't something you would try to control on your local machine, and the amount of test runs to get statisticle confidence about the rates of failure/hallucination are staying low. The best surface to gate and monitor this is going to be in your [Continous Integration](running-in-ci.md).
465
+
WOW! We didn't get any developers at all. Great! We can work with this! From here we can update our prompt to be more resilient. Once we make our updates, we will want to make sure these problems are decreasing and not not regressing over time. Obviously, that isn't something you would try to control on your local machine, and the amount of test runs to get statisticle confidence about the rates of failure/hallucination are staying low. The best surface to gate and monitor this is going to be in your [Continous Integration](running-in-ci.html).
0 commit comments