Skip to content

Commit 77c2e1c

Browse files
authored
Merge pull request #1636 from stanfordnlp/docs_oct2024
Docs
2 parents 7c29c86 + 251ca88 commit 77c2e1c

File tree

2 files changed

+24
-195
lines changed

2 files changed

+24
-195
lines changed

docs/docs/quick-start/getting-started-01.md

Lines changed: 6 additions & 117 deletions
Original file line numberDiff line numberDiff line change
@@ -45,52 +45,10 @@ Let's see it directly. You can inspect the `n` last prompts sent by DSPy easily.
4545
dspy.inspect_history(n=1)
4646
```
4747

48-
**Output:**
49-
```
50-
System message:
51-
52-
Your input fields are:
53-
1. `question` (str)
54-
55-
Your output fields are:
56-
1. `response` (str)
57-
58-
All interactions will be structured in the following way, with the appropriate values filled in.
59-
60-
[[ ## question ## ]]
61-
{question}
62-
63-
[[ ## response ## ]]
64-
{response}
65-
66-
[[ ## completed ## ]]
67-
68-
In adhering to this structure, your objective is:
69-
Given the fields `question`, produce the fields `response`.
70-
71-
72-
User message:
73-
74-
[[ ## question ## ]]
75-
what are high memory and low memory on linux?
76-
77-
Respond with the corresponding output fields, starting with the field `response`, and then ending with the marker for `completed`.
48+
**Output:**
49+
See this [gist](https://gist.github.com/okhat/aff3c9788ccddf726fdfeb78e40e5d22)
7850

7951

80-
Response:
81-
82-
[[ ## response ## ]]
83-
In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of 32-bit architectures.
84-
85-
- **Low Memory**: This typically refers to the first 896 MB of memory in a 32-bit system. It is directly accessible by the kernel and is used for kernel data structures and user processes. The low memory region is where most of the system's memory management occurs, and it is where the kernel can allocate memory for processes without needing special handling.
86-
87-
- **High Memory**: This refers to memory above the 896 MB threshold in a 32-bit system. The kernel cannot directly access this memory without special mechanisms because of the limitations of the 32-bit address space. High memory is used for user processes that require more memory than what is available in the low memory region. The kernel can manage high memory through techniques like "highmem" support, which allows it to map high memory pages into the kernel's address space when needed.
88-
89-
In summary, low memory is directly accessible by the kernel, while high memory requires additional handling for the kernel to access it, especially in 32-bit systems. In 64-bit systems, this distinction is less relevant as the addressable memory space is significantly larger.
90-
91-
[[ ## completed ## ]]
92-
```
93-
9452
DSPy has various built-in modules, e.g. `dspy.ChainOfThought`, `dspy.ProgramOfThought`, and `dspy.ReAct`. These are interchangeable with basic `dspy.Predict`: they take your signature, which is specific to your task, and they apply general-purpose prompting techniques and inference-time strategies to it.
9553

9654
For example, `dspy.ChainOfThought` is an easy way to elicit `reasoning` out of your LM before it commits to the outputs requested in your signature.
@@ -125,14 +83,14 @@ That said, you're likely here because you want to build a high-quality system an
12583

12684
To measure the quality of your DSPy system, you need (1) a bunch of input values, like `question`s for example, and (2) a `metric` that can score the quality of an output from your system. Metrics vary widely. Some metrics need ground-truth labels of ideal outputs, e.g. for classification or question answering. Other metrics are self-supervised, e.g. checking faithfulness or lack of hallucination, perhaps using a DSPy program as a judge of these qualities.
12785

128-
Let's load a dataset of questions and their (pretty long) gold answers. Since we started this notebook with the goal of building **a system for answering Tech questions**, we obtained a bunch of StackExchange-based questions and their correct answers from the RAG-QA Arena dataset. (Learn more about the [development cycle](/docs/building-blocks/solving_your_task) if you don't have data for your task.)
86+
Let's load a dataset of questions and their (pretty long) gold answers. Since we started this notebook with the goal of building **a system for answering Tech questions**, we obtained a bunch of StackExchange-based questions and their correct answers from the [RAG-QA Arena](https://arxiv.org/abs/2407.13998) dataset. (Learn more about the [development cycle](/docs/building-blocks/solving_your_task) if you don't have data for your task.)
12987

13088

13189
```python
13290
import ujson
13391

13492
# Download 500 question--answer pairs from the RAG-QA Arena "Tech" dataset.
135-
# !wget https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_500.json
93+
!wget https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_500.json
13694

13795
with open('ragqa_arena_tech_500.json') as f:
13896
data = ujson.load(f)
@@ -233,75 +191,8 @@ The final DSPy module call above actually happens inside `metric`. You might be
233191
dspy.inspect_history(n=1)
234192
```
235193

236-
**Output:**
237-
```
238-
System message:
239-
240-
Your input fields are:
241-
1. `question` (str)
242-
2. `ground_truth` (str)
243-
3. `system_response` (str)
244-
245-
Your output fields are:
246-
1. `reasoning` (str)
247-
2. `recall` (float): fraction (out of 1.0) of ground truth covered by the system response
248-
3. `precision` (float): fraction (out of 1.0) of system response covered by the ground truth
249-
250-
All interactions will be structured in the following way, with the appropriate values filled in.
251-
252-
[[ ## question ## ]]
253-
{question}
254-
255-
[[ ## ground_truth ## ]]
256-
{ground_truth}
257-
258-
[[ ## system_response ## ]]
259-
{system_response}
260-
261-
[[ ## reasoning ## ]]
262-
{reasoning}
263-
264-
[[ ## recall ## ]]
265-
{recall}
266-
267-
[[ ## precision ## ]]
268-
{precision}
269-
270-
[[ ## completed ## ]]
271-
272-
In adhering to this structure, your objective is:
273-
Compare a system's response to the ground truth to compute its recall and precision.
274-
If asked to reason, enumerate key ideas in each response, and whether they are present in the other response.
275-
276-
277-
User message:
278-
279-
[[ ## question ## ]]
280-
what are high memory and low memory on linux?
281-
282-
[[ ## ground_truth ## ]]
283-
"High Memory" refers to the application or user space, the memory that user programs can use and which isn't permanently mapped in the kernel's space, while "Low Memory" is the kernel's space, which the kernel can address directly and is permanently mapped.
284-
The user cannot access the Low Memory as it is set aside for the required kernel programs.
285-
286-
[[ ## system_response ## ]]
287-
In Linux, "low memory" refers to the memory that is directly accessible by the kernel and user processes, typically the first 4GB on a 32-bit system. "High memory" refers to memory above this limit, which is not directly accessible by the kernel in a 32-bit environment. This distinction is crucial for memory management, particularly in systems with large amounts of RAM, as it influences how memory is allocated and accessed.
288-
289-
Respond with the corresponding output fields, starting with the field `reasoning`, then `recall`, then `precision`, and then ending with the marker for `completed`.
290-
291-
292-
Response:
293-
294-
[[ ## reasoning ## ]]
295-
The system response correctly identifies "low memory" as the memory directly accessible by the kernel and user processes, and it mentions the typical limit of 4GB on a 32-bit system. It also accurately describes "high memory" as the memory above this limit that is not directly accessible by the kernel. However, it does not explicitly mention that "low memory" is set aside for kernel programs, which is a key point in the ground truth. The system response provides a good overview of the concepts but lacks the detail about user access to low memory and the permanent mapping aspect mentioned in the ground truth.
296-
297-
[[ ## recall ## ]]
298-
0.75
299-
300-
[[ ## precision ## ]]
301-
0.85
302-
303-
[[ ## completed ## ]]
304-
```
194+
**Output:**
195+
See this [gist](https://gist.github.com/okhat/57bf86472d1e14812c0ae33fba5353f8)
305196

306197
For evaluation, you could use the metric above in a simple loop and just average the score. But for nice parallelism and utilities, we can rely on `dspy.Evaluate`.
307198

@@ -317,8 +208,6 @@ evaluate(cot)
317208
**Output:**
318209
```
319210
Average Metric: 59.565342393613165 / 150 (39.7): 100%|██████████| 150/150 [00:00<00:00, 432.92it/s]
320-
[TABLE HERE]
321-
39.71
322211
```
323212

324213
The table you'll get in the output would look like:

0 commit comments

Comments
 (0)