Skip to content

Commit 4ad8ffd

Browse files
committed
Address code review comments
Signed-off-by: Bill Murdock <[email protected]>
1 parent 6535df3 commit 4ad8ffd

File tree

3 files changed

+8
-8
lines changed

3 files changed

+8
-8
lines changed

notebooks/evaluation/evaluate-using-sample-questions-lls-vs-li.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -440,7 +440,7 @@
440440
"- Content is from the URLs configured in CONTENT_URLS at the top of this notebook\n",
441441
"- Milvus-lite inline vector IO provider\n",
442442
"- granite-embedding-125m embedding model\n",
443-
"- meta-llama/llama-3-3-70b-instruct generative model using the watsonx inference provider\n",
443+
"- gpt-3.5-turbo generative model\n",
444444
"- max_tokens for output is 4096"
445445
]
446446
},
@@ -837,7 +837,7 @@
837837
"- Content is from the URLs configured in CONTENT_URLS at the top of this notebook\n",
838838
"- Milvus vector IO provider\n",
839839
"- granite-embedding-125m embedding model\n",
840-
"- meta-llama/llama-3-3-70b-instruct generative model using the watsonx inference provider\n",
840+
"- gpt-3.5-turbo generative model\n",
841841
"- max_tokens for output is 4096\n",
842842
"- number of search results to return is 5"
843843
]

notebooks/evaluation/make-sample-questions.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@
1111
"1. It uses the abstract description of the documents to generate a bunch of questions by calling a question generator model, which is currently set to gpt-4o. You want a very powerful and smart model for that purpose because generating a large volume of questions from an abstract description is a pretty challenging task.\n",
1212
"2. It builds a vector database from the content of those documents using Docling to analyze them.\n",
1313
"3. It uses RAG and a reference answer generator model (also gpt-4o currently) to generate reference answers. You really need a very powerful model to be the reference answer generator because you're going to be treating these reference answers as ground truth for the smaller and presumably less powerful models that you were trying to actually evaluate in the next notebook.\n",
14-
"4. It through each of the reference answers and asks the reference answer generator model to assess whether the answer is really answering the question or just saying that it doesn't know. This is important because often you want a separate analysis for how well each model works on those questions that have reference answers versus how well each model works on those questions where the reference behavior is do not answer because the content doesn't say.\n",
14+
"4. It iterates through each of the reference answers and asks the reference answer generator model to assess whether the answer is really answering the question or just saying that it doesn't know. This is important because often you want a separate analysis for how well each model works on those questions that have reference answers versus how well each model works on those questions where the reference behavior is do not answer because the content doesn't say.\n",
1515
"5. It stores all of this information in a file for use in the next notebook, [evaluate-using-sample-questions.ipynb](./evaluate-using-sample-questions.ipynb).\n",
1616
"\n",
17-
"If you have time, you should also get a human to vet the reference answers and improve them, but that's expensive to do at scale so I think in practice often that's not going to happen."
17+
"If you have time, you should also get a human to vet the reference answers and improve them, but that's expensive to do at scale."
1818
]
1919
},
2020
{

notebooks/evaluation/run.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,17 @@ providers:
1717
provider_type: remote::watsonx
1818
config:
1919
url: ${env.WATSONX_BASE_URL:https://us-south.ml.cloud.ibm.com}
20-
api_key: ${env.WATSONX_API_KEY}
21-
project_id: ${env.WATSONX_PROJECT_ID}
20+
api_key: ${env.WATSONX_API_KEY:key-not-set}
21+
project_id: ${env.WATSONX_PROJECT_ID:project-not-set}
2222
timeout: 1200
2323
- provider_id: llama-openai-compat
2424
provider_type: remote::llama-openai-compat
2525
config:
26-
api_key: ${env.LLAMA_API_KEY}
26+
api_key: ${env.LLAMA_API_KEY:key-not-set}
2727
- provider_id: openai
2828
provider_type: remote::openai
2929
config:
30-
api_key: ${env.OPENAI_API_KEY}
30+
api_key: ${env.OPENAI_API_KEY:key-not-set}
3131
- provider_id: sentence-transformers
3232
provider_type: inline::sentence-transformers
3333
config: {}

0 commit comments

Comments
 (0)