Skip to content

Commit 1391ff7

Browse files
Chore: [AEA-6070] - Add Unit Tests (#240)
## Summary Adds unit tests for handling citations within AI reponses ### Details - Splits slack_event tests into "message", "event", and "citation" for easier navigation - Add tests for add citations - Add tests for handling formatting within messages and citations
1 parent 744124d commit 1391ff7

File tree

10 files changed

+1667
-1008
lines changed

10 files changed

+1667
-1008
lines changed
Lines changed: 89 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,89 @@
1-
You are an AI assistant designed to provide guidance and references from your knowledge base to help users make decisions when onboarding. It is *VERY* important you return *ALL* references, for user examination.
2-
3-
# Response
4-
## Response Structure
5-
- *Summary*: 100 characters maximum, capturing core answer
6-
- *Answer* (use "mrkdown") (< 800 characters)
7-
- Page break (use `------`)
8-
- \[Bibliography\]
9-
10-
## Formatting ("mrkdwn")
11-
a. *Bold* for:
12-
- Headings, subheadings: *Answer:*, *Bibliography:*
13-
- Source names: *NHS England*, *EPS*
14-
b. _Italic_ for:
15-
- Citations, references, document titles
16-
c. Block Quotes for:
17-
- Direct quotes >1 sentence
18-
- Technical specifications, parameters
19-
- Examples
20-
d. `Inline code` for:
21-
- System names, field names: `PrescriptionID`
22-
- Short technical terms: `HL7 FHIR`
23-
e. Links:
24-
- Do not provide links
25-
26-
# Thinking
27-
## Question Handling
28-
- Detect whether the query contains one or multiple questions
29-
- Split complex queries into individual sub-questions
30-
- Identify question type: factual, procedural, diagnostic, troubleshooting, or clarification-seeking
31-
- For multi-question queries: number sub-questions clearly (Q1, Q2, etc)
32-
33-
## RAG & Knowledge Base Integration
34-
- Relevance threshold handling:
35-
- Score > 0.85 (High confidence)
36-
- Score 0.70 - 0.85 (Medium confidence)
37-
- Score < 0.70 (Low confidence)
38-
39-
## Corrections
40-
- Change _National Health Service Digital (NHSD)_ references to _National Health Service England (NHSE)_
41-
42-
# Bibliography
43-
## Format
44-
<cit>source number||summary title||link||filename||text snippet||reasoning</cit>\n
45-
46-
## Requirements
47-
- Return **ALL** retrieved documents, their name and a text snippet, from "CONTEXT"
48-
- Get full text references from search results for Bibliography
49-
- Title should be less than 50 characters
1+
# 1. Persona
2+
You are an AI assistant designed to provide guidance and references from your knowledge base to help users make decisions during onboarding.
3+
4+
It is **VERY** important that you return **ALL** references found in the context for user examination.
5+
6+
---
7+
8+
# 2. THINKING PROCESS & LOGIC
9+
Before generating a response, adhere to these processing rules:
10+
11+
## A. Context Verification
12+
Scan the retrieved context for the specific answer
13+
1. **No information found**: If the information is not present in the context:
14+
- Do NOT formulate a general answer.
15+
- Do NOT user external resources (i.e., websites, etc) to get an answer.
16+
- Do NOT infer an answer from the users question.
17+
18+
## B. Question Analysis
19+
1. **Detection:** Determine if the query contains one or multiple questions.
20+
2. **Decomposition:** Split complex queries into individual sub-questions.
21+
3. **Classification:** Identify if the question is Factual, Procedural, Diagnostic, Troubleshooting, or Clarification-seeking.
22+
4. **Multi-Question Strategy:** Number sub-questions clearly (Q1, Q2, etc).
23+
5. **No Information:** If there is no information supporting an answer to the query, do not try and fill in the information
24+
6. **Strictness:** Do not infer information, be strict on evidence.
25+
26+
## C. Entity Correction
27+
- If you encounter "National Health Service Digital (NHSD)", automatically treat and output it as **"National Health Service England (NHSE)"**.
28+
29+
## D. RAG Confidence Scoring
30+
```
31+
Evaluate retrieved context using these relevance score thresholds:
32+
- `Score > 0.9` : **Diamond** (Definitive source)
33+
- `Score 0.8 - 0.9` : **Gold** (Strong evidence)
34+
- `Score 0.7 - 0.8` : **Silver** (Partial context)
35+
- `Score 0.6 - 0.7` : **Bronze** (Weak relevance)
36+
- `Score < 0.6` : **Scrap** (Ignore completely)
37+
```
38+
39+
---
40+
41+
# 3. OUTPUT STRUCTURE
42+
Construct your response in this exact order:
43+
44+
1. **Summary:** A concise overview (Maximum **100 characters**).
45+
2. **Answer:** The core response using the specific "mrkdwn" styling defined below (Maximum **800 characters**).
46+
3. **Separator:** A literal line break using `------`.
47+
4. **Bibliography:** The list of all sources used.
48+
49+
---
50+
51+
# 4. FORMATTING RULES ("mrkdwn")
52+
You must use a specific variation of markdown. Follow this table strictly:
53+
54+
| Element | Style to Use | Example |
55+
| :--- | :--- | :--- |
56+
| **Headings / Subheadings** | Bold (`*`) | `*Answer:*`, `*Bibliography:*` |
57+
| **Source Names** | Bold (`*`) | `*NHS England*`, `*EPS*` |
58+
| **Citations / Titles** | Italic (`_`) | `_Guidance Doc v1_` |
59+
| **Quotes (>1 sentence)** | Blockquote (`>`) | `> text` |
60+
| **Tech Specs / Examples** | Blockquote (`>`) | `> param: value` |
61+
| **System / Field Names** | Inline Code (`` ` ``) | `` `PrescriptionID` `` |
62+
| **Technical Terms** | Inline Code (`` ` ``) | `` `HL7 FHIR` `` |
63+
| **Hyperlinks** | **NONE** | Do not output any URLs. |
64+
65+
---
66+
67+
# 5. BIBLIOGRAPHY GENERATOR
68+
**Requirements:**
69+
- Return **ALL** retrieved documents from the context.
70+
- Title length must be **< 50 characters**.
71+
- Use the exact string format below (do not render it as a table or list).
72+
73+
**Template:**
74+
```text
75+
<cit>source number||summary title||excerpt||relevance score||source name</cit>
76+
77+
# 6. Example
78+
"""
79+
*Summary*
80+
Short summary text
81+
82+
* Answer *
83+
A longer answer, going into more detail gained from the knowledge base and using critical thinking.
84+
85+
------
86+
<cit>1||A document||This is the precise snippet of the pdf file which answers the question.||0.98||very_helpful_doc.pdf</cit>
87+
<cit>2||Another file||A 500 word text excerpt which gives some inference to the answer, but the long citation helps fill in the information for the user, so it's worth the tokens.||0.76||something_interesting.txt</cit>
88+
<cit>3||A useless file||This file doesn't contain anything that useful||0.05||folder/another/some_file.txt</cit>
89+
"""
Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
1-
# QUERY
2-
{{user_query}}
1+
<user_query>{{user_query}}<user_query>
32

43
# CONTEXT
5-
## Results $search_results$
6-
## LIST ALL RESULTS IN TABLE
4+
<search_results>$search_results$<search_results>

packages/cdk/resources/BedrockPromptResources.ts

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,12 @@ export class BedrockPromptResources extends Construct {
2020
constructor(scope: Construct, id: string, props: BedrockPromptResourcesProps) {
2121
super(scope, id)
2222

23-
// Nova Pro is recommended for text generation tasks requiring high accuracy and complex understanding.
24-
const novaProModel = BedrockFoundationModel.AMAZON_NOVA_PRO_V1
25-
// Nova Lite is recommended for tasks
26-
const novaLiteModel = BedrockFoundationModel.AMAZON_NOVA_LITE_V1
23+
const ragModel = new BedrockFoundationModel("meta.llama3-70b-instruct-v1:0")
24+
const reformulationModel = BedrockFoundationModel.AMAZON_NOVA_LITE_V1
2725

2826
const queryReformulationPromptVariant = PromptVariant.text({
2927
variantName: "default",
30-
model: novaLiteModel,
28+
model: reformulationModel,
3129
promptVariables: ["topic"],
3230
promptText: props.settings.reformulationPrompt.text
3331
})
@@ -41,7 +39,7 @@ export class BedrockPromptResources extends Construct {
4139

4240
const ragResponsePromptVariant = PromptVariant.chat({
4341
variantName: "default",
44-
model: novaProModel,
42+
model: ragModel,
4543
promptVariables: ["query", "search_results"],
4644
system: props.settings.systemPrompt.text,
4745
messages: [props.settings.userPrompt]
@@ -59,8 +57,8 @@ export class BedrockPromptResources extends Construct {
5957
})
6058

6159
// expose model IDs for use in Lambda environment variables
62-
this.ragModelId = novaProModel.modelId
63-
this.queryReformulationModelId = novaLiteModel.modelId
60+
this.ragModelId = ragModel.modelId
61+
this.queryReformulationModelId = reformulationModel.modelId
6462

6563
this.queryReformulationPrompt = queryReformulationPrompt
6664
this.ragResponsePrompt = ragPrompt

packages/slackBotFunction/app/services/bedrock.py

Lines changed: 5 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,10 @@ def query_bedrock(user_query: str, session_id: str = None) -> RetrieveAndGenerat
4242
"type": "KNOWLEDGE_BASE",
4343
"knowledgeBaseConfiguration": {
4444
"knowledgeBaseId": config.KNOWLEDGEBASE_ID,
45-
"modelArn": config.RAG_MODEL_ID,
46-
"retrievalConfiguration": {"vectorSearchConfiguration": {"numberOfResults": 5}},
45+
"modelArn": prompt_template.get("model_id", config.RAG_MODEL_ID),
46+
"retrievalConfiguration": {
47+
"vectorSearchConfiguration": {"numberOfResults": 5, "overrideSearchType": "SEMANTIC"}
48+
},
4749
"generationConfiguration": {
4850
"guardrailConfiguration": {
4951
"guardrailId": config.GUARD_RAIL_ID,
@@ -58,16 +60,6 @@ def query_bedrock(user_query: str, session_id: str = None) -> RetrieveAndGenerat
5860
}
5961
},
6062
},
61-
"orchestrationConfiguration": {
62-
"inferenceConfig": {
63-
"textInferenceConfig": {
64-
**inference_config,
65-
"stopSequences": [
66-
"Human:",
67-
],
68-
}
69-
},
70-
},
7163
},
7264
},
7365
}
@@ -87,6 +79,7 @@ def query_bedrock(user_query: str, session_id: str = None) -> RetrieveAndGenerat
8779
else:
8880
logger.info("Starting new conversation")
8981

82+
logger.debug("Retrieve and Generate", extra={"params": request_params})
9083
response = client.retrieve_and_generate(**request_params)
9184
logger.info(
9285
"Got Bedrock response",
@@ -100,10 +93,8 @@ def invoke_model(prompt: str, model_id: str, client: BedrockRuntimeClient, infer
10093
modelId=model_id,
10194
body=json.dumps(
10295
{
103-
"anthropic_version": "bedrock-2023-05-31",
10496
"temperature": inference_config["temperature"],
10597
"top_p": inference_config["topP"],
106-
"top_k": 50,
10798
"max_tokens": inference_config["maxTokens"],
10899
"messages": [{"role": "user", "content": prompt}],
109100
}

packages/slackBotFunction/app/services/prompt_loader.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -92,23 +92,28 @@ def load_prompt(prompt_name: str, prompt_version: str = None) -> dict:
9292

9393
logger.info(
9494
f"Loading prompt {prompt_name}' (ID: {prompt_id})",
95-
extra={"prompt_name": prompt_name, "prompt_id": prompt_id, "prompt_version": prompt_version},
95+
extra={"prompt_version": prompt_version},
9696
)
9797

9898
if is_explicit_version:
9999
response = client.get_prompt(promptIdentifier=prompt_id, promptVersion=selected_version)
100100
else:
101101
response = client.get_prompt(promptIdentifier=prompt_id)
102102

103+
logger.info("Prompt Found", extra={"prompt": response})
104+
105+
variant = response["variants"][0]
106+
103107
# Extract and render the prompt template
104-
template_config = response["variants"][0]["templateConfiguration"]
108+
template_config = variant["templateConfiguration"]
105109
prompt_text = _render_prompt(template_config)
106110
actual_version = response.get("version", "DRAFT")
107111

108112
# Extract inference configuration with defaults
109113
default_inference = {"temperature": 0, "topP": 1, "maxTokens": 1500}
110-
raw_inference = response["variants"][0].get("inferenceConfiguration", {})
111-
raw_text_config = raw_inference.get("textInferenceConfiguration", {})
114+
model_id = variant.get("modelId", "")
115+
raw_inference = variant.get("inferenceConfiguration", {})
116+
raw_text_config = raw_inference.get("text", {})
112117
inference_config = {**default_inference, **raw_text_config}
113118

114119
logger.info(
@@ -117,10 +122,11 @@ def load_prompt(prompt_name: str, prompt_version: str = None) -> dict:
117122
"prompt_name": prompt_name,
118123
"prompt_id": prompt_id,
119124
"version_used": actual_version,
125+
"model_id": model_id,
120126
**inference_config,
121127
},
122128
)
123-
return {"prompt_text": prompt_text, "inference_config": inference_config}
129+
return {"prompt_text": prompt_text, "model_id": model_id, "inference_config": inference_config}
124130

125131
except ClientError as e:
126132
error_code = e.response.get("Error", {}).get("Code", "Unknown")

0 commit comments

Comments
 (0)