Skip to content

Commit dc44d69

Browse files
committed
Add include category feature
- Modified approach.py to include category logic - Updated models.ts with category types - Added translations for category("All") in en, es, fr, and ja locales - Updated Ask.tsx and Chat.tsx to handle category - Updated data ingestion documentation
1 parent be26d31 commit dc44d69

File tree

9 files changed

+117
-2
lines changed

9 files changed

+117
-2
lines changed

app/backend/approaches/approach.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,9 +123,12 @@ def __init__(
123123
self.vision_token_provider = vision_token_provider
124124

125125
def build_filter(self, overrides: dict[str, Any], auth_claims: dict[str, Any]) -> Optional[str]:
126+
include_category = overrides.get("include_category")
126127
exclude_category = overrides.get("exclude_category")
127128
security_filter = self.auth_helper.build_security_filters(overrides, auth_claims)
128129
filters = []
130+
if include_category:
131+
filters.append("category eq '{}'".format(include_category.replace("'", "''")))
129132
if exclude_category:
130133
filters.append("category ne '{}'".format(exclude_category.replace("'", "''")))
131134
if security_filter:

app/frontend/src/api/models.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ export type ChatAppRequestOverrides = {
2020
retrieval_mode?: RetrievalMode;
2121
semantic_ranker?: boolean;
2222
semantic_captions?: boolean;
23+
include_category?: string;
2324
exclude_category?: string;
2425
seed?: number;
2526
top?: number;

app/frontend/src/locales/en/translation.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,10 @@
8181
"minimumSearchScore": "Minimum search score",
8282
"minimumRerankerScore": "Minimum reranker score",
8383
"retrieveCount": "Retrieve this many search results:",
84+
"includeCategory": "Include category",
85+
"includeCategoryOptions": {
86+
"all": "All"
87+
},
8488
"excludeCategory": "Exclude category",
8589
"useSemanticRanker": "Use semantic ranker for retrieval",
8690
"useSemanticCaptions": "Use semantic captions",
@@ -127,6 +131,8 @@
127131
"Sets a minimum score for search results coming back from the semantic reranker. The score always ranges between 0-4. The higher the score, the more semantically relevant the result is to the question.",
128132
"retrieveNumber":
129133
"Sets the number of search results to retrieve from Azure AI search. More results may increase the likelihood of finding the correct answer, but may lead to the model getting 'lost in the middle'.",
134+
"includeCategory":
135+
"Specifies a category to include in the search results. There are no categories used in the default data set.",
130136
"excludeCategory":
131137
"Specifies a category to exclude from the search results. There are no categories used in the default data set.",
132138
"useSemanticReranker":

app/frontend/src/locales/es/translation.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,10 @@
8181
"minimumSearchScore": "Puntaje mínimo de búsqueda",
8282
"minimumRerankerScore": "Puntaje mínimo de re-clasificación",
8383
"retrieveCount": "Obtén éste número resultados de búsqueda:",
84+
"includeCategory": "Incluir categoría",
85+
"includeCategoryOptions": {
86+
"all": "Todos"
87+
},
8488
"excludeCategory": "Excluir categoría",
8589
"useSemanticRanker": "Usar clasificador semántico para la recuperación",
8690
"useSemanticCaptions": "Usar subtítulos semánticos",
@@ -128,6 +132,8 @@
128132
"Establece una puntuación mínima para los resultados de búsqueda que vuelven del re-clasificador semántico. La puntuación siempre varía entre 0-4. Cuanto mayor es la puntuación, más relevante es semánticamente el resultado a la pregunta.",
129133
"retrieveNumber":
130134
"Establece el número de resultados de búsqueda para recuperar de Azure AI search. Más resultados pueden aumentar la probabilidad de encontrar la respuesta correcta, pero pueden provocar que el modelo se 'pierda en el medio'.",
135+
"includeCategory":
136+
"Especifica una categoría para incluir en los resultados de búsqueda. No se utilizan categorías en el conjunto de datos predeterminado.",
131137
"excludeCategory":
132138
"Especifica una categoría para excluir de los resultados de búsqueda. No se utilizan categorías en el conjunto de datos predeterminado.",
133139
"useSemanticReranker":

app/frontend/src/locales/fr/translation.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,10 @@
8181
"minimumSearchScore": "Score de recherche minimum",
8282
"minimumRerankerScore": "Score minimum du reclasseur sémantique",
8383
"retrieveCount": "Récupérer ce nombre de résultats de recherche :",
84+
"includeCategory": "Inclure la catégorie",
85+
"includeCategoryOptions": {
86+
"all": "Tous"
87+
},
8488
"excludeCategory": "Exclure la catégorie",
8589
"useSemanticRanker": "Utiliser le reclasseur sémantique",
8690
"useSemanticCaptions": "Utiliser les titres sémantiques",
@@ -128,6 +132,8 @@
128132
"Définit un score minimum pour les résultats de recherche provenant du reranker sémantique. Le score varie toujours entre 0 et 4. Plus le score est élevé, plus le résultat est sémantiquement pertinent par rapport à la question.",
129133
"retrieveNumber":
130134
"Définit le nombre de résultats de recherche à récupérer d'Azure AI Search. Plus de résultats peuvent augmenter la probabilité de trouver la bonne réponse, mais peuvent amener le modèle à se 'perdre au milieu'.",
135+
"includeCategory":
136+
"Spécifie une catégorie à inclure dans les résultats de recherche. Il n'y a aucune catégorie utilisée dans l'ensemble de données par défaut.",
131137
"excludeCategory":
132138
"Spécifie une catégorie à exclure des résultats de recherche. Il n'y a aucune catégorie utilisée dans l'ensemble de données par défaut.",
133139
"useSemanticReranker":

app/frontend/src/locales/ja/translation.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,10 @@
8181
"minimumSearchScore": "最小検索スコア",
8282
"minimumRerankerScore": "最小リランキング・スコア",
8383
"retrieveCount": "ここで指定する検索結果数を取得:",
84+
"includeCategory": "カテゴリを指定",
85+
"includeCategoryOptions": {
86+
"all": "全て"
87+
},
8488
"excludeCategory": "カテゴリを除外",
8589
"useSemanticRanker": "取得にセマンティック・ランカーを使用",
8690
"useSemanticCaptions": "セマンティック・キャプションを使用",
@@ -127,6 +131,7 @@
127131
"セマンティック・リランカーから返される検索結果の最小スコアを設定します。スコアの値は0から4の範囲で変更できます。スコアの値が大きいほど、質問に対する結果の意味的な関連性が高まります。",
128132
"retrieveNumber":
129133
"Azure AI Searchの検索結果から取得する数を設定します。結果が多ければ多いほど、正しい答えを見つける可能性は高まるかもしれませんが、モデルが「途中で迷子になる」可能性もあります。",
134+
"includeCategory": "検索結果に含めるカテゴリを指定します。デフォルトのデータセットはカテゴリを使用していません。",
130135
"excludeCategory": "検索結果から除外するカテゴリを指定します。デフォルトのデータセットはカテゴリを使用していません。",
131136
"useSemanticReranker":
132137
"Azure AI Searchのセマンティック・ランカーを有効にします(ユーザーのクエリに対するセマンティック類似性に基づいて検索結果をリランク付けするモデル)。",

app/frontend/src/pages/ask/Ask.tsx

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,18 @@
11
import { useContext, useEffect, useRef, useState } from "react";
22
import { useTranslation } from "react-i18next";
33
import { Helmet } from "react-helmet-async";
4-
import { Checkbox, Panel, DefaultButton, Spinner, TextField, ICheckboxProps, ITextFieldProps } from "@fluentui/react";
4+
import {
5+
Checkbox,
6+
Panel,
7+
DefaultButton,
8+
Spinner,
9+
TextField,
10+
ICheckboxProps,
11+
ITextFieldProps,
12+
Dropdown,
13+
IDropdownOption,
14+
IDropdownProps
15+
} from "@fluentui/react";
516
import { useId } from "@fluentui/react-hooks";
617

718
import styles from "./Ask.module.css";
@@ -38,6 +49,7 @@ export function Component(): JSX.Element {
3849
const [useSemanticCaptions, setUseSemanticCaptions] = useState<boolean>(false);
3950
const [useGPT4V, setUseGPT4V] = useState<boolean>(false);
4051
const [gpt4vInput, setGPT4VInput] = useState<GPT4VInput>(GPT4VInput.TextAndImages);
52+
const [includeCategory, setIncludeCategory] = useState<string>("");
4153
const [excludeCategory, setExcludeCategory] = useState<string>("");
4254
const [question, setQuestion] = useState<string>("");
4355
const [vectorFieldList, setVectorFieldList] = useState<VectorFieldOptions[]>([VectorFieldOptions.Embedding, VectorFieldOptions.ImageEmbedding]);
@@ -120,6 +132,7 @@ export function Component(): JSX.Element {
120132
prompt_template: promptTemplate.length === 0 ? undefined : promptTemplate,
121133
prompt_template_prefix: promptTemplatePrefix.length === 0 ? undefined : promptTemplatePrefix,
122134
prompt_template_suffix: promptTemplateSuffix.length === 0 ? undefined : promptTemplateSuffix,
135+
include_category: includeCategory.length === 0 ? undefined : includeCategory,
123136
exclude_category: excludeCategory.length === 0 ? undefined : excludeCategory,
124137
top: retrieveCount,
125138
temperature: temperature,
@@ -181,6 +194,10 @@ export function Component(): JSX.Element {
181194
setUseSemanticCaptions(!!checked);
182195
};
183196

197+
const onIncludeCategoryChanged = (_ev?: React.FormEvent<HTMLElement | HTMLInputElement>, option?: IDropdownOption) => {
198+
setIncludeCategory((option?.key as string) || "");
199+
};
200+
184201
const onExcludeCategoryChanged = (_ev?: React.FormEvent, newValue?: string) => {
185202
setExcludeCategory(newValue || "");
186203
};
@@ -228,6 +245,8 @@ export function Component(): JSX.Element {
228245
const rerankerScoreFieldId = useId("rerankerScoreField");
229246
const retrieveCountId = useId("retrieveCount");
230247
const retrieveCountFieldId = useId("retrieveCountField");
248+
const includeCategoryId = useId("includeCategory");
249+
const includeCategoryFieldId = useId("includeCategoryField");
231250
const excludeCategoryId = useId("excludeCategory");
232251
const excludeCategoryFieldId = useId("excludeCategoryField");
233252
const semanticRankerId = useId("semanticRanker");
@@ -407,6 +426,26 @@ export function Component(): JSX.Element {
407426
)}
408427
/>
409428

429+
<Dropdown
430+
id={includeCategoryFieldId}
431+
className={styles.chatSettingsSeparator}
432+
label={t("labels.includeCategory")}
433+
selectedKey={includeCategory}
434+
onChange={onIncludeCategoryChanged}
435+
aria-labelledby={includeCategoryId}
436+
options={[
437+
{ key: '', text: t("labels.includeCategoryOptions.all") }
438+
]}
439+
onRenderLabel={(props: IDropdownProps | undefined) => (
440+
<HelpCallout
441+
labelId={includeCategoryId}
442+
fieldId={includeCategoryFieldId}
443+
helpText={t("helpTexts.includeCategory")}
444+
label={props?.label}
445+
/>
446+
)}
447+
/>
448+
410449
<TextField
411450
id={excludeCategoryFieldId}
412451
className={styles.chatSettingsSeparator}

app/frontend/src/pages/chat/Chat.tsx

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,17 @@
11
import { useRef, useState, useEffect, useContext } from "react";
22
import { useTranslation } from "react-i18next";
33
import { Helmet } from "react-helmet-async";
4-
import { Checkbox, Panel, DefaultButton, TextField, ITextFieldProps, ICheckboxProps } from "@fluentui/react";
4+
import {
5+
Checkbox,
6+
Panel,
7+
DefaultButton,
8+
TextField,
9+
ITextFieldProps,
10+
ICheckboxProps,
11+
Dropdown,
12+
IDropdownOption,
13+
IDropdownProps
14+
} from "@fluentui/react";
515
import { SparkleFilled } from "@fluentui/react-icons";
616
import { useId } from "@fluentui/react-hooks";
717
import readNDJSONStream from "ndjson-readablestream";
@@ -53,6 +63,7 @@ const Chat = () => {
5363
const [useSemanticRanker, setUseSemanticRanker] = useState<boolean>(true);
5464
const [shouldStream, setShouldStream] = useState<boolean>(true);
5565
const [useSemanticCaptions, setUseSemanticCaptions] = useState<boolean>(false);
66+
const [includeCategory, setIncludeCategory] = useState<string>("");
5667
const [excludeCategory, setExcludeCategory] = useState<string>("");
5768
const [useSuggestFollowupQuestions, setUseSuggestFollowupQuestions] = useState<boolean>(false);
5869
const [vectorFieldList, setVectorFieldList] = useState<VectorFieldOptions[]>([VectorFieldOptions.Embedding]);
@@ -184,6 +195,7 @@ const Chat = () => {
184195
context: {
185196
overrides: {
186197
prompt_template: promptTemplate.length === 0 ? undefined : promptTemplate,
198+
include_category: includeCategory.length === 0 ? undefined : includeCategory,
187199
exclude_category: excludeCategory.length === 0 ? undefined : excludeCategory,
188200
top: retrieveCount,
189201
temperature: temperature,
@@ -291,6 +303,10 @@ const Chat = () => {
291303
setShouldStream(!!checked);
292304
};
293305

306+
const onIncludeCategoryChanged = (_ev?: React.FormEvent<HTMLElement | HTMLInputElement>, option?: IDropdownOption) => {
307+
setIncludeCategory((option?.key as string) || "");
308+
};
309+
294310
const onExcludeCategoryChanged = (_ev?: React.FormEvent, newValue?: string) => {
295311
setExcludeCategory(newValue || "");
296312
};
@@ -345,6 +361,8 @@ const Chat = () => {
345361
const rerankerScoreFieldId = useId("rerankerScoreField");
346362
const retrieveCountId = useId("retrieveCount");
347363
const retrieveCountFieldId = useId("retrieveCountField");
364+
const includeCategoryId = useId("includeCategory");
365+
const includeCategoryFieldId = useId("includeCategoryField");
348366
const excludeCategoryId = useId("excludeCategory");
349367
const excludeCategoryFieldId = useId("excludeCategoryField");
350368
const semanticRankerId = useId("semanticRanker");
@@ -607,6 +625,30 @@ const Chat = () => {
607625
)}
608626
/>
609627

628+
<Dropdown
629+
id={includeCategoryFieldId}
630+
className={styles.chatSettingsSeparator}
631+
label={t("labels.includeCategory")}
632+
selectedKey={includeCategory}
633+
onChange={onIncludeCategoryChanged}
634+
aria-labelledby={includeCategoryId}
635+
options={[
636+
{ key: '', text: t("labels.includeCategoryOptions.all") },
637+
// You can add a category key here for ingested data like below:
638+
// { key: 'categoryName', text: 'Meaningful Category Name' }
639+
// Alternatively, display the key to guide the user on what to type
640+
// in the "Exclude category" field (e.g., 'Meaningful Category Name(categoryName)').
641+
]}
642+
onRenderLabel={(props: IDropdownProps | undefined) => (
643+
<HelpCallout
644+
labelId={includeCategoryId}
645+
fieldId={includeCategoryFieldId}
646+
helpText={t("helpTexts.includeCategory")}
647+
label={props?.label}
648+
/>
649+
)}
650+
/>
651+
610652
<TextField
611653
id={excludeCategoryFieldId}
612654
className={styles.chatSettingsSeparator}

docs/data_ingestion.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ This guide provides more details for using the `prepdocs` script to index docume
55
- [Supported document formats](#supported-document-formats)
66
- [Overview of the manual indexing process](#overview-of-the-manual-indexing-process)
77
- [Chunking](#chunking)
8+
- [Categorizing data for enhanced search](#enhancing-search-functionality-with-data-categorization)
89
- [Indexing additional documents](#indexing-additional-documents)
910
- [Removing documents](#removing-documents)
1011
- [Overview of Integrated Vectorization](#overview-of-integrated-vectorization)
@@ -41,6 +42,12 @@ The script uses the following steps to index documents:
4142
3. Split the PDFs into chunks of text.
4243
4. Upload the chunks to Azure AI Search. If using vectors (the default), also compute the embeddings and upload those alongside the text.
4344

45+
### Enhancing search functionality with data categorization
46+
47+
To enhance search functionality, categorize data during the ingestion process with `--category` argument, for example `scripts/prepdocs.ps1 --category ExampleCategoryName`. This argument specifies the category to which the data belongs, enabling you to filter search results based on these categories.
48+
49+
After running the script with the desired category, ensure these categories are added to the "Include Category" dropdown list in the developer settings. The default option for this dropdown is "All". By including specific categories, you can refine your search results more effectively.
50+
4451
### Chunking
4552

4653
We're often asked why we need to break up the PDFs into chunks when Azure AI Search supports searching large documents.

0 commit comments

Comments
 (0)