diff --git a/Part 3 - Template Exploration/README.md b/Part 3 - Template Exploration/README.md index 8973513..dc300b0 100644 --- a/Part 3 - Template Exploration/README.md +++ b/Part 3 - Template Exploration/README.md @@ -213,6 +213,231 @@ Key points about `IChatClient`: 1. It supports both one-off responses and conversation history 1. It enables function calling and other advanced features +## How Function Invocation Connects the LLM to the Vector Database + +One of the most powerful features of the AI Web Chat template is how it uses **function invocation** to enable the large language model (LLM) to search the vector database when needed. This creates a Retrieval Augmented Generation (RAG) system where the AI can access your custom data. + +### Function Invocation Architecture + +Here's how the LLM decides when to search the vector database and retrieve relevant information: + +```mermaid +%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#f4f4f4', 'primaryTextColor': '#000', 'primaryBorderColor': '#333', 'lineColor': '#333', 'secondaryColor': '#e1f5fe', 'tertiaryColor': '#f3e5f5' }}}%% +sequenceDiagram + participant User + participant Chat as Chat.razor + participant LLM as IChatClient
(LLM with Function Calling) + participant Search as SearchAsync Function + participant VDB as Vector Database
(Qdrant) + + User->>Chat: "What are the GPS watch features?" + + Chat->>LLM: Send message + available tools
(SearchAsync function) + + Note over LLM: LLM analyzes the question
and decides it needs to
search for information + + LLM->>Chat: Function call request
SearchAsync("GPS watch features") + + Chat->>Search: Execute SearchAsync
searchPhrase: "GPS watch features" + + Search->>VDB: Vector similarity search
for "GPS watch features" + + VDB-->>Search: Top 5 relevant chunks
from ingested documents + + Search-->>Chat: Formatted results in XML format + + Chat->>LLM: Function result (search results) + + Note over LLM: LLM uses search results
to generate accurate answer
with citations + + LLM-->>Chat: Final response with citations + + Chat-->>User: Answer with citations in XML format +``` + +This sequence diagram shows the complete flow from user question to AI response, highlighting how the LLM autonomously decides to call the search function. + +**Note on data format:** The search results are returned in XML format like `text`, and the LLM includes citations as `quote`. + +### Enabling Function Invocation in Program.cs + +Function invocation is enabled when configuring the chat client: + +```csharp +// From Program.cs +var openai = builder.AddAzureOpenAIClient("openai"); +openai.AddChatClient("gpt-4o-mini") + .UseFunctionInvocation() // This enables the LLM to call functions + .UseOpenTelemetry(configure: c => + c.EnableSensitiveData = builder.Environment.IsDevelopment()); +``` + +The `.UseFunctionInvocation()` middleware: + +1. **Intercepts function call requests** from the LLM +2. **Executes the requested function** with the parameters the LLM provides +3. **Returns the function results** back to the LLM +4. **Continues the conversation** so the LLM can use the results + +Without this middleware, the LLM would only receive function call *requests* but wouldn't be able to execute them. + +### Registering the Search Function in Chat.razor + +The search function is registered as a tool that the LLM can use: + +```csharp +// From Chat.razor OnInitialized +protected override void OnInitialized() +{ + statefulMessageCount = 0; + messages.Add(new(ChatRole.System, SystemPrompt)); + chatOptions.Tools = [AIFunctionFactory.Create(SearchAsync)]; +} +``` + +**What happens here:** + +1. `AIFunctionFactory.Create(SearchAsync)` analyzes the `SearchAsync` method +2. It reads the `[Description]` attributes to understand what the function does +3. It creates a tool definition that gets sent to the LLM with each request +4. The LLM sees the tool definition and knows it can call this function when needed + +The tool definition sent to the LLM looks conceptually like this: + +```json +{ + "name": "SearchAsync", + "description": "Searches for information using a phrase or keyword", + "parameters": { + "searchPhrase": { + "type": "string", + "description": "The phrase to search for." + }, + "filenameFilter": { + "type": "string", + "description": "If possible, specify the filename to search that file only. If not provided or empty, the search includes all files.", + "required": false + } + } +} +``` + +### The SearchAsync Function Implementation + +Here's how the search function is implemented: + +```csharp +[Description("Searches for information using a phrase or keyword")] +private async Task> SearchAsync( + [Description("The phrase to search for.")] string searchPhrase, + [Description("If possible, specify the filename to search that file only. If not provided or empty, the search includes all files.")] string? filenameFilter = null) +{ + await InvokeAsync(StateHasChanged); // Update UI to show function is being called + var results = await Search.SearchAsync(searchPhrase, filenameFilter, maxResults: 5); + return results.Select(result => + $"{result.Text}"); +} +``` + +**Key aspects of this function:** + +1. **Description Attributes**: The `[Description]` attributes provide context to the LLM about when and how to use this function +2. **Parameters**: The LLM can extract search phrases from the user's question and pass them as parameters +3. **Vector Search**: Calls `SemanticSearch.SearchAsync()` to find the most relevant chunks from the vector database +4. **Formatted Results**: Returns results in a structured XML format that the LLM can parse and use +5. **UI Update**: Calls `StateHasChanged()` so users can see when a search is happening + +### How the LLM Decides to Call the Function + +The LLM uses several factors to decide when to call the search function: + +1. **System Prompt**: The system prompt tells the LLM to "Use the search tool to find relevant information" +2. **Function Description**: The description "Searches for information using a phrase or keyword" helps the LLM understand the function's purpose +3. **User Question**: When a user asks a question that requires specific information, the LLM recognizes it needs to search +4. **Context**: If the conversation history doesn't contain the needed information, the LLM will search + +**Example decision process:** + +- User asks: "What are the GPS watch features?" +- LLM thinks: "I need specific information about GPS watch features" +- LLM sees: SearchAsync tool is available for finding information +- LLM decides: Call `SearchAsync(searchPhrase: "GPS watch features")` +- LLM receives: Relevant text chunks about GPS watches +- LLM generates: Answer based on the search results with citations + +### The Complete RAG Flow + +Here's what happens in a complete Retrieval Augmented Generation (RAG) interaction: + +```mermaid +%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#f4f4f4', 'primaryTextColor': '#000', 'primaryBorderColor': '#333', 'lineColor': '#333', 'secondaryColor': '#e1f5fe', 'tertiaryColor': '#f3e5f5' }}}%% +flowchart TB + Start([User Asks Question]) --> AddMsg[Add user message
to conversation] + + AddMsg --> Stream[Call ChatClient.
GetStreamingResponseAsync] + + Stream --> LLMAnalyze{LLM Analyzes:
Do I need more info?} + + LLMAnalyze -->|Yes, need to search| FuncCall[LLM returns
function call request] + LLMAnalyze -->|No, can answer directly| DirectAnswer[LLM generates
direct response] + + FuncCall --> Middleware[UseFunctionInvocation
middleware intercepts] + + Middleware --> Execute[Execute SearchAsync
with LLM's parameters] + + Execute --> VectorSearch[Vector similarity search
in Qdrant] + + VectorSearch --> Results[Return top 5
matching chunks] + + Results --> BackToLLM[Send results
back to LLM] + + BackToLLM --> LLMGenerate[LLM generates answer
using search results] + + LLMGenerate --> AddCitations[LLM adds citations
in XML format] + + AddCitations --> StreamToUser[Stream response
to user] + + DirectAnswer --> StreamToUser + + StreamToUser --> End([User sees answer
with citations]) + + style Start fill:#e8f5e8 + style LLMAnalyze fill:#fff4e6 + style FuncCall fill:#f9d5e5 + style Middleware fill:#e1f5fe + style VectorSearch fill:#e1f5fe + style LLMGenerate fill:#d5e8d4 + style End fill:#e8f5e8 +``` + +**Step-by-step breakdown:** + +1. **User Question**: User submits a question through the chat interface +2. **Message Added**: The question is added to the conversation history +3. **LLM Analysis**: The LLM analyzes whether it needs additional information +4. **Function Call Decision**: + - If needed: LLM requests to call SearchAsync with specific parameters + - If not needed: LLM generates a direct response +5. **Middleware Interception**: `UseFunctionInvocation()` middleware catches the function call request +6. **Function Execution**: `SearchAsync` is executed with the LLM's chosen parameters +7. **Vector Search**: The search service queries Qdrant for semantically similar content +8. **Results Return**: Top 5 matching chunks are formatted and returned to the middleware +9. **Back to LLM**: The function results are sent back to the LLM as part of the conversation +10. **Answer Generation**: The LLM uses the search results to generate an accurate, grounded response +11. **Citations Added**: The LLM includes citations in the specified XML format +12. **Streaming Response**: The complete answer is streamed back to the user + +### Why This Approach is Powerful + +This function invocation pattern provides several benefits: + +1. **Autonomous Decision Making**: The LLM decides when it needs to search, not hardcoded logic +2. **Dynamic Parameter Selection**: The LLM extracts the right search terms from the user's question +3. **Grounded Responses**: Answers are based on actual data from your documents, not the LLM's training data +4. **Transparent Citations**: Users can see exactly which documents and pages were used +5. **Flexible RAG**: The same pattern can be extended with additional functions (e.g., calculator, weather, database queries) +6. **Provider Agnostic**: Works with any LLM that supports function calling (OpenAI, Azure OpenAI, etc.) + ## Microsoft Extensions for Vector Data with Vector Collections The template uses Microsoft Extensions for Vector Data to implement document ingestion and semantic search. Instead of using a separate database for tracking ingested documents, everything is stored directly in vector collections. @@ -533,6 +758,10 @@ The semantic search process: - How services are configured and orchestrated in .NET Aspire - How the main application is structured and configured - How `IChatClient` is set up and used for interacting with AI models +- **How function invocation enables the LLM to autonomously search the vector database** +- **How `.UseFunctionInvocation()` middleware processes function calls** +- **How `AIFunctionFactory.Create()` registers functions as tools for the LLM** +- **How the complete RAG (Retrieval Augmented Generation) flow works from question to cited answer** - How vector collections are used to store both document chunks and metadata - How Microsoft Extensions for Vector Data simplifies document ingestion with vector-native storage - How the simplified architecture eliminates the need for separate ingestion cache databases