google-gemini · HooJohn · Jun 6, 2025 · Jun 6, 2025 · Jun 6, 2025
diff --git a/README.md b/README.md
@@ -8,12 +8,20 @@ This project demonstrates a fullstack application using a React frontend and a L
 
 - 💬 Fullstack application with a React frontend and LangGraph backend.
 - 🧠 Powered by a LangGraph agent for advanced research and conversational AI.
-- 🔍 Dynamic search query generation using Google Gemini models.
+- 💡 **Multi-LLM Support:** Flexibility to use different LLM providers (Gemini, OpenRouter, DeepSeek).
+- 🔍 Dynamic search query generation using the configured LLM.
 - 🌐 Integrated web research via Google Search API.
+- 🏠 **Local Network Search:** Optional capability to search within configured local domains.
+- 🔄 **Flexible Search Modes:** Control whether to search internet, local network, or both, and in which order.
 - 🤔 Reflective reasoning to identify knowledge gaps and refine searches.
 - 📄 Generates answers with citations from gathered sources.
+- 🎨 **Updated UI Theme:** Modern, light theme for improved readability and a professional look.
+- 🛠️ **Configurable Tracing:** LangSmith tracing can be enabled/disabled.
 - 🔄 Hot-reloading for both frontend and backend development during development.
 
+### Upcoming Features
+- Dedicated "Finance" and "HR" sections for specialized research tasks.
+
 ## Project Structure
 
 The project is divided into two main directories:
@@ -29,10 +37,7 @@ Follow these steps to get the application running locally for development and te
 
 -   Node.js and npm (or yarn/pnpm)
 -   Python 3.8+
--   **`GEMINI_API_KEY`**: The backend agent requires a Google Gemini API key.
-    1.  Navigate to the `backend/` directory.
-    2.  Create a file named `.env` by copying the `backend/.env.example` file.
-    3.  Open the `.env` file and add your Gemini API key: `GEMINI_API_KEY="YOUR_ACTUAL_API_KEY"`
+-   **API Keys & Configuration:** The backend agent requires API keys depending on the chosen LLM provider and other features. See the "Configuration" section below for details on setting up your `.env` file in the `backend/` directory.
 
 **2. Install Dependencies:**
 
@@ -42,6 +47,11 @@ Follow these steps to get the application running locally for development and te
 cd backend
 pip install .
 ```
+*Note: If you plan to use the Local Network Search feature, ensure you install its dependencies:*
+```bash
+pip install ".[local_search]"
+```
+*(Or `pip install requests beautifulsoup4` if you manage dependencies manually)*
 
 **Frontend:**
 
@@ -57,21 +67,76 @@ npm install
 ```bash
 make dev
 ```
-This will run the backend and frontend development servers.    Open your browser and navigate to the frontend development server URL (e.g., `http://localhost:5173/app`).
+This will run the backend and frontend development servers. Open your browser and navigate to the frontend development server URL (e.g., `http://localhost:5173/app`).
 
 _Alternatively, you can run the backend and frontend development servers separately. For the backend, open a terminal in the `backend/` directory and run `langgraph dev`. The backend API will be available at `http://127.0.0.1:2024`. It will also open a browser window to the LangGraph UI. For the frontend, open a terminal in the `frontend/` directory and run `npm run dev`. The frontend will be available at `http://localhost:5173`._
 
+## Configuration
+
+Create a `.env` file in the `backend/` directory by copying `backend/.env.example`. Below are the available environment variables:
+
+### Core Agent & LLM Configuration
+-   `GEMINI_API_KEY`: Your Google Gemini API key. Required if using "gemini" as the LLM provider for any task or for Google Search functionality.
+-   `LLM_PROVIDER`: Specifies the primary LLM provider for core agent tasks (query generation, reflection, answer synthesis).
+    -   Options: `"gemini"`, `"openrouter"`, `"deepseek"`.
+    -   Default: `"gemini"`.
+-   `LLM_API_KEY`: The API key for the selected `LLM_PROVIDER`.
+    -   Example: If `LLM_PROVIDER="openrouter"`, this should be your OpenRouter API key.
+-   `OPENROUTER_MODEL_NAME`: Specify the full model string if using OpenRouter (e.g., `"anthropic/claude-3-haiku"`). This can be used by the agent if specific task models are not set.
+-   `DEEPSEEK_MODEL_NAME`: Specify the model name if using DeepSeek (e.g., `"deepseek-chat"`). This can be used by the agent if specific task models are not set.
+-   `QUERY_GENERATOR_MODEL`: Model used for generating search queries. Interpreted based on `LLM_PROVIDER`.
+    -   Default for Gemini: `"gemini-1.5-flash"`
+-   `REFLECTION_MODEL`: Model used for reflection and knowledge gap analysis. Interpreted based on `LLM_PROVIDER`.
+    -   Default for Gemini: `"gemini-1.5-flash"`
+-   `ANSWER_MODEL`: Model used for synthesizing the final answer. Interpreted based on `LLM_PROVIDER`.
+    -   Default for Gemini: `"gemini-1.5-pro"`
+-   `NUMBER_OF_INITIAL_QUERIES`: Number of initial search queries to generate. Default: `3`.
+-   `MAX_RESEARCH_LOOPS`: Maximum number of research refinement loops. Default: `2`.
+
+### LangSmith Tracing
+-   `LANGSMITH_ENABLED`: Master switch to enable (`true`) or disable (`false`) LangSmith tracing for the backend. Default: `true`.
+    -   If `true`, various LangSmith environment variables below should also be set.
+    -   If `false`, tracing is globally disabled for the application process, and the UI toggle cannot override this.
+-   `LANGCHAIN_API_KEY`: Your LangSmith API key. Required if `LANGSMITH_ENABLED` is true.
+-   `LANGCHAIN_TRACING_V2`: Set to `"true"` to use the V2 tracing protocol. Usually managed by the `LANGSMITH_ENABLED` setting.
+-   `LANGCHAIN_ENDPOINT`: LangSmith API endpoint. Defaults to `"https://api.smith.langchain.com"`.
+-   `LANGCHAIN_PROJECT`: Name of the project in LangSmith.
+
+### Local Network Search
+-   `ENABLE_LOCAL_SEARCH`: Set to `true` to enable searching within local network domains. Default: `false`.
+-   `LOCAL_SEARCH_DOMAINS`: A comma-separated list of base URLs or domains for local search.
+    -   Example: `"http://intranet.mycompany.com,http://docs.internal.team"`
+-   `SEARCH_MODE`: Defines the search behavior when both internet and local search capabilities might be active.
+    -   `"internet_only"` (Default): Searches only the public internet.
+    *   `"local_only"`: Searches only configured local domains (requires `ENABLE_LOCAL_SEARCH=true` and `LOCAL_SEARCH_DOMAINS` to be set).
+    *   `"internet_then_local"`: Performs internet search first, then local search if enabled.
+    *   `"local_then_internet"`: Performs local search first if enabled, then internet search.
+
+## Frontend UI Settings
+
+The user interface provides several controls to customize the agent's behavior for each query:
+
+-   **Effort Level:** (Low, Medium, High) - Adjusts the number of initial queries and maximum research loops.
+-   **Reasoning Model:** (Flash/Fast, Pro/Advanced) - Selects a class of model for reasoning tasks (reflection, answer synthesis). The actual model used depends on the selected LLM Provider.
+-   **LLM Provider:** (Gemini, OpenRouter, DeepSeek) - Choose the primary LLM provider for the current query. Requires corresponding API keys to be configured on the backend.
+-   **LangSmith Monitoring:** (Toggle Switch) - If LangSmith is enabled globally on the backend, this allows users to toggle tracing for their specific session/query.
+-   **Search Scope:** (Internet Only, Local Only, Internet then Local, Local then Internet) - Defines where the agent should search for information. "Local" options require backend configuration for local search.
+
 ## How the Backend Agent Works (High-Level)
 
 The core of the backend is a LangGraph agent defined in `backend/src/agent/graph.py`. It follows these steps:
 
 ![Agent Flow](./agent.png)
 
-1.  **Generate Initial Queries:** Based on your input, it generates a set of initial search queries using a Gemini model.
-2.  **Web Research:** For each query, it uses the Gemini model with the Google Search API to find relevant web pages.
-3.  **Reflection & Knowledge Gap Analysis:** The agent analyzes the search results to determine if the information is sufficient or if there are knowledge gaps. It uses a Gemini model for this reflection process.
-4.  **Iterative Refinement:** If gaps are found or the information is insufficient, it generates follow-up queries and repeats the web research and reflection steps (up to a configured maximum number of loops).
-5.  **Finalize Answer:** Once the research is deemed sufficient, the agent synthesizes the gathered information into a coherent answer, including citations from the web sources, using a Gemini model.
+1.  **Configure:** Reads settings from environment variables and per-request UI selections.
+2.  **Generate Initial Queries:** Based on your input and configured model, it generates initial search queries.
+3.  **Web/Local Research:** Depending on the `SEARCH_MODE`:
+    *   Performs searches using the Google Search API (for internet results).
+    *   Performs searches using the custom `LocalSearchTool` against configured domains (for local results).
+    *   Combines results if applicable.
+4.  **Reflection & Knowledge Gap Analysis:** The agent analyzes the search results to determine if the information is sufficient or if there are knowledge gaps.
+5.  **Iterative Refinement:** If gaps are found, it generates follow-up queries and repeats the research and reflection steps.
+6.  **Finalize Answer:** Once research is sufficient, the agent synthesizes the information into a coherent answer with citations, using the configured answer model.
 
 ## Deployment
 
@@ -89,8 +154,12 @@ _Note: If you are not running the docker-compose.yml example or exposing the bac
    ```
 **2. Run the Production Server:**
 
+   Adjust the `docker-compose.yml` or your deployment environment to include all necessary environment variables as described in the "Configuration" section.
+   Example:
    ```bash
-   GEMINI_API_KEY=<your_gemini_api_key> LANGSMITH_API_KEY=<your_langsmith_api_key> docker-compose up
+   # Ensure your .env file (if used by docker-compose) or environment variables are set
+   # e.g., GEMINI_API_KEY, LLM_PROVIDER, LLM_API_KEY, LANGSMITH_API_KEY (if LangSmith enabled), etc.
+   docker-compose up
    ```
 
 Open your browser and navigate to `http://localhost:8123/app/` to see the application. The API will be available at `http://localhost:8123`.
@@ -101,7 +170,8 @@ Open your browser and navigate to `http://localhost:8123/app/` to see the applic
 - [Tailwind CSS](https://tailwindcss.com/) - For styling.
 - [Shadcn UI](https://ui.shadcn.com/) - For components.
 - [LangGraph](https://github.com/langchain-ai/langgraph) - For building the backend research agent.
-- [Google Gemini](https://ai.google.dev/models/gemini) - LLM for query generation, reflection, and answer synthesis.
+- LLMs: [Google Gemini](https://ai.google.dev/models/gemini), and adaptable for others like [OpenRouter](https://openrouter.ai/), [DeepSeek](https://www.deepseek.com/).
+- Search: Google Search API, Custom Local Network Search (Python `requests` & `BeautifulSoup`).
 
 ## License
 

diff --git a/backend/pyproject.toml b/backend/pyproject.toml
@@ -18,6 +18,8 @@ dependencies = [
     "langgraph-api",
     "fastapi",
     "google-genai",
+    "requests>=2.25.0,<3.0.0",
+    "beautifulsoup4>=4.9.0,<5.0.0",
 ]
 
 

diff --git a/backend/src/agent/configuration.py b/backend/src/agent/configuration.py
@@ -1,31 +1,59 @@
 import os
-from pydantic import BaseModel, Field
-from typing import Any, Optional
+from pydantic import BaseModel, Field, validator
+from typing import Any, Optional, List
 
 from langchain_core.runnables import RunnableConfig
 
 
 class Configuration(BaseModel):
     """The configuration for the agent."""
 
+    llm_provider: str = Field(
+        default="gemini",
+        metadata={
+            "description": "The LLM provider to use (e.g., 'gemini', 'openrouter', 'deepseek'). Environment variable: LLM_PROVIDER"
+        },
+    )
+
+    llm_api_key: Optional[str] = Field(
+        default=None,
+        metadata={
+            "description": "The API key for the selected LLM provider. Environment variable: LLM_API_KEY"
+        },
+    )
+
+    openrouter_model_name: Optional[str] = Field(
+        default=None,
+        metadata={
+            "description": "The specific OpenRouter model string (e.g., 'anthropic/claude-3-haiku'). Environment variable: OPENROUTER_MODEL_NAME"
+        },
+    )
+
+    deepseek_model_name: Optional[str] = Field(
+        default=None,
+        metadata={
+            "description": "The specific DeepSeek model (e.g., 'deepseek-chat'). Environment variable: DEEPSEEK_MODEL_NAME"
+        },
+    )
+
     query_generator_model: str = Field(
-        default="gemini-2.0-flash",
+        default="gemini-1.5-flash",
         metadata={
-            "description": "The name of the language model to use for the agent's query generation."
+            "description": "The name of the language model to use for the agent's query generation. Interpreted based on llm_provider (e.g., 'gemini-1.5-flash' for Gemini, part of model string for OpenRouter). Environment variable: QUERY_GENERATOR_MODEL"
         },
     )
 
     reflection_model: str = Field(
-        default="gemini-2.5-flash-preview-04-17",
+        default="gemini-1.5-flash",
         metadata={
-            "description": "The name of the language model to use for the agent's reflection."
+            "description": "The name of the language model to use for the agent's reflection. Interpreted based on llm_provider. Environment variable: REFLECTION_MODEL"
         },
     )
 
     answer_model: str = Field(
-        default="gemini-2.5-pro-preview-05-06",
+        default="gemini-1.5-pro",
         metadata={
-            "description": "The name of the language model to use for the agent's answer."
+            "description": "The name of the language model to use for the agent's answer. Interpreted based on llm_provider. Environment variable: ANSWER_MODEL"
         },
     )
 
@@ -39,6 +67,44 @@ class Configuration(BaseModel):
         metadata={"description": "The maximum number of research loops to perform."},
     )
 
+    langsmith_enabled: bool = Field(
+        default=True,
+        metadata={
+            "description": "Controls LangSmith tracing. Set to false to disable. If true, ensure LANGCHAIN_API_KEY and other relevant LangSmith environment variables (LANGCHAIN_TRACING_V2, LANGCHAIN_ENDPOINT, LANGCHAIN_PROJECT) are set. Environment variable: LANGSMITH_ENABLED"
+        },
+    )
+
+    enable_local_search: bool = Field(
+        default=False,
+        metadata={
+            "description": "Enable or disable local network search functionality. Environment variable: ENABLE_LOCAL_SEARCH"
+        },
+    )
+
+    local_search_domains: List[str] = Field(
+        default_factory=list, # Use default_factory for mutable types like list
+        metadata={
+            "description": "Comma-separated list of base URLs or domains for local network search (e.g., 'http://intranet.mycompany.com,http://docs.internal'). Environment variable: LOCAL_SEARCH_DOMAINS"
+        },
+    )
+
+    search_mode: str = Field(
+        default="internet_only",
+        metadata={
+            "description": "Search behavior: 'internet_only', 'local_only', 'internet_then_local', 'local_then_internet'. Environment variable: SEARCH_MODE"
+        },
+    )
+
+    @validator("local_search_domains", pre=True, always=True)
+    def parse_local_search_domains(cls, v: Any) -> List[str]:
+        if isinstance(v, str):
+            if not v: # Handle empty string case
+                return []
+            return [domain.strip() for domain in v.split(',')]
+        if v is None: # Handle None if default_factory is not triggered early enough by env var
+            return []
+        return v # Already a list or handled by Pydantic
+
     @classmethod
     def from_runnable_config(
         cls, config: Optional[RunnableConfig] = None
@@ -48,13 +114,41 @@ def from_runnable_config(
             config["configurable"] if config and "configurable" in config else {}
         )
 
-        # Get raw values from environment or config
+        # Define a helper to fetch values preferentially from environment, then config, then default
+        def get_value(field_name: str, default_value: Any = None) -> Any:
+            env_var_name = field_name.upper()
+            # For model_fields that have metadata and description, we can try to get env var name from there
+            # However, it's safer to rely on convention (field_name.upper())
+            # or explicitly map them if names differ significantly.
+            # For now, we'll stick to the convention.
+            value = os.environ.get(env_var_name, configurable.get(field_name))
+            if value is None:
+                # Fallback to default if defined in Field
+                field_info = cls.model_fields.get(field_name)
+                if field_info and field_info.default is not None:
+                    return field_info.default
+                return default_value
+            return value
+
         raw_values: dict[str, Any] = {
-            name: os.environ.get(name.upper(), configurable.get(name))
+            name: get_value(name, cls.model_fields[name].default)
             for name in cls.model_fields.keys()
         }
 
-        # Filter out None values
-        values = {k: v for k, v in raw_values.items() if v is not None}
+        # Filter out None values for fields that are not explicitly Optional
+        # and don't have a default value that is None.
+        # Pydantic handles default values automatically, so this filtering might be redundant
+        # if defaults are correctly set up in the model fields.
+        # However, ensuring that we only pass values that are actually provided (env, config, or explicit default)
+        # can prevent issues with Pydantic's validation if a field is not Optional but no value is found.
+
+        values_to_pass = {}
+        for name, field_info in cls.model_fields.items():
+            val = raw_values.get(name)
+            if val is not None:
+                values_to_pass[name] = val
+            # If val is None but the field has a default value (even if None),
+            # Pydantic will handle it. If it's Optional, None is fine.
+            # If it's required and None, Pydantic will raise an error, which is correct.
 
-        return cls(**values)
+        return cls(**values_to_pass)