|
| 1 | +# OpenDeepResearcher |
| 2 | + |
| 3 | +This project is based on the [OpenDeepResearcher](https://github.com/mshumer/OpenDeepResearcher) repository and includes an AI researcher that continuously searches for information based on a user query until the system is confident that it has gathered all the necessary details. Built with [Reflex](https://reflex.dev/) for seamless user interaction. It makes use of several services to do so: |
| 4 | + |
| 5 | +### Services Used: |
| 6 | +- **SERPAPI**: To perform Google searches. |
| 7 | +- **Jina**: To fetch and extract webpage content. |
| 8 | +- **Google Gemini**: To interact with a LLM for generating search queries, evaluating page relevance, and extracting context. |
| 9 | + |
| 10 | +### Features: |
| 11 | +- **Iterative Research Loop**: The system refines its search queries iteratively until no further queries are required. |
| 12 | +- **Asynchronous Processing**: Searches, webpage fetching, evaluation, and context extraction are performed concurrently to improve speed. |
| 13 | +- **Duplicate Filtering**: Aggregates and deduplicates links within each round, ensuring that the same link isn’t processed twice. |
| 14 | +- **LLM-Powered Decision Making**: Uses Google Gemini to generate new search queries, decide on page usefulness, extract relevant context, and produce a final comprehensive report. |
| 15 | + |
| 16 | +### Requirements: |
| 17 | +API access and keys for: |
| 18 | +- Google Gemini API |
| 19 | +- SERPAPI API |
| 20 | +- Jina API |
| 21 | + |
| 22 | +### Setup: |
| 23 | + |
| 24 | +1. **Clone or Open the Notebook**: |
| 25 | + - Download the notebook file or open it directly in Google Colab. |
| 26 | + |
| 27 | +2. **Install nest_asyncio**: |
| 28 | + - Run the first cell to set up nest_asyncio. |
| 29 | + |
| 30 | +3. **Configure API Keys**: |
| 31 | + - Replace the placeholder values in the notebook for `GOOGLE_GEMINI_API_KEY`, `SERPAPI_API_KEY`, and `JINA_API_KEY` with your actual API keys. |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +### Getting Started |
| 36 | + |
| 37 | +1. **Clone the Repository** |
| 38 | + Clone the GitHub repository to your local machine: |
| 39 | + ```bash |
| 40 | + git clone https://github.com/reflex-dev/reflex-llm-examples.git |
| 41 | + cd reflex-llm-examples/open_deep_researcher |
| 42 | + ``` |
| 43 | + |
| 44 | +2. **Install Dependencies** |
| 45 | + Install the required dependencies: |
| 46 | + ```bash |
| 47 | + pip install -r requirements.txt |
| 48 | + ``` |
| 49 | + |
| 50 | +3. **Set Up API Keys** |
| 51 | + To use the Gemini 2.0 Flash model, SERPAPI, and Jina, you need API keys for each service. Follow these steps: |
| 52 | + |
| 53 | + - **Google Gemini API Key**: |
| 54 | + Go to [Google AI Studio](https://cloud.google.com/ai), get your API Key, and set it as an environment variable: |
| 55 | + ```bash |
| 56 | + export GOOGLE_API_KEY="your-api-key-here" |
| 57 | + ``` |
| 58 | + |
| 59 | + - **SERPAPI API Key**: |
| 60 | + Go to [SERPAPI](https://serpapi.com/), sign up, and obtain your API key. Set it as an environment variable: |
| 61 | + ```bash |
| 62 | + export SERPAPI_API_KEY="your-serpapi-api-key-here" |
| 63 | + ``` |
| 64 | + |
| 65 | + - **Jina API Key**: |
| 66 | + Go to [Jina AI](https://jina.ai/), create an account, and obtain your API key. Set it as an environment variable: |
| 67 | + ```bash |
| 68 | + export JINA_API_KEY="your-jina-api-key-here" |
| 69 | + ``` |
| 70 | + |
| 71 | +4. **Run the Reflex App** |
| 72 | + Start the application: |
| 73 | + ```bash |
| 74 | + reflex run |
| 75 | + ``` |
| 76 | + |
| 77 | +--- |
| 78 | + |
| 79 | +### How It Works: |
| 80 | +1. **Input & Query Generation**: |
| 81 | + - The user enters a research topic, and Google Gemini generates up to four distinct search queries. |
| 82 | + |
| 83 | +2. **Concurrent Search & Processing**: |
| 84 | + - **SERPAPI**: Each search query is sent to SERPAPI concurrently. |
| 85 | + - **Deduplication**: All retrieved links are aggregated and deduplicated within the current iteration. |
| 86 | + - **Jina & Google Gemini**: Each unique link is processed concurrently to fetch webpage content via Jina, evaluate its usefulness with Google Gemini, and extract relevant information if the page is deemed useful. |
| 87 | + |
| 88 | +3. **Iterative Refinement**: |
| 89 | + - The system passes the aggregated context to Google Gemini to determine if further search queries are needed. New queries are generated if required; otherwise, the loop terminates. |
| 90 | + |
| 91 | +4. **Final Report Generation**: |
| 92 | + - All gathered context is compiled and sent to Google Gemini to produce a final, comprehensive report addressing the original query. |
| 93 | + |
| 94 | +--- |
0 commit comments