Skip to content

Commit 3b0bb82

Browse files
committed
add open_deep_researcher
1 parent 6e3797e commit 3b0bb82

File tree

7 files changed

+470
-0
lines changed

7 files changed

+470
-0
lines changed

open_deep_researcher/.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
*.db
2+
assets/external/
3+
*.py[cod]
4+
.web
5+
__pycache__/

open_deep_researcher/LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2023 Pynecone, Inc.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

open_deep_researcher/README.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# OpenDeepResearcher
2+
3+
This project is based on the [OpenDeepResearcher](https://github.com/mshumer/OpenDeepResearcher) repository and includes an AI researcher that continuously searches for information based on a user query until the system is confident that it has gathered all the necessary details. Built with [Reflex](https://reflex.dev/) for seamless user interaction. It makes use of several services to do so:
4+
5+
### Services Used:
6+
- **SERPAPI**: To perform Google searches.
7+
- **Jina**: To fetch and extract webpage content.
8+
- **Google Gemini**: To interact with a LLM for generating search queries, evaluating page relevance, and extracting context.
9+
10+
### Features:
11+
- **Iterative Research Loop**: The system refines its search queries iteratively until no further queries are required.
12+
- **Asynchronous Processing**: Searches, webpage fetching, evaluation, and context extraction are performed concurrently to improve speed.
13+
- **Duplicate Filtering**: Aggregates and deduplicates links within each round, ensuring that the same link isn’t processed twice.
14+
- **LLM-Powered Decision Making**: Uses Google Gemini to generate new search queries, decide on page usefulness, extract relevant context, and produce a final comprehensive report.
15+
16+
### Requirements:
17+
API access and keys for:
18+
- Google Gemini API
19+
- SERPAPI API
20+
- Jina API
21+
22+
### Setup:
23+
24+
1. **Clone or Open the Notebook**:
25+
- Download the notebook file or open it directly in Google Colab.
26+
27+
2. **Install nest_asyncio**:
28+
- Run the first cell to set up nest_asyncio.
29+
30+
3. **Configure API Keys**:
31+
- Replace the placeholder values in the notebook for `GOOGLE_GEMINI_API_KEY`, `SERPAPI_API_KEY`, and `JINA_API_KEY` with your actual API keys.
32+
33+
---
34+
35+
### Getting Started
36+
37+
1. **Clone the Repository**
38+
Clone the GitHub repository to your local machine:
39+
```bash
40+
git clone https://github.com/reflex-dev/reflex-llm-examples.git
41+
cd reflex-llm-examples/open_deep_researcher
42+
```
43+
44+
2. **Install Dependencies**
45+
Install the required dependencies:
46+
```bash
47+
pip install -r requirements.txt
48+
```
49+
50+
3. **Set Up API Keys**
51+
To use the Gemini 2.0 Flash model, SERPAPI, and Jina, you need API keys for each service. Follow these steps:
52+
53+
- **Google Gemini API Key**:
54+
Go to [Google AI Studio](https://cloud.google.com/ai), get your API Key, and set it as an environment variable:
55+
```bash
56+
export GOOGLE_API_KEY="your-api-key-here"
57+
```
58+
59+
- **SERPAPI API Key**:
60+
Go to [SERPAPI](https://serpapi.com/), sign up, and obtain your API key. Set it as an environment variable:
61+
```bash
62+
export SERPAPI_API_KEY="your-serpapi-api-key-here"
63+
```
64+
65+
- **Jina API Key**:
66+
Go to [Jina AI](https://jina.ai/), create an account, and obtain your API key. Set it as an environment variable:
67+
```bash
68+
export JINA_API_KEY="your-jina-api-key-here"
69+
```
70+
71+
4. **Run the Reflex App**
72+
Start the application:
73+
```bash
74+
reflex run
75+
```
76+
77+
---
78+
79+
### How It Works:
80+
1. **Input & Query Generation**:
81+
- The user enters a research topic, and Google Gemini generates up to four distinct search queries.
82+
83+
2. **Concurrent Search & Processing**:
84+
- **SERPAPI**: Each search query is sent to SERPAPI concurrently.
85+
- **Deduplication**: All retrieved links are aggregated and deduplicated within the current iteration.
86+
- **Jina & Google Gemini**: Each unique link is processed concurrently to fetch webpage content via Jina, evaluate its usefulness with Google Gemini, and extract relevant information if the page is deemed useful.
87+
88+
3. **Iterative Refinement**:
89+
- The system passes the aggregated context to Google Gemini to determine if further search queries are needed. New queries are generated if required; otherwise, the loop terminates.
90+
91+
4. **Final Report Generation**:
92+
- All gathered context is compiled and sent to Google Gemini to produce a final, comprehensive report addressing the original query.
93+
94+
---
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
reflex
2+
google-generativeai

open_deep_researcher/researcher/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)