Skip to content

Commit 4f237ca

Browse files
committed
Add open assistants integration
1 parent c0bb840 commit 4f237ca

File tree

4 files changed

+304
-0
lines changed

4 files changed

+304
-0
lines changed
Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
---
2+
title: OpenAI Assistants integration
3+
sidebar_label: OpenAI Assistants
4+
description: Learn how to integrate Apify with OpenAI Assistants to provide real-time search data and to save them into OpenAI Vector Store
5+
sidebar_position: 1
6+
slug: /integrations/openai-assistants
7+
---
8+
9+
**Learn how to integrate Apify with OpenAI Assistants to provide real-time search data and to save them into OpenAI Vector Store.**
10+
11+
---
12+
13+
[OpenAI Assistants API](https://platform.openai.com/docs/assistants/overview) allows you to build your own AI applications such as such as chatbots, virtual assistants, and more.
14+
The OpenAI Assistants can access OpenAI knowledge base ([vector store](https://platform.openai.com/docs/api-reference/vector-stores)) via file search and use function calling for dynamic interaction and data retrieval.
15+
16+
Unlike Custom GPT, OpenAI Assistants are available via API, enabling integration with Apify to automatically update assistant data and deliver real-time information, improving the quality of answers.
17+
18+
In this tutorial, we’ll start by demonstrating how to create an assistant and integrate real-time data using function calling with the [RAG-Web-Browser](https://apify.com/apify/rag-web-browser).
19+
Next, we’ll show how to save data from Apify Actors into the OpenAI Vector Store for easy retrieval through [file-search](https://platform.openai.com/docs/assistants/tools/file-search).
20+
21+
## Real-time search data for OpenAI Assistant
22+
23+
We'll use [RAG-Web-Browser](https://apify.com/apify/rag-web-browser) to fetch the latest information from the web and provide it to the OpenAI Assistant through [function calling](https://platform.openai.com/docs/assistants/tools/function-calling?context=without-streaming).
24+
To begin, we need to create an OpenAI Assistant with the appropriate instructions.
25+
After that, we can initiate a conversation with the assistant by creating a thread, adding messages, and running the assistant to receive responses.
26+
27+
The following image provides an overview of the Apify-OpenAI Assistant integration:
28+
29+
[//]: # (![Apify-OpenAI Assistant integration](../images/apify-openai-assistant-integration.png))
30+
31+
Before we start, we need to install all dependencies:
32+
33+
```bash
34+
pip install apify-client openai
35+
```
36+
37+
Import all required packages:
38+
39+
```python
40+
41+
import json
42+
import time
43+
from typing import TYPE_CHECKING
44+
45+
from apify_client import ApifyClient
46+
from openai import OpenAI, Stream
47+
from openai.types.beta.threads.run_submit_tool_outputs_params import ToolOutput
48+
49+
if TYPE_CHECKING:
50+
from openai.types.beta import AssistantStreamEvent
51+
from openai.types.beta.threads import Run
52+
```
53+
54+
Find your [Apify API token](https://console.apify.com/account/integrations) and [OpenAI API key](https://platform.openai.com/account/api-keys) and initialize OpenAI and Apify clients:
55+
56+
```python
57+
client = OpenAI(api_key="YOUR OPENAI API KEY")
58+
apify_client = ApifyClient("YOUR APIFY API TOKEN")
59+
```
60+
61+
First, let us specify assistant's instructions. Here, we ask the assistant to always provide answers based on the latest information from the internet and include relevant sources whenever possible.
62+
In a real-world scenario, you can customize the instructions based on your requirements.
63+
64+
```python
65+
INSTRUCTIONS = """ You are a smart and helpful assistant. Maintain an expert, friendly, and informative tone in your responses.
66+
Your task is to answer questions based on information from the internet.
67+
Always call call_rag_web_browser function to retrieve the latest and most relevant online results.
68+
Never provide answers based solely on your own knowledge.
69+
For each answer, always include relevant sources whenever possible.
70+
"""
71+
```
72+
73+
Next, we define a function description with two parameters, search query (`query`) and number of results we need to retrieve (`maxResults`).
74+
The RAG-Web-Browser can be called with more parameters, check the [Actor input schema](https://apify.com/apify/rag-web-browser/input-schema) for details.
75+
76+
```python
77+
rag_web_browser_function = {
78+
"type": "function",
79+
"function": {
80+
"name": "call_rag_web_browser",
81+
"description": "Query Google search, scrape the top N pages from the results, and returns their cleaned content as markdown",
82+
"parameters": {
83+
"type": "object",
84+
"properties": {
85+
"query": { "type": "string", "description": "Use regular search words or enter Google Search URLs. "},
86+
"maxResults": {"type": "integer", "description": "The number of top organic search results to return and scrape text from"}
87+
},
88+
"required": ["query"]
89+
}
90+
}
91+
}
92+
```
93+
94+
We also need to implement the `call_rag_web_browser` function, which will be used to retrieve the search data.
95+
96+
```python
97+
def call_rag_web_browser(query: str, max_results: int) -> list[dict]:
98+
"""
99+
Query Google search, scrape the top N pages from the results, and returns their cleaned content as markdown.
100+
First start the Actor and wait for it to finish. Then fetch results from the Actor run's default dataset.
101+
"""
102+
actor_call = apify_client.actor("apify/rag-web-browser").call(run_input={"query": query, "maxResults": max_results})
103+
return apify_client.dataset(actor_call["defaultDatasetId"]).list_items().items
104+
```
105+
106+
Now, we can create an assistant with the specified instructions and function description:
107+
108+
```python
109+
my_assistant = client.beta.assistants.create(
110+
instructions=INSTRUCTIONS,
111+
name="OpenAI Assistant with Web Browser",
112+
tools=[rag_web_browser_function],
113+
model="gpt-4o-mini",
114+
)
115+
```
116+
117+
Once the assistant is created, we can initiate a conversation.
118+
Start by creating a thread and adding messages to it, and then calling the run method.
119+
Since runs are asynchronous, we need to continuously poll the `Run` object until it reaches a terminal status.
120+
To simplify this, we use the `create_and_poll` convenience function, which both initiates the run and polls it until completion.
121+
122+
```python
123+
thread = client.beta.threads.create()
124+
message = client.beta.threads.messages.create(
125+
thread_id=thread.id, role="user", content="What are the latest LLM news?"
126+
)
127+
128+
run = client.beta.threads.runs.create_and_poll(thread_id=thread.id, assistant_id=my_assistant.id)
129+
```
130+
131+
Finally, we need to check the run status to determine if the assistant requires any action to retrieve the search data.
132+
If it does, we must submit the results using the `submit_tool_outputs` function.
133+
This function will trigger the RAG-Web-Browser to fetch the search data and submit it to the assistant for processing.
134+
135+
Let's implement the `submit_tool_outputs` function:
136+
137+
```python
138+
def submit_tool_outputs(run_: Run) -> Run | Stream[AssistantStreamEvent]:
139+
""" Submit tool outputs to continue the run """
140+
tool_output = []
141+
for tool in run_.required_action.submit_tool_outputs.tool_calls:
142+
if tool.function.name == "call_rag_web_browser":
143+
d = json.loads(tool.function.arguments)
144+
output = call_rag_web_browser(query=d["query"], max_results=d["maxResults"])
145+
tool_output.append(ToolOutput(tool_call_id=tool.id, output=json.dumps(output)))
146+
print("RAG-Web-Browser added as a tool output.")
147+
148+
return client.beta.threads.runs.submit_tool_outputs_and_poll(thread_id=run_.thread_id, run_id=run_.id, tool_outputs=tool_output)
149+
```
150+
151+
Now, we can check the run status and submit the tool outputs if required:
152+
153+
```python
154+
if run.status == "requires_action":
155+
run = submit_tool_outputs(run)
156+
```
157+
158+
The function `submit_tool_output` also poll the run until it reaches a terminal status.
159+
After the run is completed, we can print the assistant's response:
160+
161+
```python
162+
print("Assistant response:")
163+
for m in client.beta.threads.messages.list(thread_id=run.thread_id):
164+
print(m.content[0].text.value)
165+
```
166+
167+
For the question "What are the latest LLM news?" the assistant's response might look like this:
168+
169+
```plaintext
170+
Assistant response:
171+
The latest news on LLM is as follows:
172+
- [OpenAI](https://openai.com) has released a new version of GPT-4.
173+
- [Hugging Face](https://huggingface.co) has updated their Transformers library.
174+
- [Apify](https://apify.com) has released a new RAG-Web-Browser.
175+
```
176+
177+
## Save data into OpenAI Vector Store and use it in the assistant
178+
179+
To provide real-time or proprietary data, OpenAI Assistants can access the [OpenAI Vector Store](https://platform.openai.com/docs/assistants/tools/file-search/vector-stores) to retrieve information for their asnwers.
180+
With the [Apify OpenAI Vector Store Integration](https://apify.com/jiri.spilka/openai-vector-store-integration), data saving and updating the OpenAI Vector Store can be fully automated.
181+
For more information on automating this process, check out the blog post [How we built an AI salesman with the OpenAI Assistants API](https://blog.apify.com/enterprise-support-openai-assistant/).
182+
183+
The following image illustrates the Apify-OpenAI Vector Store integration:
184+
185+
[//]: # (![Apify-OpenAI Vector Store integration](../images/apify-openai-vector-store-integration.png))
186+
187+
In this example, we'll demonstrate how to save data into the OpenAI Vector Store and use it in the assistant.
188+
189+
Before we start, we need to install all dependencies:
190+
191+
```bash
192+
pip install apify-client openai
193+
```
194+
195+
Find your [Apify API token](https://console.apify.com/account/integrations) and [OpenAI API key](https://platform.openai.com/account/api-keys) and initialize OpenAI and Apify clients:
196+
197+
```python
198+
from apify_client import ApifyClient
199+
from openai import OpenAI
200+
201+
client = OpenAI(api_key="YOUR OPENAI API KEY")
202+
apify_client = ApifyClient("YOUR APIFY API TOKEN")
203+
```
204+
205+
Create an assistant with the instructions and `file-search` tool:
206+
207+
```python
208+
my_assistant = client.beta.assistants.create(
209+
instructions="As a customer support agent at Apify, your role is to assist customers",
210+
name="Support assistant",
211+
tools=[{"type": "file_search"}],
212+
model="gpt-4o-mini",
213+
)
214+
```
215+
216+
Next, create a vector store and attach it to the assistant:
217+
218+
```python
219+
vector_store = client.beta.vector_stores.create(name="Support assistant vector store")
220+
221+
assistant = client.beta.assistants.update(
222+
assistant_id=my_assistant.id,
223+
tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
224+
)
225+
```
226+
227+
Now, use [Website Content Crawler](https://apify.com/apify/website-content-crawler) to crawl the web and save the data into Apify's dataset:
228+
229+
```python
230+
run_input = {"startUrls": [{"url": "https://docs.apify.com/platform"}], "maxCrawlPages": 10, "crawlerType": "cheerio"}
231+
actor_call_website_crawler = apify_client.actor("apify/website-content-crawler").call(run_input=run_input)
232+
233+
dataset_id = actor_call_website_crawler["defaultDatasetId"]
234+
```
235+
236+
Finally, save the data into the OpenAI Vector Store using [OpenAI Vector Store Integration](https://apify.com/jiri.spilka/openai-vector-store-integration)
237+
238+
```python
239+
run_input_vs = {
240+
"datasetId": dataset_id,
241+
"assistantId": my_assistant.id,
242+
"datasetFields": ["text", "url"],
243+
"openaiApiKey": "YOUR-OPENAI-API-KEY",
244+
"vectorStoreId": vector_store.id,
245+
}
246+
247+
apify_client.actor("jiri.spilka/openai-vector-store-integration").call(run_input=run_input_vs)
248+
```
249+
250+
Now, the assistant can access the data stored in the OpenAI Vector Store and use it in its responses.
251+
Start by creating a thread and adding messages to it.
252+
Then, initiate a run and poll for the results.
253+
Once the run is completed, you can print the assistant's response.
254+
255+
```python
256+
thread = client.beta.threads.create()
257+
message = client.beta.threads.messages.create(
258+
thread_id=thread.id, role="user", content="How can I scrape a website using Apify?"
259+
)
260+
261+
run = client.beta.threads.runs.create_and_poll(
262+
thread_id=thread.id,
263+
assistant_id=assistant.id,
264+
tool_choice={"type": "file_search"}
265+
)
266+
267+
print("Assistant response:")
268+
for m in client.beta.threads.messages.list(thread_id=run.thread_id):
269+
print(m.content[0].text.value)
270+
```
271+
272+
For the question "How can I scrape a website using Apify?" the assistant's response might look like this:
273+
274+
```plaintext
275+
Assistant response:
276+
You can scrape a website using Apify by following these steps:
277+
1. Visit the [Apify website](https://apify.com) and create an account.
278+
2. Go to the [Apify Store](https://apify.com/store) and choose a web scraper.
279+
3. Configure the web scraper with the URL of the website you want to scrape.
280+
4. Run the web scraper and download the data.
281+
```

sources/platform/integrations/index.mdx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,13 @@ If you are working on an AI/LLM-related project, we recommend you look into the
179179
imageUrl="/img/platform/integrations/milvus.svg"
180180
smallImage
181181
/>
182+
<Card
183+
title="OpenAI Assistants"
184+
to="./integrations/openai-assistants"
185+
imageUrl="/img/platform/integrations/openai.svg"
186+
imageUrlDarkTheme="/img/platform/integrations/openai-white.png"
187+
smallImage
188+
/>
182189
</CardGrid>
183190

184191
## Other Actors
Lines changed: 8 additions & 0 deletions
Loading
Lines changed: 8 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)