|
| 1 | +--- |
| 2 | +title: OpenAI Assistants integration |
| 3 | +sidebar_label: OpenAI Assistants |
| 4 | +description: Learn how to integrate Apify with OpenAI Assistants to provide real-time search data and to save them into OpenAI Vector Store |
| 5 | +sidebar_position: 1 |
| 6 | +slug: /integrations/openai-assistants |
| 7 | +--- |
| 8 | + |
| 9 | +**Learn how to integrate Apify with OpenAI Assistants to provide real-time search data and to save them into OpenAI Vector Store.** |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +[OpenAI Assistants API](https://platform.openai.com/docs/assistants/overview) allows you to build your own AI applications such as such as chatbots, virtual assistants, and more. |
| 14 | +The OpenAI Assistants can access OpenAI knowledge base ([vector store](https://platform.openai.com/docs/api-reference/vector-stores)) via file search and use function calling for dynamic interaction and data retrieval. |
| 15 | + |
| 16 | +Unlike Custom GPT, OpenAI Assistants are available via API, enabling integration with Apify to automatically update assistant data and deliver real-time information, improving the quality of answers. |
| 17 | + |
| 18 | +In this tutorial, we’ll start by demonstrating how to create an assistant and integrate real-time data using function calling with the [RAG-Web-Browser](https://apify.com/apify/rag-web-browser). |
| 19 | +Next, we’ll show how to save data from Apify Actors into the OpenAI Vector Store for easy retrieval through [file-search](https://platform.openai.com/docs/assistants/tools/file-search). |
| 20 | + |
| 21 | +## Real-time search data for OpenAI Assistant |
| 22 | + |
| 23 | +We'll use [RAG-Web-Browser](https://apify.com/apify/rag-web-browser) to fetch the latest information from the web and provide it to the OpenAI Assistant through [function calling](https://platform.openai.com/docs/assistants/tools/function-calling?context=without-streaming). |
| 24 | +To begin, we need to create an OpenAI Assistant with the appropriate instructions. |
| 25 | +After that, we can initiate a conversation with the assistant by creating a thread, adding messages, and running the assistant to receive responses. |
| 26 | + |
| 27 | +The following image provides an overview of the Apify-OpenAI Assistant integration: |
| 28 | + |
| 29 | +[//]: # () |
| 30 | + |
| 31 | +Before we start, we need to install all dependencies: |
| 32 | + |
| 33 | +```bash |
| 34 | +pip install apify-client openai |
| 35 | +``` |
| 36 | + |
| 37 | +Import all required packages: |
| 38 | + |
| 39 | +```python |
| 40 | + |
| 41 | +import json |
| 42 | +import time |
| 43 | +from typing import TYPE_CHECKING |
| 44 | + |
| 45 | +from apify_client import ApifyClient |
| 46 | +from openai import OpenAI, Stream |
| 47 | +from openai.types.beta.threads.run_submit_tool_outputs_params import ToolOutput |
| 48 | + |
| 49 | +if TYPE_CHECKING: |
| 50 | + from openai.types.beta import AssistantStreamEvent |
| 51 | + from openai.types.beta.threads import Run |
| 52 | +``` |
| 53 | + |
| 54 | +Find your [Apify API token](https://console.apify.com/account/integrations) and [OpenAI API key](https://platform.openai.com/account/api-keys) and initialize OpenAI and Apify clients: |
| 55 | + |
| 56 | +```python |
| 57 | +client = OpenAI(api_key="YOUR OPENAI API KEY") |
| 58 | +apify_client = ApifyClient("YOUR APIFY API TOKEN") |
| 59 | +``` |
| 60 | + |
| 61 | +First, let us specify assistant's instructions. Here, we ask the assistant to always provide answers based on the latest information from the internet and include relevant sources whenever possible. |
| 62 | +In a real-world scenario, you can customize the instructions based on your requirements. |
| 63 | + |
| 64 | +```python |
| 65 | +INSTRUCTIONS = """ You are a smart and helpful assistant. Maintain an expert, friendly, and informative tone in your responses. |
| 66 | + Your task is to answer questions based on information from the internet. |
| 67 | + Always call call_rag_web_browser function to retrieve the latest and most relevant online results. |
| 68 | + Never provide answers based solely on your own knowledge. |
| 69 | + For each answer, always include relevant sources whenever possible. |
| 70 | +""" |
| 71 | +``` |
| 72 | + |
| 73 | +Next, we define a function description with two parameters, search query (`query`) and number of results we need to retrieve (`maxResults`). |
| 74 | +The RAG-Web-Browser can be called with more parameters, check the [Actor input schema](https://apify.com/apify/rag-web-browser/input-schema) for details. |
| 75 | + |
| 76 | +```python |
| 77 | +rag_web_browser_function = { |
| 78 | + "type": "function", |
| 79 | + "function": { |
| 80 | + "name": "call_rag_web_browser", |
| 81 | + "description": "Query Google search, scrape the top N pages from the results, and returns their cleaned content as markdown", |
| 82 | + "parameters": { |
| 83 | + "type": "object", |
| 84 | + "properties": { |
| 85 | + "query": { "type": "string", "description": "Use regular search words or enter Google Search URLs. "}, |
| 86 | + "maxResults": {"type": "integer", "description": "The number of top organic search results to return and scrape text from"} |
| 87 | + }, |
| 88 | + "required": ["query"] |
| 89 | + } |
| 90 | + } |
| 91 | +} |
| 92 | +``` |
| 93 | + |
| 94 | +We also need to implement the `call_rag_web_browser` function, which will be used to retrieve the search data. |
| 95 | + |
| 96 | +```python |
| 97 | +def call_rag_web_browser(query: str, max_results: int) -> list[dict]: |
| 98 | + """ |
| 99 | + Query Google search, scrape the top N pages from the results, and returns their cleaned content as markdown. |
| 100 | + First start the Actor and wait for it to finish. Then fetch results from the Actor run's default dataset. |
| 101 | + """ |
| 102 | + actor_call = apify_client.actor("apify/rag-web-browser").call(run_input={"query": query, "maxResults": max_results}) |
| 103 | + return apify_client.dataset(actor_call["defaultDatasetId"]).list_items().items |
| 104 | +``` |
| 105 | + |
| 106 | +Now, we can create an assistant with the specified instructions and function description: |
| 107 | + |
| 108 | +```python |
| 109 | +my_assistant = client.beta.assistants.create( |
| 110 | + instructions=INSTRUCTIONS, |
| 111 | + name="OpenAI Assistant with Web Browser", |
| 112 | + tools=[rag_web_browser_function], |
| 113 | + model="gpt-4o-mini", |
| 114 | +) |
| 115 | +``` |
| 116 | + |
| 117 | +Once the assistant is created, we can initiate a conversation. |
| 118 | +Start by creating a thread and adding messages to it, and then calling the run method. |
| 119 | +Since runs are asynchronous, we need to continuously poll the `Run` object until it reaches a terminal status. |
| 120 | +To simplify this, we use the `create_and_poll` convenience function, which both initiates the run and polls it until completion. |
| 121 | + |
| 122 | +```python |
| 123 | +thread = client.beta.threads.create() |
| 124 | +message = client.beta.threads.messages.create( |
| 125 | + thread_id=thread.id, role="user", content="What are the latest LLM news?" |
| 126 | +) |
| 127 | + |
| 128 | +run = client.beta.threads.runs.create_and_poll(thread_id=thread.id, assistant_id=my_assistant.id) |
| 129 | +``` |
| 130 | + |
| 131 | +Finally, we need to check the run status to determine if the assistant requires any action to retrieve the search data. |
| 132 | +If it does, we must submit the results using the `submit_tool_outputs` function. |
| 133 | +This function will trigger the RAG-Web-Browser to fetch the search data and submit it to the assistant for processing. |
| 134 | + |
| 135 | +Let's implement the `submit_tool_outputs` function: |
| 136 | + |
| 137 | +```python |
| 138 | +def submit_tool_outputs(run_: Run) -> Run | Stream[AssistantStreamEvent]: |
| 139 | + """ Submit tool outputs to continue the run """ |
| 140 | + tool_output = [] |
| 141 | + for tool in run_.required_action.submit_tool_outputs.tool_calls: |
| 142 | + if tool.function.name == "call_rag_web_browser": |
| 143 | + d = json.loads(tool.function.arguments) |
| 144 | + output = call_rag_web_browser(query=d["query"], max_results=d["maxResults"]) |
| 145 | + tool_output.append(ToolOutput(tool_call_id=tool.id, output=json.dumps(output))) |
| 146 | + print("RAG-Web-Browser added as a tool output.") |
| 147 | + |
| 148 | + return client.beta.threads.runs.submit_tool_outputs_and_poll(thread_id=run_.thread_id, run_id=run_.id, tool_outputs=tool_output) |
| 149 | +``` |
| 150 | + |
| 151 | +Now, we can check the run status and submit the tool outputs if required: |
| 152 | + |
| 153 | +```python |
| 154 | +if run.status == "requires_action": |
| 155 | + run = submit_tool_outputs(run) |
| 156 | +``` |
| 157 | + |
| 158 | +The function `submit_tool_output` also poll the run until it reaches a terminal status. |
| 159 | +After the run is completed, we can print the assistant's response: |
| 160 | + |
| 161 | +```python |
| 162 | +print("Assistant response:") |
| 163 | +for m in client.beta.threads.messages.list(thread_id=run.thread_id): |
| 164 | + print(m.content[0].text.value) |
| 165 | +``` |
| 166 | + |
| 167 | +For the question "What are the latest LLM news?" the assistant's response might look like this: |
| 168 | + |
| 169 | +```plaintext |
| 170 | +Assistant response: |
| 171 | +The latest news on LLM is as follows: |
| 172 | +- [OpenAI](https://openai.com) has released a new version of GPT-4. |
| 173 | +- [Hugging Face](https://huggingface.co) has updated their Transformers library. |
| 174 | +- [Apify](https://apify.com) has released a new RAG-Web-Browser. |
| 175 | +``` |
| 176 | + |
| 177 | +## Save data into OpenAI Vector Store and use it in the assistant |
| 178 | + |
| 179 | +To provide real-time or proprietary data, OpenAI Assistants can access the [OpenAI Vector Store](https://platform.openai.com/docs/assistants/tools/file-search/vector-stores) to retrieve information for their asnwers. |
| 180 | +With the [Apify OpenAI Vector Store Integration](https://apify.com/jiri.spilka/openai-vector-store-integration), data saving and updating the OpenAI Vector Store can be fully automated. |
| 181 | +For more information on automating this process, check out the blog post [How we built an AI salesman with the OpenAI Assistants API](https://blog.apify.com/enterprise-support-openai-assistant/). |
| 182 | + |
| 183 | +The following image illustrates the Apify-OpenAI Vector Store integration: |
| 184 | + |
| 185 | +[//]: # () |
| 186 | + |
| 187 | +In this example, we'll demonstrate how to save data into the OpenAI Vector Store and use it in the assistant. |
| 188 | + |
| 189 | +Before we start, we need to install all dependencies: |
| 190 | + |
| 191 | +```bash |
| 192 | +pip install apify-client openai |
| 193 | +``` |
| 194 | + |
| 195 | +Find your [Apify API token](https://console.apify.com/account/integrations) and [OpenAI API key](https://platform.openai.com/account/api-keys) and initialize OpenAI and Apify clients: |
| 196 | + |
| 197 | +```python |
| 198 | +from apify_client import ApifyClient |
| 199 | +from openai import OpenAI |
| 200 | + |
| 201 | +client = OpenAI(api_key="YOUR OPENAI API KEY") |
| 202 | +apify_client = ApifyClient("YOUR APIFY API TOKEN") |
| 203 | +``` |
| 204 | + |
| 205 | +Create an assistant with the instructions and `file-search` tool: |
| 206 | + |
| 207 | +```python |
| 208 | +my_assistant = client.beta.assistants.create( |
| 209 | + instructions="As a customer support agent at Apify, your role is to assist customers", |
| 210 | + name="Support assistant", |
| 211 | + tools=[{"type": "file_search"}], |
| 212 | + model="gpt-4o-mini", |
| 213 | +) |
| 214 | +``` |
| 215 | + |
| 216 | +Next, create a vector store and attach it to the assistant: |
| 217 | + |
| 218 | +```python |
| 219 | +vector_store = client.beta.vector_stores.create(name="Support assistant vector store") |
| 220 | + |
| 221 | +assistant = client.beta.assistants.update( |
| 222 | + assistant_id=my_assistant.id, |
| 223 | + tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}, |
| 224 | +) |
| 225 | +``` |
| 226 | + |
| 227 | +Now, use [Website Content Crawler](https://apify.com/apify/website-content-crawler) to crawl the web and save the data into Apify's dataset: |
| 228 | + |
| 229 | +```python |
| 230 | +run_input = {"startUrls": [{"url": "https://docs.apify.com/platform"}], "maxCrawlPages": 10, "crawlerType": "cheerio"} |
| 231 | +actor_call_website_crawler = apify_client.actor("apify/website-content-crawler").call(run_input=run_input) |
| 232 | + |
| 233 | +dataset_id = actor_call_website_crawler["defaultDatasetId"] |
| 234 | +``` |
| 235 | + |
| 236 | +Finally, save the data into the OpenAI Vector Store using [OpenAI Vector Store Integration](https://apify.com/jiri.spilka/openai-vector-store-integration) |
| 237 | + |
| 238 | +```python |
| 239 | +run_input_vs = { |
| 240 | + "datasetId": dataset_id, |
| 241 | + "assistantId": my_assistant.id, |
| 242 | + "datasetFields": ["text", "url"], |
| 243 | + "openaiApiKey": "YOUR-OPENAI-API-KEY", |
| 244 | + "vectorStoreId": vector_store.id, |
| 245 | +} |
| 246 | + |
| 247 | +apify_client.actor("jiri.spilka/openai-vector-store-integration").call(run_input=run_input_vs) |
| 248 | +``` |
| 249 | + |
| 250 | +Now, the assistant can access the data stored in the OpenAI Vector Store and use it in its responses. |
| 251 | +Start by creating a thread and adding messages to it. |
| 252 | +Then, initiate a run and poll for the results. |
| 253 | +Once the run is completed, you can print the assistant's response. |
| 254 | + |
| 255 | +```python |
| 256 | +thread = client.beta.threads.create() |
| 257 | +message = client.beta.threads.messages.create( |
| 258 | + thread_id=thread.id, role="user", content="How can I scrape a website using Apify?" |
| 259 | +) |
| 260 | + |
| 261 | +run = client.beta.threads.runs.create_and_poll( |
| 262 | + thread_id=thread.id, |
| 263 | + assistant_id=assistant.id, |
| 264 | + tool_choice={"type": "file_search"} |
| 265 | +) |
| 266 | + |
| 267 | +print("Assistant response:") |
| 268 | +for m in client.beta.threads.messages.list(thread_id=run.thread_id): |
| 269 | + print(m.content[0].text.value) |
| 270 | +``` |
| 271 | + |
| 272 | +For the question "How can I scrape a website using Apify?" the assistant's response might look like this: |
| 273 | + |
| 274 | +```plaintext |
| 275 | +Assistant response: |
| 276 | +You can scrape a website using Apify by following these steps: |
| 277 | +1. Visit the [Apify website](https://apify.com) and create an account. |
| 278 | +2. Go to the [Apify Store](https://apify.com/store) and choose a web scraper. |
| 279 | +3. Configure the web scraper with the URL of the website you want to scrape. |
| 280 | +4. Run the web scraper and download the data. |
| 281 | +``` |
0 commit comments