|
| 1 | + |
| 2 | +## Creating vector stores and adding files |
| 3 | + |
| 4 | +You can create a vector store and add files to it in a single API call: |
| 5 | + |
| 6 | +```python |
| 7 | +vector_store = project_client.agents.create_vector_store_file_batch_and_poll( |
| 8 | + name="my_vector_store", |
| 9 | + file_ids=['file_path_1', 'file_path_2', 'file_path_3', 'file_path_4', 'file_path_5'] |
| 10 | +) |
| 11 | +``` |
| 12 | + |
| 13 | +Adding files to vector stores is an async operation. To ensure the operation is complete, we recommend that you use the 'create and poll' helpers in our official SDKs. If you're not using the SDKs, you can retrieve the `vector_store` object and monitor its `file_counts` property to see the result of the file ingestion operation. |
| 14 | + |
| 15 | +Files can also be added to a vector store after it's created by creating vector store files. |
| 16 | + |
| 17 | +```python |
| 18 | + |
| 19 | +# create a vector store with no file and wait for it to be processed |
| 20 | +vector_store = project_client.agents.create_vector_store_and_poll(data_sources=[], name="sample_vector_store") |
| 21 | +print(f"Created vector store, vector store ID: {vector_store.id}") |
| 22 | + |
| 23 | +# add the file to the vector store or you can supply file ids in the vector store creation |
| 24 | +vector_store_file_batch = project_client.agents.create_vector_store_file_batch_and_poll( |
| 25 | + vector_store_id=vector_store.id, file_ids=[file.id] |
| 26 | +) |
| 27 | +print(f"Created vector store file batch, vector store file batch ID: {vector_store_file_batch.id}") |
| 28 | + |
| 29 | +``` |
| 30 | + |
| 31 | +Alternatively, you can add several files to a vector store by creating batches of up to 500 files. |
| 32 | + |
| 33 | +```python |
| 34 | +batch = project_client.agents.create_vector_store_file_batch_and_poll( |
| 35 | + vector_store_id=vector_store.id, |
| 36 | + file_ids=['file_1', 'file_2', 'file_3', 'file_4', 'file_5'] |
| 37 | +) |
| 38 | +``` |
| 39 | + |
| 40 | +### Basic agent setup: Deleting files from vector stores |
| 41 | +Files can be removed from a vector store by either: |
| 42 | + |
| 43 | +* Deleting the vector store file object or, |
| 44 | +* By deleting the underlying file object (which removes the file it from all vector_store and code_interpreter configurations across all agents and threads in your organization) |
| 45 | + |
| 46 | +The maximum file size is 512 MB. Each file should contain no more than 5,000,000 tokens per file (computed automatically when you attach a file). |
| 47 | + |
| 48 | + |
| 49 | +## Remove vector store |
| 50 | + |
| 51 | +You can can remove a vector store from the file search tool. |
| 52 | + |
| 53 | +```python |
| 54 | +file_search_tool.remove_vector_store(vector_store.id) |
| 55 | +print(f"Removed vector store from file search, vector store ID: {vector_store.id}") |
| 56 | + |
| 57 | +project_client.agents.update_agent( |
| 58 | + assistant_id=agent.id, tools=file_search_tool.definitions, tool_resources=file_search_tool.resources |
| 59 | +) |
| 60 | +print(f"Updated agent, agent ID: {agent.id}") |
| 61 | + |
| 62 | +``` |
| 63 | + |
| 64 | +## Deleting vector stores |
| 65 | +```python |
| 66 | +project_client.agents.delete_vector_store(vector_store.id) |
| 67 | +print("Deleted vector store") |
| 68 | +``` |
| 69 | + |
| 70 | +## Ensuring vector store readiness before creating runs |
| 71 | + |
| 72 | +We highly recommend that you ensure all files in a vector_store are fully processed before you create a run. This ensures that all the data in your vector store is searchable. You can check for vector store readiness by using the polling helpers in the SDKs, or by manually polling the `vector_store` object to ensure the status is completed. |
| 73 | + |
| 74 | +As a fallback, there's a 60-second maximum wait in the run object when the thread's vector store contains files that are still being processed. This is to ensure that any files your users upload in a thread a fully searchable before the run proceeds. This fallback wait does not apply to the agent's vector store. |
| 75 | + |
| 76 | +## Managing costs with expiration policies |
| 77 | + |
| 78 | +For basic agent setup. the `file_search` tool uses the `vector_stores` object as its resource and you will be billed based on the size of the vector_store objects created. The size of the vector store object is the sum of all the parsed chunks from your files and their corresponding embeddings. |
| 79 | + |
| 80 | +In order to help you manage the costs associated with these vector_store objects, we have added support for expiration policies in the `vector_store` object. You can set these policies when creating or updating the `vector_store` object. |
| 81 | + |
| 82 | +```python |
| 83 | +vector_store = project_client.agents.create_vector_store_and_poll( |
| 84 | + name="Product Documentation", |
| 85 | + file_ids=['file_1', 'file_2', 'file_3', 'file_4', 'file_5'], |
| 86 | + expires_after={ |
| 87 | + "anchor": "last_active_at", |
| 88 | + "days": 7 |
| 89 | + } |
| 90 | +) |
| 91 | +``` |
| 92 | + |
| 93 | +### Thread vector stores have default expiration policies |
| 94 | + |
| 95 | +Vector stores created using thread helpers (like `tool_resources.file_search.vector_stores` in Threads or `message.attachments` in Messages) have a default expiration policy of seven days after they were last active (defined as the last time the vector store was part of a run). |
| 96 | + |
| 97 | +When a vector store expires, the runs on that thread fail. To fix this, you can recreate a new vector_store with the same files and reattach it to the thread. |
| 98 | + |
| 99 | +```python |
| 100 | +all_files = list(client.beta.vector_stores.files.list("vs_expired")) |
| 101 | + |
| 102 | +vector_store = client.beta.vector_stores.create(name="rag-store") |
| 103 | +client.beta.threads.update( |
| 104 | + "thread_abc123", |
| 105 | + tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}, |
| 106 | +) |
| 107 | + |
| 108 | +for file_batch in chunked(all_files, 100): |
| 109 | + client.beta.vector_stores.file_batches.create_and_poll( |
| 110 | + vector_store_id=vector_store.id, file_ids=[file.id for file in file_batch] |
| 111 | + ) |
| 112 | +``` |
0 commit comments