Replies: 6 comments
-
hi @luismanez the files in storage, currently, are not considered "temp files". The plan (still under development) is to allow users to open the relevant files when receiving an "answer". The "citations" coming back with an answer will allow to reach and download/open the files stored. This is still TBD, e.g. it should be configurable, and we have to consider the impact of duplicating files. Depending on how one uses the service, the files could be considered temporary, an unwanted duplication, or useful. About the "CleanUp" step, that has a different purpose, and it's mostly internal, I would suggest ignoring it and not relying on that. Its goal is keeping the storage consistent when a deletion is requested (in your case you're not asking for a deletion, but uploading information). Considering we'd like to leave this open and configurable, please feel free to say how you would like to configure this and how you're using citations (if at all), so we can design accordingly :-) Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi @dluc, many thanks for the detailed explanation. You made a great point here. Let me explain our scenario (and I know this is not a "one size fits all", but I think is a pretty common one nowadays. We are indexing documents coming from different sources (mainly SharePoint document libraries). Our solution runs on Azure, so we're using Azure Cognitive Search and Blob storage (our "Indexer" is an Azure Function). We are in your "unwanted duplication" point. For citations, we want to point to the "original" file in SharePoint, and to do so, we are indexing a custom Tag with the Full SP URL. We don't want to duplicate those files. It's not just about increasing cost, but also about Security and compliance. Docs remain in M365, though we index "chunks" in Azure Search, but Search service is better seen to the eyes of Security teams than Blob storages :) In this scenario, I'd like to have some flag in the library to say "hey, delete all the physical files once this document is indexed". Also, when deleting a document, files are deleted, but there´s still the pipeline_status.json file. I'd like to have another flag to delete that one. Last comment not related with files. We also miss the chance to tune the final prompt is sent to OpenAI. For instance, currently the library assumes that you only want the answer based on the Facts, but this won't always the case), or what about sending the chat history... Let me know any question. Happy to share more details and help to improve the library. Thanks! |
Beta Was this translation helpful? Give feedback.
-
thanks for the notes!
That's strange, is it happening with the latest code? Deleting that file is exactly the purpose of CleanUpAfterCompletionAsync, when deleting a document, although the implementation uses a best effort approach, without retry logic.
agreed, customizing prompts and other params is in our roadmap :-) (and using the Search API is the current workaround, glad to see you're on that path). Things will improve over time. Thank you! |
Beta Was this translation helpful? Give feedback.
-
@luismanez FYI it's now possible to customize prompts via configuration, see #110 Let me know if we need to keep this issue open, I believe the issue with deletions was addressed in a recent bugfix. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi @dluc , thanks for the heads up! ... I like the IPromptProvider approach. About this issue, it was more about having the option, so that when a Document is imported/indexed, all the generated files (extraction, chunks, summaries...) are deleted from the Storage (disk, azure storage...). As discussed, in our approach, the docs are stored in SharePoint, and the chunks are indexed in Az Search, so we don't need anything else, and having the generated physical files in Az Storage, is more a security/compliance concern than a benefit. I think there's still no option to do so, but happy to prove me wrong 😄 |
Beta Was this translation helpful? Give feedback.
-
#177 merged - thank you!! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been doing some tests with
memory.ImportDocumentAsync
(either using Local file system and Azure Blob storage). In both cases, the temp files are not deleted when the pipeline is successfully finished.As they are temp files, shouldn't be deleted by default after pipeline complete?
According to the source code, the
BaseOrchestrator
class has a ``CleanUpAfterCompletionAsync``` method, but that method is only executed if you add the delete_document handler to the pipeline. Besides, the code in the orchestrators leads to misunderstanding:It says that when the pipeline is complete, it will do a CleanUp, which is exactly what I'd want and expect. However, as said before, that method only runs when the DeleteDocument or DeleteIndex handlers are in the pipeline:
Is there any way to really CleanUp those temp files?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions