-
Notifications
You must be signed in to change notification settings - Fork 845
Description
I'm beginning to look at Microsoft.Extensions.DataIngestion pipelines. As a test, I considered using an IngestionPipeline to ingest content stored in a CMS SQL database and create a vector store for use with RAG. However, I'm unclear on how to implement it when the data to be ingested is stored in a database.
Currently, both overloads of the ProcessAsync method require file system objects.
extensions/src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
Lines 80 to 81 in 15ffd76
| public async IAsyncEnumerable<IngestionResult> ProcessAsync(DirectoryInfo directory, string searchPattern = "*.*", | |
| SearchOption searchOption = SearchOption.TopDirectoryOnly, [EnumeratorCancellation] CancellationToken cancellationToken = default) |
and
extensions/src/Libraries/Microsoft.Extensions.DataIngestion/IngestionPipeline.cs
Lines 107 to 108 in 15ffd76
| public async IAsyncEnumerable<IngestionResult> ProcessAsync(IEnumerable<FileInfo> files, [EnumeratorCancellation] CancellationToken cancellationToken = default) | |
| { |
Perhaps I misunderstand its purpose or how it's meant to be used, but it would appear that it can only ingest data originating from files. Is that the case?