Skip to content

Using IngestionPipeline for content not originating from the file system #7082

@f2bo

Description

@f2bo

I'm beginning to look at Microsoft.Extensions.DataIngestion pipelines. As a test, I considered using an IngestionPipeline to ingest content stored in a CMS SQL database and create a vector store for use with RAG. However, I'm unclear on how to implement it when the data to be ingested is stored in a database.

Currently, both overloads of the ProcessAsync method require file system objects.

public async IAsyncEnumerable<IngestionResult> ProcessAsync(DirectoryInfo directory, string searchPattern = "*.*",
SearchOption searchOption = SearchOption.TopDirectoryOnly, [EnumeratorCancellation] CancellationToken cancellationToken = default)

and

public async IAsyncEnumerable<IngestionResult> ProcessAsync(IEnumerable<FileInfo> files, [EnumeratorCancellation] CancellationToken cancellationToken = default)
{

Perhaps I misunderstand its purpose or how it's meant to be used, but it would appear that it can only ingest data originating from files. Is that the case?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions