-
-
Notifications
You must be signed in to change notification settings - Fork 190
Open
Labels
RFCRFC = Request For Comments (proposals about features that you want to be discussed)RFC = Request For Comments (proposals about features that you want to be discussed)StoreIssues & PRs about the AI Store componentIssues & PRs about the AI Store component
Description
I have this current code to index all non-empty markdown files, I feel this use case is common enough to warrant some better DX by integrating the finder directly but curious to hear alternative solutions on making the loader more flexible.
$documentsLoader = new TextFileLoader();
$vectorizer = new Vectorizer(
$this->aiPlatform->getPlatform(),
AIPlatform::EMBEDDING_MODEL
);
$docFiles = Finder::create()->files()->in($folderPath)->name('*.md')->size('> 0')->getIterator();
$docsArray = \array_map(
fn(\SplFileInfo $file) => $file->getRealPath() ?: throw new \RuntimeException('File path not found'),
iterator_to_array($docFiles)
);
$output->writeln(\sprintf('Indexing %d documents...', \count($docsArray)));
$indexer = new Indexer(
loader: $documentsLoader,
vectorizer: $vectorizer,
store: $this->vektor,
source: $docsArray,
transformers: [
new RecursiveCharacterTextTransformer(
separators: ['#', '##', '\n', ' '],
chunkSize: 1000,
)
]
);
$indexer->index();Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
RFCRFC = Request For Comments (proposals about features that you want to be discussed)RFC = Request For Comments (proposals about features that you want to be discussed)StoreIssues & PRs about the AI Store componentIssues & PRs about the AI Store component