-
Notifications
You must be signed in to change notification settings - Fork 19
feat(chunks): add POST /v1/chunks endpoint #660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
✨ Finishing touches🧪 Generate unit tests (beta)
Tip 🧪 Unit Test Generation v2 is now available!We have significantly improved our unit test generation capabilities. To enable: Add this to your reviews:
finishing_touches:
unit_tests:
enabled: trueTry it out by using the Have feedback? Share your thoughts on our Discord thread! Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3170632 to
f3b7959
Compare
f3b7959 to
845690e
Compare
| await self._upsert( | ||
| collection_id=collection_id, | ||
| document_id=document_id, | ||
| document_name=document_name, | ||
| chunks=chunks, | ||
| redis_client=redis_client, | ||
| postgres_session=postgres_session, | ||
| model_registry=model_registry, | ||
| request_context=request_context, | ||
| ) |
Check failure
Code scanning / CodeQL
Wrong number of arguments in a call Error
method DocumentManager._upsert
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
To fix the problem, the call to _upsert must supply all required arguments, specifically elasticsearch_vector_store and elasticsearch_client. The _upsert method signature is already correct and makes use of these parameters when performing the final upsert into Elasticsearch, so we should not change the method definition or remove its parameters. Instead, we update the _vectorize (or equivalent) method around line 333–382 to pass the two missing keyword arguments when invoking _upsert. Both elasticsearch_vector_store and elasticsearch_client are already in scope in this method (they are used at line 354 to compute last_chunk_id), so we simply forward them to _upsert. No new imports or definitions are needed; this is a straightforward addition of two keyword arguments at the call site, preserving existing functionality.
Concretely, in api/helpers/_documentmanager.py, in the block starting at line 366 where await self._upsert( is called, add elasticsearch_vector_store=elasticsearch_vector_store, and elasticsearch_client=elasticsearch_client, to the argument list, aligning with the parameters declared in _upsert at line 520.
-
Copy modified lines R376-R377
| @@ -373,6 +373,8 @@ | ||
| postgres_session=postgres_session, | ||
| model_registry=model_registry, | ||
| request_context=request_context, | ||
| elasticsearch_vector_store=elasticsearch_vector_store, | ||
| elasticsearch_client=elasticsearch_client, | ||
| ) | ||
| except Exception as e: | ||
| raise VectorizationFailedException(detail=f"Vectorization failed: {e}") |
845690e to
ffda2ec
Compare
| ) | ||
| raise VectorizationFailedException(detail=f"Vectorization failed: {e}") | ||
| if file: | ||
| chunks = [InputChunk(id=i, content=chunk, metadata=metadata) for i, chunk in enumerate(chunks)] |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
In general, this pattern is fixed by ensuring the local variable is initialized on all code paths before any possible use. That can be done by either (a) initializing it to a default value before any conditional branches, or (b) restructuring the code so that any reads are strictly dominated by the write (for example, by nesting the later logic inside the same if block).
Here, the least intrusive fix that preserves existing behavior is to initialize chunks to an empty list before the if file: block. Because all uses of chunks are inside if file: blocks, this will not change behavior: when file is truthy, chunks will be overwritten by the parsed/split content as before; when file is falsy, the lower if file: will skip the entire vectorization branch, and the initial empty list will never be used. To implement this, add chunks: list[str] = [] (or simply chunks = []) immediately after computing document_name and before the if file: block in api/helpers/_documentmanager.py.
No new imports or helper methods are required; we only add one local initialization line.
-
Copy modified line R192
| @@ -189,6 +189,7 @@ | ||
|
|
||
| # get document name | ||
| document_name = name or file.filename if file else name | ||
| chunks: list[str] = [] | ||
|
|
||
| if file: | ||
| # parse the file |
ffda2ec to
4a93330
Compare
No description provided.