-
Couldn't load subscription status.
- Fork 1.2k
fix!: remove chunk_id property from Chunk class #3954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
this relates to conversation in #3895, fixes like this will make the API easier to split into a separate extensions package. |
| return str(uuid.UUID(hashlib.sha256(hash_input).hexdigest()[:32])) | ||
|
|
||
| @property | ||
| def chunk_id(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the bigger issue is its existence in the API file itself. The API should never have behavior like this at all. It should be data types and method declarations. Not implementation. Can we actually remove the usage of this property itself? You should use immutable datatypes -- something at the top level in the routers, etc. should say "oh there is no chunk id so I will generate one and then I will initialize an immutable object with this chunk id"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes sure, that makes more sense
| chunk_window = f"{i}-{i + len(toks)}" | ||
| chunk_id = generate_chunk_id(chunk, text, chunk_window) | ||
| chunk_metadata = metadata.copy() | ||
| chunk_metadata["chunk_id"] = chunk_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need this in both places
chunk_id in the Chunk class executes actual logic to compute a chunk ID. This sort of logic should not live in the API spec. Instead, the providers should be in charge of calling generate_chunk_id, and pass it to `Chunk`. this removes the incorrect dependency between Provider impl and API impl Signed-off-by: Charlie Doern <[email protected]>
What does this PR do?
chunk_id in the Chunk class executes actual logic to compute a chunk ID. This sort of logic should not live in the API spec.
Instead, the providers should be in charge of calling generate_chunk_id, and pass it to
Chunk.this removes the incorrect dependency between Provider impl and API impl