-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Enable full object checksum validation on JSON path #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds validation for client-provided (pre-calculated) and client-calculated CRC32C and MD5 hashes when the final upload request is made via the JSON API path (status 200). This ensures consistency checks are performed even when the `Upload` stream is finalized, preventing silent data corruption if the server-reported hash (in the response body) mismatches the client's expected hash.
Summary of ChangesHello @thiyaguk09, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the reliability of object uploads by integrating comprehensive client-side checksum validation. It ensures that data uploaded through the JSON API path maintains integrity by verifying CRC32C and MD5 hashes, either provided by the client or calculated during the upload process, against the hashes reported by the server. This proactive approach aims to eliminate silent data corruption and provide immediate feedback on data consistency. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds checksum validation for uploads using the JSON API path. The changes introduce logic to calculate CRC32C and MD5 hashes on the client-side and validate them against the server's response. My review has identified one critical issue and one area for improvement. There is a critical bug in hash-stream-validator.ts where hash.digest() could be called multiple times on the same hash object, which will cause a runtime error. Additionally, there is some code duplication in resumable-upload.ts for handling checksum headers in single-chunk and multi-chunk uploads, which could be refactored for better maintainability. Overall, the changes are well-structured and address the feature requirement, but the critical issue should be fixed before merging.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds checksum validation for client-side calculated CRC32C and MD5 hashes on the final upload request. The changes are logical and well-implemented, introducing new configuration options and updating the upload process to include the X-Goog-Hash header.
I've identified a critical issue in src/hash-stream-validator.ts that could lead to a runtime error due to a race condition when calculating the MD5 digest. I've also pointed out an opportunity for refactoring in src/resumable-upload.ts to reduce code duplication and improve maintainability.
Once these points are addressed, the PR should be in good shape.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces checksum validation for resumable uploads, a valuable addition for data integrity. The implementation is well-structured, and the new tests are thorough. However, I've identified a critical logic issue in how single-chunk uploads are handled when the content length is unknown, which could result in incorrect hash headers being sent. My review includes a suggested fix for this issue. With that addressed, the changes look solid.
Adds the 'Validation of Client Checksums Against Server Response' test suite. Fixes test failures in client-provided hash scenarios by updating mock responses to ensure server-reported checksums match the client's expected values.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable feature for ensuring data integrity by adding checksum validation for uploads. The changes allow for both client-calculated and pre-calculated CRC32C and MD5 hashes to be sent and validated against the server's response. The implementation is mostly solid, with good test coverage for the new functionality.
I've identified one critical bug in the logic for constructing the X-Goog-Hash header, which could lead to pre-calculated hashes being silently ignored in certain scenarios. I've provided a suggested fix for this. I also made a minor suggestion to refactor a small part of the code for improved clarity.
Once the critical issue is addressed, this will be a great addition to the library.
Refactors four duplicate test cases (CRC32C/MD5 success and failure) into a single, parameterized test block within the 'Validation of Client Checksums Against Server Response' suite. This improves test clarity and reduces code duplication by dynamically generating test scenarios for post-upload hash validation.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable feature for ensuring data integrity during uploads by adding client-side checksum validation for CRC32C and MD5. The implementation is well-structured, with clear separation of concerns for hash calculation and application. The test coverage is extensive and covers many scenarios, including single-chunk, multi-chunk, and pre-calculated hashes.
I've found one critical issue related to streaming uploads that could cause crashes, a high-severity issue where validation may not run for streams, and one medium-severity suggestion to improve code maintainability by reducing duplication. Please see the detailed comments.
|
/gemini summary |
Summary of ChangesThis pull request significantly enhances data integrity for object uploads by introducing comprehensive client-side checksum validation. It ensures that data uploaded through the JSON API path remains consistent by verifying both client-provided and client-calculated CRC32C and MD5 hashes against the server's reported hashes. This mechanism proactively detects and prevents silent data corruption, providing a more reliable upload experience. Highlights
Changelog
Activity
|
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces client-side checksum validation for uploads, a great feature for ensuring data integrity. The implementation is comprehensive, adding new configuration options, hash calculation, and validation against server responses. The test coverage is extensive, with both unit and integration tests for various success and failure scenarios.
However, I've identified a critical issue in the handling of single-chunk uploads where the content is streamed in multiple parts. The hash is finalized prematurely, leading to incorrect checksums. I've provided a suggestion to fix this by buffering the content before hash calculation, aligning it with the logic for multi-chunk uploads.
|
/gemini summary |
Summary of ChangesThis pull request significantly enhances the data integrity of object uploads by implementing robust client-side checksum validation for both CRC32C and MD5 hashes. It introduces new configuration options for users to either provide pre-calculated hashes or enable automatic calculation, ensuring that data remains consistent throughout the upload process. Any detected mismatch between client and server hashes will proactively terminate the upload, preventing silent data corruption and improving overall reliability. Highlights
Changelog
Activity
|
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces client-side checksum validation for resumable uploads, supporting both CRC32C and MD5 hashes. It adds new UploadConfig options (clientCrc32c, clientMd5Hash, crc32c, md5) to allow users to provide pre-calculated hashes or enable client-side calculation. A new HashStreamValidator class is integrated into the Upload stream to compute hashes as data is written. The X-Goog-Hash header is now conditionally applied to upload requests, containing either the calculated or provided hashes, specifically on the final chunk for multi-chunk uploads or on the single request for non-resumable uploads. After a successful upload response, the client-side calculated or provided hashes are compared against the server-returned hashes, and the upload stream is destroyed with an UPLOAD_MISMATCH error if a discrepancy is found. New unit and system tests have been added to cover various scenarios, including successful hash validation, checksum mismatches, and correct X-Goog-Hash header injection for both single and multi-chunk uploads.
This commit introduces several stability fixes for the ResumableUpload class: 1. **Fixes Timeouts in Unit Tests:** Updates `makeRequestStream` mocks to fully drain the request body stream, resolving stream consumption deadlocks and timeouts in `#startUploading` unit tests. 2. **Fixes Multi-Part Hang:** Correctly finalizes the `pipeline` for partial chunks (`isPartialUpload=true`) by calling `pipelineCallback()` immediately after successful chunk upload, preventing indefinite hangs in multi-session tests. 3. **Fixes Single-Chunk Checksum Missing Header:** Applies the `X-Goog-Hash` header unconditionally in single-chunk mode if a validator is present, ensuring checksum validation is active even when `contentLength` is unknown.
Simplify `HashStreamValidator._flush` by utilizing `md5Digest` getter.
Description
Impact
Testing
Additional Information
Checklist
Fixes #