-
Notifications
You must be signed in to change notification settings - Fork 1
_base64 #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
_base64 #6
Conversation
WalkthroughAdds a new Base64 encoder implemented in Rust with PyO3 bindings as Changes
Sequence DiagramsequenceDiagram
participant Python
participant PyO3 as _base64::standard_b64encode
participant Encoder as encode_into
Python->>PyO3: standard_b64encode(data)
activate PyO3
PyO3->>PyO3: compute encoded_output_len(input_len)
alt output length overflow
PyO3-->>Python: memory_error
else valid length
PyO3->>PyO3: allocate output buffer
PyO3->>Encoder: encode_into(input, output)
activate Encoder
Encoder->>Encoder: process full 3-byte chunks -> 4 bytes each
Encoder->>Encoder: handle remaining 1 or 2 bytes + padding
Encoder-->>PyO3: written length
deactivate Encoder
PyO3-->>Python: Vec<u8> (encoded bytes)
end
deactivate PyO3
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
crates/stdlib/src/base64.rs (1)
61-90: Docstring mentionsaltcharsargument that is not implementedThe Rust doc comments for
standard_b64encodedescribe analtcharsargument and alternative alphabet semantics, but the function signature only acceptsdata: ArgBytesLikeand does not expose any altchars parameter yet. This makes the exported Python docstring misleading.Consider either updating the docstring to match the current API or adding an
altcharsparameter and wiring it into the encoding logic.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
crates/stdlib/src/base64.rs(1 hunks)crates/stdlib/src/lib.rs(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
crates/stdlib/src/lib.rs (3)
crates/stdlib/src/array.rs (1)
make_module(5-30)crates/stdlib/src/pyexpat.rs (1)
make_module(4-13)crates/stdlib/src/sha256.rs (1)
make_module(3-6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Check the WASM package and demo
- GitHub Check: Check Rust code with clippy
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: auto_format
🔇 Additional comments (2)
crates/stdlib/src/lib.rs (1)
11-11: _base64 module wiring looks consistent and correct
mod base64;plus the"_base64" => base64::make_moduleentry follow the same pattern as existing stdlib modules and correctly match the_base64pymodule name frombase64.rs. No additional cfg-guards seem necessary here.Also applies to: 124-156
crates/stdlib/src/base64.rs (1)
5-59: Core base64 encoding and size computation look correct and overflow-safeThe combination of
encoded_output_lenandencode_intois sound:
- Output length formula
((input_len + 2) / 3) * 4is implemented withchecked_addandchecked_mul, preventing usize overflow.encode_intocorrectly processes full 3-byte chunks and 1- or 2-byte remainders with the standard base64 alphabet and padding, and the returneddst_indexmatches the precomputed output length (asserted viadebug_assert_eq!).Given that
standard_b64encodealways allocatesoutputbased onencoded_output_lenand then callsencode_into, the indexing intooutputis bounded and consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
crates/stdlib/src/base64.rs(1 hunks)crates/stdlib/src/lib.rs(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Check Rust code with clippy
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run tests under miri
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: auto_format
🔇 Additional comments (5)
crates/stdlib/src/lib.rs (1)
11-11: LGTM! Module registration follows project conventions.The module declaration and registration follow the established patterns in this file, with appropriate naming convention (underscore prefix for private modules) and placement.
Also applies to: 127-127
crates/stdlib/src/base64.rs (4)
5-6: LGTM! Standard Base64 constants.The padding byte and encoding table correctly implement the standard Base64 alphabet as defined in RFC 4648.
8-14: LGTM! Proper overflow protection.The output length calculation is mathematically correct for Base64 encoding (4 bytes output per 3 bytes input), and the use of checked arithmetic prevents integer overflow.
17-59: LGTM! Core encoding logic is correct.The bit manipulation correctly implements Base64 encoding:
- Full 3-byte chunks are properly converted to four 6-bit indices
- Remaining bytes (1 or 2) are correctly handled with appropriate padding
- The unreachable!() macro is safe here since the match covers all possible cases (0, 1, or 2 remaining bytes)
74-76: Good Python compatibility check.The
isize::MAXcheck is important for Python compatibility, as Python's memory model uses signed sizes. This prevents potential issues when passing the buffer to Python code.
| #[pymodule(name = "_base64")] | ||
| mod _base64 { | ||
| use crate::vm::{PyResult, VirtualMachine, function::ArgBytesLike}; | ||
|
|
||
| #[pyfunction] | ||
| fn standard_b64encode(data: ArgBytesLike, vm: &VirtualMachine) -> PyResult<Vec<u8>> { | ||
| data.with_ref(|input| { | ||
| let input_len = input.len(); | ||
|
|
||
| let Some(output_len) = super::encoded_output_len(input_len) else { | ||
| return Err(vm.new_memory_error("output length overflow".to_owned())); | ||
| }; | ||
|
|
||
| if output_len > isize::MAX as usize { | ||
| return Err(vm.new_memory_error("output too large".to_owned())); | ||
| } | ||
|
|
||
| let mut output = vec![0u8; output_len]; | ||
| let written = super::encode_into(input, &mut output); | ||
| debug_assert_eq!(written, output_len); | ||
|
|
||
| Ok(output) | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify completeness of the module implementation.
The module currently only implements standard_b64encode. CPython's _base64 module typically provides both encoding and decoding functions. Additionally, consider whether using the well-established base64 crate would be preferable to a custom implementation for better performance and test coverage.
Run the following script to check what functions CPython's _base64 module provides:
🏁 Script executed:
#!/bin/bash
# Check what functions are typically in Python's _base64 module
python3 -c "import _base64; print(dir(_base64))" 2>/dev/null || echo "_base64 module not available in standard Python"
# Search for any existing tests for this module in the repository
fd -e rs -e py test | xargs rg -l "base64|b64encode" | head -20Length of output: 47
I need to gather more information. Let me search for CPython's _base64 module documentation and check the full implementation in the repository.
Now let me search the codebase to see the complete implementation:
Let me search the repository for the complete base64 implementation:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Check the complete base64.rs file
cat crates/stdlib/src/base64.rs
# Also check if base64 crate is in dependencies
grep -r "base64" Cargo.toml 2>/dev/null | head -20Length of output: 2948
Now let me verify what CPython's _base64 module actually provides:
[/web_search]
🌐 Web query:
CPython _base64 module functions standard_b64encode standard_b64decode
💡 Result:
The CPython _base64 module provides the C-optimized implementations used by the Python base64 module. The two functions you asked about behave as follows:
- standard_b64encode(s): encode the bytes-like object s using the standard Base64 alphabet (+/ and = padding) and return the encoded bytes. [1][2]
- standard_b64decode(s): decode a Base64-encoded bytes-like object or ASCII string s using the standard Base64 alphabet and return the decoded bytes; incorrect padding or invalid characters raise an error (binascii.Error / TypeError depending on the API and Python version). [1][2]
Use the Python-level aliases in the base64 module (base64.standard_b64encode / base64.standard_b64decode) which call into the _base64 implementation for speed. [1][2]
Sources: Python base64 documentation and CPython base64 module source. [1] Python docs (base64). [2] CPython/base64 implementation (stdlib source).
Add missing standard_b64decode function to complete the _base64 module implementation.
The CPython _base64 module provides standard_b64decode alongside standard_b64encode, but the current implementation only includes encoding. The module needs to implement standard_b64decode to match CPython's API. Additionally, consider whether using an established base64 crate would be preferable to the custom implementation for maintenance and performance.
🤖 Prompt for AI Agents
In crates/stdlib/src/base64.rs around lines 61 to 85, the module implements
standard_b64encode but is missing standard_b64decode; add a #[pyfunction] fn
standard_b64decode(data: ArgBytesLike, vm: &VirtualMachine) -> PyResult<Vec<u8>>
that mirrors the encode path: call data.with_ref(|input| { ... }), compute the
decoded output length using the existing helper for decoded size (or implement
decoded_output_len), check for overflow and isize::MAX, allocate a Vec<u8> with
that capacity, call the existing decode_into (or implement the decode logic) to
write into the buffer, handle invalid input by returning vm.new_value_error or
vm.new_bytes_like_error as appropriate, return Ok(output[..written].to_vec())
and add the function to the module exports so _base64 exposes both
standard_b64encode and standard_b64decode; alternatively, consider replacing the
custom encoder/decoder with a well-tested base64 crate and adapt wrappers
accordingly.
crates/stdlib/src/base64.rs
Outdated
| fn standard_b64encode(data: ArgBytesLike, vm: &VirtualMachine) -> PyResult<Vec<u8>> { | ||
| data.with_ref(|input| { | ||
| let input_len = input.len(); | ||
|
|
||
| let Some(output_len) = super::encoded_output_len(input_len) else { | ||
| return Err(vm.new_memory_error("output length overflow".to_owned())); | ||
| }; | ||
|
|
||
| if output_len > isize::MAX as usize { | ||
| return Err(vm.new_memory_error("output too large".to_owned())); | ||
| } | ||
|
|
||
| let mut output = vec![0u8; output_len]; | ||
| let written = super::encode_into(input, &mut output); | ||
| debug_assert_eq!(written, output_len); | ||
|
|
||
| Ok(output) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring for the Python function.
The standard_b64encode function lacks a docstring, which means Python users won't see any documentation when they call help(_base64.standard_b64encode). Consider adding a docstring that describes the function's purpose, parameters, and return value.
Apply this diff to add a docstring:
#[pyfunction]
fn standard_b64encode(data: ArgBytesLike, vm: &VirtualMachine) -> PyResult<Vec<u8>> {
+ /// Encode bytes using the standard Base64 alphabet.
+ ///
+ /// Args:
+ /// data: Bytes-like object to encode
+ ///
+ /// Returns:
+ /// Encoded bytes using Base64 encoding
data.with_ref(|input| {Committable suggestion skipped: line range outside the PR's diff.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.