Skip to content

Conversation

@youknowone
Copy link
Owner

@youknowone youknowone commented Nov 20, 2025

Summary by CodeRabbit

  • New Features
    • Added Base64 encoding to the standard library with a built-in Python-accessible interface. Users can encode binary data to standard Base64 with correct padding.
    • Includes safeguards for very large inputs (size checks and memory-safety handling) to prevent allocation issues and ensure reliable behavior across edge cases.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

Walkthrough

Adds a new Base64 encoder implemented in Rust with PyO3 bindings as _base64.standard_b64encode, plus stdlib registration for the module; includes output length checks, buffer allocation, chunked encoding, and padding handling.

Changes

Cohort / File(s) Summary
Base64 encoding module
crates/stdlib/src/base64.rs
New file implementing standard Base64 encoding: constants (PAD_BYTE, ENCODE_TABLE), helpers (encoded_output_len, encode_into), chunked 3-byte→4-byte encoding with 1/2-byte padding handling, overflow/memory checks, and a PyO3-exposed standard_b64encode function.
Stdlib registration
crates/stdlib/src/lib.rs
Adds mod base64; and registers the _base64 module in the get_module_inits mapping to expose the PyO3 module at runtime.

Sequence Diagram

sequenceDiagram
    participant Python
    participant PyO3 as _base64::standard_b64encode
    participant Encoder as encode_into

    Python->>PyO3: standard_b64encode(data)
    activate PyO3
    PyO3->>PyO3: compute encoded_output_len(input_len)
    alt output length overflow
        PyO3-->>Python: memory_error
    else valid length
        PyO3->>PyO3: allocate output buffer
        PyO3->>Encoder: encode_into(input, output)
        activate Encoder
        Encoder->>Encoder: process full 3-byte chunks -> 4 bytes each
        Encoder->>Encoder: handle remaining 1 or 2 bytes + padding
        Encoder-->>PyO3: written length
        deactivate Encoder
        PyO3-->>Python: Vec<u8> (encoded bytes)
    end
    deactivate PyO3
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review attention: correctness of 3-byte chunk encoding and padding branches, overflow checks (encoded_output_len vs isize::MAX), memory-allocation and error paths, and PyO3 argument/return handling.

Poem

🐰 I nibble bytes and count in four,
I hop through chunks to pad and more.
With tables, checks, and careful art,
I bind to Python — a tiny part.
Encode away! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title '_base64' is vague and only names the module being added without describing the primary change or purpose of the pull request. Use a more descriptive title that explains what is being implemented, such as 'Add Base64 encoding support' or 'Implement standard_b64encode function for Base64 encoding'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch _base64

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e71701 and 1aef3bd.

📒 Files selected for processing (2)
  • crates/stdlib/src/base64.rs (1 hunks)
  • crates/stdlib/src/lib.rs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/stdlib/src/base64.rs
🔇 Additional comments (2)
crates/stdlib/src/lib.rs (2)

11-11: LGTM! Module declaration follows existing patterns.

The private module declaration is correctly placed and follows the project's conventions for stdlib modules.


127-127: LGTM, original review comment was incorrect.

The registration at line 127 is valid. The base64::make_module resolves correctly through the re-export chain: pub(crate) use _base64::make_module; in base64.rs imports from the _base64 module, which is marked with #[pymodule]. This procedural macro automatically generates the make_module function at compile time, so it won't appear as explicit source code. The pattern is correct and consistent with rustpython's module system.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/stdlib/src/base64.rs (1)

61-90: Docstring mentions altchars argument that is not implemented

The Rust doc comments for standard_b64encode describe an altchars argument and alternative alphabet semantics, but the function signature only accepts data: ArgBytesLike and does not expose any altchars parameter yet. This makes the exported Python docstring misleading.

Consider either updating the docstring to match the current API or adding an altchars parameter and wiring it into the encoding logic.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7ddcd2 and 644a7fb.

📒 Files selected for processing (2)
  • crates/stdlib/src/base64.rs (1 hunks)
  • crates/stdlib/src/lib.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
crates/stdlib/src/lib.rs (3)
crates/stdlib/src/array.rs (1)
  • make_module (5-30)
crates/stdlib/src/pyexpat.rs (1)
  • make_module (4-13)
crates/stdlib/src/sha256.rs (1)
  • make_module (3-6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: auto_format
🔇 Additional comments (2)
crates/stdlib/src/lib.rs (1)

11-11: _base64 module wiring looks consistent and correct

mod base64; plus the "_base64" => base64::make_module entry follow the same pattern as existing stdlib modules and correctly match the _base64 pymodule name from base64.rs. No additional cfg-guards seem necessary here.

Also applies to: 124-156

crates/stdlib/src/base64.rs (1)

5-59: Core base64 encoding and size computation look correct and overflow-safe

The combination of encoded_output_len and encode_into is sound:

  • Output length formula ((input_len + 2) / 3) * 4 is implemented with checked_add and checked_mul, preventing usize overflow.
  • encode_into correctly processes full 3-byte chunks and 1- or 2-byte remainders with the standard base64 alphabet and padding, and the returned dst_index matches the precomputed output length (asserted via debug_assert_eq!).

Given that standard_b64encode always allocates output based on encoded_output_len and then calls encode_into, the indexing into output is bounded and consistent.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 644a7fb and 0e71701.

📒 Files selected for processing (2)
  • crates/stdlib/src/base64.rs (1 hunks)
  • crates/stdlib/src/lib.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: auto_format
🔇 Additional comments (5)
crates/stdlib/src/lib.rs (1)

11-11: LGTM! Module registration follows project conventions.

The module declaration and registration follow the established patterns in this file, with appropriate naming convention (underscore prefix for private modules) and placement.

Also applies to: 127-127

crates/stdlib/src/base64.rs (4)

5-6: LGTM! Standard Base64 constants.

The padding byte and encoding table correctly implement the standard Base64 alphabet as defined in RFC 4648.


8-14: LGTM! Proper overflow protection.

The output length calculation is mathematically correct for Base64 encoding (4 bytes output per 3 bytes input), and the use of checked arithmetic prevents integer overflow.


17-59: LGTM! Core encoding logic is correct.

The bit manipulation correctly implements Base64 encoding:

  • Full 3-byte chunks are properly converted to four 6-bit indices
  • Remaining bytes (1 or 2) are correctly handled with appropriate padding
  • The unreachable!() macro is safe here since the match covers all possible cases (0, 1, or 2 remaining bytes)

74-76: Good Python compatibility check.

The isize::MAX check is important for Python compatibility, as Python's memory model uses signed sizes. This prevents potential issues when passing the buffer to Python code.

Comment on lines 61 to 90
#[pymodule(name = "_base64")]
mod _base64 {
use crate::vm::{PyResult, VirtualMachine, function::ArgBytesLike};

#[pyfunction]
fn standard_b64encode(data: ArgBytesLike, vm: &VirtualMachine) -> PyResult<Vec<u8>> {
data.with_ref(|input| {
let input_len = input.len();

let Some(output_len) = super::encoded_output_len(input_len) else {
return Err(vm.new_memory_error("output length overflow".to_owned()));
};

if output_len > isize::MAX as usize {
return Err(vm.new_memory_error("output too large".to_owned()));
}

let mut output = vec![0u8; output_len];
let written = super::encode_into(input, &mut output);
debug_assert_eq!(written, output_len);

Ok(output)
})
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify completeness of the module implementation.

The module currently only implements standard_b64encode. CPython's _base64 module typically provides both encoding and decoding functions. Additionally, consider whether using the well-established base64 crate would be preferable to a custom implementation for better performance and test coverage.

Run the following script to check what functions CPython's _base64 module provides:


🏁 Script executed:

#!/bin/bash
# Check what functions are typically in Python's _base64 module
python3 -c "import _base64; print(dir(_base64))" 2>/dev/null || echo "_base64 module not available in standard Python"

# Search for any existing tests for this module in the repository
fd -e rs -e py test | xargs rg -l "base64|b64encode" | head -20

Length of output: 47


I need to gather more information. Let me search for CPython's _base64 module documentation and check the full implementation in the repository.

Now let me search the codebase to see the complete implementation:

Let me search the repository for the complete base64 implementation:

[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Check the complete base64.rs file
cat crates/stdlib/src/base64.rs

# Also check if base64 crate is in dependencies
grep -r "base64" Cargo.toml 2>/dev/null | head -20

Length of output: 2948


Now let me verify what CPython's _base64 module actually provides:

[/web_search]


🌐 Web query:

CPython _base64 module functions standard_b64encode standard_b64decode

💡 Result:

The CPython _base64 module provides the C-optimized implementations used by the Python base64 module. The two functions you asked about behave as follows:

  • standard_b64encode(s): encode the bytes-like object s using the standard Base64 alphabet (+/ and = padding) and return the encoded bytes. [1][2]
  • standard_b64decode(s): decode a Base64-encoded bytes-like object or ASCII string s using the standard Base64 alphabet and return the decoded bytes; incorrect padding or invalid characters raise an error (binascii.Error / TypeError depending on the API and Python version). [1][2]

Use the Python-level aliases in the base64 module (base64.standard_b64encode / base64.standard_b64decode) which call into the _base64 implementation for speed. [1][2]

Sources: Python base64 documentation and CPython base64 module source. [1] Python docs (base64). [2] CPython/base64 implementation (stdlib source).


Add missing standard_b64decode function to complete the _base64 module implementation.

The CPython _base64 module provides standard_b64decode alongside standard_b64encode, but the current implementation only includes encoding. The module needs to implement standard_b64decode to match CPython's API. Additionally, consider whether using an established base64 crate would be preferable to the custom implementation for maintenance and performance.

🤖 Prompt for AI Agents
In crates/stdlib/src/base64.rs around lines 61 to 85, the module implements
standard_b64encode but is missing standard_b64decode; add a #[pyfunction] fn
standard_b64decode(data: ArgBytesLike, vm: &VirtualMachine) -> PyResult<Vec<u8>>
that mirrors the encode path: call data.with_ref(|input| { ... }), compute the
decoded output length using the existing helper for decoded size (or implement
decoded_output_len), check for overflow and isize::MAX, allocate a Vec<u8> with
that capacity, call the existing decode_into (or implement the decode logic) to
write into the buffer, handle invalid input by returning vm.new_value_error or
vm.new_bytes_like_error as appropriate, return Ok(output[..written].to_vec())
and add the function to the module exports so _base64 exposes both
standard_b64encode and standard_b64decode; alternatively, consider replacing the
custom encoder/decoder with a well-tested base64 crate and adapt wrappers
accordingly.

Comment on lines 66 to 84
fn standard_b64encode(data: ArgBytesLike, vm: &VirtualMachine) -> PyResult<Vec<u8>> {
data.with_ref(|input| {
let input_len = input.len();

let Some(output_len) = super::encoded_output_len(input_len) else {
return Err(vm.new_memory_error("output length overflow".to_owned()));
};

if output_len > isize::MAX as usize {
return Err(vm.new_memory_error("output too large".to_owned()));
}

let mut output = vec![0u8; output_len];
let written = super::encode_into(input, &mut output);
debug_assert_eq!(written, output_len);

Ok(output)
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add docstring for the Python function.

The standard_b64encode function lacks a docstring, which means Python users won't see any documentation when they call help(_base64.standard_b64encode). Consider adding a docstring that describes the function's purpose, parameters, and return value.

Apply this diff to add a docstring:

     #[pyfunction]
     fn standard_b64encode(data: ArgBytesLike, vm: &VirtualMachine) -> PyResult<Vec<u8>> {
+        /// Encode bytes using the standard Base64 alphabet.
+        ///
+        /// Args:
+        ///     data: Bytes-like object to encode
+        ///
+        /// Returns:
+        ///     Encoded bytes using Base64 encoding
         data.with_ref(|input| {

Committable suggestion skipped: line range outside the PR's diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants