Skip to content

The undefined behavior in buffered::bufreader::ReadBuffer::read_to_string #11

@WIZeaz

Description

@WIZeaz

Description

buffered::bufreader::ReadBuffer implements axio::Read in arceos-main/modules/axio/src/buffered/bufreader.rs. In the implementation of read_to_string:

// The inner reader might have an optimized `read_to_end`. Drain our buffer and then
// delegate to the inner implementation.
#[cfg(feature = "alloc")]
fn read_to_string(&mut self, buf: &mut String) -> Result<usize> {
// In the general `else` case below we must read bytes into a side buffer, check
// that they are valid UTF-8, and then append them to `buf`. This requires a
// potentially large memcpy.
//
// If `buf` is empty--the most common case--we can leverage `append_to_string`
// to read directly into `buf`'s internal byte buffer, saving an allocation and
// a memcpy.
if buf.is_empty() {
// `append_to_string`'s safety relies on the buffer only being appended to since
// it only checks the UTF-8 validity of new data. If there were existing content in
// `buf` then an untrustworthy reader (i.e. `self.inner`) could not only append
// bytes but also modify existing bytes and render them invalid. On the other hand,
// if `buf` is empty then by definition any writes must be appends and
// `append_to_string` will validate all of the new bytes.
unsafe { crate::append_to_string(buf, |b| self.read_to_end(b)) }
} else {
// We cannot append our byte buffer directly onto the `buf` String as there could
// be an incomplete UTF-8 sequence that has only been partially read. We must read
// everything into a side buffer first and then call `from_utf8` on the complete
// buffer.
let mut bytes = Vec::new();
self.read_to_end(&mut bytes)?;
let string = core::str::from_utf8(&bytes).map_err(|_| {
axerrno::ax_err_type!(InvalidData, "stream did not contain valid UTF-8")
})?;
*buf += string;
Ok(string.len())
}
}
}

When buf is empty, read_to_string calls append_to_string, which reads raw bytes directly into the buf and performs UTF-8 validation afterwards. If validation fails, the already written bytes are not rolled back, leaving the buf in an invalid UTF-8 state and violating Rust’s String invariant.

When the destination buf is not empty, append_to_string uses a temporary buffer and validates UTF-8 before appending, so this issue does not occur. This makes the behavior depend on buf.is_empty(), which is inconsistent.

Reproduction

This issue can be reproduced by the PoC following:

#[test]
fn poc_read_to_string_empty() {
    let a = [0xFFu8];
    let mut s = String::new();
    let mut reader = BufReader::new(&a[..]);

    let res = reader.read_to_string(&mut s);

    // read_to_string return Err, but s is already changed!
    assert!(res.is_err());

    // Now s = [0xFF], which is an invalid string
    // String::from_utf8 will fail
    String::from_utf8( s.as_bytes().to_vec()).unwrap(); // fail here
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions