Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 22 additions & 12 deletions crates/ast-engine/src/replacer/indent.rs
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,8 @@
//!
//! ## Limitations
//!
//! - Only supports space-based indentation (tabs not fully supported)
//! - Handles both space-based and tab-based indentation; mixed indentation
//! (spaces and tabs on the same line) falls back to space-based re-indentation
//! - Assumes well-formed input indentation
//! - Performance overhead for large code blocks
//! - Complex algorithm with edge cases
Expand Down Expand Up @@ -120,13 +121,13 @@ pub enum DeindentedExtract<'a, C: Content> {

/// Multi-line content with original indentation level recorded.
///
/// Contains the content bytes and the number of spaces that were used
/// for indentation in the original context. The first line's indentation
/// is not included in the content.
/// Contains the content bytes and the number of whitespace characters
/// (spaces or tabs) used for indentation in the original context. The first
/// line's indentation is not included in the content.
///
/// # Fields
/// - Content bytes with relative indentation preserved
/// - Original indentation level (number of spaces)
/// - Original indentation level (number of whitespace characters)
MultiLine(&'a [C::Underlying], usize),
}

Expand Down Expand Up @@ -251,32 +252,40 @@ pub fn get_indent_at_offset<C: Content>(src: &[C::Underlying]) -> usize {
get_indent_at_offset_with_tab::<C>(src).0
}

/// returns (indent, `is_tab`)
/// Returns `(indent_count, is_tab)` for the current line's leading whitespace.
///
/// `is_tab` is `true` only when the entire indentation prefix consists of tab
/// characters. For mixed indentation (e.g. `" \t"`) `is_tab` is `false` so that
/// re-indentation falls back to space-based expansion rather than silently
/// replacing the prefix with all tabs.
pub fn get_indent_at_offset_with_tab<C: Content>(src: &[C::Underlying]) -> (usize, bool) {
let lookahead = src.len().max(MAX_LOOK_AHEAD) - MAX_LOOK_AHEAD;

let mut indent = 0;
let mut is_tab = false;
let mut has_tab = false;
let mut has_space = false;
let new_line = get_new_line::<C>();
let space = get_space::<C>();
let tab = get_tab::<C>();
for c in src[lookahead..].iter().rev() {
if *c == new_line {
return (indent, is_tab);
return (indent, has_tab && !has_space);
}
if *c == space {
indent += 1;
has_space = true;
} else if *c == tab {
indent += 1;
is_tab = true;
has_tab = true;
} else {
indent = 0;
is_tab = false;
has_tab = false;
has_space = false;
}
}
// lookahead == 0 means we have indentation at first line.
if lookahead == 0 && indent != 0 {
(indent, is_tab)
(indent, has_tab && !has_space)
} else {
(0, false)
}
Expand Down Expand Up @@ -328,7 +337,8 @@ mod test {
.count();
let end = source.chars().count() - trailing_white;
let extracted = extract_with_deindent(&source, start..end);
let result_bytes = indent_lines::<String>(0, &extracted, source.contains('\t'));
let (_, is_tab) = get_indent_at_offset_with_tab::<String>(&source.as_bytes()[..start]);
let result_bytes = indent_lines::<String>(0, &extracted, is_tab);
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start/end in test_deindent are derived from .chars().count() and then used as byte offsets (extract_with_deindent(&source, start..end) and &source.as_bytes()[..start]). This is inconsistent for non-ASCII input (multi-byte whitespace or trailing chars) and can panic or produce invalid UTF-8 slices. Consider computing start/end as byte indices (e.g., via char_indices()/len_utf8()), or otherwise deriving the indentation prefix using source.get_range(0..start_byte) where start_byte is guaranteed to be a byte offset.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 36da33f. Replaced the .chars().count() arithmetic with str::trim_start / str::trim_end length subtraction so start and end are always byte offsets:

let leading_ws_bytes = source[offset..].len() - source[offset..].trim_start().len();
let start = offset + leading_ws_bytes;
let end = source.trim_end().len();

Both values are now guaranteed to be valid UTF-8 byte boundaries, so &source.as_bytes()[..start] and extract_with_deindent(&source, start..end) are safe for non-ASCII input.

let actual = std::str::from_utf8(&result_bytes).unwrap();
assert_eq!(actual, expected);
}
Expand Down
2 changes: 1 addition & 1 deletion crates/ast-engine/src/replacer/template.rs
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ fn replace_fixer<D: Doc>(fixer: &TemplateFix, env: &MetaVarEnv<'_, D>) -> Underl
ret.extend_from_slice(&D::Source::decode_str(frag));
}
for ((var, indent, is_tab), frag) in vars.zip(frags) {
if let Some(bytes) = maybe_get_var(env, var, indent.to_owned(), is_tab.to_owned()) {
if let Some(bytes) = maybe_get_var(env, var, *indent, *is_tab) {
ret.extend_from_slice(&bytes);
}
ret.extend_from_slice(&D::Source::decode_str(frag));
Expand Down