Skip to content

Commit 724abef

Browse files
authored
fix(assembly): regression in link-time symbol resolution involving quoted path components (#2627)
* test: add regression test for linking items with quoted path components This test fails due to some subtle issues around how symbols and paths are handled during parsing, semantic analysis, and linking/assembly. * fix(assembly): unquote quoted path components when converting to str This ensures that converting a path component to an identifier/str, then pushing that identifier/str as a path component produces an identical path. While we want to preserve quotes in the underlying buffer of Path/PathBuf, we don't want to surface quotes in strings produced by the Path/PathBuf APIs. This is consistent with how Ident internally represents symbols (i.e. unquoted). Instead, we rely on the builder/constructor methods of `PathBuf` to restore quotes as-needed. Note that it is possible to construct a path involving quoted components that don't actually need to be quoted. In a future commit, canonicalization will be added so that you can always obtain a canonical form from a possible non-canonical path. * fix(assembly): ensure PathComponent::requires_quoting considers whether a string is a valid identifier, not just whether it contains `::` * fix(assembly): distinguish path join semantics for paths vs components Previously, `Path::join` required the item to be joined to implement `AsRef<Path>`, and implementations for this exist for both `str` and `Path`/`PathBuf` - thus, it always treated the item to be joined as a full path. The problem is that we are frequently obtaining a path component as a string, and then constructing new paths with `Path::join`. When these strings are produced, they are produced unquoted - consequently, if those paths happened to contain `::` delimiters, `Path::join` would produce a path Fundamentally, the issue is that the semantics of `Path::join` differ depending on whether you are joining a `Path`-like item, or a `PathComponent`-like item. We are always joining strings to a path as a single component, never as a multi-component path; while we always join paths to a path as multi-component paths. This commit reworks `Path::join` to distinguish between the two use cases. It provides a way to be explicit about what behavior you want from `join`, while still providing the ergonomics from before. There is still a subtle edge case where mistakes could be made, if someone is attempting to join a multi-component path in string form (it will now be treated as a single quoted component) - but we have no such uses in our code base, and this is now documented clearly to warn those who might do such a thing. * fix(assembly): simplify path handling in grammar * Properly handle quoted path components * Remove unnecessary grammar rules * fix(build): broken Makefile targets * test(assembly): add test for intra-kernel library linking * fix(assembly): implement path canonicalization and fix issues uncovered This commit introduces two main features: * Automatic canonicalization of paths when constructed via PathBuf * Explicit canonicalization of Path via `Path::canonicalize` * Tests were added for joining and canonicalization It also addresses the following issues: * We no longer treat $kernel/$exec specially during component parsing and in certain Path methods like `is_absolute`, instead, these paths are canonicalized to absolute and then treated like any other path. * Some of the TryFrom/From conversions were not using the semantically correct method when constructing a PathBuf * Ident::requires_quoting was unused as a method, but has value as a static function, and has been converted to one. It was also updated to properly consider whether an identifier containing $ requires escaping based on whether it is recognized as a reserved symbol, e.g. $kernel * Deserialization of paths was requiring that the input be a borrowed str, due to reusing the deserializer for Path, this is unnecessarily limiting, and caused deserialization errors when the input string did not have a lifetime sufficient to allow the input to be borrowed. Instead, we now deserialize via PathBuf, and PathBufs deserialization code was improved to allow deserializing from both borrowed and owned strings, and to provide better error feedback
1 parent 2e66bd2 commit 724abef

File tree

13 files changed

+708
-227
lines changed

13 files changed

+708
-227
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# Changelog
22

3+
## 0.20.5 (TBD)
4+
5+
- Fixed issue with deserialization of Paths due to lifetime restrictions [#2627](https://github.com/0xMiden/miden-vm/pull/2627)
6+
- Implemented path canonicalization and modified Path/PathBuf APIs to canonicalize paths during construction. This also addressed some issues uncovered during testing where some APIs were not canonicalizing paths, or path-related functions were inconsistent in their behavior due to special-casing that was previously needed [#2627](https://github.com/0xMiden/miden-vm/pull/2627)
7+
8+
39
## 0.20.4 (2026-01-30)
410

511
- Fixed issue with handling of quoted components in `PathBuf` [#2618](https://github.com/0xMiden/miden-vm/pull/2618)

Makefile

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ help:
1414
@printf " make test-assembly # Test assembly crate\n"
1515
@printf " make test-assembly-syntax # Test assembly-syntax crate\n"
1616
@printf " make test-core # Test core crate\n"
17-
@printf " make test-miden-vm # Test miden-vm crate\n"
17+
@printf " make test-vm # Test miden-vm crate\n"
1818
@printf " make test-processor # Test processor crate\n"
1919
@printf " make test-prover # Test prover crate\n"
2020
@printf " make test-core-lib # Test core-lib crate\n"
@@ -48,10 +48,11 @@ FEATURES_air := testing
4848
FEATURES_assembly := testing
4949
FEATURES_assembly-syntax := testing,serde
5050
FEATURES_core :=
51-
FEATURES_miden-vm := concurrent,executable,metal,internal
51+
FEATURES_vm := concurrent,executable,metal,internal
5252
FEATURES_processor := concurrent,testing,bus-debugger
5353
FEATURES_prover := concurrent,metal
54-
FEATURES_core-lib :=FEATURES_verifier :=
54+
FEATURES_core-lib :=
55+
FEATURES_verifier :=
5556

5657
# -- linting --------------------------------------------------------------------------------------
5758

crates/assembly-syntax/src/ast/ident.rs

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,8 +135,13 @@ impl Ident {
135135
}
136136

137137
/// Returns true if this identifier must be quoted in Miden Assembly syntax
138-
pub fn requires_quoting(&self) -> bool {
139-
!self.name.chars().all(|c| c.is_ascii_alphanumeric() || c == '_' || c == '$')
138+
pub fn requires_quoting(ident: impl AsRef<str>) -> bool {
139+
match ident.as_ref() {
140+
crate::Path::KERNEL_PATH
141+
| crate::Path::EXEC_PATH
142+
| crate::ast::ProcedureName::MAIN_PROC_NAME => false,
143+
ident => !ident.chars().all(|c| c.is_ascii_alphanumeric() || c == '_'),
144+
}
140145
}
141146

142147
/// Applies the default [Ident] validation rules to `source`.

crates/assembly-syntax/src/ast/path/components.rs

Lines changed: 46 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -22,31 +22,33 @@ pub enum PathComponent<'a> {
2222

2323
impl<'a> PathComponent<'a> {
2424
/// Get this component as a [prim@str]
25-
#[inline(always)]
25+
///
26+
/// NOTE: If the component is quoted, the resulting string does _not_ contain quotes. Depending
27+
/// on how the resulting string is used, you may need to ensure quotes are added manually. In
28+
/// general, the `Path`/`PathBuf` APIs handle this for you.
2629
pub fn as_str(&self) -> &'a str {
2730
match self {
2831
Self::Root => "::",
32+
Self::Normal(id) if id.starts_with('"') && id.ends_with('"') => &id[1..(id.len() - 1)],
2933
Self::Normal(id) => id,
3034
}
3135
}
3236

3337
/// Get this component as an [Ident], if it represents an identifier
3438
#[inline]
3539
pub fn to_ident(&self) -> Option<Ident> {
36-
match self {
37-
Self::Root => None,
38-
Self::Normal(id) => Some(Ident::from_raw_parts(Span::unknown(Arc::from(
39-
id.to_string().into_boxed_str(),
40-
)))),
40+
if matches!(self, Self::Root) {
41+
None
42+
} else {
43+
Some(Ident::from_raw_parts(Span::unknown(Arc::from(
44+
self.as_str().to_string().into_boxed_str(),
45+
))))
4146
}
4247
}
4348

4449
/// Get the size in [prim@char]s of this component when printed
4550
pub fn char_len(&self) -> usize {
46-
match self {
47-
Self::Root => 2,
48-
Self::Normal(id) => id.chars().count(),
49-
}
51+
self.as_str().chars().count()
5052
}
5153

5254
/// Returns true if this path component is a quoted string
@@ -56,7 +58,16 @@ impl<'a> PathComponent<'a> {
5658

5759
/// Returns true if this path component requires quoting when displayed/stored as a string
5860
pub fn requires_quoting(&self) -> bool {
59-
matches!(self, Self::Normal(component) if component.contains("::"))
61+
match self {
62+
Self::Root => false,
63+
Self::Normal(Path::KERNEL_PATH | Path::EXEC_PATH) => false,
64+
Self::Normal(component)
65+
if component.contains("::") || Ident::requires_quoting(component) =>
66+
{
67+
true
68+
},
69+
Self::Normal(_) => false,
70+
}
6071
}
6172
}
6273

@@ -132,8 +143,8 @@ impl<'a> Iterator for Iter<'a> {
132143

133144
fn next(&mut self) -> Option<Self::Item> {
134145
match self.components.next() {
135-
Some(Ok(PathComponent::Normal(component)))
136-
if component.len() > Path::MAX_COMPONENT_LENGTH =>
146+
Some(Ok(component @ PathComponent::Normal(_)))
147+
if component.as_str().chars().count() > Path::MAX_COMPONENT_LENGTH =>
137148
{
138149
Some(Err(PathError::InvalidComponent(crate::ast::IdentError::InvalidLength {
139150
max: Path::MAX_COMPONENT_LENGTH,
@@ -147,8 +158,8 @@ impl<'a> Iterator for Iter<'a> {
147158
impl<'a> DoubleEndedIterator for Iter<'a> {
148159
fn next_back(&mut self) -> Option<Self::Item> {
149160
match self.components.next_back() {
150-
Some(Ok(PathComponent::Normal(component)))
151-
if component.len() > Path::MAX_COMPONENT_LENGTH =>
161+
Some(Ok(component @ PathComponent::Normal(_)))
162+
if component.as_str().chars().count() > Path::MAX_COMPONENT_LENGTH =>
152163
{
153164
Some(Err(PathError::InvalidComponent(crate::ast::IdentError::InvalidLength {
154165
max: Path::MAX_COMPONENT_LENGTH,
@@ -215,12 +226,6 @@ impl<'a> Iterator for Components<'a> {
215226
self.front_pos += 2;
216227
return Some(Ok(PathComponent::Root));
217228
},
218-
None if self.path.starts_with(Path::KERNEL_PATH)
219-
|| self.path.starts_with(Path::EXEC_PATH) =>
220-
{
221-
self.front = State::Body;
222-
return Some(Ok(PathComponent::Root));
223-
},
224229
None => {
225230
self.front = State::Body;
226231
},
@@ -342,12 +347,7 @@ impl<'a> DoubleEndedIterator for Components<'a> {
342347
return Some(Ok(PathComponent::Root));
343348
},
344349
other => {
345-
assert!(
346-
other.starts_with(Path::KERNEL_PATH)
347-
|| other.starts_with(Path::EXEC_PATH),
348-
"expected path in start state to be a valid path prefix, got '{other}'"
349-
);
350-
return Some(Ok(PathComponent::Root));
350+
return Some(Ok(PathComponent::Normal(other)));
351351
},
352352
}
353353
},
@@ -385,13 +385,7 @@ impl<'a> DoubleEndedIterator for Components<'a> {
385385
None => {
386386
self.back = State::Start;
387387
let component = self.path;
388-
if component.starts_with(Path::KERNEL_PATH)
389-
|| component.starts_with(Path::EXEC_PATH)
390-
{
391-
self.path = "::";
392-
} else {
393-
self.path = "";
394-
}
388+
self.path = "";
395389
self.back_pos = 0;
396390
if let Err(err) =
397391
Ident::validate(component).map_err(PathError::InvalidComponent)
@@ -557,6 +551,10 @@ mod tests {
557551
#[test]
558552
fn special_path() {
559553
let mut components = Iter::new("$kernel");
554+
assert_matches!(components.next(), Some(Ok(PathComponent::Normal("$kernel"))));
555+
assert_matches!(components.next(), None);
556+
557+
let mut components = Iter::new("::$kernel");
560558
assert_matches!(components.next(), Some(Ok(PathComponent::Root)));
561559
assert_matches!(components.next(), Some(Ok(PathComponent::Normal("$kernel"))));
562560
assert_matches!(components.next(), None);
@@ -566,13 +564,22 @@ mod tests {
566564
fn special_path_back() {
567565
let mut components = Iter::new("$kernel");
568566
assert_matches!(components.next_back(), Some(Ok(PathComponent::Normal("$kernel"))));
567+
assert_matches!(components.next_back(), None);
568+
569+
let mut components = Iter::new("::$kernel");
570+
assert_matches!(components.next_back(), Some(Ok(PathComponent::Normal("$kernel"))));
569571
assert_matches!(components.next_back(), Some(Ok(PathComponent::Root)));
570572
assert_matches!(components.next_back(), None);
571573
}
572574

573575
#[test]
574576
fn special_nested_path() {
575577
let mut components = Iter::new("$kernel::bar");
578+
assert_matches!(components.next(), Some(Ok(PathComponent::Normal("$kernel"))));
579+
assert_matches!(components.next(), Some(Ok(PathComponent::Normal("bar"))));
580+
assert_matches!(components.next(), None);
581+
582+
let mut components = Iter::new("::$kernel::bar");
576583
assert_matches!(components.next(), Some(Ok(PathComponent::Root)));
577584
assert_matches!(components.next(), Some(Ok(PathComponent::Normal("$kernel"))));
578585
assert_matches!(components.next(), Some(Ok(PathComponent::Normal("bar"))));
@@ -584,6 +591,11 @@ mod tests {
584591
let mut components = Iter::new("$kernel::bar");
585592
assert_matches!(components.next_back(), Some(Ok(PathComponent::Normal("bar"))));
586593
assert_matches!(components.next_back(), Some(Ok(PathComponent::Normal("$kernel"))));
594+
assert_matches!(components.next_back(), None);
595+
596+
let mut components = Iter::new("::$kernel::bar");
597+
assert_matches!(components.next_back(), Some(Ok(PathComponent::Normal("bar"))));
598+
assert_matches!(components.next_back(), Some(Ok(PathComponent::Normal("$kernel"))));
587599
assert_matches!(components.next_back(), Some(Ok(PathComponent::Root)));
588600
assert_matches!(components.next_back(), None);
589601
}

0 commit comments

Comments
 (0)