Skip to content

Canonicalise URL IDs on all platforms#1082

Open
lionel- wants to merge 3 commits intomainfrom
bugfix/canonical-paths
Open

Canonicalise URL IDs on all platforms#1082
lionel- wants to merge 3 commits intomainfrom
bugfix/canonical-paths

Conversation

@lionel-
Copy link
Contributor

@lionel- lionel- commented Mar 4, 2026

Fixes breakpoint tests skipped in posit-dev/positron#12119. The frontend tests run from a temporary folder, and the way we handled URIs caused verification mismatches because temp paths go through symlinks on macOS.

File URIs for the same file arrive from four independent sources, each with its own representation: DAP (file paths from the frontend), LSP (editor URIs), execute requests (editor document model URIs), and the R runtime (paths that went through normalizePath(), resolving symlinks). These must all agree on file identity because breakpoints set via DAP are looked up in a HashMap keyed by URI when code is executed or sourced, and invalidated when documents change via LSP.

Previously, ExtUrl::normalize() only handled the Windows encoding issue (file:///c%3A/... vs file:///C:/...) and was a no-op on other platforms. ExtUrl::from_file_path() did resolve symlinks, but from_url paths (LSP, execute requests) did not. On macOS, /var/folders is a symlink to /private/var/folders, and tempfile::tempdir() creates directories under /var/folders. So breakpoints set via DAP (which canonicalized through from_file_path) would use /private/var/... as the key, while execute request URIs (which came straight from the editor) would look up /var/... and miss.

New UrlId type

This PR replaces ExtUrl::normalize() with a UrlId type that canonicalizes file URIs on all platforms. UrlId wraps a Url that has been through std::fs::canonicalize() to resolve symlinks and round-tripped through the filesystem path to normalize encoding. It also makes sure the drive letter on Windows is uppercased.

UrlId is constructed at every entry point where a URI will be used as an identity key:

  • UrlId::from_file_path() for DAP SetBreakpoints paths
  • UrlId::from_url() for LSP didChange URIs
  • UrlId::from_code_location() for execute request URIs
  • UrlId::parse() for URI strings from R callbacks

The breakpoints HashMap in Dap is now keyed by UrlId instead of Url, and all lookup/mutation sites use UrlId. When the file doesn't exist on disk, canonicalization falls back to the original URI.

Canonical URIs must not leak back to R or the frontend. An early version of this branch canonicalized URIs everywhere, which caused the frontend to open files in new editors instead of reusing existing ones (the frontend didn't recognise /private/var/... as the same file it had open as /var/...). So UrlId is strictly for internal identity. In console_annotate.rs, the raw Url is kept for embedding in R code (#line directives, breakpoint calls) while UrlId is used only for HashMap lookups.

The LSP only uses UrlId at the boundary where it notifies the console about document changes for breakpoint invalidation. Its own internal document state is still keyed by raw Url. We might want to switch the LSP internals to UrlId in the future if symlink mismatches become an issue there too, or simply for internal consistency.

@lionel- lionel- force-pushed the bugfix/canonical-paths branch from aea831a to e645459 Compare March 4, 2026 12:31
},
};

let uri = Url::from_file_path(&path).unwrap_or(uri);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log when this fails?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants