Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Jun 25, 2025

Description

This PR addresses YAML parsing edge cases and formatting issues:

  1. BOM and invisible character handling: Strips UTF-8/UTF-16 BOMs and cleans problematic invisible characters that can corrupt YAML parsing
  2. Fancy quote normalization: Converts smart quotes and special dashes to standard ASCII equivalents
  3. YAML line width control: Prevents automatic line wrapping by setting lineWidth: 0 and defaultStringType: 'PLAIN' options

Changes Made

CustomModesManager.ts

  • Added stripBOM() method to remove Byte Order Marks
  • Added cleanInvisibleCharacters() method to handle problematic Unicode characters
  • Added parseYamlSafely() method with enhanced error handling
  • Updated all yaml.stringify() calls to use { lineWidth: 0, defaultStringType: 'PLAIN' }

SimpleInstaller.ts & migrateSettings.ts

  • Updated yaml.stringify() calls to use the same options for consistency

Testing

  • Existing tests pass
  • Manual testing with files containing BOMs and special characters
  • Verified YAML output preserves user formatting without automatic line breaks

Notes

The YAML library may still convert between scalar styles (e.g., from folded >- to literal |-) when parsing and re-stringifying content. This is inherent to YAML normalization and ensures content integrity while minimizing formatting changes.

- Add BOM (Byte Order Mark) stripping for UTF-8 and UTF-16
- Normalize invisible characters including non-breaking spaces
- Replace fancy quotes and dashes with standard characters
- Remove zero-width characters that can cause parsing issues
- Add comprehensive test coverage for all edge cases

This fixes the YAML parsing limitations documented in PR #237 by
implementing proper preprocessing before parsing YAML content.
Copilot AI review requested due to automatic review settings June 25, 2025 03:39
@hannesrudolph hannesrudolph requested review from cte, jr and mrubens as code owners June 25, 2025 03:39
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. bug Something isn't working labels Jun 25, 2025
@delve-auditor
Copy link

delve-auditor bot commented Jun 25, 2025

We have finished reviewing your PR. We have found no vulnerabilities.

Reply to this PR with @delve-auditor followed by a description of what change you want and we'll auto-submit a change to this PR to implement it.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances YAML parsing in CustomModesManager by preprocessing content to strip BOMs and normalize problematic Unicode characters, updating error handling, and adding extensive edge-case tests.

  • Added private methods stripBOM and cleanInvisibleCharacters for content normalization.
  • Refactored parseYamlSafely to apply preprocessing and surface user-friendly errors.
  • Introduced a new test suite CustomModesManager.yamlEdgeCases.spec.ts covering BOMs, invisible characters, fancy quotes, dashes, error scenarios, and international characters.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/core/config/CustomModesManager.ts Implemented BOM stripping, invisible character cleaning, and updated YAML parsing with enhanced error reporting
src/core/config/tests/CustomModesManager.yamlEdgeCases.spec.ts Added comprehensive tests for YAML edge cases, including BOMs, invisible chars, quotes, dashes, errors, and UTF-8/Intl
Comments suppressed due to low confidence (2)

src/core/config/CustomModesManager.ts:76

  • Add @param and @returns annotations to the JSDoc for stripBOM, cleanInvisibleCharacters, and parseYamlSafely to clarify expected input and output types.
	/**

src/core/config/tests/CustomModesManager.yamlEdgeCases.spec.ts:389

  • Consider adding a test for files with Windows-style CRLF line endings to ensure the preprocessing handles all common line ending conventions.
			expect(modes[0].name).toBe("📝 Writing Mode")

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jun 25, 2025
- Fix BOM handling to correctly handle UTF-16 (all BOMs appear as \uFEFF when decoded)
- Optimize cleanInvisibleCharacters with single regex pass for better performance
- Prevent duplicate error messages by marking errors as already handled
- Refactor test file to use mockFsReadFile helper function to reduce duplication
- Fix YAML indentation in tests (use spaces instead of tabs)
- Add ESLint disable comment for character class warning (regex is correct)
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jun 25, 2025
Copy link
Collaborator Author

@hannesrudolph hannesrudolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review! I've addressed all the feedback and pushed the changes. All tests are passing and the implementation is now more robust and performant.

- Added lineWidth: 0 option to all yaml.stringify() calls
- Prevents automatic line wrapping at 80 characters
- Improves readability of YAML output for long strings
- Applied to CustomModesManager, SimpleInstaller, and migrateSettings
@hannesrudolph
Copy link
Collaborator Author

✅ Added the YAML line width fix as requested!

I've updated all yaml.stringify() calls to include { lineWidth: 0 } option, which prevents the YAML library from automatically breaking long lines at the default 80-character limit.

Changes made:

  • CustomModesManager.ts: 4 instances updated
  • SimpleInstaller.ts: 2 instances updated
  • migrateSettings.ts: 1 instance updated

This ensures that long strings in YAML output remain on a single line, improving readability as you requested.

- Added defaultStringType: 'PLAIN' to minimize formatting changes
- This helps preserve plain scalars when possible
- Works alongside lineWidth: 0 to prevent automatic line wrapping
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 25, 2025
@hannesrudolph
Copy link
Collaborator Author

I've pushed an additional update that adds defaultStringType: 'PLAIN' to all yaml.stringify() calls. This helps minimize formatting changes when possible.

However, I need to explain what's happening with the line breaks you're seeing:

  1. Your original YAML uses folded scalars (>-) which join multiple lines into a single line with spaces
  2. When the YAML library parses this, it creates a string with the lines joined
  3. When re-stringifying, if the content has actual newlines, the library uses literal scalars (|-) to preserve them

The lineWidth: 0 option prevents the library from adding NEW line breaks due to length, but it doesn't prevent the style conversion from folded (>-) to literal (|-) scalars.

To truly preserve exact YAML formatting without any changes would require avoiding parsing/re-stringifying altogether, but that's not possible when we need to modify the content (add/remove modes).

The current solution ensures:

  • No automatic line wrapping at 80 characters
  • Plain scalars are used when possible
  • User's manual line breaks are preserved

@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Jun 25, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Jun 25, 2025
- Move regex pattern to PROBLEMATIC_CHARS_REGEX static constant
- Add comprehensive documentation for each character range
- Improves maintainability and makes the pattern reusable
- Add test for mixed line endings (CRLF vs LF)
- Add test for multiple BOMs in sequence
- Add test for deeply nested structures with problematic characters
- Ensures robustness across different real-world scenarios
- Update CustomModesManager tests to expect translation keys
- Fix YAML edge case tests to match new i18n error messages
- All tests now pass with the i18n integration
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is awesome!

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 25, 2025
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Needs Review] in Roo Code Roadmap Jun 25, 2025
/**
* Strip BOM (Byte Order Mark) from the beginning of a string
*/
private stripBOM(content: string): string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a stripbom method elsewhere too - could we move to a shared utility file?

.map((issue) => `• ${issue.path.join(".")}: ${issue.message}`)
.join("\n")

vscode.window.showErrorMessage(t("common:customModes.errors.schemaValidationError", { issues }))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

- Replace custom stripBOM method with existing strip-bom package
- Fix duplicate error handling in parseYamlSafely by returning empty object instead of re-throwing
- Addresses review comments from PR #5099
@mrubens mrubens merged commit e559bee into main Jun 26, 2025
11 checks passed
@mrubens mrubens deleted the fix/yaml-parsing-edge-cases branch June 26, 2025 00:22
@github-project-automation github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Jun 26, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jun 26, 2025
Alorse pushed a commit to Alorse/Roo-Code that referenced this pull request Jun 27, 2025
…#5099)

* fix: handle YAML parsing edge cases in CustomModesManager

- Add BOM (Byte Order Mark) stripping for UTF-8 and UTF-16
- Normalize invisible characters including non-breaking spaces
- Replace fancy quotes and dashes with standard characters
- Remove zero-width characters that can cause parsing issues
- Add comprehensive test coverage for all edge cases

This fixes the YAML parsing limitations documented in PR RooCodeInc#237 by
implementing proper preprocessing before parsing YAML content.

* fix: address PR review comments

- Fix BOM handling to correctly handle UTF-16 (all BOMs appear as \uFEFF when decoded)
- Optimize cleanInvisibleCharacters with single regex pass for better performance
- Prevent duplicate error messages by marking errors as already handled
- Refactor test file to use mockFsReadFile helper function to reduce duplication
- Fix YAML indentation in tests (use spaces instead of tabs)
- Add ESLint disable comment for character class warning (regex is correct)

* fix: prevent YAML line breaks by setting lineWidth to 0

- Added lineWidth: 0 option to all yaml.stringify() calls
- Prevents automatic line wrapping at 80 characters
- Improves readability of YAML output for long strings
- Applied to CustomModesManager, SimpleInstaller, and migrateSettings

* fix: add defaultStringType option to yaml.stringify calls

- Added defaultStringType: 'PLAIN' to minimize formatting changes
- This helps preserve plain scalars when possible
- Works alongside lineWidth: 0 to prevent automatic line wrapping

* refactor: extract problematic characters regex as a named constant

- Move regex pattern to PROBLEMATIC_CHARS_REGEX static constant
- Add comprehensive documentation for each character range
- Improves maintainability and makes the pattern reusable

* test: add comprehensive edge case tests for YAML parsing

- Add test for mixed line endings (CRLF vs LF)
- Add test for multiple BOMs in sequence
- Add test for deeply nested structures with problematic characters
- Ensures robustness across different real-world scenarios

* feat(i18n): add error messages for custom modes in multiple languages

* fix: update tests to expect i18n keys instead of hardcoded strings

- Update CustomModesManager tests to expect translation keys
- Fix YAML edge case tests to match new i18n error messages
- All tests now pass with the i18n integration

* refactor: use strip-bom package and fix error handling

- Replace custom stripBOM method with existing strip-bom package
- Fix duplicate error handling in parseYamlSafely by returning empty object instead of re-throwing
- Addresses review comments from PR RooCodeInc#5099

---------

Co-authored-by: Daniel Riccio <[email protected]>
@HumanSpark
Copy link

Thanks folks

daniel-lxs added a commit that referenced this pull request Jun 27, 2025
- Add JSON fallback to parseYamlSafely for reading old JSON files
- Remove defaultStringType: 'PLAIN' from yaml.stringify to ensure JSON-compatible output
- Fixes issue introduced in #5099
hannesrudolph added a commit that referenced this pull request Jul 3, 2025
* fix: handle YAML parsing edge cases in CustomModesManager

- Add BOM (Byte Order Mark) stripping for UTF-8 and UTF-16
- Normalize invisible characters including non-breaking spaces
- Replace fancy quotes and dashes with standard characters
- Remove zero-width characters that can cause parsing issues
- Add comprehensive test coverage for all edge cases

This fixes the YAML parsing limitations documented in PR #237 by
implementing proper preprocessing before parsing YAML content.

* fix: address PR review comments

- Fix BOM handling to correctly handle UTF-16 (all BOMs appear as \uFEFF when decoded)
- Optimize cleanInvisibleCharacters with single regex pass for better performance
- Prevent duplicate error messages by marking errors as already handled
- Refactor test file to use mockFsReadFile helper function to reduce duplication
- Fix YAML indentation in tests (use spaces instead of tabs)
- Add ESLint disable comment for character class warning (regex is correct)

* fix: prevent YAML line breaks by setting lineWidth to 0

- Added lineWidth: 0 option to all yaml.stringify() calls
- Prevents automatic line wrapping at 80 characters
- Improves readability of YAML output for long strings
- Applied to CustomModesManager, SimpleInstaller, and migrateSettings

* fix: add defaultStringType option to yaml.stringify calls

- Added defaultStringType: 'PLAIN' to minimize formatting changes
- This helps preserve plain scalars when possible
- Works alongside lineWidth: 0 to prevent automatic line wrapping

* refactor: extract problematic characters regex as a named constant

- Move regex pattern to PROBLEMATIC_CHARS_REGEX static constant
- Add comprehensive documentation for each character range
- Improves maintainability and makes the pattern reusable

* test: add comprehensive edge case tests for YAML parsing

- Add test for mixed line endings (CRLF vs LF)
- Add test for multiple BOMs in sequence
- Add test for deeply nested structures with problematic characters
- Ensures robustness across different real-world scenarios

* feat(i18n): add error messages for custom modes in multiple languages

* fix: update tests to expect i18n keys instead of hardcoded strings

- Update CustomModesManager tests to expect translation keys
- Fix YAML edge case tests to match new i18n error messages
- All tests now pass with the i18n integration

* refactor: use strip-bom package and fix error handling

- Replace custom stripBOM method with existing strip-bom package
- Fix duplicate error handling in parseYamlSafely by returning empty object instead of re-throwing
- Addresses review comments from PR #5099

---------

Co-authored-by: Daniel Riccio <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer PR - Needs Review size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants