Add string format tools to library. #2373

toinehartman · 2025-08-27T15:16:21Z

This PR adds many generic tools that can be used to process/format strings.

TODO

Extensively document (once we figured out which ones we will actually keep)

codecov · 2025-08-27T15:21:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 47%. Comparing base (875fe06) to head (8d8de3e).

Additional details and impacted files

@@           Coverage Diff           @@
##              main   #2373   +/-   ##
=======================================
  Coverage       47%     47%           
- Complexity    6560    6571   +11     
=======================================
  Files          780     780           
  Lines        64398   64398           
  Branches      9628    9628           
=======================================
+ Hits         30509   30517    +8     
+ Misses       31569   31555   -14     
- Partials      2320    2326    +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

rodinaarssen

Cool! Just a few remarks and questions.

src/org/rascalmpl/library/String.rsc

DavyLandman

Pinging @jurgenvinju since he has in the past done some more stuff around formatting & indentation.

I have the feeling we're missing something in this PR, a feature we already have in rascal. (aside from the fact I don't really like all these regexps and char loops)

DavyLandman · 2025-08-29T11:04:23Z

src/org/rascalmpl/library/String.rsc

+}
+
+@synopsis{Split a string to an indentation prefix and the remainder of the string.}
+tuple[str indentation, str rest] splitIndentation(/^<indentation:\s*><rest:.*>/)


what if the string contains multiple lines?

I think that should be an explicit exception (invalid argument or similar)

DavyLandman · 2025-08-29T11:12:21Z

src/org/rascalmpl/library/String.rsc

+        int count = size(findAll(input, nl));
+        linesepCounts[nl] = count;
+        // subtract all occurrences of substrings of newline characters that we counted before
+        for (str snl <- substrings(nl), linesepCounts[snl]?) {


this almost looks like pattern matching on strings? (which we only have reasonable support over)

for example:

rascal>visit("abcd") { case str m : println(m); } abcd bcd cd d

come to think of it, this whole function smells like an parsing automata. Where we build a big state table of all the possible matches and then iterate through all the chars and count the matches based on their state.

In java this would be 20/30 lines, but in rascal we might be missing some primitives (as we don't have a character loop).

If we hard-code the set of newline characters (e.g. to all Unicode newline chars), we could write it as a grammar and use the parser generator. Downside (as we discussed) is that all (transitive) imports of this module will trigger generation of a parser. We could also move some of those to a specific Format module.

toinehartman requested a review from DavyLandman August 27, 2025 15:16

toinehartman self-assigned this Aug 27, 2025

toinehartman added enhancement library labels Aug 27, 2025

DavyLandman requested a review from jurgenvinju August 27, 2025 15:18

toinehartman mentioned this pull request Aug 27, 2025

Parametric: formatting usethesource/rascal-language-servers#677

Draft

rodinaarssen requested changes Aug 28, 2025

View reviewed changes

DavyLandman reviewed Aug 29, 2025

View reviewed changes

toinehartman force-pushed the feature/format-tools branch from fa8fe8b to 836f117 Compare September 4, 2025 07:36

toinehartman added 5 commits September 4, 2025 12:47

Add string format tools to library.

1fbeaf9

Document substrings pitfalls.

9c39584

Small fixes (h/t @rodinaarssen)

9754bd4

Optionally include empty last line when string ends with newline.

4be1f35

Optimize substrings for performance (h/t @rodinaarssen)

8d8de3e

toinehartman force-pushed the feature/format-tools branch from 836f117 to 8d8de3e Compare September 4, 2025 10:47

toinehartman marked this pull request as draft October 28, 2025 08:23

Add string format tools to library. #2373

Are you sure you want to change the base?

Add string format tools to library. #2373

Uh oh!

Conversation

toinehartman commented Aug 27, 2025

Uh oh!

codecov bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rodinaarssen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DavyLandman left a comment

Choose a reason for hiding this comment

Uh oh!

DavyLandman Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

toinehartman Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

DavyLandman Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

toinehartman Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Aug 27, 2025 •

edited

Loading

toinehartman Sep 8, 2025 •

edited

Loading