- 
                Notifications
    You must be signed in to change notification settings 
- Fork 81
Add string format tools to library. #2373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@           Coverage Diff           @@
##              main   #2373   +/-   ##
=======================================
  Coverage       47%     47%           
- Complexity    6560    6571   +11     
=======================================
  Files          780     780           
  Lines        64398   64398           
  Branches      9628    9628           
=======================================
+ Hits         30509   30517    +8     
+ Misses       31569   31555   -14     
- Partials      2320    2326    +6     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Just a few remarks and questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @jurgenvinju since he has in the past done some more stuff around formatting & indentation.
I have the feeling we're missing something in this PR, a feature we already have in rascal. (aside from the fact I don't really like all these regexps and char loops)
| } | ||
| @synopsis{Split a string to an indentation prefix and the remainder of the string.} | ||
| tuple[str indentation, str rest] splitIndentation(/^<indentation:\s*><rest:.*>/) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the string contains multiple lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should be an explicit exception (invalid argument or similar)
| int count = size(findAll(input, nl)); | ||
| linesepCounts[nl] = count; | ||
| // subtract all occurrences of substrings of newline characters that we counted before | ||
| for (str snl <- substrings(nl), linesepCounts[snl]?) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this almost looks like pattern matching on strings? (which we only have reasonable support over)
for example:
rascal>visit("abcd") { case str m : println(m); }
abcd
bcd
cd
d
come to think of it, this whole function smells like an parsing automata. Where we build a big state table of all the possible matches and then iterate through all the chars and count the matches based on their state.
In java this would be 20/30 lines, but in rascal we might be missing some primitives (as we don't have a character loop).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we hard-code the set of newline characters (e.g. to all Unicode newline chars), we could write it as a grammar and use the parser generator. Downside (as we discussed) is that all (transitive) imports of this module will trigger generation of a parser. We could also move some of those to a specific Format module.
fa8fe8b    to
    836f117      
    Compare
  
    836f117    to
    8d8de3e      
    Compare
  
    
This PR adds many generic tools that can be used to process/format strings.
TODO