Skip to content

Conversation

@ratmice
Copy link
Collaborator

@ratmice ratmice commented Mar 21, 2025

So here is a similar thing to PR #476 except instead of adding an empty symbol, it adds an
Option<Span> to Production.

the assertion in nimbleparse should now be okay, but only because nimbleparse will always produce a grammar from a parser which ensures that either symbols has a symbol, containing a span or empty_span is Some.

That is, this isn't true in general from an AST built using the public constructors.

4| Expr: %empty | Factor ;
         ------- Reduced productions

4| Expr: Factor | %empty ;
                  ------- Reduced productions

4| Expr: Factor | ;
                  - Reduced productions

4| Expr: | Factor ;
         - Reduced productions

This entails adding a span argument to add_prod and the Productions struct.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 21, 2025

The error messages, and underlining is not great. It just just kind of the position we are at when we finally can figure out that we have no symbol. E.g. underlining the ; or the |.

Seems like it should be possible to get rid of the whitespace though in the %empty case.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 21, 2025

There, that is better:

4| Expr: %empty | Factor ;
         ------ Reduced productions

4| Expr: Factor | %empty ;
                  ------ Reduced productions

@ltratt
Copy link
Member

ltratt commented Mar 21, 2025

This looks like a good start!

Can we simplify this so we always give Production a Span? There might be other reasons later for people to want to know the Span of non-empty productions, so the simpler API might be the more powerful one!

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 21, 2025

Edit: Some of what I originally said here was based on an incorrect reading of parse_rules.
But trying it out, it is still pretty hairy.

I think the ideal value for the Production's span is something like:
Production.span == Span::new(symbol.spans.minimum_start() , symbol.spans.maximum_end())

For the case where symbols is not empty, not even close but will keep trying.

@ltratt
Copy link
Member

ltratt commented Mar 21, 2025

I think the ideal value for the Production's span is something like:
Production.span == Span::new(symbol.spans.minimum_start() , symbol.spans.maximum_end())

Agreed, that would be great if possible! Looking at yacc/parser.rs I think the crucial bit might be:

while i < self.src.len() {
    if let Some(j) = self.lookahead_is("|", i) {
	self.ast.add_prod(rn.clone(), syms, prec, action);
	syms = Vec::new();
	prec = None;
	action = None;
	i = self.parse_ws(j, true)?;
	continue;
    } else if let Some(j) = self.lookahead_is(";", i) {
	self.ast.add_prod(rn, syms, prec, action);
	return Ok(j);
    }
   ...

That's mostly at the start of the loop to deal with R: | ... i.e. an empty first production AFAICS. We can fairly easily expand that code a bit if we need to.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 21, 2025

I think the ideal value for the Production's span is something like:
Production.span == Span::new(symbol.spans.minimum_start() , symbol.spans.maximum_end())

Agreed, that would be great if possible! Looking at yacc/parser.rs I think the crucial bit might be:

One of the things that is making this specific thing difficult is that a Symbol::Token's span points inside the quotes.
It seems as though that might be the issue I'm running into at least, in trying to hit that ideal for cases where symbols isn't empty.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 21, 2025

So 94622a2 is as close as I've gotten.

It is quite often +1 or -1 when the production starts or ends with a Symbol::Token because

Rule: "a";
       ^ symbol span between quotes
      ^^^ production span overlapping quotes.

And just for clarity, here's permutations of the examples again which cause the assertion on the latest version of the patch:

4| Expr: %empty | Factor ;
         ------ Reduced productions

4| Expr: | Factor ;
         - Reduced productions

4| Expr: Factor | ;
                  - Reduced productions
4| Expr: Factor | %empty ;
                  ------ Reduced productions
4| Expr: Factor |;
                 - Reduced productions
4| Expr:|Factor;
        - Reduced productions

The single character underlines pointing at the last character in the sequences " ;"... " |"... "|;"... and ":|" are spans pointing between the characters.
I'm not sure I can get much better than that with the limitations of ascii?

@ratmice ratmice mentioned this pull request Mar 21, 2025
@ltratt
Copy link
Member

ltratt commented Mar 21, 2025

This feels like a significant improvement! Do we have tests for all the cases you've listed above?

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 21, 2025

Kind of, except for the last two, which are just whitespace variations,
the first four are from the following test.

#[test]
fn test_empty_production_spans_issue_473

Except that output is when run through nimbleparse, but we don't really have any systematic way to test nimbleparse output yet. So that test just tests that the spans which drive nimbleparse return the expected results.

I suppose I should/need to add the last two variants.

@ltratt
Copy link
Member

ltratt commented Mar 21, 2025

If we get those two tests in, I think we'll be ready to squash.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 21, 2025

Should be fixed in 09d9efb

@ltratt
Copy link
Member

ltratt commented Mar 21, 2025

Please squash.

@ratmice ratmice force-pushed the issue_473_assertion_take2 branch from 09d9efb to e854d81 Compare March 21, 2025 23:29
@ratmice
Copy link
Collaborator Author

ratmice commented Mar 21, 2025

Oops, typos.

This provides a span for `Production`, even when the prod is
empty. The production will span all it's `Symbols`. Previously when
a production was empty, or `%empty` it had no Symbols and subsequently
no span. Now an empty production will at least have a zero length span
on the production.
@ratmice ratmice force-pushed the issue_473_assertion_take2 branch from e854d81 to d1ff8d9 Compare March 21, 2025 23:31
@ratmice
Copy link
Collaborator Author

ratmice commented Mar 21, 2025

Squashed.

@ltratt ltratt added this pull request to the merge queue Mar 21, 2025
Merged via the queue into softdevteam:master with commit ea448ef Mar 21, 2025
2 checks passed
@ratmice ratmice deleted the issue_473_assertion_take2 branch March 22, 2025 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants