Skip to content

Conversation

@mzuenni
Copy link
Collaborator

@mzuenni mzuenni commented Nov 17, 2025

solves #312
@thorehusfeldt do you mind adjusting the schemas?

@mzuenni mzuenni requested a review from mpsijm November 17, 2025 21:58
@mzuenni mzuenni marked this pull request as ready for review November 19, 2025 16:22
@RagnarGrootKoerkamp
Copy link
Owner

Should we also allow this on the answer files? to ensure a testcase is (im)possible as intended.

@thorehusfeldt
Copy link
Collaborator

Consider bumping the generator framework version.

The sample generators.yaml script linked from doc might want to include

version: 2025-12 

and the default in the CUE schema generators key (and presumably the JSON file) updated accordingly.

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 3, 2025

@RagnarGrootKoerkamp :

Should we also allow this on the answer files?

But ans: possible and ans: impossible can already be specified. I guess what you are proposing would be to support “make sure the answer is not impossible", for instance (by

ans-match: \d+

Hm. I find this useful, but now it’s getting ugly.

An idea for syntax that is consistent with the current proposal:

match:
 in: foo
 ans: bar
---
match:
  in: [42, forty-two] 
  ans: bar
---
match: \w+\s\w+ # same as match: { in: \w+\s\w+ }
---
# same as match: { in: [42, forty-two] }:
match:
  - 42
  - forty-two

I guess the schema is

match: string | [...string] | {
  in: string | [...string]
  ans: string | [...string]
}

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 3, 2025

even though this might be less yaml like i would prefer to not nest these and go for something like in.match or match.in?

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 4, 2025

Just be make sure I was clear: I propose to retain

match: \d+

as a valid expression, and expect it to be the widest-used form. I propose that the above is the same as

match:
  in: \d+

(which, thanks to standard YAML syntax, can also be written as a one-liner, match: { in: \d+ }, but which is not the same as match.in: \d+. )

The situation in which the “mapping” form would mainly arise is when you want to specify something about ans, like “the answer is not impossible”. Not sure what kinds of conventions will arise among authors, but here are some suggestions:

match: { ans: ^[^i] }
---
match: { ans: \d+ }
---
match: { ans: ^(?!impossible$).* }

I would advise against introducing more keys in the top-level mapping (such asmatch, match.in, and match.ans); tool support for YAML is just better when we stick to YAML conventions.


The main alternatives I can see to my proposal would be to add pattern to in and ans. (I use pattern here instead of match just to keep the proposals syntactically separate.)

[in|ans]: string | {
  value: string
  pattern: string | [...string]
}

so you’d have expressions like this:

generate: make_random_tree -n 100 --balanced {seed:0}
in:
  pattern: \d+
ans: impossible

This doesn’t smell right to me, but it’s just a hunch.

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 4, 2025

I notice that we already have a plethora of stuff, namely

["in" | "in.statement" | "in.download" |
    "ans" | "ans.statement" | "ans.download" |
    "out"]: string

The current semantics is that the key: value pair means "<testcasename>.<key> must equal value". What we’re looking for in the current proposal is a semantics that says "<testcasename>.<key> should obey constraint".

This is a case against introducing keys like in.match, by the way. You’d need ans.statement.match etc.

My hunch is that the cleanest way is to enrich the right-hand side, instead of introducing more left-hand sides of such expressions.

I think what I’m saying is

let extension = "in" | "in.statement" | "in.download" |  "ans" | "ans.statement" | "ans.download" | "out"
[extension]:  string # as we have now
match: string | { [extension]: string } # default string same as { in: string }

allowing

in: foo
match:
  ans.statement: \d\w+ 

Alternatively,

let extension = "in" | "in.statement" | "in.download" |  "ans" | "ans.statement" | "ans.download" | "out"
[extension]:  string  | { match: string }
in: foo
ans.statement:
  match: \d\w+ 

Dream state

The dream state would be what CUE already supports out-of-the box:

ans: "impossible"  # ans must equal impossible
---
in: number & >0 # in must be a number, and strictly larger than 0
---
ans: "yes" | "no" # ans  must be either "yes" or "no"
in.statement: =~"^\w\w$" # in.statement has two letters
in: !~"impossible" # in does not contain impossible
in: in.statement # in and in.statement are identical

In other words, there’s a whole grammar on the right hand side supporting |, &, literal match, and =~ and !~ for regex match and unmatch.

Note that explicit creation and constraint checking are the same: CUE just unifies everything it knows about, say .in (including whatever copy or generate may have produced) and expects the result to be a singleton. Otherwise it complains. Specifying a constraint is the same as specifying a value (the latter is just a constraint with a singleton valid instantiation.)

This would be sah-weet!

@RagnarGrootKoerkamp
Copy link
Owner

Interesting idea to do in: {match: ...}, sounds reasonable as well to me, but no strong opinion either way.

Should it be matches instead of match maybe? As in the .ans matches X Y Z.

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 4, 2025

The main alternatives I can see to my proposal would be to add pattern to in and ans. (I use pattern here instead of match just to keep the proposals syntactically separate.)

I don't like that, also feels weird in combination with generated testcases...

["in" | "in.statement" | "in.download" |
   "ans" | "ans.statement" | "ans.download" |
   "out"]: string

I dont think we need this for something else as .in and .ans since this is only intended to additionally check generated files. The others are already hardcoded typically?

and =~ and !~ for regex match and unmatch.

Unmatch would certainly be nice...

match: string | [...string] | {
  in: string | [...string]
  ans: string | [...string]
}

I am fine with that, even though I like to not nest things... :D

The question is if/how we want to support unmatch than?

@thorehusfeldt
Copy link
Collaborator

thorehusfeldt commented Dec 4, 2025

The question is if/how we want to support unmatch than?

The current proposal already supports “unmatching”, since regexen support that. Here are the three examples from upthread again, for a problem with output impossible or some numbers:

match: { ans: ^[^i] }
---
match: { ans: \d+ }
---
match: { ans: ^(?!impossible$).* } 

CUE of course would make this nicer to look at:

ans: !~"impossible"

@mzuenni
Copy link
Collaborator Author

mzuenni commented Dec 4, 2025

match: { ans: ^(?!impossible$).* }

I don't think that one is right? (\A(?!.*^impossible$).*\Z would work but is not very nice...) we could say that if the string starts with ! we do unmatch and if it starts with = we do a match

@thorehusfeldt
Copy link
Collaborator

The only thing I’m unsure about for my negative lookahead regex is what do to with a possibly trailing newline. (I don’t understand the specification well enough.) So maybe it should be ^(?!impossible).* Otherwise I’m pretty sure it’s fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants