Skip to content

Conversation

@NWilson
Copy link
Member

@NWilson NWilson commented Nov 25, 2024

This is a cosmetic change in the tests. It looks much better.

I've also fixed a highly confusing comment in printint.c which suggests that the character map for XCLASS behaves differently to NCLASS (it doesn't now - perhaps it did previously).

@zherczeg
Copy link
Collaborator

It looks better, but we loose the "nclass" information from the output. Could we keep it some way? (Note: classes can contain ^ as a letter).

@zherczeg
Copy link
Collaborator

  re> /[\^a]/B
------------------------------------------------------------------
        Bra
        [^a]
        Ket
        End
------------------------------------------------------------------

@NWilson NWilson force-pushed the user/niwilson/nclass-print branch from 0e2bf18 to 77dbf7c Compare November 25, 2024 12:26
@NWilson
Copy link
Member Author

NWilson commented Nov 25, 2024

Well-spotted. This is an existing bug for XCLASS printing - and worse, even backslash isn't escaped currently!

I've fixed those cases too (and added tests).

@NWilson NWilson force-pushed the user/niwilson/nclass-print branch from 77dbf7c to 913ca2e Compare November 25, 2024 12:29
@zherczeg
Copy link
Collaborator

I think [] printing has never been intended to be accurate, just debug data. Since eclasses are introduced, I would prefer ![] more for negation, since we could use it for [^a] (OP_NOTCHAR) as well. This way it would be obvious that "something happens behind" even for a new contributor. This is just my thoughts. What others think?

@NWilson
Copy link
Member Author

NWilson commented Nov 25, 2024

You're saying that, because we use "[^a]" to represent OP_NOT, you want the output for an OP_NCLASS to be distinguishable. That makes sense. And we're using the "!" notation in eclasses, so that's a consistent notation to use for OP_NCLASS.

I am half-way through the optimisation work. You might be happy to know that the OP_ECLASS_NOT code will disappear when I'm finished (it will always be constant-folded in to atomic classes within the eclass).

@zherczeg
Copy link
Collaborator

Yes. The purpose of /B is debugging, and less constructs have the same text representation is the better.

Removing OP_ECLASS_NOT is good news indeed, nice work! How do you encode [^[\pC]&&[\pL]] without it?

@NWilson
Copy link
Member Author

NWilson commented Nov 25, 2024

How do you encode [^[\pC]&&[\pL]] without it?

We know that !(A && B) = !A || !B. So we can distribute the NOT over the AND and OR (and SUB and XOR), and flip the operands and flip the operators.

@NWilson
Copy link
Member Author

NWilson commented Nov 25, 2024

Actually... I think I'd rather keep it so that NCLASS prints as [^foo], and change it so that OP_NOT prints as [^x] (not). If we want to make them look separate, then OP_NCLASS should use ordinary class notation, and should be OP_NOT which changes to make it distinguishable.

@NWilson NWilson force-pushed the user/niwilson/nclass-print branch from 913ca2e to a3fa1f1 Compare November 25, 2024 20:04
@NWilson NWilson force-pushed the user/niwilson/nclass-print branch from a3fa1f1 to e24fded Compare November 26, 2024 09:06
Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zherczeg zherczeg merged commit 04ab344 into PCRE2Project:master Nov 26, 2024
19 checks passed
@NWilson NWilson deleted the user/niwilson/nclass-print branch December 17, 2024 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants