-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Make dumping work with the irM.instN
tagged InstId format
#6168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
Just to note, on this PR, |
I have used That said, I would be just fine with that if we omitted the ir from the id printing when it's from the current IR. I do want to do that, but it requires us to plumb File into Print, which I think we need to have a discussion about separate from just making dumping work again. |
Yes, that's right. The incorrect prefix seemed to be part of why you were confused here, and it confused me too, so I'd thought maybe it'd be worth the characters. |
I see, yeah the biggest confusion was that I thought it must be referring to a foreign IR if it was appearing in the id like that. But the id number didn't match the foreign file. I did not at all consider it could be referring to the current file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure, I have very strong concerns about the multiple different uses of "ir". Look for example at the YAML format in testdata -- line 91 and line 124 use "ir1" with different meaning. This seems like it'll be a significant debugging hindrance; I'm maybe too familiar with what the prefixes mean, but also do you expect people joining to be able to understand this?
Also, taking a step back to one thing you said earlier about "check_ir" feeling "long". Is the main use of this copy-paste from other debug output, or are you expecting you'll usually type it in from scratch?
Speaking for myself, I see three possible answers here:
- Change
IdTag
to matchCheckIRId
ImportIRId
-> "ir",CheckIRId
-> "check_ir",IdTag
-> "check_ir"
- Change
ImportIRId
andIdTag
to always be explicitImportIRId
-> "import_ir",CheckIRId
-> "check_ir",IdTag
-> "check_ir"
- Swap
ImportIRId
andCheckIRId
ImportIRId
-> "import_ir",CheckIRId
-> "ir",IdTag
-> "ir"
Personally, I'd favor (2) actually -- we should probably always clarify because confusion isn't worth the shorter name. (I could mainly see (3) if typing this out is a common use-case) But, partly noting this if it's worth further discussion (I assume this wasn't discussed in detail since the review was reassigned to me).
if m := re.fullmatch("([a-z_]+)(\\d+)", args[1]): | ||
if m[1] in id_types: | ||
# Look for <irN>.<type><id> as a single argument. | ||
if m := re.fullmatch("ir(\\d+)\\.([a-z_]+)(\\d+)", args[1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had you considered making this optional (e.g.. (ir(\\d+)\\.)?
) to make this a single regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arguments to the MakeInstId
function are different depending on if there's an ir or not. I guess I could look for the group being present and branch, but this feels at least as straightforward to me.
scripts/lldbinit.py
Outdated
a Parse::Context, or a Lex::TokenizeBuffer. | ||
EXPR is a C++ expression such as a variable name. Use `--` to prevent it from | ||
being treated as a TYPE and ID. | ||
IR is the CheckIRId(N) in the form `irN`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this mention that it only supports inst
? That seems new, and doesn't appear to be otherwise documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will become wrong in the future, so I don't feel that we should. Maybe I can say something about what the dumped ids look like instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps still document that it's a subset, and where to look for the supported list? Or change the approach, and just drop the IR on the floor when it's not used?
Right now reading TYPE
documentation says everything is allowed, so as long as it's a subset I think it's worth clarifying that the disagreement between TYPE
documentation and behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I have added some clarification and mentioned InstId without saying it's the only one that will be like this.
if len(args) != 3: | ||
# Look for <type><id> as a single argument. | ||
if m := re.fullmatch("([a-z_]+)(\\d+)", args[1]): | ||
if m[1] in untagged_id_types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you dropping support for dumping inst17
without ir.
prefixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That form does not work anymore, unfortunately. At the moment, trying to dump an inst id just crashes. The InstId needs an ir index to add as a tag.
While we could add the current file's ir index to it, at the moment we're printing them as irN.instM
so I don't see a lot of point in consuming them in a different way.
I want to change that but it's something beyond just getting dumping working again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be expecting that it's always irN
, but it's conditional. See https://github.com/carbon-language/carbon-lang/blob/trunk/toolchain/sem_ir/ids.cpp#L21 or the example https://github.com/carbon-language/carbon-lang/blob/trunk/toolchain/check/testdata/basics/raw_sem_ir/multifile.carbon#L60.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's either inst<none>
or irN.instM
, but you can't dump a none, so for our purposes here it's always irN.instM
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you look at the link?
// CHECK:STDOUT: inst14: {kind: Namespace, arg0: name_scope0, arg1: inst<none>, type: type(inst(NamespaceType))}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I did.. I was looking at the rhs at arg1. I don't know how we get an inst with a none for the CheckIRId like that, hm...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that inst14
is PackageInstId
:
carbon-lang/toolchain/sem_ir/typed_insts.h
Line 1243 in 49213b1
static constexpr InstId PackageInstId = InstId(SingletonInstKinds.size()); |
It's.. like a singleton but isn't a singleton. It is the only inst like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this +1
is what prevents that InstId from getting tagged on Add as well:
carbon-lang/toolchain/sem_ir/file.cpp
Line 39 in 49213b1
insts_(this, SingletonInstKinds.size() + 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah.. there's more InstIds constructed without a tag here, inside formatting, which I am struggling to understand at the moment:
carbon-lang/toolchain/sem_ir/formatter.cpp
Lines 1239 to 1251 in 49213b1
InstId stripped_inst_id( | |
import_ir.sem_ir->insts().GetRawIndex(import_ir_inst.inst_id())); | |
switch (loc_id.kind()) { | |
case LocId::Kind::None: { | |
out_ << stripped_inst_id << " [no loc]"; | |
break; | |
} | |
case LocId::Kind::ImportIRInstId: { | |
// TODO: Probably don't want to format each indirection, but maybe | |
// reuse GetCanonicalImportIRInst? | |
out_ << stripped_inst_id << " [indirect]"; | |
break; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These special cases are removed in #6173
I thought about 3 as well after I left work last night. I think it might be the best option, because yeah sometimes you do have to type things - I tended to type If possible I would like to separate all of these concerns from getting debugging/dumping functioning again though. Right now, we are printing inst ids that we can not dump, because constructing an inst id in the debugger does so without a tag, and the functions have no way to provide the ir id for the tag. Will address the other feedback as well, and will reassign this review to you (thanks for looking at it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
scripts/lldbinit.py
Outdated
a Parse::Context, or a Lex::TokenizeBuffer. | ||
EXPR is a C++ expression such as a variable name. Use `--` to prevent it from | ||
being treated as a TYPE and ID. | ||
IR is the CheckIRId(N) in the form `irN`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will become wrong in the future, so I don't feel that we should. Maybe I can say something about what the dumped ids look like instead.
if m := re.fullmatch("([a-z_]+)(\\d+)", args[1]): | ||
if m[1] in id_types: | ||
# Look for <irN>.<type><id> as a single argument. | ||
if m := re.fullmatch("ir(\\d+)\\.([a-z_]+)(\\d+)", args[1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arguments to the MakeInstId
function are different depending on if there's an ir or not. I guess I could look for the group being present and branch, but this feels at least as straightforward to me.
if len(args) != 3: | ||
# Look for <type><id> as a single argument. | ||
if m := re.fullmatch("([a-z_]+)(\\d+)", args[1]): | ||
if m[1] in untagged_id_types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That form does not work anymore, unfortunately. At the moment, trying to dump an inst id just crashes. The InstId needs an ir index to add as a tag.
While we could add the current file's ir index to it, at the moment we're printing them as irN.instM
so I don't see a lot of point in consuming them in a different way.
I want to change that but it's something beyond just getting dumping working again.
I have done the third one in #6172 |
As of #5997, InstId now contains a tag of the
CheckIRId
inside it, which works fine for debugging when dumping a C++ variable, but did not work anymore when trying to dump other InstIds that had been dumped. Teach lldb'sdump
command to pull apart their1.inst2
format and change theMakeInstId
methods in dump.cpp to take theCheckIRId
as an input. Since you could now possibly dump inst ids from another IR, provide an overload inCheck::
that allows for this. And CHECK inSemIR::
if you try to dump something from the wrong IR.Now this works again: