Skip to content

Conversation

danakj
Copy link
Contributor

@danakj danakj commented Oct 6, 2025

As of #5997, InstId now contains a tag of the CheckIRId inside it, which works fine for debugging when dumping a C++ variable, but did not work anymore when trying to dump other InstIds that had been dumped. Teach lldb's dump command to pull apart the ir1.inst2 format and change the MakeInstId methods in dump.cpp to take the CheckIRId as an input. Since you could now possibly dump inst ids from another IR, provide an overload in Check:: that allows for this. And CHECK in SemIR:: if you try to dump something from the wrong IR.

Now this works again:

(lldb) dump context ir1.inst21
ir1.inst21: {kind: ClassType, arg0: class0, arg1: specific<none>, type: type(TypeType)}
  - type: type(TypeType): type; {kind: TypeType, type: type(TypeType)}
  - value: concrete_constant(ir1.inst21)
  - loc: LocId(<none>)

@danakj danakj requested a review from a team as a code owner October 6, 2025 19:12
@danakj danakj requested review from josh11b and removed request for a team October 6, 2025 19:12
@jonmeow
Copy link
Contributor

jonmeow commented Oct 6, 2025

Just to note, on this PR, CheckIRId uses "check_ir" and ImportIRId uses "ir". It sounds like you're saying InstId/IdTag is using "ir" for CheckIRId; had you considered aligning the prefix as part of adding a dependency on which it is?

@danakj
Copy link
Contributor Author

danakj commented Oct 6, 2025

Just to note, on this PR, CheckIRId uses "check_ir" and ImportIRId uses "ir". It sounds like you're saying InstId/IdTag is using "ir" for CheckIRId; had you considered aligning the prefix as part of adding a dependency on which it is?

I have used check_ir_id in the variable names I am introducing. Are you suggesting check_ir1.inst2 as the dump format? That feels like it's getting long to me...

That said, I would be just fine with that if we omitted the ir from the id printing when it's from the current IR. I do want to do that, but it requires us to plumb File into Print, which I think we need to have a discussion about separate from just making dumping work again.

@jonmeow
Copy link
Contributor

jonmeow commented Oct 6, 2025

Yes, that's right. The incorrect prefix seemed to be part of why you were confused here, and it confused me too, so I'd thought maybe it'd be worth the characters.

@josh11b josh11b requested review from jonmeow and removed request for josh11b October 6, 2025 20:24
@danakj
Copy link
Contributor Author

danakj commented Oct 6, 2025

Yes, that's right. The incorrect prefix seemed to be part of why you were confused here, and it confused me too, so I'd thought maybe it'd be worth the characters.

I see, yeah the biggest confusion was that I thought it must be referring to a foreign IR if it was appearing in the id like that. But the id number didn't match the foreign file. I did not at all consider it could be referring to the current file.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be sure, I have very strong concerns about the multiple different uses of "ir". Look for example at the YAML format in testdata -- line 91 and line 124 use "ir1" with different meaning. This seems like it'll be a significant debugging hindrance; I'm maybe too familiar with what the prefixes mean, but also do you expect people joining to be able to understand this?

Also, taking a step back to one thing you said earlier about "check_ir" feeling "long". Is the main use of this copy-paste from other debug output, or are you expecting you'll usually type it in from scratch?

Speaking for myself, I see three possible answers here:

  1. Change IdTag to match CheckIRId
    • ImportIRId -> "ir", CheckIRId -> "check_ir", IdTag -> "check_ir"
  2. Change ImportIRId and IdTag to always be explicit
    • ImportIRId -> "import_ir", CheckIRId -> "check_ir", IdTag -> "check_ir"
  3. Swap ImportIRId and CheckIRId
    • ImportIRId -> "import_ir", CheckIRId -> "ir", IdTag -> "ir"

Personally, I'd favor (2) actually -- we should probably always clarify because confusion isn't worth the shorter name. (I could mainly see (3) if typing this out is a common use-case) But, partly noting this if it's worth further discussion (I assume this wasn't discussed in detail since the review was reassigned to me).

if m := re.fullmatch("([a-z_]+)(\\d+)", args[1]):
if m[1] in id_types:
# Look for <irN>.<type><id> as a single argument.
if m := re.fullmatch("ir(\\d+)\\.([a-z_]+)(\\d+)", args[1]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had you considered making this optional (e.g.. (ir(\\d+)\\.)?) to make this a single regex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments to the MakeInstId function are different depending on if there's an ir or not. I guess I could look for the group being present and branch, but this feels at least as straightforward to me.

a Parse::Context, or a Lex::TokenizeBuffer.
EXPR is a C++ expression such as a variable name. Use `--` to prevent it from
being treated as a TYPE and ID.
IR is the CheckIRId(N) in the form `irN`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this mention that it only supports inst? That seems new, and doesn't appear to be otherwise documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will become wrong in the future, so I don't feel that we should. Maybe I can say something about what the dumped ids look like instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps still document that it's a subset, and where to look for the supported list? Or change the approach, and just drop the IR on the floor when it's not used?

Right now reading TYPE documentation says everything is allowed, so as long as it's a subset I think it's worth clarifying that the disagreement between TYPE documentation and behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I have added some clarification and mentioned InstId without saying it's the only one that will be like this.

if len(args) != 3:
# Look for <type><id> as a single argument.
if m := re.fullmatch("([a-z_]+)(\\d+)", args[1]):
if m[1] in untagged_id_types:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you dropping support for dumping inst17 without ir. prefixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That form does not work anymore, unfortunately. At the moment, trying to dump an inst id just crashes. The InstId needs an ir index to add as a tag.

While we could add the current file's ir index to it, at the moment we're printing them as irN.instM so I don't see a lot of point in consuming them in a different way.

I want to change that but it's something beyond just getting dumping working again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's either inst<none> or irN.instM, but you can't dump a none, so for our purposes here it's always irN.instM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look at the link?

https://github.com/carbon-language/carbon-lang/blob/trunk/toolchain/check/testdata/basics/raw_sem_ir/multifile.carbon#L60

// CHECK:STDOUT:     inst14:          {kind: Namespace, arg0: name_scope0, arg1: inst<none>, type: type(inst(NamespaceType))}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I did.. I was looking at the rhs at arg1. I don't know how we get an inst with a none for the CheckIRId like that, hm...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that inst14 is PackageInstId:

static constexpr InstId PackageInstId = InstId(SingletonInstKinds.size());

It's.. like a singleton but isn't a singleton. It is the only inst like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this +1 is what prevents that InstId from getting tagged on Add as well:

insts_(this, SingletonInstKinds.size() + 1),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. there's more InstIds constructed without a tag here, inside formatting, which I am struggling to understand at the moment:

InstId stripped_inst_id(
import_ir.sem_ir->insts().GetRawIndex(import_ir_inst.inst_id()));
switch (loc_id.kind()) {
case LocId::Kind::None: {
out_ << stripped_inst_id << " [no loc]";
break;
}
case LocId::Kind::ImportIRInstId: {
// TODO: Probably don't want to format each indirection, but maybe
// reuse GetCanonicalImportIRInst?
out_ << stripped_inst_id << " [indirect]";
break;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These special cases are removed in #6173

@danakj
Copy link
Contributor Author

danakj commented Oct 7, 2025

To be sure, I have very strong concerns about the multiple different uses of "ir". Look for example at the YAML format in testdata -- line 91 and line 124 use "ir1" with different meaning. This seems like it'll be a significant debugging hindrance; I'm maybe too familiar with what the prefixes mean, but also do you expect people joining to be able to understand this?

Also, taking a step back to one thing you said earlier about "check_ir" feeling "long". Is the main use of this copy-paste from other debug output, or are you expecting you'll usually type it in from scratch?

Speaking for myself, I see three possible answers here:

  1. Change IdTag to match CheckIRId

    • ImportIRId -> "ir", CheckIRId -> "check_ir", IdTag -> "check_ir"
  2. Change ImportIRId and IdTag to always be explicit

    • ImportIRId -> "import_ir", CheckIRId -> "check_ir", IdTag -> "check_ir"
  3. Swap ImportIRId and CheckIRId

    • ImportIRId -> "import_ir", CheckIRId -> "ir", IdTag -> "ir"

Personally, I'd favor (2) actually -- we should probably always clarify because confusion isn't worth the shorter name. (I could mainly see (3) if typing this out is a common use-case) But, partly noting this if it's worth further discussion (I assume this wasn't discussed in detail since the review was reassigned to me).

I thought about 3 as well after I left work last night. I think it might be the best option, because yeah sometimes you do have to type things - I tended to type inst31 or whatever a lot before. Which is kind of why I want to get rid of the irN prefix for the current file too.

If possible I would like to separate all of these concerns from getting debugging/dumping functioning again though. Right now, we are printing inst ids that we can not dump, because constructing an inst id in the debugger does so without a tag, and the functions have no way to provide the ir id for the tag.

Will address the other feedback as well, and will reassign this review to you (thanks for looking at it)

Copy link
Contributor Author

@danakj danakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

a Parse::Context, or a Lex::TokenizeBuffer.
EXPR is a C++ expression such as a variable name. Use `--` to prevent it from
being treated as a TYPE and ID.
IR is the CheckIRId(N) in the form `irN`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will become wrong in the future, so I don't feel that we should. Maybe I can say something about what the dumped ids look like instead.

if m := re.fullmatch("([a-z_]+)(\\d+)", args[1]):
if m[1] in id_types:
# Look for <irN>.<type><id> as a single argument.
if m := re.fullmatch("ir(\\d+)\\.([a-z_]+)(\\d+)", args[1]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments to the MakeInstId function are different depending on if there's an ir or not. I guess I could look for the group being present and branch, but this feels at least as straightforward to me.

if len(args) != 3:
# Look for <type><id> as a single argument.
if m := re.fullmatch("([a-z_]+)(\\d+)", args[1]):
if m[1] in untagged_id_types:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That form does not work anymore, unfortunately. At the moment, trying to dump an inst id just crashes. The InstId needs an ir index to add as a tag.

While we could add the current file's ir index to it, at the moment we're printing them as irN.instM so I don't see a lot of point in consuming them in a different way.

I want to change that but it's something beyond just getting dumping working again.

@danakj danakj requested a review from jonmeow October 7, 2025 14:31
@danakj
Copy link
Contributor Author

danakj commented Oct 7, 2025

To be sure, I have very strong concerns about the multiple different uses of "ir". Look for example at the YAML format in testdata -- line 91 and line 124 use "ir1" with different meaning. This seems like it'll be a significant debugging hindrance; I'm maybe too familiar with what the prefixes mean, but also do you expect people joining to be able to understand this?
Also, taking a step back to one thing you said earlier about "check_ir" feeling "long". Is the main use of this copy-paste from other debug output, or are you expecting you'll usually type it in from scratch?
Speaking for myself, I see three possible answers here:

  1. Change IdTag to match CheckIRId

    • ImportIRId -> "ir", CheckIRId -> "check_ir", IdTag -> "check_ir"
  2. Change ImportIRId and IdTag to always be explicit

    • ImportIRId -> "import_ir", CheckIRId -> "check_ir", IdTag -> "check_ir"
  3. Swap ImportIRId and CheckIRId

    • ImportIRId -> "import_ir", CheckIRId -> "ir", IdTag -> "ir"

Personally, I'd favor (2) actually -- we should probably always clarify because confusion isn't worth the shorter name. (I could mainly see (3) if typing this out is a common use-case) But, partly noting this if it's worth further discussion (I assume this wasn't discussed in detail since the review was reassigned to me).

I thought about 3 as well after I left work last night. I think it might be the best option, because yeah sometimes you do have to type things - I tended to type inst31 or whatever a lot before. Which is kind of why I want to get rid of the irN prefix for the current file too.

If possible I would like to separate all of these concerns from getting debugging/dumping functioning again though. Right now, we are printing inst ids that we can not dump, because constructing an inst id in the debugger does so without a tag, and the functions have no way to provide the ir id for the tag.

Will address the other feedback as well, and will reassign this review to you (thanks for looking at it)

I have done the third one in #6172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants