Skip to content

Conversation

@tonykuttai
Copy link

@tonykuttai tonykuttai commented Oct 18, 2025

Add support for the #pragma comment (copyright, "Copyright info") directive for AIX.

This builds on top of PR #159096

Refernce: XL documentation for #pragma comment

/*Name=*/"__aix_copyright_str");
StrGV->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
StrGV->setAlignment(Align(1));
StrGV->setSection("__llvm_copyright");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments for this transformation do not say anything about this. What is the benefit for doing this? It increases the chances of the string being kept under LTO situations, but is that always a good thing?

Also, if this is AIX-specific, why call this section __llvm_copyright?

Copy link
Author

@tonykuttai tonykuttai Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • setUnnamedAddr will allow linker to merge identical strings. I think using InternalLinkage will do the exact same thing.
  • I will change the name to __aix_copyright.
    xlc also emitted a section and .ref was pointing to it as shown below.
	.csect	H.10.NO_SYMBOL{PR}, 7   
.func1:                                 # 0x00000000 (H.10.NO_SYMBOL)
	cal        r3,1(r0)
	bcr        BO_ALWAYS,CR0_LT
	.long	0x00000000

	.ref	E.16.__STATIC{RW}.   ############# .ref to E.16.__STATIC section
	.csect	E.16.__STATIC{RW}, 3    
	.long	0x436f7079              # "Copy"
	.long	0x72696768              # "righ"
	.long	0x74203230              # "t 20"
	.long	0x32352043              # "25 C"
	.long	0x6f6d7061              # "ompa"
	.long	0x6e792041              # "ny A"
# End	csect	E.16.__STATIC{RW}
	.long	0x00000000              # "\0\0\0\0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xlc puts all internal linkage data into the same section (which unnecessarily bloats the loaded image if some of the data is unreferenced and could otherwise be garbage-collected, but it has the benefit of reducing TOC usage).

Providing the disassembly here for discussion is a good call: It seems xlc put the string into a writable section. There is no documented rationale for that. We should consider marking the string as read-only (and testing for that).

Copy link
Author

@tonykuttai tonykuttai Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are generating a csect with RO mapping class as shown below:

	.file	"file1.c",,"LLVM version 22.0.0git"
	.csect ..text..[PR],5
	.rename	..text..[PR],""
	.machine	"PWR7"
	.globl	func1[DS]                       # -- Begin function func1
	.globl	.func1
	.align	4
	.csect func1[DS],2
	.vbyte	4, .func1                       # @func1
	.vbyte	4, TOC[TC0]
	.vbyte	4, 0
	.csect ..text..[PR],5
.func1:
	.ref __aix_copyright_str
# %bb.0:                                # %entry
	blr
L..func10:
	.vbyte	4, 0x00000000                   # Traceback table begin
	.byte	0x00                            # Version = 0
	.byte	0x09                            # Language = CPlusPlus
	.byte	0x20                            # -IsGlobaLinkage, -IsOutOfLineEpilogOrPrologue
                                        # +HasTraceBackTableOffset, -IsInternalProcedure
                                        # -HasControlledStorage, -IsTOCless
                                        # -IsFloatingPointPresent
                                        # -IsFloatingPointOperationLogOrAbortEnabled
	.byte	0x40                            # -IsInterruptHandler, +IsFunctionNamePresent, -IsAllocaUsed
                                        # OnConditionDirective = 0, -IsCRSaved, -IsLRSaved
	.byte	0x00                            # -IsBackChainStored, -IsFixup, NumOfFPRsSaved = 0
	.byte	0x00                            # -HasExtensionTable, -HasVectorInfo, NumOfGPRsSaved = 0
	.byte	0x00                            # NumberOfFixedParms = 0
	.byte	0x01                            # NumberOfFPParms = 0, +HasParmsOnStack
	.vbyte	4, L..func10-.func1             # Function size
	.vbyte	2, 0x0005                       # Function name len = 5
	.byte	"func1"                         # Function Name
                                        # -- End function
	.csect __aix_copyright[RO],2
	.lglobl	__aix_copyright_str             # @__aix_copyright_str
__aix_copyright_str:
	.string	"Copyright 2025 TU A"
	.toc

We should consider marking the string as read-only (and testing for that).

Marking the csect as RO would suffice right?
To test this, we can do // RUN: llc -mtriple=powerpc-ibm-aix -filetype=asm %t/file1.bc -o - | FileCheck %s --check-prefix=CHECK-ASM like we did in the previous commit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reintroduced llvm/test/CodeGen/PowerPC/pragma-comment-copyright-aix.ll for backend testing. We can remove this if not required.

@@ -0,0 +1,110 @@
// REQUIRES: powerpc-registered-target, system-aix, clang

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test fits with the LLVM unit testing philosophy.

My understanding is that we have tests

  • for the front-end generation of the aix.copyright.comment metadata, and
  • for the transformation of the aix.copyright.comment metadata to implicit.ref, etc.

What we do not have are targeted tests that specifically verify that the transformation is run at the right point in the various possible pipelines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. Removed this test.

Copy link
Author

@tonykuttai tonykuttai Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we do not have are targeted tests that specifically verify that the transformation is run at the right point in the various possible pipelines.

Added the opt levels testing to llvm/test/Transforms/CopyrightMetadata/copyright-metadata.ll. Do we need an end-to-end test ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a end-to-end test ?

I can't say that we shouldn't have one. It is just that I am not aware of there being any in upstream LLVM that involve the linker (except when it's an lld test).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find anything similar as well.

@tonykuttai tonykuttai force-pushed the tvarghese/pragma-comment-copyright branch from 438f04c to cfa6019 Compare October 21, 2025 14:22
@tonykuttai tonykuttai force-pushed the tvarghese/pragma-comment-copyright branch from 252f063 to 82254ee Compare October 22, 2025 11:58
@tonykuttai
Copy link
Author

Added the test clang/test/CodeGen/PowerPC/pragma-comment-copyright-aix-modules.cpp to test the C++ 20 Modules getting ignored in the importing TU.

Copy link

@hubert-reinterpretcast hubert-reinterpretcast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some partial review comments...

Comment on lines +4 to +7
; RUN: opt --O0 -S %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-O0
; RUN: opt --O1 -S %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-ON
; RUN: opt --O2 -S %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-ON
; RUN: opt --O3 -S %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK-ON

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with how the pipelines are built between different tools (clang, opt, etc.) to know if this is the right way to check that the pass is in the pipeline at the right place (especially for LTO).

@w2yehia @mandlebug, your input would be appreciated.

; - Emits the string in a dedicated read-only csect
;
; CHECK-ASM: .ref __aix_copyright_str
; CHECK-ASM: .csect __aix_copyright[RO],2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the churn. The check in llvm/test/Transforms/CopyrightMetadata/copyright-metadata.ll for constant in the IR is sufficient for establishing that we are generating as read-only, so the testing request from #1 (comment) was already satisfied (and we don't need this test just to cover that).

It is still a question whether we want to group the strings into one csect (which probably has no effect except under "full" LTO where it will increase the chance of the string being kept).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sure. Removing the llvm/test/CodeGen/PowerPC/pragma-comment-copyright-aix.ll.

It is still a question whether we want to group the strings into one csect (which probably has no effect except under "full" LTO where it will increase the chance of the string being kept).

Are there any downsides to the current approach ? I think it is cleaner to group them seperately in a csect.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any downsides to the current approach ?

It generates "false positives" to the question of whether the content associated with the copyright string was actually retained/relevant to the result.


// Emit copyright metadata for AIX
if (AIXCopyrightComment) {
auto *NMD = getModule().getOrInsertNamedMetadata("aix.copyright.comment");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename "aix.copyright.comment" to be something generic so other platforms can use it. Something like "comment.module". Hubert had a better suggestion than the word "module". I'd use that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to loadtime.copyright.comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop copyright from the name. That reflects a use, not the purpose.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, it may be helpful (in case there are additional #pragma comment IR annotations in the future) to name it comment_string.loadtime so that we can have a family of comment_string.* annotations with the "high-order qualifier" first (assuming that the comments will uniformly be generated as strings out of the front-end).


/// Single module-level copyright comment for AIX (if any).
/// We only ever accept one per TU.
llvm::MDNode *AIXCopyrightComment = nullptr;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a SmallVector<> too. AIX might limit this to only one pragma, other platforms may not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to evaluate the impact of making this a SmallVector. Will it cause the code that processes it to become a loop? If so, can that loop be truly tested in this PR? We should not be leaving "inoperative" code paths lying around.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Hubert here, we can always change it to a vector if a need arises.

Comment on lines +7 to +10
// verify that module interface emit copyright string when compiled to assembly
// RUN: %clang_cc1 -std=c++20 -triple powerpc-ibm-aix -S %t/copymod.cppm -o - \
// RUN: | FileCheck %s --check-prefix=CHECK-MOD
// CHECK-MOD: .string "module me"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test can be written to check LLVM IR (to avoid the PowerPC backend requirement).


// Emit copyright metadata for AIX
if (AIXCopyrightComment) {
auto *NMD = getModule().getOrInsertNamedMetadata("aix.copyright.comment");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, it may be helpful (in case there are additional #pragma comment IR annotations in the future) to name it comment_string.loadtime so that we can have a family of comment_string.* annotations with the "high-order qualifier" first (assuming that the comments will uniformly be generated as strings out of the front-end).

void CodeGenModule::ProcessPragmaComment(PragmaMSCommentKind Kind,
StringRef Comment) {

if (!getTriple().isOSAIX() || Kind != PCK_Copyright)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider asserting either if this is called not for copyright or if it is called for copyright but not for AIX.
Silently doing nothing seems questionable (e.g., why was the function called to do nothing in the first place?).

/// A vector of metadata strings for dependent libraries for ELF.
SmallVector<llvm::MDNode *, 16> ELFDependentLibraries;

/// Single module-level copyright comment (if any).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid "module" without qualification. "Single" is also redundant given the next sentence.

Suggested change
/// Single module-level copyright comment (if any).
/// Metadata for copyright pragma comment (if present).

Comment on lines +3216 to +3219
// Restrict copyright to AIX targets only
if (!PP.getTargetInfo().getTriple().isOSAIX()) {
switch (Kind) {
case PCK_Copyright:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This choice of control flow does not stand on its own. To explain the choice, a comment should be added to the code expressing future direction that justifies this structure.

For example, we can say that this should be applied for z/OS and extended with other IBM pragma comment kinds.

Comment on lines +14 to +19
// into concrete, translation-unit–local globals to ensure that copyright
// strings:
// - survive all optimization and LTO pipelines,
// - are not removed by linker garbage collection, and
// - remain visible in the final XCOFF binary.
//

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this. It is inaccurate and otherwise redundant with the detailed description below.

// - are not removed by linker garbage collection, and
// - remain visible in the final XCOFF binary.
//
// For each module (translation unit), the pass performs the following:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "only for AIX".

// 1. Create a single NULL-terminated string global
Constant *StrInit = ConstantDataArray::getString(Ctx, Text, /*AddNull=*/true);

// Internal, constant, TU-local — avoids duplicate symbol issues across TUs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to avoid gratuitous UTF-8.

Suggested change
// Internal, constant, TU-localavoids duplicate symbol issues across TUs.
// Internal, constant, TU-local--avoids duplicate symbol issues across TUs.

Comment on lines +125 to +127
// Backend maps this to an appropriate XCOFF csect (typically [RO])
// The section will appear in assembly as:
// .csect __copyright_comment[RO],2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's limit the amount that we say about XCOFF in this IR pass.

Suggested change
// Backend maps this to an appropriate XCOFF csect (typically [RO])
// The section will appear in assembly as:
// .csect __copyright_comment[RO],2
// The GV is constant, so we expect a read-only section.

MDNode *ImplicitRefMD = MDNode::get(Ctx, Ops);

// Lambda to attach implicit.ref metadata to a function.
// The backend will translate this into .ref assembly directives.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The backend will translate this into .ref assembly directives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet