Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions llvm/include/llvm/BinaryFormat/ELF.h
Original file line number Diff line number Diff line change
Expand Up @@ -1123,6 +1123,8 @@ struct Elf64_Shdr {
Elf64_Xword sh_entsize;
};

enum { PN_XNUM = 0xffff };

// Special section indices.
enum {
SHN_UNDEF = 0, // Undefined, missing, irrelevant, or meaningless
Expand Down
60 changes: 41 additions & 19 deletions llvm/include/llvm/Object/ELF.h
Original file line number Diff line number Diff line change
Expand Up @@ -278,9 +278,16 @@ class ELFFile {
std::vector<Elf_Shdr> FakeSections;
SmallString<0> FakeSectionStrings;

Elf_Word RealPhNum;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth a comment immediately before this block of variables explaining why they are needed and Header.e_phnum etc can't be relied upon directly?

Also, as these are no longer referring directly to a specific field in the ELF (since they could be the value in the ELF header or the 0th section header), I'd stop using Elf_* types for them, perhaps using size_t instead, since by this point they are more about being indexes and counts of things in memory/in a file.

Elf_Word RealShNum;
Elf_Word RealShStrNdx;

ELFFile(StringRef Object);

public:
Elf_Word getPhNum() const { return RealPhNum; }
Elf_Word getShNum() const { return RealShNum; }
Elf_Word getShStrNdx() const { return RealShStrNdx; }
const Elf_Ehdr &getHeader() const {
return *reinterpret_cast<const Elf_Ehdr *>(base());
}
Expand Down Expand Up @@ -379,22 +386,21 @@ class ELFFile {

/// Iterate over program header table.
Expected<Elf_Phdr_Range> program_headers() const {
if (getHeader().e_phnum && getHeader().e_phentsize != sizeof(Elf_Phdr))
if (RealPhNum && getHeader().e_phentsize != sizeof(Elf_Phdr))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're introducing a getter, perhaps it makes sense to always use the getter, even within the class?

return createError("invalid e_phentsize: " +
Twine(getHeader().e_phentsize));

uint64_t HeadersSize =
(uint64_t)getHeader().e_phnum * getHeader().e_phentsize;
uint64_t HeadersSize = (uint64_t)RealPhNum * getHeader().e_phentsize;
uint64_t PhOff = getHeader().e_phoff;
if (PhOff + HeadersSize < PhOff || PhOff + HeadersSize > getBufSize())
return createError("program headers are longer than binary of size " +
Twine(getBufSize()) + ": e_phoff = 0x" +
Twine::utohexstr(getHeader().e_phoff) +
", e_phnum = " + Twine(getHeader().e_phnum) +
", e_phnum = " + Twine(RealPhNum) +
", e_phentsize = " + Twine(getHeader().e_phentsize));

auto *Begin = reinterpret_cast<const Elf_Phdr *>(base() + PhOff);
return ArrayRef(Begin, Begin + getHeader().e_phnum);
return ArrayRef(Begin, Begin + RealPhNum);
}

/// Get an iterator over notes in a program header.
Expand Down Expand Up @@ -772,18 +778,10 @@ template <class ELFT>
Expected<StringRef>
ELFFile<ELFT>::getSectionStringTable(Elf_Shdr_Range Sections,
WarningHandler WarnHandler) const {
uint32_t Index = getHeader().e_shstrndx;
if (Index == ELF::SHN_XINDEX) {
// If the section name string table section index is greater than
// or equal to SHN_LORESERVE, then the actual index of the section name
// string table section is contained in the sh_link field of the section
// header at index 0.
if (Sections.empty())
return createError(
"e_shstrndx == SHN_XINDEX, but the section header table is empty");

Index = Sections[0].sh_link;
}
uint32_t Index = RealShStrNdx;
if (Index == ELF::SHN_XINDEX)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bug here: if there happen to be exactly SHN_XINDEX section headers, this check will spuriously report an error, because the sh_size field of the null section header will happen to be 0xffff.

return createError(
"e_shstrndx == SHN_XINDEX, but the section header table is empty");

// There is no section name string table. Return FakeSectionStrings which
// is non-empty if we have created fake sections.
Expand Down Expand Up @@ -889,7 +887,31 @@ Expected<uint64_t> ELFFile<ELFT>::getDynSymtabSize() const {
return 0;
}

template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {}
template <class ELFT> ELFFile<ELFT>::ELFFile(StringRef Object) : Buf(Object) {
const Elf_Ehdr &Header = getHeader();
RealPhNum = Header.e_phnum;
RealShNum = Header.e_shnum;
RealShStrNdx = Header.e_shstrndx;
if (!Header.hasPhdrNumExtension())
return;

// An ELF binary may report `hasExtendedHeader` as true but not actually
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasExtendedHeader is stale, I think?

More generally, this is an example of why I think it makes more sense to lazily initialise the Real* values. By lazily initialising, we don't have to wonder whether these values have actually been set properly or not and therefore whether we need to look at section 0 again. It would also help with the other case I've commented on: if we make Real* values std::optional, we can detect whether they've been set or not, then once they've been set, we can rely on these values always being the real values.

Copy link
Contributor Author

@aokblast aokblast Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use std::optional for the new getter?
Or just returns Real_* ? Real_ : getHeader()->e_*.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the getter should be the thing that does the initialization (if needed).

So something like:

mutable std::optional<uint32_t> RealPhNum;
mutable std::optional<uint32_t> RealShNum;
mutable std::optional<uint32_t> RealShStrndx;

Expected<uint32_t> getPhNum() const { // Probably should be defined in a .cpp at this point.
  if (!RealPhNum)
    if (Error E = readShdrZero())
      return std::move(E);
  return *RealPhNum;
}
// Same for ShNum and ShStrndx.

Error readShdrZero() {
  const Elf_Ehdr &Header = getHeader();
  if (Header.e_phnum == PN_XNUM || Header.e_shnum == 0 || Header.e_shstrndx == SHN_XINDEX)
  auto SecOrErr = getSection(0); // NB: maybe should use Expected<...> here instead, to fit LLVM coding standards.
  if (!SecOrErr)
    return SecOrErr.takeError();

  RealPhNum = Header.e_phnum == PN_XNUM ? (*SecOrErr)->sh_info : Header.e_phnum;
  RealShNum = Header.e_shnum == 0 ? (*SecOrErrr)->sh_size : Header.e_shnum;
  RealShStrndx = Header.e_shstrndx == SHN_XINDEX ? (*SecOrErr)->sh_link : Header.e_shstrndx;
}

Miscellaneous notes about the above:

  • The above is not likely to be 100% correct, as I haven't tested it, but should help clarify what I'm expecting.
  • The Elf gABI says e_shnum == 0 indicates to use the 0th section header sh_size field, not SHN_UNDEF. While the two technically are the same value, it's more correct to compare against the literal 0 and not SHN_UNDEF.
  • I've used PN_XNUM, not 0xffff, since we should use the constants when we have them and the gABI does (NB: the draft gABI hasn't been updated with the PN_XNUM extension yet, but it has been agreed on the mailing list).
  • By doing this, the error isn't reported until the first (and subsequent) attempts to read the the real values, so if the values aren't needed, no error is reported, and the error is only reported once it is actually needed. If multiple reports of the error are undesirable, the caller can handle it (see e.g. llvm-readelf's reportUniqueWarning).
  • If doing this, it's essential we don't reference Real... directly outside the example code I've posted, even if it's accessible. You could go as far as wrapping the fields and initialisation in another class, to enforce this, but I don't think it's worth doing that, personally.

Copy link
Contributor Author

@aokblast aokblast Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I get what you mean.

A question is that should we consume up the error or report it back when the caller is inside the class?

Since some member functions like program_headers do not expect that it fails on reading e_phnum. That means we need to change some testcase if we want to report it back since the user of program_headers would be expected to early return and early report of the error.

But you said that we don't want to see any difference in a user's viewpoint. That is why I choose to set these field in the constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we don't want to introduce a test with 0xffff program headers, we can create a test, adapted from test/llvm-objcopy/ELF/invalid-e_phoff.test, that shows the llvm-readelf behavior when e_phnum is 0xffff, but the section header table doesn't contain that many of sections. The diagnostic is probably different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question is that should we consume up the error or report it back when the caller is inside the class?

I'd propagate it as far as possible until you have to change an API, then consume the error with a TODO comment to highlight we'd like to propagate it further. We've taken this approach in other cases. A future change could be made to propagate it further, where we'd evaluate what should be done on a case-by-case basis.

As I understand it, with my suggested approach, the only time we'd see an error is if e_phnum is PN_XNUM/e_shstrndx is SHN_XINDEX/e_shnum is 0 and the section header table can't be read (or there are no section headers in it). As a result, this should preserve the existing behaviour for all current test cases.

Regarding testing, strictly-speaking, I expect the final gABI wording to match that of the Oracle document, which technically means the extension should only be used if there are more than 65534 program headers. However, fundamentally I don't see any reason we'd enforce this in the binary tools (possibly with the exception in llvm-objcopy when writing an object out that used the extension unnecessarily), so if we used yaml2obj to specify e_phnum to be PN_XNUM and to manually populate section header 0, we could show that it correctly reads this in one test and then in a different test case (or cases) we could have it fail because e.g. the section header table is missing (there's a yaml2obj feature that allows you to suppress the table).

Copy link
Contributor Author

@aokblast aokblast Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please check if the current change is what you expected? If that is the case, I will rebase my llvm-objcopy branch and force push just for once, since another branch is so dirty. Sorry for the bothering and nuance.

There are actually some corner test cases has the problem you have mentioned (SHN_UNDEF and the section size is actually 0). I just add some comments there.

// include an extended header. For example, a core dump can contain 65,535
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "extended header" isn't used in the ELF gABI, so I don't think we should use it here. The header hasn't really been extended, in my opinion, so I think the term used in the Oracle docs (and possibly elsewhere) is misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the better choice for us?

According to gabi, they don't provide special term for this condition. Also, extended header is used in LLDB which also provides mechanism to parse special section 0.

Should we call hasSection0Header? Or something else?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the code suggestion I've given, the hasPhdrNumExtension should be redundant. If a comment is needed within the code suggestion, I'd just refer to it as "the section header at index 0" or something to that effect. I expect you'll see other examples in the code or commit history.

// segments but no sections at all. We defer reporting an error until section
// 0 is accessed. Consumers should handle and emit the error themselves when
// they attempt to access it.
auto SecOrErr = getSection(0);
if (!SecOrErr) {
consumeError(SecOrErr.takeError());
return;
}
if (RealPhNum == 0xFFFF)
RealPhNum = (*SecOrErr)->sh_info;
if (RealShNum == ELF::SHN_UNDEF)
RealShNum = (*SecOrErr)->sh_size;
if (RealShStrNdx == ELF::SHN_XINDEX)
RealShStrNdx = (*SecOrErr)->sh_link;
}

template <class ELFT>
Expected<ELFFile<ELFT>> ELFFile<ELFT>::create(StringRef Object) {
Expand Down Expand Up @@ -956,7 +978,7 @@ Expected<typename ELFT::ShdrRange> ELFFile<ELFT>::sections() const {
const Elf_Shdr *First =
reinterpret_cast<const Elf_Shdr *>(base() + SectionTableOffset);

uintX_t NumSections = getHeader().e_shnum;
uintX_t NumSections = RealShNum;
if (NumSections == 0)
NumSections = First->sh_size;

Expand Down
5 changes: 5 additions & 0 deletions llvm/include/llvm/Object/ELFTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -529,6 +529,11 @@ struct Elf_Ehdr_Impl {

unsigned char getFileClass() const { return e_ident[ELF::EI_CLASS]; }
unsigned char getDataEncoding() const { return e_ident[ELF::EI_DATA]; }
bool hasPhdrNumExtension() const {
return (e_phnum == ELF::PN_XNUM || e_shnum == ELF::SHN_UNDEF ||
e_shstrndx == ELF::SHN_XINDEX) &&
e_shoff != 0;
}
};

template <endianness Endianness>
Expand Down