Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion llvm/docs/Extensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -535,6 +535,27 @@ Example of BBAddrMap with PGO data:
.uleb128 1000 # BB_3 basic block frequency (only when enabled)
.uleb128 0 # BB_3 successors count (only enabled with branch probabilities)

``SHT_LLVM_FUNC_MAP`` Section (function address map)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Pretty sure the underline is supposed to match the title length.

This section stores the mapping from the binary address of function to its
related metadata features. It is used to emit function-level analysis data and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This section stores the mapping from the binary address of function to its
related metadata features. It is used to emit function-level analysis data and
This section stores the mapping from the binary address of functions to their
related metadata features. It is used to emit function-level analysis data and

can be enabled through ``--func-map`` option. The fields are encoded in the
following format:

#. A version number byte used for backward compatibility.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we want a version byte for every single entry. That feels wasteful when there could be thousands of functions. However, we also can't just have a single version byte at the start of the section, because then the section becomes ambiguous when concatenated together by the linker from two different objects.

Would a better idea be to have a header, consisting of a size (or count of function entries in this table) and a version number (for now), followed by all the function address/count entries? A section might consist of one or more of these header + address/count blocks, to allow for this concatenation. The end of a block (and therefore start of the next block) is identified via the size/count member of the header.

Another idea, if you adopt the header + body approach, is to split the entries into separate function list/data, i.e. you'd have something like the following:

funcaddr1
funcaddr2
funcaddr3
data-for-func1
data-for-func2
data-for-func3

This would be useful for reducing the amount of data that needs to be read to find out information for a specific function. However, it can only be used if the data is fixed size for all entries within a block (i.e. no ULEBs), because accessing the data requires finding the right function and then using its index in the function address list to jump to the right block of data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jh7370 Does the header approach require adding custom merging support in the linker?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's the same approach as taken by many DWARF sections. Pull the common stuff into a table header and then put the data in the table body, with a size in the header to indicate how much data there is (an alternative approach would be some kind of end marker in the data, but that has issues under some conditions, depending on the values used).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds a great idea, I will give a try! Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So we will get one section per module with multiple function entries and one header. Then the linker will simply concatenate these sections. So we may end up with multiple headers. Is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is my revised suggestion as an alternative. This may not be ideal, because it impedes size reductions through gc-sections/COMDAT deduplication etc as it leaves dead entries in the data.

Whether you adopt either of these approaches or stick with the original design really needs to be a decision that you as clients of the functionality make. Keep in mind that having more data will make it slower to read and write the data. Functionality like gc-sections can help improve this, at a cost about what the section format might look like.

@jh7370 Sorry for late reply and thank you for the detailed clarification, that was super helpful!
I've now done the single section approach you suggested, it does work to emit the good data!(IIUC, the first suggestion might rely on features that aren’t ready yet)
Then to understand the tradeoffs, I ran some experiments to compare the original design(duplicated version filed) vs the one single section design. I ran them on one of our top services(big size binary, contains 1M+ functions), I noticed one significant diff in finial binary's section size.

  • Original design: 25MB.
  • Single section design: 69MB.

It's 2~3X more size, which I think that's due to the dead entries(missing gc-sections). For other overheads, I think that's not a significant factor for our system. For build time, as for our major services, the build time could take 30mins+ time, the extra linking time for the section is too small to measure. And the disk/network overhead is fine for the small intermediate elf obj size increase. But for the finial binary size, given we could extend more data, that means for each additional data, it would cost 2 ~ 3X more(dead entry) size, which I feel could be a problem for long run. Given this, I'm leaning towards the original design. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. It's important to get these things right! On the plus side, as long as the first part of an entry is the version byte, if we come up with a better format in the future, all we have to do is bump the version number and ensure the header or first entry has that version byte first again, i.e. the header approach (should you decide to switch in the future for whatever reason) is interchangeable with the version-per-entry approach, assuming you have the correct version byte, since the version-per-entry approach is effectively "header per entry" where the version field is the sole component of the header.

Another consideration: is it important to be able to traverse this structure quickly to find the appropriate entry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. It's important to get these things right! On the plus side, as long as the first part of an entry is the version byte, if we come up with a better format in the future, all we have to do is bump the version number and ensure the header or first entry has that version byte first again, i.e. the header approach (should you decide to switch in the future for whatever reason) is interchangeable with the version-per-entry approach, assuming you have the correct version byte, since the version-per-entry approach is effectively "header per entry" where the version field is the sole component of the header.

Got it!

Another consideration: is it important to be able to traverse this structure quickly to find the appropriate entry?

Is it related to your earlier comment about "using a fixed size instead of the ULEBs"? That sounds good. As we could extend more data in future, it should be beneficial to quickly look up the entry. I will change to use a fixed size so that we can skip parsing the unused data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With fixed-sized entries, you can then use a binary search algorithm to search the section for a specific address, assuming that the entries are in address order. I think this will be guaranteed by the SHF_LINK_ORDER flag. You might want to set the seciton's sh_entsize appropriately too.

Some related thoughts:

  1. If the function addresses are ordered, you can use a binary search algorithm to find the specific one you care about, without needing to read any extra data at all (just the ones that get picked in the search) but only if the entry sizes are fixed.
  2. I imagine that a future version of the structure might change the size of entries. In this case, you could end up with two objects with function maps of different versions and then different sh_entsize values. I've forgotten how linkers handle this. My hope is that in such a situation they set the sh_entsize to 0 for the combined section, but you'd need to check. A value of 0 would then mean that "fast traversal via binary search without reading the full structure first" isn't possible (in that case, you'd need to read each entry sequentially, although you could skip the data that isn't actually important after checking the version number to determine the size).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With fixed-sized entries, you can then use a binary search algorithm to search the section for a specific address, assuming that the entries are in address order. I think this will be guaranteed by the SHF_LINK_ORDER flag. You might want to set the seciton's sh_entsize appropriately too.

Some related thoughts:

  1. If the function addresses are ordered, you can use a binary search algorithm to find the specific one you care about, without needing to read any extra data at all (just the ones that get picked in the search) but only if the entry sizes are fixed.

Appreciate the suggestion! Updated to set the entry size.

  1. I imagine that a future version of the structure might change the size of entries. In this case, you could end up with two objects with function maps of different versions and then different sh_entsize values. I've forgotten how linkers handle this. My hope is that in such a situation they set the sh_entsize to 0 for the combined section, but you'd need to check. A value of 0 would then mean that "fast traversal via binary search without reading the full structure first" isn't possible (in that case, you'd need to read each entry sequentially, although you could skip the data that isn't actually important after checking the version number to determine the size).

I verified that on a local test, right, the linker set sh_entsize to 0 if the entry size is not fixed size(combine different entry sizes from two versions)

#. The function's entry address.
#. Dynamic Instruction Count, which is calculated as the total PGO counts for all
instructions within the function.

Example:

.. code-block:: gas

.section ".llvm_func_map","",@llvm_func_map
.byte 1 # version number
.quad .Lfunc_begin1 # function address
.uleb128 1000 # dynamic instruction count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want 2+ functions in the example.

I'm not convinced we want to use ULEBs in this section. Using them means the section entries have variable width, which in turn means the only way of finding information for a specific function in the map is to read the whole map, rather than just the function addresses. Of course, it's a space versus speed trade-off, so it depends on how this section will likely work in the future.


``SHT_LLVM_OFFLOADING`` Section (offloading data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This section stores the binary data used to perform offloading device linking
Expand Down Expand Up @@ -725,4 +746,3 @@ follows:
add x16, x16, :lo12:__chkstk
blr x16
sub sp, sp, x15, lsl #4

32 changes: 1 addition & 31 deletions llvm/include/llvm/Object/ELFTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -1029,38 +1029,8 @@ struct PGOAnalysisMap {

// Struct representing the FuncMap for one function.
struct FuncMap {

// Bitfield of optional features to control the extra information
// emitted/encoded in the the section.
struct Features {
bool DynamicInstCount : 1;

// Encodes to minimum bit width representation.
uint8_t encode() const {
return (static_cast<uint8_t>(DynamicInstCount) << 0);
}

// Decodes from minimum bit width representation and validates no
// unnecessary bits are used.
static Expected<Features> decode(uint8_t Val) {
Features Feat{static_cast<bool>(Val & (1 << 0))};
if (Feat.encode() != Val)
return createStringError(std::error_code(),
"invalid encoding for FuncMap::Features: 0x%x",
Val);
return Feat;
}

bool operator==(const Features &Other) const {
return DynamicInstCount == Other.DynamicInstCount;
}
};

uint64_t FunctionAddress = 0; // Function entry address.
uint64_t DynamicInstCount = 0; // Dynamic instruction count for this function

// Flags to indicate if each feature was enabled in this function
Features FeatEnable;
uint64_t DynamicInstCount = 0; // Dynamic instruction count for this function.

uint64_t getFunctionAddress() const { return FunctionAddress; }
};
Expand Down
3 changes: 1 addition & 2 deletions llvm/include/llvm/ObjectYAML/ELFYAML.h
Original file line number Diff line number Diff line change
Expand Up @@ -197,9 +197,8 @@ struct PGOAnalysisMapEntry {

struct FuncMapEntry {
uint8_t Version;
llvm::yaml::Hex8 Feature;
llvm::yaml::Hex64 Address;
llvm::yaml::Hex64 DynamicInstCount;
uint64_t DynamicInstCount;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, not using Hex64 means you can't use hex encoding for this value. I feel like that's a mistake, personally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This is just a count. So there is no point in using the hex encoding in Yaml.

};

struct StackSizeEntry {
Expand Down
7 changes: 2 additions & 5 deletions llvm/lib/ObjectYAML/ELFEmitter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1550,20 +1550,17 @@ void ELFState<ELFT>::writeSectionContent(Elf_Shdr &SHeader,
return;

for (const auto &[Idx, E] : llvm::enumerate(*Section.Entries)) {
// Write version and feature values.
if (Section.Type == llvm::ELF::SHT_LLVM_FUNC_MAP) {
if (E.Version > 1)
WithColor::warning() << "unsupported SHT_LLVM_FUNC_MAP version: "
<< static_cast<int>(E.Version)
<< "; encoding using the most recent version";
CBA.write(E.Version);
CBA.write(E.Feature);
SHeader.sh_size += 2;
SHeader.sh_size += 1;
}
CBA.write<uintX_t>(E.Address, ELFT::Endianness);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but shouldn't this be using the Elf_Addr type, since it's representing an address? They might amount to the same thing, but it conveys the meaning better. (NB: I haven't tested it, so this might not work as desired)

Suggested change
CBA.write<uintX_t>(E.Address, ELFT::Endianness);
CBA.write<ELFT::Elf_Addr>(E.Address, ELFT::Endianness);

SHeader.sh_size += sizeof(uintX_t);
if (E.DynamicInstCount)
SHeader.sh_size += CBA.writeULEB128(E.DynamicInstCount);
SHeader.sh_size += CBA.writeULEB128(E.DynamicInstCount);
}
}

Expand Down
3 changes: 1 addition & 2 deletions llvm/lib/ObjectYAML/ELFYAML.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1865,9 +1865,8 @@ void MappingTraits<ELFYAML::FuncMapEntry>::mapping(IO &IO,
ELFYAML::FuncMapEntry &E) {
assert(IO.getContext() && "The IO context is not initialized");
IO.mapRequired("Version", E.Version);
IO.mapOptional("Feature", E.Feature, Hex8(0));
IO.mapOptional("Address", E.Address, Hex64(0));
IO.mapOptional("DynInstCnt", E.DynamicInstCount, Hex64(0));
IO.mapOptional("DynInstCnt", E.DynamicInstCount, 0);
}

void MappingTraits<ELFYAML::BBAddrMapEntry>::mapping(
Expand Down
29 changes: 10 additions & 19 deletions llvm/test/tools/obj2yaml/ELF/func-map.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,14 @@
# VALID-NEXT: Type: SHT_LLVM_FUNC_MAP
# VALID-NEXT: Entries:
# VALID-NEXT: - Version: 1
# VALID-NEXT: Feature: 0x1
## The 'Address' field is omitted when it's zero.
# VALID-NEXT: DynInstCnt: 0x10
# VALID-NEXT: DynInstCnt: 16
## The 'DynInstCnt' field is omitted when it's zero.
# VALID-NEXT: - Version: 1
## The 'Feature' field is omitted when it's zero.
# VALID-NEXT: Address: 0x1
# VALID-NEXT: - Version: 1
# VALID-NEXT: Feature: 0x1
# VALID-NEXT: Address: 0xFFFFFFFFFFFFFFF1
# VALID-NEXT: DynInstCnt: 0xFFFFFFFFFFFFFFF2
# VALID-NEXT: DynInstCnt: 100001

--- !ELF
FileHeader:
Expand All @@ -37,16 +35,14 @@ Sections:
ShSize: [[SIZE=<none>]]
Entries:
- Version: 1
Feature: 0x1
Address: 0x0
DynInstCnt: 0x10
DynInstCnt: 16
- Version: 1
Feature: 0x0
Address: 0x1
DynInstCnt: 0
- Version: 1
Feature: 0x1
Address: 0xFFFFFFFFFFFFFFF1
DynInstCnt: 0xFFFFFFFFFFFFFFF2
DynInstCnt: 100001

## Check obj2yaml can dump empty .llvm_func_map sections.

Expand Down Expand Up @@ -88,16 +84,14 @@ Sections:
# MULTI-NEXT: Type: SHT_LLVM_FUNC_MAP
# MULTI-NEXT: Entries:
# MULTI-NEXT: - Version: 1
# MULTI-NEXT: Feature: 0x1
# MULTI-NEXT: Address: 0x2
# MULTI-NEXT: DynInstCnt: 0x3
# MULTI-NEXT: DynInstCnt: 3
# MULTI-NEXT: - Name: '.llvm_func_map (1)'
# MULTI-NEXT: Type: SHT_LLVM_FUNC_MAP
# MULTI-NEXT: Entries:
# MULTI-NEXT: - Version: 1
# MULTI-NEXT: Feature: 0x1
# MULTI-NEXT: Address: 0xA
# MULTI-NEXT: DynInstCnt: 0xB
# MULTI-NEXT: DynInstCnt: 100

--- !ELF
FileHeader:
Expand All @@ -109,16 +103,14 @@ Sections:
Type: SHT_LLVM_FUNC_MAP
Entries:
- Version: 1
Feature: 0x1
Address: 0x2
DynInstCnt: 0x3
DynInstCnt: 3
- Name: '.llvm_func_map (1)'
Type: SHT_LLVM_FUNC_MAP
Entries:
- Version: 1
Feature: 0x1
Address: 0xA
DynInstCnt: 0xB
DynInstCnt: 100

## Check that obj2yaml uses the "Content" tag to describe an .llvm_func_map section
## when it can't extract the entries, for example, when the section is truncated.
Expand All @@ -135,5 +127,4 @@ Sections:
# INVALID-NEXT: Sections:
# INVALID-NEXT: - Name: .llvm_func_map
# INVALID-NEXT: Type: SHT_LLVM_FUNC_MAP
# BADNUM-NEXT: Content: {{([[:xdigit:]]+)}}{{$}}
# TRUNCATED-NEXT: Content: '{{([[:xdigit:]]{16})}}'{{$}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to explicitly check the expected 8 bytes here, to show that it's not producing garbage here.

3 changes: 1 addition & 2 deletions llvm/test/tools/yaml2obj/ELF/func-map.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
# Case 4: Specify Entries.
# CHECK: Name: .llvm_func_map (1)
# CHECK: SectionData (
# CHECK-NEXT: 0000: 01012222 02000000 000010
# CHECK-NEXT: 0000: 01222202 00000000 0010
# CHECK-NEXT: )


Expand Down Expand Up @@ -75,7 +75,6 @@ Sections:
Type: SHT_LLVM_FUNC_MAP
Entries:
- Version: 1
Feature: 0x1
Address: 0x22222
DynInstCnt: 0x10

Expand Down
16 changes: 6 additions & 10 deletions llvm/tools/obj2yaml/elf2yaml.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1012,22 +1012,18 @@ ELFDumper<ELFT>::dumpFuncMapSection(const Elf_Shdr *Shdr) {
std::vector<ELFYAML::FuncMapEntry> Entries;
DataExtractor::Cursor Cur(0);
uint8_t Version = 0;
uint8_t Feature = 0;
uint64_t Address = 0;
while (Cur && Cur.tell() < Content.size()) {
if (Shdr->sh_type == ELF::SHT_LLVM_FUNC_MAP) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this check? When could it be false if you're inside this function?

Version = Data.getU8(Cur);
Feature = Data.getU8(Cur);
if (Cur && Version > 1)
return createStringError(errc::invalid_argument,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case?

"invalid SHT_LLVM_FUNC_MAP section version: " +
Twine(static_cast<int>(Version)));
}
auto FeatureOrErr = llvm::object::FuncMap::Features::decode(Feature);
if (!FeatureOrErr)
return FeatureOrErr.takeError();

Address = Data.getAddress(Cur);

uint64_t DynamicInstCount =
FeatureOrErr->DynamicInstCount ? Data.getULEB128(Cur) : 0;
Entries.push_back({Version, Feature, Address, DynamicInstCount});
uint64_t DynamicInstCount = Data.getULEB128(Cur);
Entries.push_back({Version, Address, DynamicInstCount});
}

if (!Cur) {
Expand Down
Loading