Skip to content

[llvm] Proofread SourceLevelDebugging.rst #152646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 40 additions & 40 deletions llvm/docs/SourceLevelDebugging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1300,28 +1300,28 @@ calls. This descriptor results in the following DWARF tag:
Debugging information format
============================

Debugging Information Extension for Objective C Properties
Debugging Information Extension for Objective-C Properties
----------------------------------------------------------

Introduction
^^^^^^^^^^^^

Objective C provides a simpler way to declare and define accessor methods using
Objective-C provides a simpler way to declare and define accessor methods using
declared properties. The language provides features to declare a property and
to let compiler synthesize accessor methods.

The debugger lets developer inspect Objective C interfaces and their instance
The debugger lets developer inspect Objective-C interfaces and their instance
variables and class variables. However, the debugger does not know anything
about the properties defined in Objective C interfaces. The debugger consumes
about the properties defined in Objective-C interfaces. The debugger consumes
information generated by compiler in DWARF format. The format does not support
encoding of Objective C properties. This proposal describes DWARF extensions to
encode Objective C properties, which the debugger can use to let developers
inspect Objective C properties.
encoding of Objective-C properties. This proposal describes DWARF extensions to
encode Objective-C properties, which the debugger can use to let developers
inspect Objective-C properties.

Proposal
^^^^^^^^

Objective C properties exist separately from class members. A property can be
Objective-C properties exist separately from class members. A property can be
defined only by "setter" and "getter" selectors, and be calculated anew on each
access. Or a property can just be a direct access to some declared ivar.
Finally it can have an ivar "automatically synthesized" for it by the compiler,
Expand Down Expand Up @@ -1624,24 +1624,24 @@ The BUCKETS are an array of offsets to DATA for each hash:

So for ``bucket[3]`` in the example above, we have an offset into the table
0x000034f0 which points to a chain of entries for the bucket. Each bucket must
contain a next pointer, full 32 bit hash value, the string itself, and the data
contain a next pointer, full 32-bit hash value, the string itself, and the data
for the current string value.

.. code-block:: none

.------------.
0x000034f0: | 0x00003500 | next pointer
| 0x12345678 | 32 bit hash
| 0x12345678 | 32-bit hash
| "erase" | string value
| data[n] | HashData for this bucket
|------------|
0x00003500: | 0x00003550 | next pointer
| 0x29273623 | 32 bit hash
| 0x29273623 | 32-bit hash
| "dump" | string value
| data[n] | HashData for this bucket
|------------|
0x00003550: | 0x00000000 | next pointer
| 0x82638293 | 32 bit hash
| 0x82638293 | 32-bit hash
| "main" | string value
| data[n] | HashData for this bucket
`------------'
Expand All @@ -1650,17 +1650,17 @@ The problem with this layout for debuggers is that we need to optimize for the
negative lookup case where the symbol we're searching for is not present. So
if we were to lookup "``printf``" in the table above, we would make a 32-bit
hash for "``printf``", it might match ``bucket[3]``. We would need to go to
the offset 0x000034f0 and start looking to see if our 32 bit hash matches. To
the offset 0x000034f0 and start looking to see if our 32-bit hash matches. To
do so, we need to read the next pointer, then read the hash, compare it, and
skip to the next bucket. Each time we are skipping many bytes in memory and
touching new pages just to do the compare on the full 32 bit hash. All of
touching new pages just to do the compare on the full 32-bit hash. All of
these accesses then tell us that we didn't have a match.

Name Hash Tables
""""""""""""""""

To solve the issues mentioned above we have structured the hash tables a bit
differently: a header, buckets, an array of all unique 32 bit hash values,
differently: a header, buckets, an array of all unique 32-bit hash values,
followed by an array of hash value data offsets, one for each hash value, then
the data for all hash values:

Expand All @@ -1679,11 +1679,11 @@ the data for all hash values:
`-------------'

The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array. By
making all of the full 32 bit hash values contiguous in memory, we allow
making all of the full 32-bit hash values contiguous in memory, we allow
ourselves to efficiently check for a match while touching as little memory as
possible. Most often checking the 32 bit hash values is as far as the lookup
possible. Most often checking the 32-bit hash values is as far as the lookup
goes. If it does match, it usually is a match with no collisions. So for a
table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash
table with "``n_buckets``" buckets, and "``n_hashes``" unique 32-bit hash
values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and
``OFFSETS`` as:

Expand All @@ -1698,11 +1698,11 @@ values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and
| HEADER.header_data_len | uint32_t
| HEADER_DATA | HeaderData
|-------------------------|
| BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes
| BUCKETS | uint32_t[n_buckets] // 32-bit hash indexes
|-------------------------|
| HASHES | uint32_t[n_hashes] // 32 bit hash values
| HASHES | uint32_t[n_hashes] // 32-bit hash values
|-------------------------|
| OFFSETS | uint32_t[n_hashes] // 32 bit offsets to hash value data
| OFFSETS | uint32_t[n_hashes] // 32-bit offsets to hash value data
|-------------------------|
| ALL HASH DATA |
`-------------------------'
Expand Down Expand Up @@ -1761,26 +1761,26 @@ with:
| |
|------------|
0x000034f0: | 0x00001203 | .debug_str ("erase")
| 0x00000004 | A 32 bit array count - number of HashData with name "erase"
| 0x00000004 | A 32-bit array count - number of HashData with name "erase"
| 0x........ | HashData[0]
| 0x........ | HashData[1]
| 0x........ | HashData[2]
| 0x........ | HashData[3]
| 0x00000000 | String offset into .debug_str (terminate data for hash)
|------------|
0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
| 0x00000002 | A 32 bit array count - number of HashData with name "collision"
| 0x00000002 | A 32-bit array count - number of HashData with name "collision"
| 0x........ | HashData[0]
| 0x........ | HashData[1]
| 0x00001203 | String offset into .debug_str ("dump")
| 0x00000003 | A 32 bit array count - number of HashData with name "dump"
| 0x00000003 | A 32-bit array count - number of HashData with name "dump"
| 0x........ | HashData[0]
| 0x........ | HashData[1]
| 0x........ | HashData[2]
| 0x00000000 | String offset into .debug_str (terminate data for hash)
|------------|
0x00003550: | 0x00001203 | String offset into .debug_str ("main")
| 0x00000009 | A 32 bit array count - number of HashData with name "main"
| 0x00000009 | A 32-bit array count - number of HashData with name "main"
| 0x........ | HashData[0]
| 0x........ | HashData[1]
| 0x........ | HashData[2]
Expand All @@ -1795,13 +1795,13 @@ with:

So we still have all of the same data, we just organize it more efficiently for
debugger lookup. If we repeat the same "``printf``" lookup from above, we
would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit
would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32-bit
hash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which
is the index into the ``HASHES`` table. We would then compare any consecutive
32 bit hashes values in the ``HASHES`` array as long as the hashes would be in
32-bit hashes values in the ``HASHES`` array as long as the hashes would be in
``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo
``n_buckets`` is still 3. In the case of a failed lookup we would access the
memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes
memory for ``BUCKETS[3]``, and then compare a few consecutive 32-bit hashes
before we know that we have no match. We don't end up marching through
multiple words of memory and we really keep the number of processor data cache
lines being accessed as small as possible.
Expand Down Expand Up @@ -1842,10 +1842,10 @@ header is:
HeaderData header_data; // Implementation specific header data
};

The header starts with a 32 bit "``magic``" value which must be ``'HASH'``
The header starts with a 32-bit "``magic``" value which must be ``'HASH'``
encoded as an ASCII integer. This allows the detection of the start of the
hash table and also allows the table's byte order to be determined so the table
can be correctly extracted. The "``magic``" value is followed by a 16 bit
can be correctly extracted. The "``magic``" value is followed by a 16-bit
``version`` number which allows the table to be revised and modified in the
future. The current version number is 1. ``hash_function`` is a ``uint16_t``
enumeration that specifies which hash function was used to produce this table.
Expand All @@ -1858,8 +1858,8 @@ The current values for the hash function enumerations include:
eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
};

``bucket_count`` is a 32 bit unsigned integer that represents how many buckets
are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32 bit
``bucket_count`` is a 32-bit unsigned integer that represents how many buckets
are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32-bit
hash values that are in the ``HASHES`` array, and is the same number of offsets
are contained in the ``OFFSETS`` array. ``header_data_len`` specifies the size
in bytes of the ``HeaderData`` that is filled in by specialized versions of
Expand All @@ -1875,12 +1875,12 @@ The header is followed by the buckets, hashes, offsets, and hash value data.
struct FixedTable
{
uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below
uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table
uint32_t hashes [Header.hashes_count]; // Every unique 32-bit hash for the entire table is in this table
uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above
};

``buckets`` is an array of 32 bit indexes into the ``hashes`` array. The
``hashes`` array contains all of the 32 bit hash values for all names in the
``buckets`` is an array of 32-bit indexes into the ``hashes`` array. The
``hashes`` array contains all of the 32-bit hash values for all names in the
hash table. Each hash in the ``hashes`` table has an offset in the ``offsets``
array that points to the data for the hash value.

Expand Down Expand Up @@ -1967,13 +1967,13 @@ array to be:
HeaderData.atoms[0].form = DW_FORM_data4;

This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have
encoded as a 32-bit value (DW_FORM_data4). This allows a single name to have
multiple matching DIEs in a single file, which could come up with an inlined
function for instance. Future tables could include more information about the
DIE such as flags indicating if the DIE is a function, method, block,
or inlined.

The KeyType for the DWARF table is a 32 bit string table offset into the
The KeyType for the DWARF table is a 32-bit string table offset into the
".debug_str" table. The ".debug_str" is the string table for the DWARF which
may already contain copies of all of the strings. This helps make sure, with
help from the compiler, that we reuse the strings between all of the DWARF
Expand All @@ -1982,7 +1982,7 @@ compiler generate all strings as DW_FORM_strp in the debug info, is that
DWARF parsing can be made much faster.

After a lookup is made, we get an offset into the hash data. The hash data
needs to be able to deal with 32 bit hash collisions, so the chunk of data
needs to be able to deal with 32-bit hash collisions, so the chunk of data
at the offset in the hash data consists of a triple:

.. code-block:: c
Expand All @@ -1992,7 +1992,7 @@ at the offset in the hash data consists of a triple:
HashData[hash_data_count]

If "str_offset" is zero, then the bucket contents are done. 99.9% of the
hash data chunks contain a single item (no 32 bit hash collision):
hash data chunks contain a single item (no 32-bit hash collision):

.. code-block:: none

Expand Down Expand Up @@ -2025,7 +2025,7 @@ If there are collisions, you will have multiple valid string offsets:
`------------'

Current testing with real world C++ binaries has shown that there is around 1
32 bit hash collision per 100,000 name entries.
32-bit hash collision per 100,000 name entries.

Contents
^^^^^^^^
Expand Down
Loading