Skip to content

Conversation

@Camsyn
Copy link
Contributor

@Camsyn Camsyn commented Jun 18, 2025

This commit changes the interval shadow/meta address check from inclusive-inclusive ( $[\mathrm{start}, \mathrm{end}]$ ) to inclusive-exclusive ( $[\mathrm{start}, \mathrm{end})$ ), to resolve the ambiguity of the end point address. This also aligns the logic with the check for isAppMem (i.e., inclusive-exclusive), ensuring consistent behavior across all memory classifications.

  1. The isShadowMem and isMetaMem checks previously used an inclusive-inclusive interval, i.e., $[\mathrm{start}, \mathrm{end}]$, which could lead to a boundary address being incorrectly classified as both Shadow and Meta memory, e.g., 0x3000_0000_0000 in Mapping48AddressSpace.

    • What's more, even when Shadow doesn't border Meta, ShadowMem::end cannot be considered a legal shadow address, as TSan protects the gap, i.e., ProtectRange(ShadowEnd(), MetaShadowBeg());
  2. ShadowMem/MetaMem addresses are derived from AppMem using an affine-like transformation (* factor + bias). This transformation includes two extra modifications: high- and low-order bits are masked out, and for Shadow Memory, an optional XOR operation may be applied to prevent conflicts with certain AppMem regions.

    • Given that all AppMem regions are defined as inclusive-exclusive intervals, $[\mathrm{start}, \mathrm{end})$, the resulting Shadow/Meta regions should logically also be inclusive-exclusive.

Note: This change is purely for improving code consistency and should have no functional impact. In practice, the exact endpoint addresses of the Shadow/Meta regions are generally not reached.

This commit changes the interval shadow/meta address check from
inclusive-inclusive ( [start, end] ) to inclusive-exclusive
( [start, end) ), to resolve the ambiguity of the end point address.
This also aligns the logic with the check for `isAppMem`, ensuring
consistent behavior across all memory classifications.

1. The `isShadowMem` and `isMetaMem` checks previously used an
inclusive-inclusive interval, i.e., [start, end],  which could lead to
a boundary address being incorrectly classified as both Shadow and
Meta memory, e.g., 0x3000_0000_0000 in `Mapping48AddressSpace`.
    - What's more, even when Shadow doesn't border Meta,
    `ShadowMem::end` cannot be considered a legal shadow address, as
    TSan protects the gap, i.e.,
    `ProtectRange(ShadowEnd(), MetaShadowBeg());`

2. `ShadowMem`/`MetaMem` addresses are derived from `AppMem` using an
affine-like transformation (`* factor + bias`). This transformation
includes two extra modifications: high- and low-order bits are masked
out, and for Shadow Memory, an optional XOR operation may be applied to
prevent conflicts with certain AppMem regions.
    - Given that all AppMem regions are defined as inclusive-exclusive
    intervals, $[\mathrm{start}, \mathrm{end})$, the resulting
    Shadow/Meta regions should logically also be inclusive-exclusive.

Note: This change is purely for improving code consistency and should
have no functional impact. In practice, the exact endpoint addresses of
the Shadow/Meta regions are generally not reached.
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Kunqiu Chen (Camsyn)

Changes

This commit changes the interval shadow/meta address check from inclusive-inclusive ( $[\mathrm{start}, \mathrm{end}]$ ) to inclusive-exclusive ( $[\mathrm{start}, \mathrm{end})$ ), to resolve the ambiguity of the end point address. This also aligns the logic with the check for isAppMem (i.e., inclusive-exclusive), ensuring consistent behavior across all memory classifications.

  1. The isShadowMem and isMetaMem checks previously used an inclusive-inclusive interval, i.e., $[\mathrm{start}, \mathrm{end}]$, which could lead to a boundary address being incorrectly classified as both Shadow and Meta memory, e.g., 0x3000_0000_0000 in Mapping48AddressSpace.

    • What's more, even when Shadow doesn't border Meta, ShadowMem::end cannot be considered a legal shadow address, as TSan protects the gap, i.e., ProtectRange(ShadowEnd(), MetaShadowBeg());
  2. ShadowMem/MetaMem addresses are derived from AppMem using an affine-like transformation (* factor + bias). This transformation includes two extra modifications: high- and low-order bits are masked out, and for Shadow Memory, an optional XOR operation may be applied to prevent conflicts with certain AppMem regions.

    • Given that all AppMem regions are defined as inclusive-exclusive intervals, $[\mathrm{start}, \mathrm{end})$, the resulting Shadow/Meta regions should logically also be inclusive-exclusive.

Note: This change is purely for improving code consistency and should have no functional impact. In practice, the exact endpoint addresses of the Shadow/Meta regions are generally not reached.


Full diff: https://github.com/llvm/llvm-project/pull/144647.diff

2 Files Affected:

  • (modified) compiler-rt/lib/tsan/rtl/tsan_platform.h (+2-2)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp (+3-3)
diff --git a/compiler-rt/lib/tsan/rtl/tsan_platform.h b/compiler-rt/lib/tsan/rtl/tsan_platform.h
index 354f6da6a64a1..ada594bc11fc7 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_platform.h
+++ b/compiler-rt/lib/tsan/rtl/tsan_platform.h
@@ -931,7 +931,7 @@ bool IsAppMem(uptr mem) { return SelectMapping<IsAppMemImpl>(mem); }
 struct IsShadowMemImpl {
   template <typename Mapping>
   static bool Apply(uptr mem) {
-    return mem >= Mapping::kShadowBeg && mem <= Mapping::kShadowEnd;
+    return mem >= Mapping::kShadowBeg && mem < Mapping::kShadowEnd;
   }
 };
 
@@ -943,7 +943,7 @@ bool IsShadowMem(RawShadow *p) {
 struct IsMetaMemImpl {
   template <typename Mapping>
   static bool Apply(uptr mem) {
-    return mem >= Mapping::kMetaShadowBeg && mem <= Mapping::kMetaShadowEnd;
+    return mem >= Mapping::kMetaShadowBeg && mem < Mapping::kMetaShadowEnd;
   }
 };
 
diff --git a/compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp b/compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp
index cf07686d968dc..b0ce5680c95c1 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp
@@ -525,7 +525,7 @@ ALWAYS_INLINE USED void UnalignedMemoryAccess(ThreadState* thr, uptr pc,
 void ShadowSet(RawShadow* p, RawShadow* end, RawShadow v) {
   DCHECK_LE(p, end);
   DCHECK(IsShadowMem(p));
-  DCHECK(IsShadowMem(end));
+  DCHECK(p == end || IsShadowMem(end - 1));
   UNUSED const uptr kAlign = kShadowCnt * kShadowSize;
   DCHECK_EQ(reinterpret_cast<uptr>(p) % kAlign, 0);
   DCHECK_EQ(reinterpret_cast<uptr>(end) % kAlign, 0);
@@ -675,7 +675,7 @@ void MemoryAccessRangeT(ThreadState* thr, uptr pc, uptr addr, uptr size) {
     Printf("Access to non app mem start: %p\n", (void*)addr);
     DCHECK(IsAppMem(addr));
   }
-  if (!IsAppMem(addr + size - 1)) {
+  if (size > 0 && !IsAppMem(addr + size - 1)) {
     Printf("Access to non app mem end: %p\n", (void*)(addr + size - 1));
     DCHECK(IsAppMem(addr + size - 1));
   }
@@ -686,7 +686,7 @@ void MemoryAccessRangeT(ThreadState* thr, uptr pc, uptr addr, uptr size) {
 
   RawShadow* shadow_mem_end = reinterpret_cast<RawShadow*>(
       reinterpret_cast<uptr>(shadow_mem) + size * kShadowMultiplier - 1);
-  if (!IsShadowMem(shadow_mem_end)) {
+  if (size > 0 && !IsShadowMem(shadow_mem_end)) {
     Printf("Bad shadow end addr: %p (%p)\n", shadow_mem_end,
            (void*)(addr + size - 1));
     Printf(

@melver melver requested a review from dvyukov June 18, 2025 07:29
DCHECK(IsAppMem(addr));
}
if (!IsAppMem(addr + size - 1)) {
if (size > 0 && !IsAppMem(addr + size - 1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called from MemoryAccessRange, which checks if size==0, and since size is uptr size will always be > 0. Therefore this check is pointless.

Copy link
Contributor Author

@Camsyn Camsyn Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this.
But MemoryAccessRangeT did not guarantee size != 0 in its context -- is it possible for MemoryAccessRangeT to be called in the future without checking for size == 0?

It just so happens that these checks are wrapped in SANITIZER_DEBUG macro, so I think adding this check is better than adding an assertion DCHECK(size != 0).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But MemoryAccessRangeT did not guarantee size != 0 in its context -- is it possible for MemoryAccessRangeT to be called in the future without checking for size == 0?

We don't know.

But it's clear currently an invariant here is size != 0, so why not add a DCHECK and catch if that assumption changes in future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's clear currently an invariant here is size != 0, so why not add a DCHECK and catch if that assumption changes in future?

Done.

@melver melver requested a review from thurstond June 18, 2025 08:52
Copy link
Contributor

@thurstond thurstond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

@thurstond
Copy link
Contributor

@hctim FYI Meta is becoming exclusive

Copy link
Collaborator

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately most of sanitizer code uses inclusive end,
which is annoying given that typical C++ convention is exclusive end.

LGTM to incrementally change, especially if there is an issue.

@Camsyn Camsyn merged commit 681db06 into llvm:main Jun 19, 2025
7 checks passed
@Camsyn
Copy link
Contributor Author

Camsyn commented Jun 20, 2025

This change introduces some test fails of mmap_lots.cpp.

After analysis, I believe it is due to the following reasons:

Large mmap -> MemoryRangeSet -> only set the first and last page of shadow.

static void MemoryRangeSet(uptr addr, uptr size, RawShadow val) {
  if (size == 0)
    return;
  RawShadow* begin = MemToShadow(addr);
  // end == ShadowMem::end
  RawShadow* end = begin + size / kShadowCell * kShadowCnt;
  ...
  RawShadow* mid2 = RoundDown(end, kPageSize);
  // What if mid2 == end == ShadowMem::end?
  ShadowSet(mid2, end, val);
  

void ShadowSet(RawShadow* p, RawShadow* end, RawShadow v) {
  DCHECK_LE(p, end); // O: p == end
  DCHECK(IsShadowMem(p)); // X : p == ShadowMem::end 
  DCHECK(p == end || IsShadowMem(end - 1));

I think this actually exposes some degree of flaw in the current implementation.


Possible fixes:

  1. Do not allow p == end in ShadowSet
    • This is reasonable because both callers of ShadowSet (i.e., MemoryRangeSet and __tsan_java_move) are actually guaranteed that size > 0.
    • Although MemoryRangeSet's dedicated handling for overly large mmap bypasses this limit, we can append a guard that checks for size > 0, i.e., if (mid2 < end) ShadowSet(mid2, end, val);
    //   DCHECK_LE(p, end);
    DCHECK_LT(p, end);
  2. Modify the assertion in ShadowSet as follows:
    // DCHECK(IsShadowMem(p));
    DCHECK(p == end || IsShadowMem(p));
  3. ShadowSet : Just return while p == end.
     void ShadowSet(RawShadow* p, RawShadow* end, RawShadow v) {
        if (p == end) return;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants