Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions clang/include/clang/Basic/BuiltinsX86.td
Original file line number Diff line number Diff line change
Expand Up @@ -146,8 +146,13 @@ let Attributes = [Const, NoThrow, RequiredVectorWidth<256>], Features = "avx" in
// current formulation is based on what was easiest to recognize from the
// pre-TableGen version.

let Features = "mmx", Attributes = [NoThrow, Const] in {
def _mm_prefetch : X86NoPrefixBuiltin<"void(char const *, int)">;
let Features = "mmx", Header = "immintrin.h", Attributes = [NoThrow, Const] in {
def _mm_prefetch : X86LibBuiltin<"void(char const *, int)">;
}

let Features = "mmx", Header = "intrin.h", Attributes = [NoThrow, Const] in {
def _m_prefetch : X86LibBuiltin<"void(void *)">;
def _m_prefetchw : X86LibBuiltin<"void(void volatile const *)">;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefetchw should map to feature prfchw?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should! I looked at the Intel intrinsic documentation, and it said these intrinsics were part of the deprecated 3dnow ISA extension, and I wasn't sure what to. However, I took the time to check the Intel ISA manual and I updated this feature set and the _mm_prefetch feature check to "sse", since that seems to be the correct feature. PTAL, since I've expanded scope a bit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole thing is sort of confusing...

AMD originally implemented 3dnow including prefetch and prefetchw instructions. Intel then implemented SSE with different prefetch instructions... but didn't include one with a write hint. Later, they implemented prefetchw, and added a corresponding CPUID bit.

Modern LLVM never generates "prefetch"; _m_prefetch is actually lowered to the SSE prefetcht0.

_mm_prefetch(x, _MM_HINT_ET0) generates different instructions depending on the command-line: if the target only supports SSE, it generates prefetcht0. If it supports prefetchw (-mprfchw), it generates prefetchw.

I guess given that behavior, this feature mapping is probably fine?

}

let Features = "sse", Attributes = [NoThrow] in {
Expand Down
11 changes: 11 additions & 0 deletions clang/lib/CodeGen/CGBuiltin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15254,6 +15254,17 @@ Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned BuiltinID,
Function *F = CGM.getIntrinsic(Intrinsic::prefetch, Address->getType());
return Builder.CreateCall(F, {Address, RW, Locality, Data});
}
case X86::BI_m_prefetch:
case X86::BI_m_prefetchw: {
Value *Address = Ops[0];
// The 'w' suffix implies write.
Value *RW =
ConstantInt::get(Int32Ty, BuiltinID == X86::BI_m_prefetchw ? 1 : 0);
Value *Locality = ConstantInt::get(Int32Ty, 0x3);
Value *Data = ConstantInt::get(Int32Ty, 1);
Function *F = CGM.getIntrinsic(Intrinsic::prefetch, Address->getType());
return Builder.CreateCall(F, {Address, RW, Locality, Data});
}
case X86::BI_mm_clflush: {
return Builder.CreateCall(CGM.getIntrinsic(Intrinsic::x86_sse2_clflush),
Ops[0]);
Expand Down
23 changes: 10 additions & 13 deletions clang/lib/Headers/prfchwintrin.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@
#ifndef __PRFCHWINTRIN_H
#define __PRFCHWINTRIN_H

#if defined(__cplusplus)
extern "C" {
#endif

/// Loads a memory sequence containing the specified memory address into
/// all data cache levels.
///
Expand All @@ -26,11 +30,7 @@
///
/// \param __P
/// A pointer specifying the memory address to be prefetched.
static __inline__ void __attribute__((__always_inline__, __nodebug__))
_m_prefetch(void *__P)
{
__builtin_prefetch (__P, 0, 3 /* _MM_HINT_T0 */);
}
void _m_prefetch(void *__P);

/// Loads a memory sequence containing the specified memory address into
/// the L1 data cache and sets the cache-coherency state to modified.
Expand All @@ -48,13 +48,10 @@ _m_prefetch(void *__P)
///
/// \param __P
/// A pointer specifying the memory address to be prefetched.
static __inline__ void __attribute__((__always_inline__, __nodebug__))
_m_prefetchw(volatile const void *__P)
{
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wcast-qual"
__builtin_prefetch ((const void*)__P, 1, 3 /* _MM_HINT_T0 */);
#pragma clang diagnostic pop
}
void _m_prefetchw(volatile const void *__P);

#if defined(__cplusplus)
} // extern "C"
#endif

#endif /* __PRFCHWINTRIN_H */