Skip to content

Conversation

@jmorse
Copy link
Member

@jmorse jmorse commented Sep 26, 2024

As of rev ea222be, LLVMs assembler will actually try to honour the "fill value" part of p2align directives. X86 always prints these as 0x90, however, I don't believe that's what it actually wants. If you compile an LLVM-IR file with -filetype=obj, you'll get multi-byte nops for .text padding. If you go via a textual assembly file, you'll now get single-byte-nop padding. This divergent behaviour is undesirable IMO.

To fix: don't set the byte padding field for x86 which allows the assembler to pick multi-byte nops. Test that we get the same multi-byte padding when compiled via textual assembly or directly to object file. Added same-align-bytes-with-llasm-llobj.ll to that effect.

A whole load of test files get updated because of this change: the vast majority of them are co-incidental. The most suspicious ones are:

  • loop-align-debug.ll: this is actually checking for line-number assignments,
  • xray-tail-call-sled.ll: XRay installs a nop sled for reasons I don't understand, but it's independent of .p2align,
  • code-align-loops.ll: is checking that various IR constructs lead to .p2align directives, not what the padding is,

Everything else looked totally coincidental.

As of rev ea222be, LLVMs assembler will actually try to honour the "fill
value" part of p2align directives. X86 always prints these as 0x90,
however, I don't believe that's what it actually wants. If you compile an
LLVM-IR file with -filetype=obj, you'll get multi-byte nops for .text
padding.  If you go via a textual assembly file, you'll now get
single-byte-nop padding. This divergent behaviour is undesirable IMO.

To fix: don't set the byte padding field for x86, and test that we get the
same multi-byte padding when compiled via textual assembly. Added
same-align-bytes-with-llasm-llobj.ll to that effect.
@llvmbot
Copy link
Member

llvmbot commented Sep 26, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-backend-x86

Author: Jeremy Morse (jmorse)

Changes

As of rev ea222be, LLVMs assembler will actually try to honour the "fill value" part of p2align directives. X86 always prints these as 0x90, however, I don't believe that's what it actually wants. If you compile an LLVM-IR file with -filetype=obj, you'll get multi-byte nops for .text padding. If you go via a textual assembly file, you'll now get single-byte-nop padding. This divergent behaviour is undesirable IMO.

To fix: don't set the byte padding field for x86 which allows the assembler to pick multi-byte nops. Test that we get the same multi-byte padding when compiled via textual assembly or directly to object file. Added same-align-bytes-with-llasm-llobj.ll to that effect.

A whole load of test files get updated because of this change: the vast majority of them are co-incidental. The most suspicious ones are:

  • loop-align-debug.ll: this is actually checking for line-number assignments,
  • xray-tail-call-sled.ll: XRay installs a nop sled for reasons I don't understand, but it's independent of .p2align,
  • code-align-loops.ll: is checking that various IR constructs lead to .p2align directives, not what the padding is,

Everything else looked totally coincidental.


Patch is 568.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/110134.diff

216 Files Affected:

  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp (+8-4)
  • (modified) llvm/test/CodeGen/X86/2006-08-21-ExtraMovInst.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/2007-01-13-StackPtrIndex.ll (+17-17)
  • (modified) llvm/test/CodeGen/X86/2007-03-15-GEP-Idx-Sink.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/2007-10-12-CoalesceExtSubReg.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/2007-10-12-SpillerUnfold2.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/2007-11-06-InstrSched.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/2007-11-30-LoadFolding-Bug.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/2008-04-28-CoalescerBug.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/2008-08-06-CmpStride.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/2008-12-01-loop-iv-used-outside-loop.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/2009-02-26-MachineLICMBug.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/2009-04-25-CoalescerBug.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/2009-08-12-badswitch.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/2020_12_02_decrementing_loop.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/AMX/amx-across-func.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/AMX/amx-ldtilecfg-insert.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/AMX/amx-lower-tile-copy.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/AMX/amx-spill-merge.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/AMX/amx-tile-basic.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/MachineSink-Issue98477.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/MergeConsecutiveStores.ll (+41-41)
  • (modified) llvm/test/CodeGen/X86/PR71178-register-coalescer-crash.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/SwitchLowering.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/addr-mode-matcher-2.ll (+12-12)
  • (modified) llvm/test/CodeGen/X86/align-basic-block-sections.mir (+1-1)
  • (modified) llvm/test/CodeGen/X86/and-sink.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/apx/push2-pop2.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/apx/setzucc.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/assertzext-demanded.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/atom-pad-short-functions.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/atomic-bit-test.ll (+7-7)
  • (modified) llvm/test/CodeGen/X86/atomic-flags.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/atomic-idempotent.ll (+12-12)
  • (modified) llvm/test/CodeGen/X86/atomic-minmax-i6432.ll (+10-10)
  • (modified) llvm/test/CodeGen/X86/atomic-non-integer-fp128.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/atomic-non-integer.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/atomic-rm-bit-test-64.ll (+37-37)
  • (modified) llvm/test/CodeGen/X86/atomic-rm-bit-test.ll (+170-170)
  • (modified) llvm/test/CodeGen/X86/atomic-xor.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/atomic128.ll (+11-12)
  • (modified) llvm/test/CodeGen/X86/atomicrmw-cond-sub-clamp.ll (+16-16)
  • (modified) llvm/test/CodeGen/X86/atomicrmw-fadd-fp-vector.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/atomicrmw-uinc-udec-wrap.ll (+8-8)
  • (modified) llvm/test/CodeGen/X86/avx-cmp.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/avx-vbroadcast.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/avx-vzeroupper.ll (+8-8)
  • (modified) llvm/test/CodeGen/X86/avx2-vbroadcast.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/avx512-broadcast-unfold.ll (+132-132)
  • (modified) llvm/test/CodeGen/X86/avx512-i1test.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/avx512vnni-combine.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/avxvnni-combine.ll (+12-12)
  • (modified) llvm/test/CodeGen/X86/block-placement.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/break-false-dep.ll (+20-20)
  • (modified) llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/cast-vsel.ll (+8-8)
  • (modified) llvm/test/CodeGen/X86/cmpxchg-clobber-flags.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/cmpxchg-i128-i1.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/coalesce-esp.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/coalescer-breaks-subreg-to-reg-liveness-reduced.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/coalescer-commute1.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/coalescer-commute4.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/coalescer-dead-flag-verifier-error.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/code-align-loops.ll (+14-14)
  • (modified) llvm/test/CodeGen/X86/code_placement_align_all.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/combine-pmuldq.ll (+12-12)
  • (modified) llvm/test/CodeGen/X86/constant-pool-sharing.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/copy-eflags.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/dag-update-nodetomatch.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/fdiv-combine.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/fixup-lea.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/fma-commute-loop.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/fma-intrinsics-phi-213-to-231.ll (+24-24)
  • (modified) llvm/test/CodeGen/X86/fold-call-3.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/fold-loop-of-urem.ll (+28-28)
  • (modified) llvm/test/CodeGen/X86/fp-une-cmp.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/hoist-invariant-load.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/i128-mul.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/i386-shrink-wrapping.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/icmp-shift-opt.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/ifunc-asm.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/innermost-loop-alignment.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/ins_subreg_coalesce-3.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/issue76416.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/kcfi.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/known-bits.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/lea-opt-cse2.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/lea-opt-cse4.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/licm-symbol.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/loop-search.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/loop-strength-reduce5.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/loop-strength-reduce7.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/lsr-addrecloops.ll (+5-5)
  • (modified) llvm/test/CodeGen/X86/lsr-interesting-step.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/lsr-loop-exit-cond.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/lsr-negative-stride.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/lsr-sort.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/lsr-static-addr.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/machine-cp.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/machine-cse.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/madd.ll (+51-51)
  • (modified) llvm/test/CodeGen/X86/masked-iv-safe.ll (+8-8)
  • (modified) llvm/test/CodeGen/X86/masked-iv-unsafe.ll (+13-13)
  • (modified) llvm/test/CodeGen/X86/merge_store.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/min-legal-vector-width.ll (+12-12)
  • (modified) llvm/test/CodeGen/X86/mmx-arith.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/negative-stride-fptosi-user.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/optimize-max-0.ll (+10-10)
  • (modified) llvm/test/CodeGen/X86/optimize-max-1.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/optimize-max-2.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/or-address.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/overflowing-iv-codegen.ll (+5-5)
  • (modified) llvm/test/CodeGen/X86/patchable-prologue.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/pcsections-atomics.ll (+177-177)
  • (modified) llvm/test/CodeGen/X86/peep-test-0.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/peep-test-1.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/peephole-copy.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pic-load-remat.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/postalloc-coalescing.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr14314.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr22338.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/pr30562.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr32108.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr33290.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/pr33747.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/pr37916.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr38185.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr38217.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr38539.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr38795.ll (+10-10)
  • (modified) llvm/test/CodeGen/X86/pr42565.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr42909.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr43529.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr44140.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr44412.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/pr47874.ll (+8-8)
  • (modified) llvm/test/CodeGen/X86/pr49393.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr49451.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/pr50374.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr50782.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr51371.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/pr5145.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/pr51615.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/pr53842.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr53990-incorrect-machine-sink.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr55648.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr61923.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr63108.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/pr63692.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr65895.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr68539.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr93000.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/promote-sra-by-itself.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll (+12-12)
  • (modified) llvm/test/CodeGen/X86/rdrand.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/retpoline.ll (+5-5)
  • (modified) llvm/test/CodeGen/X86/reverse_branches.ll (+5-5)
  • (modified) llvm/test/CodeGen/X86/sad.ll (+17-17)
  • (modified) llvm/test/CodeGen/X86/saddo-redundant-add.ll (+1-1)
  • (added) llvm/test/CodeGen/X86/same-align-bytes-with-llasm-llobj.ll (+46)
  • (modified) llvm/test/CodeGen/X86/scalar_widen_div.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/setcc-lowering.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/setcc-non-simple-type.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/shift-parts.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/sink-out-of-loop.ll (+5-5)
  • (modified) llvm/test/CodeGen/X86/speculative-load-hardening.ll (+8-8)
  • (modified) llvm/test/CodeGen/X86/split-extend-vector-inreg.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/sse-domains.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/stack-coloring-wineh.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/switch.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/tail-dup-merge-loop-headers.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/tail-dup-multiple-latch-loop.ll (+7-7)
  • (modified) llvm/test/CodeGen/X86/tail-dup-partial.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/tail-dup-repeat.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/tailcall-cgp-dup.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/tls-loads-control3.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/trunc-store.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/twoaddr-coalesce.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/twoaddr-lea.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/unaligned-load.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/undef-label.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/vec_setcc-2.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/vector-fshl-128.ll (+12-12)
  • (modified) llvm/test/CodeGen/X86/vector-fshl-256.ll (+10-10)
  • (modified) llvm/test/CodeGen/X86/vector-pack-128.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/vector-shift-by-select-loop.ll (+12-12)
  • (modified) llvm/test/CodeGen/X86/vector-shuffle-combining.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/vselect-avx.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/widen_arith-1.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/widen_arith-2.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/widen_arith-3.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/widen_arith-4.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/widen_arith-5.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/widen_arith-6.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/widen_cast-1.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/widen_cast-2.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/widen_cast-4.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/x86-shrink-wrapping.ll (+24-24)
  • (modified) llvm/test/CodeGen/X86/x86-win64-shrink-wrapping.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/xor.ll (+12-12)
  • (modified) llvm/test/CodeGen/X86/xray-attribute-instrumentation.ll (+5-5)
  • (modified) llvm/test/CodeGen/X86/xray-custom-log.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/xray-partial-instrumentation-skip-entry.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/xray-partial-instrumentation-skip-exit.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/xray-selective-instrumentation.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/xray-tail-call-sled.ll (+7-7)
  • (modified) llvm/test/DebugInfo/COFF/pieces.ll (+1-1)
  • (modified) llvm/test/DebugInfo/X86/header.ll (+1-1)
  • (modified) llvm/test/DebugInfo/X86/loop-align-debug.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopStrengthReduce/X86/2011-11-29-postincphi.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopStrengthReduce/X86/lsr-insns-1.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopStrengthReduce/X86/macro-fuse-cmp.ll (+2-2)
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
index 3ce044387ada29..5f5c4055bf1ba1 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
@@ -43,7 +43,8 @@ X86MCAsmInfoDarwin::X86MCAsmInfoDarwin(const Triple &T) {
 
   AssemblerDialect = AsmWriterFlavor;
 
-  TextAlignFillValue = 0x90;
+  // This will be padded with appropriately sized nops.
+  TextAlignFillValue = 0;
 
   if (!is64Bit)
     Data64bitsDirective = nullptr;       // we can't emit a 64-bit unit
@@ -93,7 +94,8 @@ X86ELFMCAsmInfo::X86ELFMCAsmInfo(const Triple &T) {
 
   AssemblerDialect = AsmWriterFlavor;
 
-  TextAlignFillValue = 0x90;
+  // This will be padded with appropriately sized nops.
+  TextAlignFillValue = 0;
 
   // Debug Information
   SupportsDebugInformation = true;
@@ -132,7 +134,8 @@ X86MCAsmInfoMicrosoft::X86MCAsmInfoMicrosoft(const Triple &Triple) {
 
   AssemblerDialect = AsmWriterFlavor;
 
-  TextAlignFillValue = 0x90;
+  // This will be padded with appropriately sized nops.
+  TextAlignFillValue = 0;
 
   AllowAtInName = true;
 }
@@ -167,7 +170,8 @@ X86MCAsmInfoGNUCOFF::X86MCAsmInfoGNUCOFF(const Triple &Triple) {
 
   AssemblerDialect = AsmWriterFlavor;
 
-  TextAlignFillValue = 0x90;
+  // This will be padded with appropriately sized nops.
+  TextAlignFillValue = 0;
 
   AllowAtInName = true;
 }
diff --git a/llvm/test/CodeGen/X86/2006-08-21-ExtraMovInst.ll b/llvm/test/CodeGen/X86/2006-08-21-ExtraMovInst.ll
index ac749bccb3c55d..f3bdf561a94569 100644
--- a/llvm/test/CodeGen/X86/2006-08-21-ExtraMovInst.ll
+++ b/llvm/test/CodeGen/X86/2006-08-21-ExtraMovInst.ll
@@ -7,7 +7,7 @@ define i32 @foo(i32 %t, i32 %C) {
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %ecx
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
 ; CHECK-NEXT:    decl %eax
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_1: # %cond_true
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    incl %eax
diff --git a/llvm/test/CodeGen/X86/2007-01-13-StackPtrIndex.ll b/llvm/test/CodeGen/X86/2007-01-13-StackPtrIndex.ll
index 46dddd8fcd851a..1e5ee2f71d9b47 100644
--- a/llvm/test/CodeGen/X86/2007-01-13-StackPtrIndex.ll
+++ b/llvm/test/CodeGen/X86/2007-01-13-StackPtrIndex.ll
@@ -27,7 +27,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    js .LBB0_14
 ; CHECK-NEXT:  # %bb.12:
 ; CHECK-NEXT:    xorl %r8d, %r8d
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_13: # %a25b
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    testb %r8b, %r8b
@@ -38,7 +38,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    jne .LBB0_1
 ; CHECK-NEXT:  # %bb.15:
 ; CHECK-NEXT:    xorl %r8d, %r8d
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_16: # %a25b140
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    testb %r8b, %r8b
@@ -56,7 +56,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    xorps %xmm0, %xmm0
 ; CHECK-NEXT:    movb $1, %r10b
 ; CHECK-NEXT:    jmp .LBB0_3
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_9: # %b1606
 ; CHECK-NEXT:    # in Loop: Header=BB0_3 Depth=1
 ; CHECK-NEXT:    testb %r9b, %r9b
@@ -83,7 +83,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    # in Loop: Header=BB0_3 Depth=1
 ; CHECK-NEXT:    testq %rdx, %rdx
 ; CHECK-NEXT:    js .LBB0_18
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_36: # %a30b
 ; CHECK-NEXT:    # Parent Loop BB0_3 Depth=1
 ; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
@@ -93,7 +93,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    # in Loop: Header=BB0_3 Depth=1
 ; CHECK-NEXT:    testb %r10b, %r10b
 ; CHECK-NEXT:    jne .LBB0_4
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_19: # %a30b294
 ; CHECK-NEXT:    # Parent Loop BB0_3 Depth=1
 ; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
@@ -115,7 +115,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    # in Loop: Header=BB0_3 Depth=1
 ; CHECK-NEXT:    testb %r8b, %r8b
 ; CHECK-NEXT:    jne .LBB0_8
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_33: # %a74b
 ; CHECK-NEXT:    # Parent Loop BB0_3 Depth=1
 ; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
@@ -128,7 +128,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    # in Loop: Header=BB0_3 Depth=1
 ; CHECK-NEXT:    testl %eax, %eax
 ; CHECK-NEXT:    js .LBB0_9
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_35: # %a97b
 ; CHECK-NEXT:    # Parent Loop BB0_3 Depth=1
 ; CHECK-NEXT:    # => This Inner Loop Header: Depth=2
@@ -142,7 +142,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    testb %r9b, %r9b
 ; CHECK-NEXT:    jne .LBB0_35
 ; CHECK-NEXT:    jmp .LBB0_9
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_21: # %b377
 ; CHECK-NEXT:    # in Loop: Header=BB0_20 Depth=2
 ; CHECK-NEXT:    testb %r9b, %r9b
@@ -153,7 +153,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    # Child Loop BB0_37 Depth 3
 ; CHECK-NEXT:    testq %rsi, %rsi
 ; CHECK-NEXT:    js .LBB0_21
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_37: # %a35b
 ; CHECK-NEXT:    # Parent Loop BB0_3 Depth=1
 ; CHECK-NEXT:    # Parent Loop BB0_20 Depth=2
@@ -161,7 +161,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    testb %r9b, %r9b
 ; CHECK-NEXT:    je .LBB0_37
 ; CHECK-NEXT:    jmp .LBB0_21
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_27: # %b1016
 ; CHECK-NEXT:    # in Loop: Header=BB0_25 Depth=2
 ; CHECK-NEXT:    testq %rsi, %rsi
@@ -173,7 +173,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    # Child Loop BB0_28 Depth 3
 ; CHECK-NEXT:    testq %rdx, %rdx
 ; CHECK-NEXT:    js .LBB0_26
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_38: # %a53b
 ; CHECK-NEXT:    # Parent Loop BB0_3 Depth=1
 ; CHECK-NEXT:    # Parent Loop BB0_25 Depth=2
@@ -184,7 +184,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    # in Loop: Header=BB0_25 Depth=2
 ; CHECK-NEXT:    testb %r10b, %r10b
 ; CHECK-NEXT:    jne .LBB0_27
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_28: # %a53b1019
 ; CHECK-NEXT:    # Parent Loop BB0_3 Depth=1
 ; CHECK-NEXT:    # Parent Loop BB0_25 Depth=2
@@ -192,7 +192,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    testq %rdx, %rdx
 ; CHECK-NEXT:    jle .LBB0_28
 ; CHECK-NEXT:    jmp .LBB0_27
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_31: # %b1263
 ; CHECK-NEXT:    # in Loop: Header=BB0_29 Depth=2
 ; CHECK-NEXT:    testq %rdx, %rdx
@@ -204,7 +204,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    # Child Loop BB0_32 Depth 3
 ; CHECK-NEXT:    testq %rsi, %rsi
 ; CHECK-NEXT:    js .LBB0_30
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_39: # %a63b
 ; CHECK-NEXT:    # Parent Loop BB0_3 Depth=1
 ; CHECK-NEXT:    # Parent Loop BB0_29 Depth=2
@@ -215,7 +215,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    # in Loop: Header=BB0_29 Depth=2
 ; CHECK-NEXT:    testq %rsi, %rsi
 ; CHECK-NEXT:    jle .LBB0_31
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_32: # %a63b1266
 ; CHECK-NEXT:    # Parent Loop BB0_3 Depth=1
 ; CHECK-NEXT:    # Parent Loop BB0_29 Depth=2
@@ -223,7 +223,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    testq %rsi, %rsi
 ; CHECK-NEXT:    jle .LBB0_32
 ; CHECK-NEXT:    jmp .LBB0_31
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_24: # %b712
 ; CHECK-NEXT:    # in Loop: Header=BB0_22 Depth=2
 ; CHECK-NEXT:    testb %r9b, %r9b
@@ -234,7 +234,7 @@ define dso_local void @foo(ptr %a0, ptr %a1, ptr %a2, ptr %a3, ptr %a4, ptr %a5)
 ; CHECK-NEXT:    # Child Loop BB0_23 Depth 3
 ; CHECK-NEXT:    testq %rdx, %rdx
 ; CHECK-NEXT:    js .LBB0_24
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_23: # %a45b
 ; CHECK-NEXT:    # Parent Loop BB0_3 Depth=1
 ; CHECK-NEXT:    # Parent Loop BB0_22 Depth=2
diff --git a/llvm/test/CodeGen/X86/2007-03-15-GEP-Idx-Sink.ll b/llvm/test/CodeGen/X86/2007-03-15-GEP-Idx-Sink.ll
index f21aaca7ca5f17..49e2bf207e52a8 100644
--- a/llvm/test/CodeGen/X86/2007-03-15-GEP-Idx-Sink.ll
+++ b/llvm/test/CodeGen/X86/2007-03-15-GEP-Idx-Sink.ll
@@ -15,7 +15,7 @@ define void @foo(ptr %buf, i32 %size, i32 %col, ptr %p) nounwind {
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %esi
 ; CHECK-NEXT:    addl $8, %ecx
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  LBB0_2: ## %bb
 ; CHECK-NEXT:    ## =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    movl (%esi), %edi
diff --git a/llvm/test/CodeGen/X86/2007-10-12-CoalesceExtSubReg.ll b/llvm/test/CodeGen/X86/2007-10-12-CoalesceExtSubReg.ll
index 2f75ab29e708fe..cfb3e508576dda 100644
--- a/llvm/test/CodeGen/X86/2007-10-12-CoalesceExtSubReg.ll
+++ b/llvm/test/CodeGen/X86/2007-10-12-CoalesceExtSubReg.ll
@@ -9,7 +9,7 @@ define signext i16 @f(ptr %bp, ptr %ss)   {
 ; CHECK-NEXT:    .cfi_offset %esi, -8
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_1: # %cond_next127
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    movl (%eax), %edx
diff --git a/llvm/test/CodeGen/X86/2007-10-12-SpillerUnfold2.ll b/llvm/test/CodeGen/X86/2007-10-12-SpillerUnfold2.ll
index f9996e2df50e0e..6ebb97d63e7c65 100644
--- a/llvm/test/CodeGen/X86/2007-10-12-SpillerUnfold2.ll
+++ b/llvm/test/CodeGen/X86/2007-10-12-SpillerUnfold2.ll
@@ -6,7 +6,7 @@ define signext   i16 @t(ptr %qmatrix, ptr %dct, ptr %acBaseTable, ptr %acExtTabl
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %ecx
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_1: # %cond_next127
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    movl %eax, %edx
diff --git a/llvm/test/CodeGen/X86/2007-11-06-InstrSched.ll b/llvm/test/CodeGen/X86/2007-11-06-InstrSched.ll
index 750d06d9e6031f..bbce246a5d394a 100644
--- a/llvm/test/CodeGen/X86/2007-11-06-InstrSched.ll
+++ b/llvm/test/CodeGen/X86/2007-11-06-InstrSched.ll
@@ -14,7 +14,7 @@ define float @foo(ptr %x, ptr %y, i32 %c) nounwind {
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %edx
 ; CHECK-NEXT:    xorps %xmm0, %xmm0
 ; CHECK-NEXT:    xorl %esi, %esi
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_3: # %bb18
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    xorps %xmm1, %xmm1
diff --git a/llvm/test/CodeGen/X86/2007-11-30-LoadFolding-Bug.ll b/llvm/test/CodeGen/X86/2007-11-30-LoadFolding-Bug.ll
index 68566c7b370979..8d690ba06e3bd6 100644
--- a/llvm/test/CodeGen/X86/2007-11-30-LoadFolding-Bug.ll
+++ b/llvm/test/CodeGen/X86/2007-11-30-LoadFolding-Bug.ll
@@ -16,7 +16,7 @@ define fastcc void @mp_sqrt(i32 %n, i32 %radix, ptr %in, ptr %out, ptr %tmp1, pt
 ; CHECK-NEXT:    movb $1, %cl
 ; CHECK-NEXT:    movl $1, %ebx
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %esi
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_1: # %bb.i5
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    movl %ecx, %eax
@@ -37,7 +37,7 @@ define fastcc void @mp_sqrt(i32 %n, i32 %radix, ptr %in, ptr %out, ptr %tmp1, pt
 ; CHECK-NEXT:    xorl %eax, %eax
 ; CHECK-NEXT:    xorl %ecx, %ecx
 ; CHECK-NEXT:    xorpd %xmm1, %xmm1
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_7: # %bb.i28.i
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    cvttsd2si %xmm1, %edi
@@ -85,7 +85,7 @@ define fastcc void @mp_sqrt(i32 %n, i32 %radix, ptr %in, ptr %out, ptr %tmp1, pt
 ; CHECK-NEXT:    popl %ebx
 ; CHECK-NEXT:    popl %ebp
 ; CHECK-NEXT:    retl
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_9: # %bb.i.i
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    jmp .LBB0_9
diff --git a/llvm/test/CodeGen/X86/2008-04-28-CoalescerBug.ll b/llvm/test/CodeGen/X86/2008-04-28-CoalescerBug.ll
index 8e6d2c11b7b3dc..c95fc00b3ee6d4 100644
--- a/llvm/test/CodeGen/X86/2008-04-28-CoalescerBug.ll
+++ b/llvm/test/CodeGen/X86/2008-04-28-CoalescerBug.ll
@@ -16,13 +16,13 @@ define void @t(ptr %depth, ptr %bop, i32 %mode) nounwind  {
 ; CHECK-NEXT:  ## %bb.1: ## %entry
 ; CHECK-NEXT:    cmpl $1, %edx
 ; CHECK-NEXT:    jne LBB0_10
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  LBB0_2: ## %bb2898.us
 ; CHECK-NEXT:    ## =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    jmp LBB0_2
 ; CHECK-NEXT:  LBB0_3: ## %bb13086.preheader
 ; CHECK-NEXT:    movb $1, %al
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  LBB0_4: ## %bb13088
 ; CHECK-NEXT:    ## =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    testb %al, %al
@@ -31,7 +31,7 @@ define void @t(ptr %depth, ptr %bop, i32 %mode) nounwind  {
 ; CHECK-NEXT:    ## in Loop: Header=BB0_4 Depth=1
 ; CHECK-NEXT:    xorl %ecx, %ecx
 ; CHECK-NEXT:    jmp LBB0_7
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  LBB0_5: ## in Loop: Header=BB0_4 Depth=1
 ; CHECK-NEXT:    movl $65535, %ecx ## imm = 0xFFFF
 ; CHECK-NEXT:  LBB0_7: ## %bb13107
diff --git a/llvm/test/CodeGen/X86/2008-08-06-CmpStride.ll b/llvm/test/CodeGen/X86/2008-08-06-CmpStride.ll
index ca92c555058abb..5086ed40a43a21 100644
--- a/llvm/test/CodeGen/X86/2008-08-06-CmpStride.ll
+++ b/llvm/test/CodeGen/X86/2008-08-06-CmpStride.ll
@@ -10,7 +10,7 @@ define i32 @main() nounwind {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    pushq %rbx
 ; CHECK-NEXT:    movl $10271, %ebx # imm = 0x281F
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_1: # %forbody
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    movl $.str, %edi
diff --git a/llvm/test/CodeGen/X86/2008-12-01-loop-iv-used-outside-loop.ll b/llvm/test/CodeGen/X86/2008-12-01-loop-iv-used-outside-loop.ll
index 18d3cec442c6c2..c2a7d6be8baa00 100644
--- a/llvm/test/CodeGen/X86/2008-12-01-loop-iv-used-outside-loop.ll
+++ b/llvm/test/CodeGen/X86/2008-12-01-loop-iv-used-outside-loop.ll
@@ -11,7 +11,7 @@ define ptr @test(ptr %Q, ptr %L) nounwind {
 ; CHECK:       ## %bb.0: ## %entry
 ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
 ; CHECK-NEXT:    jmp LBB0_2
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  LBB0_1: ## %bb
 ; CHECK-NEXT:    ## in Loop: Header=BB0_2 Depth=1
 ; CHECK-NEXT:    incl %eax
diff --git a/llvm/test/CodeGen/X86/2009-02-26-MachineLICMBug.ll b/llvm/test/CodeGen/X86/2009-02-26-MachineLICMBug.ll
index 7807d49269e64c..c421541001c5d8 100644
--- a/llvm/test/CodeGen/X86/2009-02-26-MachineLICMBug.ll
+++ b/llvm/test/CodeGen/X86/2009-02-26-MachineLICMBug.ll
@@ -20,7 +20,7 @@ define ptr @t(ptr %desc, i64 %p) nounwind ssp {
 ; CHECK-NEXT:    movq %rdi, %r14
 ; CHECK-NEXT:    orq $2097152, %rbx ## imm = 0x200000
 ; CHECK-NEXT:    andl $15728640, %ebx ## imm = 0xF00000
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  LBB0_1: ## %bb4
 ; CHECK-NEXT:    ## =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    xorl %eax, %eax
diff --git a/llvm/test/CodeGen/X86/2009-04-25-CoalescerBug.ll b/llvm/test/CodeGen/X86/2009-04-25-CoalescerBug.ll
index ce28893090c43f..1dd30e82630992 100644
--- a/llvm/test/CodeGen/X86/2009-04-25-CoalescerBug.ll
+++ b/llvm/test/CodeGen/X86/2009-04-25-CoalescerBug.ll
@@ -8,7 +8,7 @@ define i64 @test(ptr %tmp13) nounwind {
 ; CHECK-NEXT:    movl (%rdi), %ecx
 ; CHECK-NEXT:    movl %ecx, %eax
 ; CHECK-NEXT:    shrl %eax
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB0_1: # %while.cond
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    testb $1, %cl
diff --git a/llvm/test/CodeGen/X86/2009-08-12-badswitch.ll b/llvm/test/CodeGen/X86/2009-08-12-badswitch.ll
index 4b8085a995f083..7050889d71029c 100644
--- a/llvm/test/CodeGen/X86/2009-08-12-badswitch.ll
+++ b/llvm/test/CodeGen/X86/2009-08-12-badswitch.ll
@@ -123,7 +123,7 @@ define internal fastcc i32 @foo(i64 %bar) nounwind ssp {
 ; CHECK-NEXT:    xorl %eax, %eax
 ; CHECK-NEXT:    popq %rcx
 ; CHECK-NEXT:    retq
-; CHECK-NEXT:    .p2align 2, 0x90
+; CHECK-NEXT:    .p2align 2
 ; CHECK-NEXT:    .data_region jt32
 ; CHECK-NEXT:  .set L0_0_set_3, LBB0_3-LJTI0_0
 ; CHECK-NEXT:  .set L0_0_set_4, LBB0_4-LJTI0_0
diff --git a/llvm/test/CodeGen/X86/2020_12_02_decrementing_loop.ll b/llvm/test/CodeGen/X86/2020_12_02_decrementing_loop.ll
index 43b52898c79a2c..22bf4581c6b42a 100644
--- a/llvm/test/CodeGen/X86/2020_12_02_decrementing_loop.ll
+++ b/llvm/test/CodeGen/X86/2020_12_02_decrementing_loop.ll
@@ -4,7 +4,7 @@
 define i32 @test_01(ptr %p, i64 %len, i32 %x) {
 ; CHECK-LABEL: test_01:
 ; CHECK:       ## %bb.0: ## %entry
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  LBB0_1: ## %loop
 ; CHECK-NEXT:    ## =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    subq $1, %rsi
@@ -44,7 +44,7 @@ failure:                                          ; preds = %backedge
 define i32 @test_01a(ptr %p, i64 %len, i32 %x) {
 ; CHECK-LABEL: test_01a:
 ; CHECK:       ## %bb.0: ## %entry
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  LBB1_1: ## %loop
 ; CHECK-NEXT:    ## =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    subq $1, %rsi
@@ -84,7 +84,7 @@ failure:                                          ; preds = %backedge
 define i32 @test_02(ptr %p, i64 %len, i32 %x) {
 ; CHECK-LABEL: test_02:
 ; CHECK:       ## %bb.0: ## %entry
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  LBB2_1: ## %loop
 ; CHECK-NEXT:    ## =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    subq $1, %rsi
@@ -126,7 +126,7 @@ failure:                                          ; preds = %backedge
 define i32 @test_03(ptr %p, i64 %len, i32 %x) {
 ; CHECK-LABEL: test_03:
 ; CHECK:       ## %bb.0: ## %entry
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  LBB3_1: ## %loop
 ; CHECK-NEXT:    ## =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    subq $1, %rsi
diff --git a/llvm/test/CodeGen/X86/AMX/amx-across-func.ll b/llvm/test/CodeGen/X86/AMX/amx-across-func.ll
index ae0be9b5a5bcd9..320c96535abba0 100644
--- a/llvm/test/CodeGen/X86/AMX/amx-across-func.ll
+++ b/llvm/test/CodeGen/X86/AMX/amx-across-func.ll
@@ -235,7 +235,7 @@ define dso_local i32 @test_loop(i32 %0) nounwind {
 ; CHECK-NEXT:    movl $32, %r15d
 ; CHECK-NEXT:    movw $8, %r12w
 ; CHECK-NEXT:    movl $buf+2048, %r13d
-; CHECK-NEXT:    .p2align 4, 0x90
+; CHECK-NEXT:    .p2align 4
 ; CHECK-NEXT:  .LBB2_2: # =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    tileloadd (%r14,%r15), %tmm0
 ; CHECK-NEXT:    movabsq $64, %rax
@@ -300,7 +300,7 @@ define dso_local i32 @test_loop(i32 %0) nounwind {
 ; IPRA-NEXT:    movl $32, %esi
 ; IPRA-NEXT:    movw $8, %di
 ; IPRA-NEXT:    movl $buf+2048, %r8d
-; IPRA-NEXT:    .p2align 4, 0x90
+; IPRA-NEXT:    .p2align 4
 ; IPRA-NEXT:  .LBB2_2: # =>This Inner Loop Header: Depth=1
 ; IPRA-NEXT:    tileloadd (%rdx,%rsi), %tmm0
 ; IPRA-NEXT:    callq foo
@@ -494,7 +494,7 @@ define dso_local void @test_loop2(i32 %0) nounwind {
 ; CHECK-...
[truncated]

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Is there any test that uses .p2align 4{{$}} so that it will catch .p2align 4, 0x90 regression?

@lenary
Copy link
Member

lenary commented Sep 26, 2024

I agree it's undesirable for the text assembler to do something different to integrated assembler. Sorry I didn't notice this before.

x86 is the only (upstream) target using TextAlignFillValue - maybe that whole mechanism in MCAsmInfo should be removed if it will only be set upstream to the default value (0)?

At the very least, the x86 target doesn't need to set this to its documented default value, so you can probably delete all assignments to TextAlignFillValue.

@lenary
Copy link
Member

lenary commented Sep 26, 2024

Also, sorry for the behaviour divergence. I'm glad I timed my PR to land just after branching so we had time to find issues like this

@MaskRay
Copy link
Member

MaskRay commented Sep 26, 2024

Agreed that TextAlignFillValue = 0 can be removed as x86 has the only use (and downstream targets tend to be RISC and unlikely need such mechanism).

Also check for p2align not having any trailing operands
@jmorse
Copy link
Member Author

jmorse commented Oct 1, 2024

Thanks for the reviews; I've added a {{$}} to same-align-bytes-with-llasm-llobj.ll to ensure there are no trailing operands to p2align, and removed the now redundant assignments to TextAlignFillValue

Also, sorry for the behaviour divergence. I'm glad I timed my PR to land just after branching so we had time to find issues like this

Timing greatly appreciated, this did indeed take a decent period to filter through to the right tests.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Oct 1, 2024
@jmorse jmorse merged commit e6bf48d into llvm:main Oct 2, 2024
5 checks passed
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Oct 3, 2024
As of rev ea222be, LLVMs assembler will actually try to honour the
"fill value" part of p2align directives. X86 printed these as 0x90, which
isn't actually what it wanted: we want multi-byte nops for .text
padding. Compiling via a textual assembly file produces single-byte
nop padding since ea222be but the built-in assembler will produce
multi-byte nops. This divergent behaviour is undesirable.

To fix: don't set the byte padding field for x86, which allows the
assembler to pick multi-byte nops. Test that we get the same multi-byte
padding when compiled via textual assembly or directly to object file.
Added same-align-bytes-with-llasm-llobj.ll to that effect, updated
numerous other tests to not contain check-lines for the explicit padding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86 clang Clang issues not falling into any other category debuginfo llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants