[X86][APX] Do optimizeMemoryInst for v1X masked load/store #151331

phoebewang · 2025-07-30T13:18:32Z

Fix redundant LEA: https://godbolt.org/z/34xEYE818

Fix redundant LEA: https://godbolt.org/z/hrP1eox4Y

llvmbot · 2025-07-30T13:19:10Z

@llvm/pr-subscribers-backend-x86

Author: Phoebe Wang (phoebewang)

Changes

Fix redundant LEA: https://godbolt.org/z/hrP1eox4Y

Full diff: https://github.com/llvm/llvm-project/pull/151331.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/CodeGenPrepare.cpp (+23)
(modified) llvm/test/CodeGen/X86/apx/cf.ll (+19)

diff --git a/llvm/lib/CodeGen/CodeGenPrepare.cpp b/llvm/lib/CodeGen/CodeGenPrepare.cpp
index 416c56d5a36f8..f16283be1b996 100644
--- a/llvm/lib/CodeGen/CodeGenPrepare.cpp
+++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp
@@ -2769,6 +2769,29 @@ bool CodeGenPrepare::optimizeCallInst(CallInst *CI, ModifyDT &ModifiedDT) {
       return optimizeGatherScatterInst(II, II->getArgOperand(0));
     case Intrinsic::masked_scatter:
       return optimizeGatherScatterInst(II, II->getArgOperand(1));
+    case Intrinsic::masked_load:
+      // Treat v1X masked load as load X type.
+      if (auto *VT = dyn_cast<FixedVectorType>(II->getType())) {
+        if (VT->getNumElements() == 1) {
+          Value *PtrVal = II->getArgOperand(0);
+          unsigned AS = PtrVal->getType()->getPointerAddressSpace();
+          if (optimizeMemoryInst(II, PtrVal, VT->getElementType(), AS))
+            return true;
+        }
+      }
+      return false;
+    case Intrinsic::masked_store:
+      // Treat v1X masked store as store X type.
+      if (auto *VT =
+              dyn_cast<FixedVectorType>(II->getArgOperand(0)->getType())) {
+        if (VT->getNumElements() == 1) {
+          Value *PtrVal = II->getArgOperand(1);
+          unsigned AS = PtrVal->getType()->getPointerAddressSpace();
+          if (optimizeMemoryInst(II, PtrVal, VT->getElementType(), AS))
+            return true;
+        }
+      }
+      return false;
     }
 
     SmallVector<Value *, 2> PtrOps;
diff --git a/llvm/test/CodeGen/X86/apx/cf.ll b/llvm/test/CodeGen/X86/apx/cf.ll
index b111ae542d93a..8c9869207f775 100644
--- a/llvm/test/CodeGen/X86/apx/cf.ll
+++ b/llvm/test/CodeGen/X86/apx/cf.ll
@@ -194,3 +194,22 @@ entry:
   call void @llvm.masked.store.v1i64.p0(<1 x i64> %3, ptr %p, i32 4, <1 x i1> %0)
   ret void
 }
+
+define void @sink_gep(ptr %p, i1 %cond) {
+; CHECK-LABEL: sink_gep:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    testb $1, %sil
+; CHECK-NEXT:    cfcmovnel %eax, 112(%rdi)
+; CHECK-NEXT:    movl $0, (%rdi)
+; CHECK-NEXT:    retq
+entry:
+  %0 = getelementptr i8, ptr %p, i64 112
+  br label %next
+
+next:
+  %1 = bitcast i1 %cond to <1 x i1>
+  call void @llvm.masked.store.v1i32.p0(<1 x i32> zeroinitializer, ptr %0, i32 1, <1 x i1> %1)
+  store i32 0, ptr %p, align 4
+  ret void
+}

KanRobert · 2025-07-30T14:20:01Z

llvm/lib/CodeGen/CodeGenPrepare.cpp

@@ -2769,6 +2769,29 @@ bool CodeGenPrepare::optimizeCallInst(CallInst *CI, ModifyDT &ModifiedDT) {
      return optimizeGatherScatterInst(II, II->getArgOperand(0));
    case Intrinsic::masked_scatter:
      return optimizeGatherScatterInst(II, II->getArgOperand(1));
+    case Intrinsic::masked_load:
+      // Treat v1X masked load as load X type.
+      if (auto *VT = dyn_cast<FixedVectorType>(II->getType())) {


Add a test for load too?

We have one in llvm/test/CodeGen/X86/isel-sink.ll: https://godbolt.org/z/5zhjr4ses

I misunderstood your point. Added in f51c6bd

llvm-ci · 2025-07-31T03:59:51Z

LLVM Buildbot has detected a new failure on builder mlir-nvidia running on mlir-nvidia while building llvm at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/16879

Here is the relevant piece of the build log for the reference

Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x56a5d3d6b100 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x56a5d3d6b100 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

llvm-ci · 2025-07-31T04:08:39Z

LLVM Buildbot has detected a new failure on builder clang-armv8-quick running on linaro-clang-armv8-quick while building llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/154/builds/19558

Here is the relevant piece of the build log for the reference

Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'Clangd Unit Tests :: ./ClangdTests/76/165' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/tcwg-buildbot/worker/clang-armv8-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests-Clangd Unit Tests-1686115-76-165.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=165 GTEST_SHARD_INDEX=76 /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/tools/clang/tools/extra/clangd/unittests/./ClangdTests
--

Note: This is test shard 77 of 165.
[==========] Running 8 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 1 test from LSPTest
[ RUN      ] LSPTest.FeatureModulesThreadingTest
[       OK ] LSPTest.FeatureModulesThreadingTest (8 ms)
[----------] 1 test from LSPTest (8 ms total)

[----------] 1 test from CompletionStringTest
[ RUN      ] CompletionStringTest.GetDeclCommentBadUTF8
Built preamble of size 341332 for file /clangd-test/TestTU.cpp version null in 0.03 seconds
[       OK ] CompletionStringTest.GetDeclCommentBadUTF8 (67 ms)
[----------] 1 test from CompletionStringTest (67 ms total)

[----------] 1 test from IncludeFixerTest
[ RUN      ] IncludeFixerTest.NoCrashMemberAccess
Built preamble of size 341332 for file /clangd-test/TestTU.cpp version null in 0.01 seconds
[       OK ] IncludeFixerTest.NoCrashMemberAccess (19 ms)
[----------] 1 test from IncludeFixerTest (19 ms total)

[----------] 1 test from FuzzyMatch
[ RUN      ] FuzzyMatch.Matches
[       OK ] FuzzyMatch.Matches (7 ms)
[----------] 1 test from FuzzyMatch (7 ms total)

[----------] 1 test from ParameterHints
[ RUN      ] ParameterHints.VariadicReferenceHint
Built preamble of size 341172 for file /clangd-test/TestTU.cpp version null in 0.02 seconds
[       OK ] ParameterHints.VariadicReferenceHint (39 ms)
[----------] 1 test from ParameterHints (39 ms total)

[----------] 1 test from CrossFileRenameTests
[ RUN      ] CrossFileRenameTests.WithUpToDateIndex
ASTWorker building file /clangd-test/foo.h version null with command 
[/clangd-test]
clang -xobjective-c++ /clangd-test/foo.h
Driver produced command: cc1 -cc1 -triple armv8a-unknown-linux-gnueabihf -fsyntax-only -disable-free -clear-ast-before-backend -main-file-name foo.h -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -target-feature +read-tp-tpidruro -target-feature +vfp2 -target-feature +vfp2sp -target-feature +vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature +vfp3sp -target-feature +fp16 -target-feature +vfp4 -target-feature +vfp4d16 -target-feature +vfp4d16sp -target-feature +vfp4sp -target-feature +fp-armv8 -target-feature +fp-armv8d16 -target-feature +fp-armv8d16sp -target-feature +fp-armv8sp -target-feature -fullfp16 -target-feature +fp64 -target-feature +d32 -target-feature +sha2 -target-feature +aes -target-feature -fp16fml -target-feature +neon -target-abi aapcs-linux -mfloat-abi hard -debugger-tuning=gdb -fdebug-compilation-dir=/clangd-test -fcoverage-compilation-dir=/clangd-test -resource-dir lib/clang/22 -internal-isystem lib/clang/22/include -internal-isystem /usr/local/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdeprecated-macro -ferror-limit 19 -fno-signed-char -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fobjc-runtime=gcc -fobjc-encode-cxx-class-template-spec -fobjc-exceptions -fcxx-exceptions -fexceptions -no-round-trip-args -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -x objective-c++ /clangd-test/foo.h
Building first preamble for /clangd-test/foo.h version null
Built preamble of size 420384 for file /clangd-test/foo.h version null in 0.12 seconds
indexed preamble AST for /clangd-test/foo.h version null:
  symbol slab: 0 symbols, 68 bytes
  ref slab: 0 symbols, 0 refs, 72 bytes
  relations slab: 0 relations, 12 bytes
indexed file AST for /clangd-test/foo.h version null:
...

[X86][APX] Do optimizeMemoryInst for v1X masked load/store

bf50bed

Fix redundant LEA: https://godbolt.org/z/hrP1eox4Y

phoebewang requested review from RKSimon, KanRobert and fzou1 July 30, 2025 13:18

llvmbot added backend:X86 llvm:codegen labels Jul 30, 2025

KanRobert reviewed Jul 30, 2025

View reviewed changes

KanRobert approved these changes Jul 31, 2025

View reviewed changes

phoebewang added 2 commits July 31, 2025 10:50

Add load

f51c6bd

Merge branch 'main' into APX

a9cb823

phoebewang merged commit ebf96f9 into llvm:main Jul 31, 2025
9 checks passed

phoebewang deleted the APX branch July 31, 2025 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86][APX] Do optimizeMemoryInst for v1X masked load/store #151331

[X86][APX] Do optimizeMemoryInst for v1X masked load/store #151331

Uh oh!

phoebewang commented Jul 30, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 30, 2025

Uh oh!

KanRobert Jul 30, 2025

Uh oh!

phoebewang Jul 31, 2025

Uh oh!

phoebewang Jul 31, 2025

Uh oh!

Uh oh!

llvm-ci commented Jul 31, 2025

Uh oh!

llvm-ci commented Jul 31, 2025

Uh oh!

Uh oh!

[X86][APX] Do optimizeMemoryInst for v1X masked load/store #151331

[X86][APX] Do optimizeMemoryInst for v1X masked load/store #151331

Uh oh!

Conversation

phoebewang commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 30, 2025

Uh oh!

KanRobert Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Jul 31, 2025

Uh oh!

llvm-ci commented Jul 31, 2025

Uh oh!

Uh oh!

phoebewang commented Jul 30, 2025 •

edited

Loading