Skip to content

[AArch64] Dont inline streaming fn into non-streaming caller #150595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 1, 2025

Conversation

sdesmalen-arm
Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm commented Jul 25, 2025

Without this change, the following test would fail to compile
with -march=armv8-a+sme:

  void func1(const svuint32_t *in, svuint32_t *out) {
    [&]() __arm_streaming { *out = *in; }();
  }

But in general, it's probably better never to inline
streaming functions into non-streaming functions, because
they will have been marked as 'streaming' for a reason
by the user.

@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Sander de Smalen (sdesmalen-arm)

Changes

Without this change, the following test would fail to compile
with -march=armv8-a+sme:

  void func1(const svuint32_t *in, svuint32_t *out) {
    [&]() __arm_streaming { *out = *in; }();
  }

Full diff: https://github.com/llvm/llvm-project/pull/150595.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+9-3)
  • (modified) llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll (+22)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 40f49dade6131..38711206a9ea1 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -280,6 +280,15 @@ bool AArch64TTIImpl::areInlineCompatible(const Function *Caller,
   if (CallAttrs.callee().isNewZA() || CallAttrs.callee().isNewZT0())
     return false;
 
+  const TargetMachine &TM = getTLI()->getTargetMachine();
+  const FeatureBitset &CallerBits =
+      TM.getSubtargetImpl(*Caller)->getFeatureBits();
+
+  // Cannot inline a streaming function into a non-streaming function,
+  // if the caller has no SVE.
+  if (CallAttrs.requiresSMChange() && !CallerBits.test(AArch64::FeatureSVE))
+    return false;
+
   if (CallAttrs.requiresLazySave() || CallAttrs.requiresSMChange() ||
       CallAttrs.requiresPreservingZT0() ||
       CallAttrs.requiresPreservingAllZAState()) {
@@ -287,9 +296,6 @@ bool AArch64TTIImpl::areInlineCompatible(const Function *Caller,
       return false;
   }
 
-  const TargetMachine &TM = getTLI()->getTargetMachine();
-  const FeatureBitset &CallerBits =
-      TM.getSubtargetImpl(*Caller)->getFeatureBits();
   const FeatureBitset &CalleeBits =
       TM.getSubtargetImpl(*Callee)->getFeatureBits();
   // Adjust the feature bitsets by inverting some of the bits. This is needed
diff --git a/llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll b/llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll
index 6cb16928ae6ca..ce323701cbe48 100644
--- a/llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll
+++ b/llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll
@@ -676,4 +676,26 @@ define void @streaming_caller_multiple_streaming_compatible_callees_inline() #0
   ret void
 }
 
+define void @nosve_streaming_function(ptr %ptr) "target-features"="+sme" "aarch64_pstate_sm_enabled" {
+; CHECK-LABEL: define void @nosve_streaming_function
+; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR2]] {
+; CHECK-NEXT:    store <vscale x 4 x i32> zeroinitializer, ptr [[PTR]], align 16
+; CHECK-NEXT:    ret void
+;
+  store <vscale x 4 x i32> zeroinitializer, ptr %ptr
+  ret void
+}
+
+; Don't allow inlining a streaming function into a non-streaming function
+; if the non-streaming function has no SVE.
+define void @nosve_non_streaming_caller_streaming_callee_dont_inline(ptr %ptr) "target-features"="+sme"  {
+; CHECK-LABEL: define void @nosve_non_streaming_caller_streaming_callee_dont_inline
+; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    call void @nosve_streaming_function(ptr [[PTR]])
+; CHECK-NEXT:    ret void
+;
+  call void @nosve_streaming_function(ptr %ptr)
+  ret void
+}
+
 attributes #0 = { "target-features"="+sve,+sme" }

@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

Without this change, the following test would fail to compile
with -march=armv8-a+sme:

  void func1(const svuint32_t *in, svuint32_t *out) {
    [&amp;]() __arm_streaming { *out = *in; }();
  }

Full diff: https://github.com/llvm/llvm-project/pull/150595.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+9-3)
  • (modified) llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll (+22)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 40f49dade6131..38711206a9ea1 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -280,6 +280,15 @@ bool AArch64TTIImpl::areInlineCompatible(const Function *Caller,
   if (CallAttrs.callee().isNewZA() || CallAttrs.callee().isNewZT0())
     return false;
 
+  const TargetMachine &TM = getTLI()->getTargetMachine();
+  const FeatureBitset &CallerBits =
+      TM.getSubtargetImpl(*Caller)->getFeatureBits();
+
+  // Cannot inline a streaming function into a non-streaming function,
+  // if the caller has no SVE.
+  if (CallAttrs.requiresSMChange() && !CallerBits.test(AArch64::FeatureSVE))
+    return false;
+
   if (CallAttrs.requiresLazySave() || CallAttrs.requiresSMChange() ||
       CallAttrs.requiresPreservingZT0() ||
       CallAttrs.requiresPreservingAllZAState()) {
@@ -287,9 +296,6 @@ bool AArch64TTIImpl::areInlineCompatible(const Function *Caller,
       return false;
   }
 
-  const TargetMachine &TM = getTLI()->getTargetMachine();
-  const FeatureBitset &CallerBits =
-      TM.getSubtargetImpl(*Caller)->getFeatureBits();
   const FeatureBitset &CalleeBits =
       TM.getSubtargetImpl(*Callee)->getFeatureBits();
   // Adjust the feature bitsets by inverting some of the bits. This is needed
diff --git a/llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll b/llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll
index 6cb16928ae6ca..ce323701cbe48 100644
--- a/llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll
+++ b/llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll
@@ -676,4 +676,26 @@ define void @streaming_caller_multiple_streaming_compatible_callees_inline() #0
   ret void
 }
 
+define void @nosve_streaming_function(ptr %ptr) "target-features"="+sme" "aarch64_pstate_sm_enabled" {
+; CHECK-LABEL: define void @nosve_streaming_function
+; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR2]] {
+; CHECK-NEXT:    store <vscale x 4 x i32> zeroinitializer, ptr [[PTR]], align 16
+; CHECK-NEXT:    ret void
+;
+  store <vscale x 4 x i32> zeroinitializer, ptr %ptr
+  ret void
+}
+
+; Don't allow inlining a streaming function into a non-streaming function
+; if the non-streaming function has no SVE.
+define void @nosve_non_streaming_caller_streaming_callee_dont_inline(ptr %ptr) "target-features"="+sme"  {
+; CHECK-LABEL: define void @nosve_non_streaming_caller_streaming_callee_dont_inline
+; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR1]] {
+; CHECK-NEXT:    call void @nosve_streaming_function(ptr [[PTR]])
+; CHECK-NEXT:    ret void
+;
+  call void @nosve_streaming_function(ptr %ptr)
+  ret void
+}
+
 attributes #0 = { "target-features"="+sve,+sme" }

Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (with a minor comment, but I think this is okay as-is)

Comment on lines 287 to 288
// Cannot inline a streaming function into a non-streaming function,
// if the caller has no SVE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking here is okay, but this feels like something hasPossibleIncompatibleOps should notice (given in your example, the use of scalable types makes the operations incompatible).

Comment on lines 679 to 695
define void @nosve_streaming_function(ptr %ptr) "target-features"="+sme" "aarch64_pstate_sm_enabled" {
; CHECK-LABEL: define void @nosve_streaming_function
; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR2]] {
; CHECK-NEXT: store <vscale x 4 x i32> zeroinitializer, ptr [[PTR]], align 16
; CHECK-NEXT: ret void
;
store <vscale x 4 x i32> zeroinitializer, ptr %ptr
ret void
}

; Don't allow inlining a streaming function into a non-streaming function
; if the non-streaming function has no SVE.
define void @nosve_non_streaming_caller_streaming_callee_dont_inline(ptr %ptr) "target-features"="+sme" {
; CHECK-LABEL: define void @nosve_non_streaming_caller_streaming_callee_dont_inline
; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR1]] {
; CHECK-NEXT: call void @nosve_streaming_function(ptr [[PTR]])
; CHECK-NEXT: ret void
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the ATTR2 and ATTR1 checks are failing CI:

2025-07-25T09:28:26.9143177Z /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll:681:15: error: CHECK-SAME: expected string not found in input
2025-07-25T09:28:26.9144429Z ; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR2]] {
2025-07-25T09:28:26.9144838Z               ^
2025-07-25T09:28:26.9145147Z <stdin>:278:38: note: scanning from here
2025-07-25T09:28:26.9145614Z define void @nosve_streaming_function(ptr %ptr) #9 {
2025-07-25T09:28:26.9146067Z                                      ^
2025-07-25T09:28:26.9146497Z <stdin>:278:38: note: with "ATTR2" equal to "2"
2025-07-25T09:28:26.9146989Z define void @nosve_streaming_function(ptr %ptr) #9 {
2025-07-25T09:28:26.9147446Z                                      ^
2025-07-25T09:28:26.9148713Z /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/Transforms/Inline/AArch64/sme-pstatesm-attrs.ll:693:15: error: CHECK-SAME: expected string not found in input
2025-07-25T09:28:26.9149942Z ; CHECK-SAME: (ptr [[PTR:%.*]]) #[[ATTR1]] {
2025-07-25T09:28:26.9150351Z               ^
2025-07-25T09:28:26.9150656Z <stdin>:283:69: note: scanning from here
2025-07-25T09:28:26.9151289Z define void @nosve_non_streaming_caller_streaming_callee_dont_inline(ptr %ptr) #10 {
2025-07-25T09:28:26.9151948Z                                                                     ^
2025-07-25T09:28:26.9152585Z <stdin>:283:69: note: with "ATTR1" equal to "1"
2025-07-25T09:28:26.9153230Z define void @nosve_non_streaming_caller_streaming_callee_dont_inline(ptr %ptr) #10 {

I think this is due to the addition of target-features

@sdesmalen-arm sdesmalen-arm force-pushed the sme-fix-inlining-nosve branch from 8e6e7ee to 932c27b Compare July 25, 2025 13:34
@sdesmalen-arm sdesmalen-arm changed the title [AArch64] Don't inline streaming Fn if caller has no SVE [AArch64] Dont inline streaming fn into non-streaming caller Jul 25, 2025
; CHECK-LABEL: define void @streaming_caller_multiple_streaming_callees_dont_inline
; CHECK-SAME: () #[[ATTR6]] {
; CHECK-NEXT: call void @streaming_caller_multiple_streaming_callees()
define void @nonstreaming_caller_multiple_nonstreaming_callees_dont_inline() #0 "aarch64_pstate_sm_enabled" {
Copy link
Member

@MacDue MacDue Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is aarch64_pstate_sm_enabled intentional here? The function is nonstreaming_caller, so I'd guess not? Generally, I'd prefer that the precommit does not change any existing tests other than adding new cases (since it makes it harder to tell what's changed). If tests need tweaking, I'd be more obvious to see why as a commit after the LLVM change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intentional, although I agree the name of the function makes no sense, but the name never made much sense before either. The name seems to relate to the mode of the callee and its callee's.

The reason for making this function aarch64_pstate_sm_enabled is described in the commit message of the precommit:

I've had to modify some of the other tests to test inlining with streaming mode changes the
other way around (streaming <- non-streaming instead of non-streaming <- streaming)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I'd prefer that the precommit does not change any existing tests other than adding new cases (since it makes it harder to tell what's changed)

I've rejigged the test changes a bit. Let me know if this makes it easier to review.

Without this change, the following test would fail to compile
with `-march=armv8-a+sme`:

  void func1(const svuint32_t *in, svuint32_t *out) {
    [&]() __arm_streaming { *out = *in; }();
  }

But in general, it's probably better never to inline
streaming functions into non-streaming functions, because
they will have been marked as 'streaming' for a reason
by the user.
@sdesmalen-arm sdesmalen-arm force-pushed the sme-fix-inlining-nosve branch from 932c27b to c7b4c03 Compare July 31, 2025 13:02
Because the behaviour now changed, I've had to modify some of the
other tests to test inlining with streaming mode changes the
other way around (streaming <- non-streaming instead of
non-streaming <- streaming)
@sdesmalen-arm sdesmalen-arm force-pushed the sme-fix-inlining-nosve branch from c7b4c03 to 443e282 Compare July 31, 2025 14:06
Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Just a few nits on comments in the tests:

Copy link
Contributor

@gbossu gbossu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sdesmalen-arm sdesmalen-arm merged commit 76ce464 into llvm:main Aug 1, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants