Skip to content

Commit e9ac757

Browse files
committed
[AArch64] Don't expand memcmp in strict align mode.
7aecf23 fixed the bug where we would miscompile, but we still generate a crazy amount of code. Turn off the expansion until someone implements an appropriate heuristic. Differential Revision: https://reviews.llvm.org/D77599
1 parent f596ab4 commit e9ac757

File tree

2 files changed

+11
-12
lines changed

2 files changed

+11
-12
lines changed

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -629,7 +629,12 @@ int AArch64TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
629629
AArch64TTIImpl::TTI::MemCmpExpansionOptions
630630
AArch64TTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
631631
TTI::MemCmpExpansionOptions Options;
632-
Options.AllowOverlappingLoads = !ST->requiresStrictAlign();
632+
if (ST->requiresStrictAlign()) {
633+
// TODO: Add cost modeling for strict align. Misaligned loads expand to
634+
// a bunch of instructions when strict align is enabled.
635+
return Options;
636+
}
637+
Options.AllowOverlappingLoads = true;
633638
Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
634639
Options.NumLoadsPerBlock = Options.MaxNumLoads;
635640
// TODO: Though vector loads usually perform well on AArch64, in some targets

llvm/test/CodeGen/AArch64/bcmp-inline-small.ll

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,12 @@ entry:
1111
ret i1 %ret
1212

1313
; CHECK-LABEL: test_b2:
14-
; CHECK-NOT: bl bcmp
14+
; CHECKN-NOT: bl bcmp
1515
; CHECKN: ldr x
1616
; CHECKN-NEXT: ldr x
1717
; CHECKN-NEXT: ldur x
1818
; CHECKN-NEXT: ldur x
19-
; CHECKS-COUNT-30: ldrb w
19+
; CHECKS: bl bcmp
2020
}
2121

2222
define i1 @test_b2_align8(i8* align 8 %s1, i8* align 8 %s2) {
@@ -26,19 +26,13 @@ entry:
2626
ret i1 %ret
2727

2828
; CHECK-LABEL: test_b2_align8:
29-
; CHECK-NOT: bl bcmp
29+
; CHECKN-NOT: bl bcmp
3030
; CHECKN: ldr x
3131
; CHECKN-NEXT: ldr x
3232
; CHECKN-NEXT: ldur x
3333
; CHECKN-NEXT: ldur x
34-
; CHECKS: ldr x
35-
; CHECKS-NEXT: ldr x
36-
; CHECKS-NEXT: ldr w
37-
; CHECKS-NEXT: ldr w
38-
; CHECKS-NEXT: ldrh w
39-
; CHECKS-NEXT: ldrh w
40-
; CHECKS-NEXT: ldrb w
41-
; CHECKS-NEXT: ldrb w
34+
; TODO: Four loads should be within the limit, but the heuristic isn't implemented.
35+
; CHECKS: bl bcmp
4236
}
4337

4438
define i1 @test_bs(i8* %s1, i8* %s2) optsize {

0 commit comments

Comments
 (0)