Skip to content

[TailDup] Delay aggressive computed-goto taildup to after RegAlloc. #150911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions llvm/lib/CodeGen/TailDuplicator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,15 @@ bool TailDuplicator::shouldTailDuplicate(bool IsSimple,
if (HasIndirectbr && PreRegAlloc)
MaxDuplicateCount = TailDupIndirectBranchSize;

// Allow higher limits when the block has computed-gotos and running after
// register allocation. NB. This basically unfactors computed gotos that were
// factored early on in the compilation process to speed up edge based data
// flow. If we do not unfactor them again, it can seriously pessimize code
// with many computed jumps in the source code, such as interpreters.
// Therefore we do not restrict the computed gotos.
if (HasComputedGoto && !PreRegAlloc)
MaxDuplicateCount = std::max(MaxDuplicateCount, 10u);

// Check the instructions in the block to determine whether tail-duplication
// is invalid or unlikely to be profitable.
unsigned InstrCount = 0;
Expand Down Expand Up @@ -663,12 +672,7 @@ bool TailDuplicator::shouldTailDuplicate(bool IsSimple,
// Duplicating a BB which has both multiple predecessors and successors will
// may cause huge amount of PHI nodes. If we want to remove this limitation,
// we have to address https://github.com/llvm/llvm-project/issues/78578.
// NB. This basically unfactors computed gotos that were factored early on in
// the compilation process to speed up edge based data flow. If we do not
// unfactor them again, it can seriously pessimize code with many computed
// jumps in the source code, such as interpreters. Therefore we do not
// restrict the computed gotos.
if (!HasComputedGoto && TailBB.pred_size() > TailDupPredSize &&
if (PreRegAlloc && TailBB.pred_size() > TailDupPredSize &&
TailBB.succ_size() > TailDupSuccSize) {
// If TailBB or any of its successors contains a phi, we may have to add a
// large number of additional phis with additional incoming values.
Expand Down
77 changes: 29 additions & 48 deletions llvm/test/CodeGen/AArch64/late-taildup-computed-goto.ll
Original file line number Diff line number Diff line change
Expand Up @@ -25,77 +25,58 @@ define void @test_interp(ptr %frame, ptr %dst) {
; CHECK-NEXT: adrp x21, _opcode.targets@PAGE
; CHECK-NEXT: Lloh1:
; CHECK-NEXT: add x21, x21, _opcode.targets@PAGEOFF
; CHECK-NEXT: mov x22, xzr
; CHECK-NEXT: mov x24, xzr
; CHECK-NEXT: add x8, x21, xzr, lsl #3
; CHECK-NEXT: mov x19, x1
; CHECK-NEXT: mov x20, x0
; CHECK-NEXT: add x23, x22, #1
; CHECK-NEXT: mov x23, xzr
; CHECK-NEXT: mov w22, #1 ; =0x1
; CHECK-NEXT: add x24, x24, #1
; CHECK-NEXT: br x8
; CHECK-NEXT: Ltmp0: ; Block address taken
; CHECK-NEXT: LBB0_1: ; %loop.header
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NEXT: add x8, x21, x23, lsl #3
; CHECK-NEXT: add x8, x21, x24, lsl #3
; CHECK-NEXT: mov x20, xzr
; CHECK-NEXT: mov x22, xzr
; CHECK-NEXT: add x23, x23, #1
; CHECK-NEXT: mov x23, xzr
; CHECK-NEXT: add x24, x24, #1
; CHECK-NEXT: br x8
; CHECK-NEXT: Ltmp1: ; Block address taken
; CHECK-NEXT: LBB0_2: ; %op1.bb
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NEXT: str xzr, [x19]
; CHECK-NEXT: mov w8, #1 ; =0x1
; CHECK-NEXT: Ltmp2: ; Block address taken
; CHECK-NEXT: LBB0_3: ; %op6.bb
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldr x0, [x20, #-8]!
; CHECK-NEXT: ldr x9, [x0, #8]
; CHECK-NEXT: str x8, [x0]
; CHECK-NEXT: ldr x8, [x9, #48]
; CHECK-NEXT: ldr x8, [x0, #8]
; CHECK-NEXT: str x22, [x0]
; CHECK-NEXT: ldr x8, [x8, #48]
; CHECK-NEXT: blr x8
; CHECK-NEXT: add x8, x21, x23, lsl #3
; CHECK-NEXT: add x23, x23, #1
; CHECK-NEXT: add x8, x21, x24, lsl #3
; CHECK-NEXT: add x24, x24, #1
; CHECK-NEXT: br x8
; CHECK-NEXT: Ltmp2: ; Block address taken
; CHECK-NEXT: LBB0_3: ; %op2.bb
; CHECK-NEXT: Ltmp3: ; Block address taken
; CHECK-NEXT: LBB0_4: ; %op2.bb
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NEXT: add x8, x21, x23, lsl #3
; CHECK-NEXT: add x8, x21, x24, lsl #3
; CHECK-NEXT: mov x20, xzr
; CHECK-NEXT: add x23, x23, #1
; CHECK-NEXT: str x22, [x19]
; CHECK-NEXT: mov x22, xzr
; CHECK-NEXT: str x23, [x19]
; CHECK-NEXT: mov x23, xzr
; CHECK-NEXT: add x24, x24, #1
; CHECK-NEXT: br x8
; CHECK-NEXT: Ltmp3: ; Block address taken
; CHECK-NEXT: LBB0_4: ; %op4.bb
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NEXT: str x22, [x19]
; CHECK-NEXT: add x10, x21, x23, lsl #3
; CHECK-NEXT: add x23, x23, #1
; CHECK-NEXT: ldur x8, [x22, #12]
; CHECK-NEXT: ldur x9, [x20, #-8]
; CHECK-NEXT: add x22, x22, #20
; CHECK-NEXT: stp x8, x9, [x20, #-8]
; CHECK-NEXT: add x20, x20, #8
; CHECK-NEXT: br x10
; CHECK-NEXT: Ltmp4: ; Block address taken
; CHECK-NEXT: LBB0_5: ; %op5.bb
; CHECK-NEXT: LBB0_5: ; %op4.bb
; CHECK-NEXT: Ltmp5: ; Block address taken
; CHECK-NEXT: LBB0_6: ; %op5.bb
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NEXT: str x22, [x19]
; CHECK-NEXT: add x10, x21, x23, lsl #3
; CHECK-NEXT: add x23, x23, #1
; CHECK-NEXT: ldur x8, [x22, #12]
; CHECK-NEXT: str x23, [x19]
; CHECK-NEXT: ldur x8, [x23, #12]
; CHECK-NEXT: ldur x9, [x20, #-8]
; CHECK-NEXT: add x22, x22, #20
; CHECK-NEXT: add x23, x23, #20
; CHECK-NEXT: stp x8, x9, [x20, #-8]
; CHECK-NEXT: add x8, x21, x24, lsl #3
; CHECK-NEXT: add x20, x20, #8
; CHECK-NEXT: br x10
; CHECK-NEXT: Ltmp5: ; Block address taken
; CHECK-NEXT: LBB0_6: ; %op6.bb
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NEXT: ldr x0, [x20, #-8]!
; CHECK-NEXT: mov w8, #1 ; =0x1
; CHECK-NEXT: ldr x9, [x0, #8]
; CHECK-NEXT: str x8, [x0]
; CHECK-NEXT: ldr x8, [x9, #48]
; CHECK-NEXT: blr x8
; CHECK-NEXT: add x8, x21, x23, lsl #3
; CHECK-NEXT: add x23, x23, #1
; CHECK-NEXT: add x24, x24, #1
; CHECK-NEXT: br x8
; CHECK-NEXT: .loh AdrpAdd Lloh0, Lloh1
entry:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# RUN: llc -mtriple=x86_64-unknown-linux-gnu -run-pass=early-tailduplication -tail-dup-pred-size=1 -tail-dup-succ-size=1 %s -o - | FileCheck %s
# Check that only the computed goto is not be restrict by tail-dup-pred-size and tail-dup-succ-size.
#
# Check that only the computed goto and others are restricted by tail-dup-pred-size and tail-dup-succ-size.
#
--- |
@computed_goto.dispatch = constant [5 x ptr] [ptr null, ptr blockaddress(@computed_goto, %bb1), ptr blockaddress(@computed_goto, %bb2), ptr blockaddress(@computed_goto, %bb3), ptr blockaddress(@computed_goto, %bb4)]
declare i64 @f0()
Expand Down Expand Up @@ -30,54 +32,54 @@ tracksRegLiveness: true
body: |
; CHECK-LABEL: name: computed_goto
; CHECK: bb.0:
; CHECK-NEXT: successors: %bb.1(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f0, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
; CHECK-NEXT: [[COPY:%[0-9]+]]:gr64_nosp = COPY $rax
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gr64_nosp = COPY [[COPY]]
; CHECK-NEXT: JMP64m $noreg, 8, [[COPY]], @computed_goto.dispatch, $noreg
; CHECK-NEXT: [[COPY:%[0-9]+]]:gr64 = COPY $rax
; CHECK-NEXT: JMP_1 %bb.5
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.bb1 (ir-block-address-taken %ir-block.bb1):
; CHECK-NEXT: successors: %bb.1(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f1, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gr64_nosp = COPY $rax
; CHECK-NEXT: [[COPY3:%[0-9]+]]:gr64_nosp = COPY [[COPY2]]
; CHECK-NEXT: JMP64m $noreg, 8, [[COPY2]], @computed_goto.dispatch, $noreg
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gr64 = COPY $rax
; CHECK-NEXT: JMP_1 %bb.5
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.bb2 (ir-block-address-taken %ir-block.bb2):
; CHECK-NEXT: successors: %bb.1(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f2, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
; CHECK-NEXT: [[COPY4:%[0-9]+]]:gr64_nosp = COPY $rax
; CHECK-NEXT: [[COPY5:%[0-9]+]]:gr64_nosp = COPY [[COPY4]]
; CHECK-NEXT: JMP64m $noreg, 8, [[COPY4]], @computed_goto.dispatch, $noreg
; CHECK-NEXT: [[COPY2:%[0-9]+]]:gr64 = COPY $rax
; CHECK-NEXT: JMP_1 %bb.5
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.3.bb3 (ir-block-address-taken %ir-block.bb3):
; CHECK-NEXT: successors: %bb.1(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f3, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
; CHECK-NEXT: [[COPY6:%[0-9]+]]:gr64_nosp = COPY $rax
; CHECK-NEXT: [[COPY7:%[0-9]+]]:gr64_nosp = COPY [[COPY6]]
; CHECK-NEXT: JMP64m $noreg, 8, [[COPY6]], @computed_goto.dispatch, $noreg
; CHECK-NEXT: [[COPY3:%[0-9]+]]:gr64 = COPY $rax
; CHECK-NEXT: JMP_1 %bb.5
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.4.bb4 (ir-block-address-taken %ir-block.bb4):
; CHECK-NEXT: successors: %bb.1(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
; CHECK-NEXT: successors: %bb.5(0x80000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
; CHECK-NEXT: CALL64pcrel32 target-flags(x86-plt) @f4, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
; CHECK-NEXT: ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
; CHECK-NEXT: [[COPY8:%[0-9]+]]:gr64_nosp = COPY $rax
; CHECK-NEXT: [[COPY9:%[0-9]+]]:gr64_nosp = COPY [[COPY8]]
; CHECK-NEXT: JMP64m $noreg, 8, [[COPY8]], @computed_goto.dispatch, $noreg
; CHECK-NEXT: [[COPY4:%[0-9]+]]:gr64 = COPY $rax
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.5:
; CHECK-NEXT: successors: %bb.1(0x20000000), %bb.2(0x20000000), %bb.3(0x20000000), %bb.4(0x20000000)
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[PHI:%[0-9]+]]:gr64_nosp = PHI [[COPY]], %bb.0, [[COPY4]], %bb.4, [[COPY3]], %bb.3, [[COPY2]], %bb.2, [[COPY1]], %bb.1
; CHECK-NEXT: JMP64m $noreg, 8, [[PHI]], @computed_goto.dispatch, $noreg
bb.0:
ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
CALL64pcrel32 target-flags(x86-plt) @f0, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
Expand Down
Loading