-
Notifications
You must be signed in to change notification settings - Fork 14.7k
Open
Description
With EVL tail folding today the branch condition is bne iv, trip-count
, e.g. this loop generates
void f(int *x, int n) {
for (int i = 0; i < n; i++)
x[i]++;
}
# %bb.1: # %for.body.preheader
li a2, 0
.LBB0_2: # %vector.body
# =>This Inner Loop Header: Depth=1
sub a3, a1, a2
sh2add a4, a2, a0
vsetvli a3, a3, e32, m2, ta, ma
vle32.v v8, (a4)
vadd.vi v8, v8, 1
add a2, a2, a3
vse32.v v8, (a4)
bne a2, a1, .LBB0_2
Given that the trip count is countable we should be able to instead emit something like
# %bb.1: # %for.body.preheader
li a2, 0
mv a3, a1
.LBB0_2: # %vector.body
# =>This Inner Loop Header: Depth=1
vsetvli a4, a3, e32, m2, ta, ma
sh2add a5, a2, a0
vle32.v v8, (a5)
add a2, a2, a4
vadd.vi v8, v8, 1
vse32.v v8, (a5)
sub a3, a3, a4
bnez a3, .LBB0_2
To do this we need to use a separate phi to carry the AVL, and not compute it from the EVL based IV. This is closer to the stripmining examples given in the RVV spec, and is what GCC currently does.
This removes a use of the trip count in the branch and also shortens the dependency chain for the AVL.
I have two patches to submit for this soon, one to add the separate phi to carry the AVL, and another to change the branch.