[LV] Increase vectorize-memory-check-threshold to 256 #151712

igogo-x86 · 2025-08-01T15:29:44Z

We have a benchmark with large loops that benefit from vectorisation; however, they currently require several thousands runtime checks due to the way LoopAccessAnalysis is implemented. I would like to improve LAA to enable vectorisation with significantly fewer checks - though still somewhat more than the current limit of 128. Before committing to this task, I need to know whether we can raise this threshold. I checked and found that increasing it to 256 caused no performance or compile-time regressions, including when using the benchmarks from https://llvm-compile-time-tracker.com/

llvmbot · 2025-08-01T15:30:14Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Igor Kirillov (igogo-x86)

Changes

We have a benchmark with large loops that benefit from vectorisation; however, they currently require several thousands runtime checks due to the way LoopAccessAnalysis is implemented. I would like to improve LAA to enable vectorisation with significantly fewer checks - though still somewhat more than the current limit of 128. Before committing to this task, I need to know whether we can raise this threshold. I checked and found that increasing it to 256 caused no performance or compile-time regressions, including when using the benchmarks from https://llvm-compile-time-tracker.com/

Full diff: https://github.com/llvm/llvm-project/pull/151712.diff

1 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 850c4a11edc67..45460003f4a4e 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -203,7 +203,7 @@ static cl::opt<unsigned> TinyTripCountVectorThreshold(
              "are incurred."));
 
 static cl::opt<unsigned> VectorizeMemoryCheckThreshold(
-    "vectorize-memory-check-threshold", cl::init(128), cl::Hidden,
+    "vectorize-memory-check-threshold", cl::init(256), cl::Hidden,
     cl::desc("The maximum allowed number of runtime memory checks"));
 
 // Option prefer-predicate-over-epilogue indicates that an epilogue is undesired,

fhahn

Can you share a reproducer that shows the issue?

igogo-x86 · 2025-08-01T15:55:08Z

One of the improvements would be about this problem. In the following example, LLVM currently generates 21 pairwise pointer-disjointness checks to prove the loop is safe to vectorised:

void test(int *a, int *b, int off_1, int off_2, int off_3, int *x, int *y, int *z) {
    for (int i = 0; i < 100000; ++i) {
        x[i] = a[i + off_1] + b[i + off_1];
        y[i] = a[i + off_2] - b[i + off_2];
        z[i] = a[i + off_3] * b[i + off_3];
    }
}

Below is the complete list of 21 pairwise disjointness (non-aliasing) checks needed for vectorization safety. These ensure that the read and write memory regions accessed do not overlap:

#	Check Description
1	a + off_1 vs x
2	a + off_1 vs y
3	a + off_1 vs z
4	b + off_1 vs x
5	b + off_1 vs y
6	b + off_1 vs z
7	a + off_2 vs x
8	a + off_2 vs y
9	a + off_2 vs z
10	b + off_2 vs x
11	b + off_2 vs y
12	b + off_2 vs z
13	a + off_3 vs x
14	a + off_3 vs y
15	a + off_3 vs z
16	b + off_3 vs x
17	b + off_3 vs y
18	b + off_3 vs z
19	x vs y
20	x vs z
21	y vs z

Instead of checking every pair, we can derive lower/upper bounds on the regions accessed in a and b and reduce the number of checks to 13:

#	Check Description	Comment
1	a + off_1 vs a + off_2	Determine boundaries of a
2	prev-range vs a + off_3	Determine boundaries of a
3	b + off_1 vs b + off_2	Determine boundaries of b
4	prev-range vs b + off_3	Determine boundaries of b
5	range of a vs x
6	range of a vs y
7	range of a vs z
8	range of b vs x
9	range of b vs y
10	range of b vs z
11	x vs y
12	x vs z
13	y vs z

In the benchmark, there are approximately 20–30 groups of objects being read, followed by 6–11 objects being written. These 20–30 groups access different memory locations multiple times, depending on an outer loop variable, which makes the number of required aliasing checks overwhelming.

[LV] Increase vectorize-memory-check-threshold to 256

6972dbe

igogo-x86 requested review from artagnon, fhahn and david-arm August 1, 2025 15:29

llvmbot added vectorizers llvm:transforms labels Aug 1, 2025

fhahn reviewed Aug 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Increase vectorize-memory-check-threshold to 256 #151712

[LV] Increase vectorize-memory-check-threshold to 256 #151712

igogo-x86 commented Aug 1, 2025

Uh oh!

llvmbot commented Aug 1, 2025 •

edited

Loading

Uh oh!

fhahn left a comment

Uh oh!

igogo-x86 commented Aug 1, 2025

Uh oh!

Uh oh!

[LV] Increase vectorize-memory-check-threshold to 256 #151712

Are you sure you want to change the base?

[LV] Increase vectorize-memory-check-threshold to 256 #151712

Conversation

igogo-x86 commented Aug 1, 2025

Uh oh!

llvmbot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

igogo-x86 commented Aug 1, 2025

Uh oh!

Uh oh!

llvmbot commented Aug 1, 2025 •

edited

Loading