-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[LV] Increase vectorize-memory-check-threshold to 256 #151712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[LV] Increase vectorize-memory-check-threshold to 256 #151712
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Igor Kirillov (igogo-x86) ChangesWe have a benchmark with large loops that benefit from vectorisation; however, they currently require several thousands runtime checks due to the way Full diff: https://github.com/llvm/llvm-project/pull/151712.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 850c4a11edc67..45460003f4a4e 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -203,7 +203,7 @@ static cl::opt<unsigned> TinyTripCountVectorThreshold(
"are incurred."));
static cl::opt<unsigned> VectorizeMemoryCheckThreshold(
- "vectorize-memory-check-threshold", cl::init(128), cl::Hidden,
+ "vectorize-memory-check-threshold", cl::init(256), cl::Hidden,
cl::desc("The maximum allowed number of runtime memory checks"));
// Option prefer-predicate-over-epilogue indicates that an epilogue is undesired,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share a reproducer that shows the issue?
One of the improvements would be about this problem. In the following example, LLVM currently generates 21 pairwise pointer-disjointness checks to prove the loop is safe to vectorised:
Below is the complete list of 21 pairwise disjointness (non-aliasing) checks needed for vectorization safety. These ensure that the read and write memory regions accessed do not overlap:
Instead of checking every pair, we can derive lower/upper bounds on the regions accessed in
In the benchmark, there are approximately 20–30 groups of objects being read, followed by 6–11 objects being written. These 20–30 groups access different memory locations multiple times, depending on an outer loop variable, which makes the number of required aliasing checks overwhelming. |
We have a benchmark with large loops that benefit from vectorisation; however, they currently require several thousands runtime checks due to the way
LoopAccessAnalysis
is implemented. I would like to improve LAA to enable vectorisation with significantly fewer checks - though still somewhat more than the current limit of 128. Before committing to this task, I need to know whether we can raise this threshold. I checked and found that increasing it to 256 caused no performance or compile-time regressions, including when using the benchmarks from https://llvm-compile-time-tracker.com/