-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[DWARF] Speedup .gdb_index dumping #151806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-debuginfo Author: None (itrofimow) ChangesThis patch drastically speed ups dumping .gdb_index for large indexes Full diff: https://github.com/llvm/llvm-project/pull/151806.diff 1 Files Affected:
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp b/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp
index 987e63963a068..c0ad2a38df373 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp
@@ -17,6 +17,7 @@
#include <cinttypes>
#include <cstdint>
#include <set>
+#include <unordered_map>
#include <utility>
using namespace llvm;
@@ -60,6 +61,24 @@ void DWARFGdbIndex::dumpSymbolTable(raw_ostream &OS) const {
", filled slots:",
SymbolTableOffset, (uint64_t)SymbolTable.size())
<< '\n';
+
+ std::unordered_map<uint32_t, decltype(ConstantPoolVectors)::const_iterator>
+ CuVectorMap{};
+ CuVectorMap.reserve(ConstantPoolVectors.size());
+ const auto FindCuVector =
+ [&CuVectorMap, notFound = ConstantPoolVectors.end()](uint32_t vecOffset) {
+ const auto it = CuVectorMap.find(vecOffset);
+ if (it != CuVectorMap.end()) {
+ return it->second;
+ }
+
+ return notFound;
+ };
+ for (auto it = ConstantPoolVectors.begin(); it != ConstantPoolVectors.end();
+ ++it) {
+ CuVectorMap.emplace(it->first, it);
+ }
+
uint32_t I = -1;
for (const SymTableEntry &E : SymbolTable) {
++I;
@@ -72,11 +91,7 @@ void DWARFGdbIndex::dumpSymbolTable(raw_ostream &OS) const {
StringRef Name = ConstantPoolStrings.substr(
ConstantPoolOffset - StringPoolOffset + E.NameOffset);
- auto CuVector = llvm::find_if(
- ConstantPoolVectors,
- [&](const std::pair<uint32_t, SmallVector<uint32_t, 0>> &V) {
- return V.first == E.VecOffset;
- });
+ auto CuVector = FindCuVector(E.VecOffset);
assert(CuVector != ConstantPoolVectors.end() && "Invalid symbol table");
uint32_t CuVectorId = CuVector - ConstantPoolVectors.begin();
OS << format(" String name: %s, CU vector index: %d\n", Name.data(),
|
I have a binary with gdb-index of size ~250Mb, and for that binary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement!
Hmm, actually at a high level: I guess this ConstantPoolVectors
isn't sorted, is it? So we can't do a binary search... could we sort it? I guess not - since we do want to dump it in a way that matches the input too (in case the on-disk ordering is important to debugging the data at some point)?
(oh, and high level question, if you're interested/able: What's your interest in gdb_index? Myself, I've worked on various indexing solutions at Google due to the large size of single binaries we have, for a while but we rarely see traction/interest in these tools outside of Google - so it's always interesting to make friends with folks who are facing similar problems)
@@ -60,6 +61,24 @@ void DWARFGdbIndex::dumpSymbolTable(raw_ostream &OS) const { | |||
", filled slots:", | |||
SymbolTableOffset, (uint64_t)SymbolTable.size()) | |||
<< '\n'; | |||
|
|||
std::unordered_map<uint32_t, decltype(ConstantPoolVectors)::const_iterator> | |||
CuVectorMap{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably drop these {}
as they're redundant.
@@ -60,6 +61,24 @@ void DWARFGdbIndex::dumpSymbolTable(raw_ostream &OS) const { | |||
", filled slots:", | |||
SymbolTableOffset, (uint64_t)SymbolTable.size()) | |||
<< '\n'; | |||
|
|||
std::unordered_map<uint32_t, decltype(ConstantPoolVectors)::const_iterator> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably use a DenseMap
? ( https://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task )
const auto it = CuVectorMap.find(vecOffset); | ||
if (it != CuVectorMap.end()) { | ||
return it->second; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM's coding style skips {}
on single-line blocks, and reduces scope where possible, so probably something like:
if (auto it = CuVectorMap.find(vecOffset); it != CuVectorMap.end())
return it->second;
CuVectorMap{}; | ||
CuVectorMap.reserve(ConstantPoolVectors.size()); | ||
const auto FindCuVector = | ||
[&CuVectorMap, notFound = ConstantPoolVectors.end()](uint32_t vecOffset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably wouldn't bother separately capturing things like this in a locally scoped non-erased functor. Probably just use [&]
and return ConstantPoolVectors.end();
in the fail-case.
++it) { | ||
CuVectorMap.emplace(it->first, it); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could drop the {}
here.
Could use a range-based for loop, and instead of putting iterators as values in the map, use pointers (then you can get a pointer to the value in the range based for loop where there aren't any visible/name-able iterators)
This patch drastically speed ups dumping .gdb_index for large indexes