Skip to content

[DWARF] Speedup .gdb_index dumping #151806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

itrofimow
Copy link
Contributor

This patch drastically speed ups dumping .gdb_index for large indexes

@llvmbot
Copy link
Member

llvmbot commented Aug 2, 2025

@llvm/pr-subscribers-debuginfo

Author: None (itrofimow)

Changes

This patch drastically speed ups dumping .gdb_index for large indexes


Full diff: https://github.com/llvm/llvm-project/pull/151806.diff

1 Files Affected:

  • (modified) llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp (+20-5)
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp b/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp
index 987e63963a068..c0ad2a38df373 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFGdbIndex.cpp
@@ -17,6 +17,7 @@
 #include <cinttypes>
 #include <cstdint>
 #include <set>
+#include <unordered_map>
 #include <utility>
 
 using namespace llvm;
@@ -60,6 +61,24 @@ void DWARFGdbIndex::dumpSymbolTable(raw_ostream &OS) const {
                ", filled slots:",
                SymbolTableOffset, (uint64_t)SymbolTable.size())
      << '\n';
+
+  std::unordered_map<uint32_t, decltype(ConstantPoolVectors)::const_iterator>
+      CuVectorMap{};
+  CuVectorMap.reserve(ConstantPoolVectors.size());
+  const auto FindCuVector =
+      [&CuVectorMap, notFound = ConstantPoolVectors.end()](uint32_t vecOffset) {
+        const auto it = CuVectorMap.find(vecOffset);
+        if (it != CuVectorMap.end()) {
+          return it->second;
+        }
+
+        return notFound;
+      };
+  for (auto it = ConstantPoolVectors.begin(); it != ConstantPoolVectors.end();
+       ++it) {
+    CuVectorMap.emplace(it->first, it);
+  }
+
   uint32_t I = -1;
   for (const SymTableEntry &E : SymbolTable) {
     ++I;
@@ -72,11 +91,7 @@ void DWARFGdbIndex::dumpSymbolTable(raw_ostream &OS) const {
     StringRef Name = ConstantPoolStrings.substr(
         ConstantPoolOffset - StringPoolOffset + E.NameOffset);
 
-    auto CuVector = llvm::find_if(
-        ConstantPoolVectors,
-        [&](const std::pair<uint32_t, SmallVector<uint32_t, 0>> &V) {
-          return V.first == E.VecOffset;
-        });
+    auto CuVector = FindCuVector(E.VecOffset);
     assert(CuVector != ConstantPoolVectors.end() && "Invalid symbol table");
     uint32_t CuVectorId = CuVector - ConstantPoolVectors.begin();
     OS << format("      String name: %s, CU vector index: %d\n", Name.data(),

@itrofimow
Copy link
Contributor Author

I have a binary with gdb-index of size ~250Mb, and for that binary llvm-dwarfdump --gdb-index takes basically forever (10+ minutes) to complete.
With the patch applied it takes ~5s

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement!

Hmm, actually at a high level: I guess this ConstantPoolVectors isn't sorted, is it? So we can't do a binary search... could we sort it? I guess not - since we do want to dump it in a way that matches the input too (in case the on-disk ordering is important to debugging the data at some point)?

(oh, and high level question, if you're interested/able: What's your interest in gdb_index? Myself, I've worked on various indexing solutions at Google due to the large size of single binaries we have, for a while but we rarely see traction/interest in these tools outside of Google - so it's always interesting to make friends with folks who are facing similar problems)

@@ -60,6 +61,24 @@ void DWARFGdbIndex::dumpSymbolTable(raw_ostream &OS) const {
", filled slots:",
SymbolTableOffset, (uint64_t)SymbolTable.size())
<< '\n';

std::unordered_map<uint32_t, decltype(ConstantPoolVectors)::const_iterator>
CuVectorMap{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably drop these {} as they're redundant.

@@ -60,6 +61,24 @@ void DWARFGdbIndex::dumpSymbolTable(raw_ostream &OS) const {
", filled slots:",
SymbolTableOffset, (uint64_t)SymbolTable.size())
<< '\n';

std::unordered_map<uint32_t, decltype(ConstantPoolVectors)::const_iterator>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +70 to +73
const auto it = CuVectorMap.find(vecOffset);
if (it != CuVectorMap.end()) {
return it->second;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM's coding style skips {} on single-line blocks, and reduces scope where possible, so probably something like:

if (auto it = CuVectorMap.find(vecOffset); it != CuVectorMap.end())
  return it->second;

CuVectorMap{};
CuVectorMap.reserve(ConstantPoolVectors.size());
const auto FindCuVector =
[&CuVectorMap, notFound = ConstantPoolVectors.end()](uint32_t vecOffset) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably wouldn't bother separately capturing things like this in a locally scoped non-erased functor. Probably just use [&] and return ConstantPoolVectors.end(); in the fail-case.

Comment on lines +78 to +80
++it) {
CuVectorMap.emplace(it->first, it);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could drop the {} here.

Could use a range-based for loop, and instead of putting iterators as values in the map, use pointers (then you can get a pointer to the value in the range based for loop where there aren't any visible/name-able iterators)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants