Skip to content

[lld][LoongArch] GOT indirection to PC relative optimization #123743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 1, 2025

Conversation

ylzsx
Copy link
Contributor

@ylzsx ylzsx commented Jan 21, 2025

In LoongArch, we try GOT indirection to PC relative optimization in normal or medium code model, whether or not with R_LARCH_RELAX relocation.

From:

  • pcalau12i $a0, %got_pc_hi20(sym_got)
  • ld.w/d $a0, $a0, %got_pc_lo12(sym_got)

To:

  • pcalau12i $a0, %pc_hi20(sym)
  • addi.w/d $a0, $a0, %pc_lo12(sym)

If the original code sequence can be relaxed into a single instruction pcaddi, this patch will not be taken (see #123566).
The optimization related to GOT is split into two locations because the relax() function is part of an iteration fixed-point algorithm. We should minimize it to achieve better linker performance.

Note: Althouth the optimization has been performed, the GOT entries still exists, similarly to AArch64. Eliminating the entries will increase code complexity.

@llvmbot
Copy link
Member

llvmbot commented Jan 21, 2025

@llvm/pr-subscribers-backend-loongarch

@llvm/pr-subscribers-lld-elf

Author: Zhaoxin Yang (ylzsx)

Changes

In LoongArch, this optimization is only supported when relaxation is enabled.
From:

  • pcalau12i $a0, %got_pc_hi20(sym_got)
  • ld.w/d $a0, $a0, %got_pc_lo12(sym_got)
    To:
  • pcalau12i $a0, %pc_hi20(sym)
  • addi.w/d $a0, $a0, %pc_lo12(sym)

If the original code sequence can be relaxed into a single instruction pcaddi, this patch will not be taken (see #123566).
The implementation related to got is split into two locations because the relax() function is part of an iteration fixed-point algorithm. We should minimize it to achieve better linker performance.

FIXME: Althouth the optimization has been performed, the GOT entries still exists, similarly to AArch64. Eliminating the entries may be require additional marking in the common code.


Full diff: https://github.com/llvm/llvm-project/pull/123743.diff

2 Files Affected:

  • (modified) lld/ELF/Arch/LoongArch.cpp (+70)
  • (modified) lld/test/ELF/loongarch-relax-pc-hi20-lo12.s (+6-4)
diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 5f49b23e8ffb1a..6ae45e109a6dec 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -47,6 +47,8 @@ class LoongArch final : public TargetInfo {
   void tlsIeToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
   void tlsdescToIe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
   void tlsdescToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  bool tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
+                     const Relocation &rLo12, uint64_t secAddr) const;
 };
 } // end anonymous namespace
 
@@ -1150,6 +1152,58 @@ void LoongArch::tlsdescToLe(uint8_t *loc, const Relocation &rel,
   }
 }
 
+// Try GOT indirection to PC relative optimization when relaxation is enabled.
+// From:
+//  * pcalau12i $a0, %got_pc_hi20(sym_got)
+//  * ld.w/d    $a0, $a0, %got_pc_lo12(sym_got)
+// To:
+//  * pcalau12i $a0, %pc_hi20(sym)
+//  * addi.w/d  $a0, $a0, %pc_lo12(sym)
+//
+// FIXME: Althouth the optimization has been performed, the GOT entries still
+// exists, similarly to AArch64. Eliminating the entries may be require
+// additional marking in the common code.
+bool LoongArch::tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
+                              const Relocation &rLo12, uint64_t secAddr) const {
+  if (!rHi20.sym->isDefined() || rHi20.sym->isPreemptible ||
+      rHi20.sym->isGnuIFunc() ||
+      (ctx.arg.isPic && !cast<Defined>(*rHi20.sym).section))
+    return false;
+
+  Symbol &sym = *rHi20.sym;
+  uint64_t symLocal = sym.getVA(ctx) + rHi20.addend;
+  // Check if the address difference is within +/-2GB range.
+  // For simplicity, the range mentioned here is an approximate estimate and is
+  // not fully equivalent to the entire region that PC-relative addressing can
+  // cover.
+  int64_t pageOffset =
+      getLoongArchPage(symLocal) - getLoongArchPage(secAddr + rHi20.offset);
+  if (!isInt<20>(pageOffset >> 12))
+    return false;
+
+  Relocation newRHi20 = {RE_LOONGARCH_PAGE_PC, R_LARCH_PCALA_HI20, rHi20.offset,
+                         rHi20.addend, &sym};
+  Relocation newRLo12 = {R_ABS, R_LARCH_PCALA_LO12, rLo12.offset, rLo12.addend,
+                         &sym};
+
+  const uint32_t currInsn = read32le(loc);
+  const uint32_t nextInsn = read32le(loc + 4);
+  // Check if use the same register.
+  if (getD5(currInsn) != getJ5(nextInsn) || getJ5(nextInsn) != getD5(nextInsn))
+    return false;
+
+  uint64_t pageDelta =
+      getLoongArchPageDelta(symLocal, secAddr + rHi20.offset, rHi20.type);
+  // pcalau12i $a0, %pc_hi20
+  write32le(loc, insn(PCALAU12I, getD5(currInsn), 0, 0));
+  relocate(loc, newRHi20, pageDelta);
+  // addi.w/d $a0, $a0, %pc_lo12
+  write32le(loc + 4, insn(ctx.arg.is64 ? ADDI_D : ADDI_W, getD5(nextInsn),
+                          getJ5(nextInsn), 0));
+  relocate(loc + 4, newRLo12, SignExtend64(symLocal, 64));
+  return true;
+}
+
 // During TLSDESC GD_TO_IE, the converted code sequence always includes an
 // instruction related to the Lo12 relocation (ld.[wd]). To obtain correct val
 // in `getRelocTargetVA`, expr of this instruction should be adjusted to
@@ -1259,6 +1313,22 @@ void LoongArch::relocateAlloc(InputSectionBase &sec, uint8_t *buf) const {
         tlsdescToLe(loc, rel, val);
       }
       continue;
+    case RE_LOONGARCH_GOT_PAGE_PC:
+      // In LoongArch, we try GOT indirection to PC relative optimization only
+      // when relaxation is enabled. This approach avoids determining whether
+      // relocation types are paired and whether the destination register of
+      // pcalau12i is only used by the immediately following instruction.
+      // Moreover, if the original code sequence can be relaxed to a single
+      // instruction `pcaddi`, the first instruction will be removed and it will
+      // not reach here.
+      if (isPairRelaxable(relocs, i) && rel.type == R_LARCH_GOT_PC_HI20 &&
+          relocs[i + 2].type == R_LARCH_GOT_PC_LO12 &&
+          tryGotToPCRel(loc, rel, relocs[i + 2], secAddr)) {
+        i = i + 3; // skip relocations R_LARCH_RELAX, R_LARCH_GOT_PC_LO12,
+                   // R_LARCH_RELAX
+        continue;
+      }
+      break;
     default:
       break;
     }
diff --git a/lld/test/ELF/loongarch-relax-pc-hi20-lo12.s b/lld/test/ELF/loongarch-relax-pc-hi20-lo12.s
index 760fe77d774e30..ae3b29e14fb3c1 100644
--- a/lld/test/ELF/loongarch-relax-pc-hi20-lo12.s
+++ b/lld/test/ELF/loongarch-relax-pc-hi20-lo12.s
@@ -30,24 +30,26 @@
 ## offset = 0x410000 - 0x10000: 0x400 pages, page offset 0
 # NORELAX32-NEXT:  10000:  pcalau12i     $a0, 1024
 # NORELAX32-NEXT:          addi.w        $a0, $a0, 0
+## Not relaxation, convertion to PCRel.
 # NORELAX32-NEXT:          pcalau12i     $a0, 1024
-# NORELAX32-NEXT:          ld.w          $a0, $a0, 4
+# NORELAX32-NEXT:          addi.w        $a0, $a0, 0
 # NORELAX32-NEXT:          pcalau12i     $a0, 1024
 # NORELAX32-NEXT:          addi.w        $a0, $a0, 0
 # NORELAX32-NEXT:          pcalau12i     $a0, 1024
-# NORELAX32-NEXT:          ld.w          $a0, $a0, 4
+# NORELAX32-NEXT:          addi.w        $a0, $a0, 0
 
 # NORELAX64-LABEL: <_start>:
 ## offset exceed range of pcaddi
 ## offset = 0x410000 - 0x10000: 0x400 pages, page offset 0
 # NORELAX64-NEXT:  10000:  pcalau12i     $a0, 1024
 # NORELAX64-NEXT:          addi.d        $a0, $a0, 0
+## Not relaxation, convertion to PCRel.
 # NORELAX64-NEXT:          pcalau12i     $a0, 1024
-# NORELAX64-NEXT:          ld.d          $a0, $a0, 8
+# NORELAX64-NEXT:          addi.d        $a0, $a0, 0
 # NORELAX64-NEXT:          pcalau12i     $a0, 1024
 # NORELAX64-NEXT:          addi.d        $a0, $a0, 0
 # NORELAX64-NEXT:          pcalau12i     $a0, 1024
-# NORELAX64-NEXT:          ld.d          $a0, $a0, 8
+# NORELAX64-NEXT:          addi.d        $a0, $a0, 0
 
 .section .text
 .global _start

@llvmbot
Copy link
Member

llvmbot commented Jan 21, 2025

@llvm/pr-subscribers-lld

Author: Zhaoxin Yang (ylzsx)

Changes

In LoongArch, this optimization is only supported when relaxation is enabled.
From:

  • pcalau12i $a0, %got_pc_hi20(sym_got)
  • ld.w/d $a0, $a0, %got_pc_lo12(sym_got)
    To:
  • pcalau12i $a0, %pc_hi20(sym)
  • addi.w/d $a0, $a0, %pc_lo12(sym)

If the original code sequence can be relaxed into a single instruction pcaddi, this patch will not be taken (see #123566).
The implementation related to got is split into two locations because the relax() function is part of an iteration fixed-point algorithm. We should minimize it to achieve better linker performance.

FIXME: Althouth the optimization has been performed, the GOT entries still exists, similarly to AArch64. Eliminating the entries may be require additional marking in the common code.


Full diff: https://github.com/llvm/llvm-project/pull/123743.diff

2 Files Affected:

  • (modified) lld/ELF/Arch/LoongArch.cpp (+70)
  • (modified) lld/test/ELF/loongarch-relax-pc-hi20-lo12.s (+6-4)
diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 5f49b23e8ffb1a..6ae45e109a6dec 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -47,6 +47,8 @@ class LoongArch final : public TargetInfo {
   void tlsIeToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
   void tlsdescToIe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
   void tlsdescToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
+  bool tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
+                     const Relocation &rLo12, uint64_t secAddr) const;
 };
 } // end anonymous namespace
 
@@ -1150,6 +1152,58 @@ void LoongArch::tlsdescToLe(uint8_t *loc, const Relocation &rel,
   }
 }
 
+// Try GOT indirection to PC relative optimization when relaxation is enabled.
+// From:
+//  * pcalau12i $a0, %got_pc_hi20(sym_got)
+//  * ld.w/d    $a0, $a0, %got_pc_lo12(sym_got)
+// To:
+//  * pcalau12i $a0, %pc_hi20(sym)
+//  * addi.w/d  $a0, $a0, %pc_lo12(sym)
+//
+// FIXME: Althouth the optimization has been performed, the GOT entries still
+// exists, similarly to AArch64. Eliminating the entries may be require
+// additional marking in the common code.
+bool LoongArch::tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
+                              const Relocation &rLo12, uint64_t secAddr) const {
+  if (!rHi20.sym->isDefined() || rHi20.sym->isPreemptible ||
+      rHi20.sym->isGnuIFunc() ||
+      (ctx.arg.isPic && !cast<Defined>(*rHi20.sym).section))
+    return false;
+
+  Symbol &sym = *rHi20.sym;
+  uint64_t symLocal = sym.getVA(ctx) + rHi20.addend;
+  // Check if the address difference is within +/-2GB range.
+  // For simplicity, the range mentioned here is an approximate estimate and is
+  // not fully equivalent to the entire region that PC-relative addressing can
+  // cover.
+  int64_t pageOffset =
+      getLoongArchPage(symLocal) - getLoongArchPage(secAddr + rHi20.offset);
+  if (!isInt<20>(pageOffset >> 12))
+    return false;
+
+  Relocation newRHi20 = {RE_LOONGARCH_PAGE_PC, R_LARCH_PCALA_HI20, rHi20.offset,
+                         rHi20.addend, &sym};
+  Relocation newRLo12 = {R_ABS, R_LARCH_PCALA_LO12, rLo12.offset, rLo12.addend,
+                         &sym};
+
+  const uint32_t currInsn = read32le(loc);
+  const uint32_t nextInsn = read32le(loc + 4);
+  // Check if use the same register.
+  if (getD5(currInsn) != getJ5(nextInsn) || getJ5(nextInsn) != getD5(nextInsn))
+    return false;
+
+  uint64_t pageDelta =
+      getLoongArchPageDelta(symLocal, secAddr + rHi20.offset, rHi20.type);
+  // pcalau12i $a0, %pc_hi20
+  write32le(loc, insn(PCALAU12I, getD5(currInsn), 0, 0));
+  relocate(loc, newRHi20, pageDelta);
+  // addi.w/d $a0, $a0, %pc_lo12
+  write32le(loc + 4, insn(ctx.arg.is64 ? ADDI_D : ADDI_W, getD5(nextInsn),
+                          getJ5(nextInsn), 0));
+  relocate(loc + 4, newRLo12, SignExtend64(symLocal, 64));
+  return true;
+}
+
 // During TLSDESC GD_TO_IE, the converted code sequence always includes an
 // instruction related to the Lo12 relocation (ld.[wd]). To obtain correct val
 // in `getRelocTargetVA`, expr of this instruction should be adjusted to
@@ -1259,6 +1313,22 @@ void LoongArch::relocateAlloc(InputSectionBase &sec, uint8_t *buf) const {
         tlsdescToLe(loc, rel, val);
       }
       continue;
+    case RE_LOONGARCH_GOT_PAGE_PC:
+      // In LoongArch, we try GOT indirection to PC relative optimization only
+      // when relaxation is enabled. This approach avoids determining whether
+      // relocation types are paired and whether the destination register of
+      // pcalau12i is only used by the immediately following instruction.
+      // Moreover, if the original code sequence can be relaxed to a single
+      // instruction `pcaddi`, the first instruction will be removed and it will
+      // not reach here.
+      if (isPairRelaxable(relocs, i) && rel.type == R_LARCH_GOT_PC_HI20 &&
+          relocs[i + 2].type == R_LARCH_GOT_PC_LO12 &&
+          tryGotToPCRel(loc, rel, relocs[i + 2], secAddr)) {
+        i = i + 3; // skip relocations R_LARCH_RELAX, R_LARCH_GOT_PC_LO12,
+                   // R_LARCH_RELAX
+        continue;
+      }
+      break;
     default:
       break;
     }
diff --git a/lld/test/ELF/loongarch-relax-pc-hi20-lo12.s b/lld/test/ELF/loongarch-relax-pc-hi20-lo12.s
index 760fe77d774e30..ae3b29e14fb3c1 100644
--- a/lld/test/ELF/loongarch-relax-pc-hi20-lo12.s
+++ b/lld/test/ELF/loongarch-relax-pc-hi20-lo12.s
@@ -30,24 +30,26 @@
 ## offset = 0x410000 - 0x10000: 0x400 pages, page offset 0
 # NORELAX32-NEXT:  10000:  pcalau12i     $a0, 1024
 # NORELAX32-NEXT:          addi.w        $a0, $a0, 0
+## Not relaxation, convertion to PCRel.
 # NORELAX32-NEXT:          pcalau12i     $a0, 1024
-# NORELAX32-NEXT:          ld.w          $a0, $a0, 4
+# NORELAX32-NEXT:          addi.w        $a0, $a0, 0
 # NORELAX32-NEXT:          pcalau12i     $a0, 1024
 # NORELAX32-NEXT:          addi.w        $a0, $a0, 0
 # NORELAX32-NEXT:          pcalau12i     $a0, 1024
-# NORELAX32-NEXT:          ld.w          $a0, $a0, 4
+# NORELAX32-NEXT:          addi.w        $a0, $a0, 0
 
 # NORELAX64-LABEL: <_start>:
 ## offset exceed range of pcaddi
 ## offset = 0x410000 - 0x10000: 0x400 pages, page offset 0
 # NORELAX64-NEXT:  10000:  pcalau12i     $a0, 1024
 # NORELAX64-NEXT:          addi.d        $a0, $a0, 0
+## Not relaxation, convertion to PCRel.
 # NORELAX64-NEXT:          pcalau12i     $a0, 1024
-# NORELAX64-NEXT:          ld.d          $a0, $a0, 8
+# NORELAX64-NEXT:          addi.d        $a0, $a0, 0
 # NORELAX64-NEXT:          pcalau12i     $a0, 1024
 # NORELAX64-NEXT:          addi.d        $a0, $a0, 0
 # NORELAX64-NEXT:          pcalau12i     $a0, 1024
-# NORELAX64-NEXT:          ld.d          $a0, $a0, 8
+# NORELAX64-NEXT:          addi.d        $a0, $a0, 0
 
 .section .text
 .global _start

@ylzsx
Copy link
Contributor Author

ylzsx commented Jan 21, 2025

I have submitted all the patches related to relaxation in lld for LoongArch. Below is a list for peer review:

@MaskRay
Copy link
Member

MaskRay commented Jan 22, 2025

Can you drop the trailing . in the subject?

@ylzsx ylzsx changed the title [lld][LoongArch] GOT indirection to PC relative optimization. [lld][LoongArch] GOT indirection to PC relative optimization Jan 22, 2025
@ylzsx ylzsx force-pushed the users/ylzsx/r-got-to-pcrel branch from 30a9eb5 to e3fc1d6 Compare January 22, 2025 06:14
@ylzsx ylzsx force-pushed the users/ylzsx/r-tlsdesc-to-iele-relax branch from e024b7c to 99a1e07 Compare January 22, 2025 06:15
@ylzsx ylzsx force-pushed the users/ylzsx/r-got-to-pcrel branch from e3fc1d6 to 47d84d9 Compare January 22, 2025 07:32
@SixWeining
Copy link
Contributor

cc @xen0n

@ylzsx ylzsx force-pushed the users/ylzsx/r-tlsdesc-to-iele-relax branch from 99a1e07 to f74a55b Compare May 14, 2025 04:05
@ylzsx ylzsx force-pushed the users/ylzsx/r-got-to-pcrel branch from 47d84d9 to 2d92dc3 Compare May 14, 2025 04:09
@SixWeining
Copy link
Contributor

cc @MQ-mengqing @xry111

@ylzsx ylzsx force-pushed the users/ylzsx/r-tlsdesc-to-iele-relax branch from f74a55b to fd76622 Compare July 3, 2025 07:01
Base automatically changed from users/ylzsx/r-tlsdesc-to-iele-relax to main July 23, 2025 09:12
ylzsx added 3 commits July 24, 2025 10:22
In LoongArch, this optimization is only supported when relaxation is enabled.
From:
 * pcalau12i $a0, %got_pc_hi20(sym_got)
 * ld.w/d $a0, $a0, %got_pc_lo12(sym_got)
To:
 * pcalau12i $a0, %pc_hi20(sym)
 * addi.w/d $a0, $a0, %pc_lo12(sym)

If the original code sequence can be relaxed into a single instruction
`pcaddi`, this patch will not be taken (see https://).
The implementation related to `got` is split into two locations because
the `relax()` function is part of an iteration fixed-point algorithm. We
should minimize it to achieve better linker performance.

FIXME: Althouth the optimization has been performed, the GOT entries still
exists, similarly to AArch64. Eliminating the entries may be require
additional marking in the common code.
@ylzsx ylzsx force-pushed the users/ylzsx/r-got-to-pcrel branch from 2d92dc3 to 9b06f46 Compare July 24, 2025 03:56
@ylzsx ylzsx force-pushed the users/ylzsx/r-got-to-pcrel branch from 6d1faaa to 0ff13d8 Compare July 28, 2025 12:07
@@ -1167,28 +1167,49 @@ void LoongArch::tlsdescToLe(uint8_t *loc, const Relocation &rel,
// complexity.
bool LoongArch::tryGotToPCRel(uint8_t *loc, const Relocation &rHi20,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only apply this relax when --relax is enabled for lld.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification: After careful consideration, I think we do not need to check the --relax option because linker relaxation is about to reduce the number of instructions while this pr is not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the term "linker optimization" is often used when the number of bytes does not change while "linker relaxation" is used when the number of bytes decreases.

While x86-64 and s390x don't have linker relaxation, they do support --no-relax. --no-relax is useful to disable this optimization.

@ylzsx ylzsx merged commit 283c47b into main Aug 1, 2025
9 checks passed
@ylzsx ylzsx deleted the users/ylzsx/r-got-to-pcrel branch August 1, 2025 06:45
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 1, 2025

LLVM Buildbot has detected a new failure on builder hip-third-party-libs-test running on ext_buildbot_hw_05-hip-docker while building lld at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/206/builds/4179

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py --jobs=32' (failure)
...
  File "/home/botworker/bbot/hip-third-party-libs-test/build/../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py", line 107, in step
    yield
  File "/home/botworker/bbot/hip-third-party-libs-test/build/../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py", line 84, in main
    run_command(cmake_command)
  File "/home/botworker/bbot/hip-third-party-libs-test/build/../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py", line 120, in run_command
    util.report_run_cmd(cmd, cwd=directory)
  File "/home/botworker/bbot/hip-third-party-libs-test/llvm-zorg/zorg/buildbot/builders/annotated/util.py", line 49, in report_run_cmd
    subprocess.check_call(cmd, shell=shell, *args, **kwargs)
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '-GNinja', '-B', 'TS-build', '-S', '.', '-DTEST_SUITE_EXTERNALS_DIR=/opt/botworker/llvm/External', '-DAMDGPU_ARCHS=gfx90a', '-DTEST_SUITE_SUBDIRS=External', '-DEXTERNAL_HIP_TESTS_KOKKOS=ON', '-DCMAKE_CXX_COMPILER=/opt/botworker/llvm/llvm-test-suite/bin/clang++', '-DCMAKE_C_COMPILER=/opt/botworker/llvm/llvm-test-suite/bin/clang']' returned non-zero exit status 1.
@@@STEP_FAILURE@@@
@@@BUILD_STEP build kokkos and test suite@@@
@@@HALT_ON_FAILURE@@@
Running: cmake --build TS-build --parallel --target build-kokkos
ninja: error: loading 'build.ninja': No such file or directory
['cmake', '--build', 'TS-build', '--parallel', '--target', 'build-kokkos'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/botworker/bbot/hip-third-party-libs-test/build/../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py", line 107, in step
    yield
  File "/home/botworker/bbot/hip-third-party-libs-test/build/../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py", line 92, in main
    run_command(["cmake", "--build", test_suite_build_dir, "--parallel", "--target", "build-kokkos"])
  File "/home/botworker/bbot/hip-third-party-libs-test/build/../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py", line 120, in run_command
    util.report_run_cmd(cmd, cwd=directory)
  File "/home/botworker/bbot/hip-third-party-libs-test/llvm-zorg/zorg/buildbot/builders/annotated/util.py", line 49, in report_run_cmd
    subprocess.check_call(cmd, shell=shell, *args, **kwargs)
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', 'TS-build', '--parallel', '--target', 'build-kokkos']' returned non-zero exit status 1.
@@@STEP_FAILURE@@@
@@@BUILD_STEP run kokkos test suite@@@
@@@HALT_ON_FAILURE@@@
Running: cmake --build TS-build --target test-kokkos
ninja: error: loading 'build.ninja': No such file or directory
['cmake', '--build', 'TS-build', '--target', 'test-kokkos'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/botworker/bbot/hip-third-party-libs-test/build/../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py", line 107, in step
    yield
  File "/home/botworker/bbot/hip-third-party-libs-test/build/../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py", line 98, in main
    run_command(["cmake", "--build", test_suite_build_dir, "--target", "test-kokkos"])
  File "/home/botworker/bbot/hip-third-party-libs-test/build/../llvm-zorg/zorg/buildbot/builders/annotated/hip-tpl.py", line 120, in run_command
    util.report_run_cmd(cmd, cwd=directory)
  File "/home/botworker/bbot/hip-third-party-libs-test/llvm-zorg/zorg/buildbot/builders/annotated/util.py", line 49, in report_run_cmd
    subprocess.check_call(cmd, shell=shell, *args, **kwargs)
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', 'TS-build', '--target', 'test-kokkos']' returned non-zero exit status 1.
@@@STEP_FAILURE@@@

tru pushed a commit to llvmbot/llvm-project that referenced this pull request Aug 5, 2025
…3743)

In LoongArch, we try GOT indirection to PC relative optimization in
normal or medium code model, whether or not with R_LARCH_RELAX
relocation.

From:
* pcalau12i $a0, %got_pc_hi20(sym_got)
* ld.w/d $a0, $a0, %got_pc_lo12(sym_got)

To:
* pcalau12i $a0, %pc_hi20(sym)
* addi.w/d $a0, $a0, %pc_lo12(sym)

If the original code sequence can be relaxed into a single instruction
`pcaddi`, this patch will not be taken (see
llvm#123566).
The optimization related to GOT is split into two locations because the
`relax()` function is part of an iteration fixed-point algorithm. We
should minimize it to achieve better linker performance.

Note: Althouth the optimization has been performed, the GOT entries
still exists, similarly to AArch64. Eliminating the entries will
increase code complexity.

(cherry picked from commit 283c47b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants