Skip to content

[CUDA] Use --image3 to construct fat binary #151760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 1, 2025
Merged

Conversation

Artem-B
Copy link
Member

@Artem-B Artem-B commented Aug 1, 2025

CUDA-12.9 has removed fatbinary tool's --image argument we've been using till now.

--image3 has been supported since cuda-9, so we do not need CUDA SDK version checks.

@Artem-B Artem-B requested a review from jhuber6 August 1, 2025 19:46
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Aug 1, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 1, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Author: Artem Belevich (Artem-B)

Changes

CUDA-12.9 has removed fatbinary tool's --image argument we've been using till now.

--image3 has been supported since cuda-9, so we do not need CUDA SDK version checks.


Full diff: https://github.com/llvm/llvm-project/pull/151760.diff

3 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/Cuda.cpp (+7-13)
  • (modified) clang/test/Driver/cuda-arch-translation.cu (+13-13)
  • (modified) clang/test/Driver/cuda-options.cu (+10-10)
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp
index 1f0b478c02b25..fdfcea852b4f2 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -549,22 +549,16 @@ void NVPTX::FatBinary::ConstructJob(Compilation &C, const JobAction &JA,
     auto *A = II.getAction();
     assert(A->getInputs().size() == 1 &&
            "Device offload action is expected to have a single input");
-    const char *gpu_arch_str = A->getOffloadingArch();
-    assert(gpu_arch_str &&
+    StringRef GpuArch = A->getOffloadingArch();
+    assert(!GpuArch.empty() &&
            "Device action expected to have associated a GPU architecture!");
-    OffloadArch gpu_arch = StringToOffloadArch(gpu_arch_str);
 
-    if (II.getType() == types::TY_PP_Asm &&
-        !shouldIncludePTX(Args, gpu_arch_str))
+    if (II.getType() == types::TY_PP_Asm && !shouldIncludePTX(Args, GpuArch))
       continue;
-    // We need to pass an Arch of the form "sm_XX" for cubin files and
-    // "compute_XX" for ptx.
-    const char *Arch = (II.getType() == types::TY_PP_Asm)
-                           ? OffloadArchToVirtualArchString(gpu_arch)
-                           : gpu_arch_str;
-    CmdArgs.push_back(
-        Args.MakeArgString(llvm::Twine("--image=profile=") + Arch +
-                           ",file=" + getToolChain().getInputFilename(II)));
+    StringRef Kind = (II.getType() == types::TY_PP_Asm) ? "ptx" : "elf";
+    CmdArgs.push_back(Args.MakeArgString(
+        "--image3=kind=" + Kind + ",sm=" + GpuArch.drop_front(3) +
+        ",file=" + getToolChain().getInputFilename(II)));
   }
 
   for (const auto &A : Args.getAllArgValues(options::OPT_Xcuda_fatbinary))
diff --git a/clang/test/Driver/cuda-arch-translation.cu b/clang/test/Driver/cuda-arch-translation.cu
index e4f83740a92eb..b4a521df2e6f5 100644
--- a/clang/test/Driver/cuda-arch-translation.cu
+++ b/clang/test/Driver/cuda-arch-translation.cu
@@ -68,19 +68,19 @@
 
 // HIP: clang-offload-bundler
 
-// SM20:--image=profile=sm_20{{.*}}
-// SM21:--image=profile=sm_21{{.*}}
-// SM30:--image=profile=sm_30{{.*}}
-// SM32:--image=profile=sm_32{{.*}}
-// SM35:--image=profile=sm_35{{.*}}
-// SM37:--image=profile=sm_37{{.*}}
-// SM50:--image=profile=sm_50{{.*}}
-// SM52:--image=profile=sm_52{{.*}}
-// SM53:--image=profile=sm_53{{.*}}
-// SM60:--image=profile=sm_60{{.*}}
-// SM61:--image=profile=sm_61{{.*}}
-// SM62:--image=profile=sm_62{{.*}}
-// SM70:--image=profile=sm_70{{.*}}
+// SM20:--image3=kind=elf,sm=20{{.*}}
+// SM21:--image3=kind=elf,sm=21{{.*}}
+// SM30:--image3=kind=elf,sm=30{{.*}}
+// SM32:--image3=kind=elf,sm=32{{.*}}
+// SM35:--image3=kind=elf,sm=35{{.*}}
+// SM37:--image3=kind=elf,sm=37{{.*}}
+// SM50:--image3=kind=elf,sm=50{{.*}}
+// SM52:--image3=kind=elf,sm=52{{.*}}
+// SM53:--image3=kind=elf,sm=53{{.*}}
+// SM60:--image3=kind=elf,sm=60{{.*}}
+// SM61:--image3=kind=elf,sm=61{{.*}}
+// SM62:--image3=kind=elf,sm=62{{.*}}
+// SM70:--image3=kind=elf,sm=70{{.*}}
 // GFX600:-targets=host-x86_64-unknown-linux-gnu,hipv4-amdgcn-amd-amdhsa--gfx600
 // GFX601:-targets=host-x86_64-unknown-linux-gnu,hipv4-amdgcn-amd-amdhsa--gfx601
 // GFX602:-targets=host-x86_64-unknown-linux-gnu,hipv4-amdgcn-amd-amdhsa--gfx602
diff --git a/clang/test/Driver/cuda-options.cu b/clang/test/Driver/cuda-options.cu
index db6536ca9e03b..fc8e83a2bb279 100644
--- a/clang/test/Driver/cuda-options.cu
+++ b/clang/test/Driver/cuda-options.cu
@@ -243,10 +243,10 @@
 
 // INCLUDES-DEVICE:fatbinary
 // INCLUDES-DEVICE-DAG: "--create" "[[FATBINARY:[^"]*]]"
-// INCLUDES-DEVICE-DAG: "--image=profile=sm_{{[0-9]+}},file=[[CUBINFILE]]"
-// INCLUDES-DEVICE-DAG: "--image=profile=compute_{{[0-9]+}},file=[[PTXFILE]]"
-// INCLUDES-DEVICE2-DAG: "--image=profile=sm_{{[0-9]+}},file=[[CUBINFILE2]]"
-// INCLUDES-DEVICE2-DAG: "--image=profile=compute_{{[0-9]+}},file=[[PTXFILE2]]"
+// INCLUDES-DEVICE-DAG: "--image3=kind=elf,sm={{[0-9]+}},file=[[CUBINFILE]]"
+// INCLUDES-DEVICE-DAG: "--image3=kind=ptx,sm={{[0-9]+}},file=[[PTXFILE]]"
+// INCLUDES-DEVICE2-DAG: "--image3=kind=elf,sm={{[0-9]+}},file=[[CUBINFILE2]]"
+// INCLUDES-DEVICE2-DAG: "--image3=kind=ptx,sm={{[0-9]+}},file=[[PTXFILE2]]"
 
 // Match host-side preprocessor job with -save-temps.
 // HOST-SAVE: "-cc1" "-triple" "x86_64-unknown-linux-gnu"
@@ -288,9 +288,9 @@
 
 // FATBIN-COMMON:fatbinary
 // FATBIN-COMMON: "--create" "[[FATBINARY:[^"]*]]"
-// FATBIN-COMMON: "--image=profile=sm_52,file=
-// PTX-SM52: "--image=profile=compute_52,file=
-// NOPTX-SM52-NOT: "--image=profile=compute_52,file=
-// FATBIN-COMMON: "--image=profile=sm_60,file=
-// PTX-SM60: "--image=profile=compute_60,file=
-// NOPTX-SM60-NOT: "--image=profile=compute_60,file=
+// FATBIN-COMMON: "--image3=kind=elf,sm=52,file=
+// PTX-SM52: "--image3=kind=ptx,sm=52,file=
+// NOPTX-SM52-NOT: "--image3=kind=ptx,sm=52,file=
+// FATBIN-COMMON: "--image3=kind=elf,sm=60,file=
+// PTX-SM60: "--image3=kind=ptx,sm=60,file=
+// NOPTX-SM60-NOT: "--image3=kind=ptx,sm=60,file=

CUDA-12.9 has removed fatbinary tool's `--image` argument we've been using till now.

--image3 has been supported since cuda-9, so we do not need CUDA SDK version checks.
@Artem-B Artem-B merged commit 12eab1a into llvm:main Aug 1, 2025
9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 1, 2025

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building clang at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/21980

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: commands/help/TestHelp.py (188 of 2294)
PASS: lldb-api :: commands/platform/basic/TestPlatformPython.py (189 of 2294)
PASS: lldb-api :: commands/memory/write/TestMemoryWrite.py (190 of 2294)
PASS: lldb-api :: commands/platform/basic/TestPlatformCommand.py (191 of 2294)
PASS: lldb-api :: commands/platform/file/close/TestPlatformFileClose.py (192 of 2294)
PASS: lldb-api :: commands/platform/file/read/TestPlatformFileRead.py (193 of 2294)
PASS: lldb-api :: commands/memory/read/TestMemoryRead.py (194 of 2294)
PASS: lldb-api :: commands/platform/connect/TestPlatformConnect.py (195 of 2294)
UNSUPPORTED: lldb-api :: commands/platform/sdk/TestPlatformSDK.py (196 of 2294)
UNRESOLVED: lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py (197 of 2294)
******************** TEST 'lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/commands/gui/spawn-threads -p TestGuiSpawnThreads.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision 12eab1a7b82d8db0e781862902b674332298162f)
  clang revision 12eab1a7b82d8db0e781862902b674332298162f
  llvm revision 12eab1a7b82d8db0e781862902b674332298162f
Skipping the following test categories: ['libc++', 'msvcstl', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_gui (TestGuiSpawnThreads.TestGuiSpawnThreadsTest)
======================================================================
ERROR: test_gui (TestGuiSpawnThreads.TestGuiSpawnThreadsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 151, in wrapper
    return func(*args, **kwargs)
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/commands/gui/spawn-threads/TestGuiSpawnThreads.py", line 44, in test_gui
    self.child.expect_exact(f"thread #{i + 2}: tid =")
  File "/usr/local/lib/python3.10/dist-packages/pexpect/spawnbase.py", line 432, in expect_exact
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 179, in expect_loop
    return self.eof(e)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 122, in eof
    raise exc
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
<pexpect.pty_spawn.spawn object at 0xf360cceb6050>
command: /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb
args: ['/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb', '--no-lldbinit', '--no-use-colors', '-O', 'settings clear --all', '-O', 'settings set symbols.enable-external-lookup false', '-O', 'settings set target.inherit-tcc true', '-O', 'settings set target.disable-aslr false', '-O', 'settings set target.detach-on-error false', '-O', 'settings set target.auto-apply-fixits false', '-O', 'settings set plugin.process.gdb-remote.packet-timeout 60', '-O', 'settings set symbols.clang-modules-cache-path "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"', '-O', 'settings set use-color false', '-O', 'settings set show-statusline false', '--file', '/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/commands/gui/spawn-threads/TestGuiSpawnThreads.test_gui/a.out']
buffer (last 100 chars): b''
before (last 100 chars): b'2 0x0000b12ef7b15030 _start (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb+0x45030)\n'
after: <class 'pexpect.exceptions.EOF'>

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 1, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-sie-win running on sie-win-worker while building clang at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/46/builds/21078

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'lld :: COFF/import_weak_alias.test' FAILED ********************
Exit Code: 3221225477

Command Output (stdout):
--
# RUN: at line 3
split-file Z:\b\llvm-clang-x86_64-sie-win\llvm-project\lld\test\COFF\import_weak_alias.test Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.dir
# executed command: split-file 'Z:\b\llvm-clang-x86_64-sie-win\llvm-project\lld\test\COFF\import_weak_alias.test' 'Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.dir'
# note: command had no output on stdout or stderr
# RUN: at line 4
z:\b\llvm-clang-x86_64-sie-win\build\bin\llvm-mc.exe --filetype=obj -triple=x86_64-windows-msvc Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.dir/foo.s -o Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.foo.obj
# executed command: 'z:\b\llvm-clang-x86_64-sie-win\build\bin\llvm-mc.exe' --filetype=obj -triple=x86_64-windows-msvc 'Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.dir/foo.s' -o 'Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.foo.obj'
# note: command had no output on stdout or stderr
# RUN: at line 5
z:\b\llvm-clang-x86_64-sie-win\build\bin\llvm-mc.exe --filetype=obj -triple=x86_64-windows-msvc Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.dir/qux.s -o Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.qux.obj
# executed command: 'z:\b\llvm-clang-x86_64-sie-win\build\bin\llvm-mc.exe' --filetype=obj -triple=x86_64-windows-msvc 'Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.dir/qux.s' -o 'Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.qux.obj'
# note: command had no output on stdout or stderr
# RUN: at line 6
z:\b\llvm-clang-x86_64-sie-win\build\bin\lld-link.exe Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.qux.obj Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.foo.obj -out:Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.dll -dll
# executed command: 'z:\b\llvm-clang-x86_64-sie-win\build\bin\lld-link.exe' 'Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.qux.obj' 'Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.foo.obj' '-out:Z:\b\llvm-clang-x86_64-sie-win\build\tools\lld\test\COFF\Output\import_weak_alias.test.tmp.dll' -dll
# .---command stderr------------
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
# | Stack dump:
# | 0.	Program arguments: z:\\b\\llvm-clang-x86_64-sie-win\\build\\bin\\lld-link.exe Z:\\b\\llvm-clang-x86_64-sie-win\\build\\tools\\lld\\test\\COFF\\Output\\import_weak_alias.test.tmp.qux.obj Z:\\b\\llvm-clang-x86_64-sie-win\\build\\tools\\lld\\test\\COFF\\Output\\import_weak_alias.test.tmp.foo.obj -out:Z:\\b\\llvm-clang-x86_64-sie-win\\build\\tools\\lld\\test\\COFF\\Output\\import_weak_alias.test.tmp.dll -dll
# | Exception Code: 0xC0000005
# | #0 0x00007ff8e5361b39 (C:\Windows\System32\KERNELBASE.dll+0x41b39)
# | #1 0x00007ff7ea67bb18 (z:\b\llvm-clang-x86_64-sie-win\build\bin\lld-link.exe+0xcbb18)
# | #2 0x00007ff7ea7032db (z:\b\llvm-clang-x86_64-sie-win\build\bin\lld-link.exe+0x1532db)
# | #3 0x00007ff7ea65d9aa (z:\b\llvm-clang-x86_64-sie-win\build\bin\lld-link.exe+0xad9aa)
# | #4 0x00007ff7ea65da14 (z:\b\llvm-clang-x86_64-sie-win\build\bin\lld-link.exe+0xada14)
# | #5 0x00007ff7ecdd0294 (z:\b\llvm-clang-x86_64-sie-win\build\bin\lld-link.exe+0x2820294)
# | #6 0x00007ff8e7947ac4 (C:\Windows\System32\KERNEL32.DLL+0x17ac4)
# | #7 0x00007ff8e879a8c1 (C:\Windows\SYSTEM32\ntdll.dll+0x5a8c1)
# `-----------------------------
# error: command failed with exit status: 0xc0000005

--

********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants