Skip to content

Commit 6e4cbad

Browse files
committed
add gpu device side instructions
1 parent 049583c commit 6e4cbad

File tree

3 files changed

+83
-28
lines changed

3 files changed

+83
-28
lines changed

src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@
103103
- [The `rustdoc-json` test suite](./rustdoc-internals/rustdoc-json-test-suite.md)
104104
- [GPU offload internals](./offload/internals.md)
105105
- [Installation](./offload/installation.md)
106+
- [Usage](./offload/usage.md)
106107
- [Autodiff internals](./autodiff/internals.md)
107108
- [Installation](./autodiff/installation.md)
108109
- [How to debug](./autodiff/debugging.md)

src/offload/installation.md

Lines changed: 1 addition & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Installation
22

3-
In the future, `std::offload` should become available in nightly builds for users. For now, everyone still needs to build rustc from source.
3+
`std::offload` is partly available in nightly builds for users. For now, everyone however still needs to build rustc from source to use all features of it.
44

55
## Build instructions
66

@@ -42,30 +42,3 @@ run
4242
```
4343
./x test --stage 1 tests/codegen-llvm/gpu_offload
4444
```
45-
46-
## Usage
47-
It is important to use a clang compiler build on the same llvm as rustc. Just calling clang without the full path will likely use your system clang, which probably will be incompatible.
48-
```
49-
/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/stage1/bin/rustc --edition=2024 --crate-type cdylib src/main.rs --emit=llvm-ir -O -C lto=fat -Cpanic=abort -Zoffload=Enable
50-
/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/llvm/bin/clang++ -fopenmp --offload-arch=native -g -O3 main.ll -o main -save-temps
51-
LIBOMPTARGET_INFO=-1 ./main
52-
```
53-
The first step will generate a `main.ll` file, which has enough instructions to cause the offload runtime to move data to and from a gpu.
54-
The second step will use clang as the compilation driver to compile our IR file down to a working binary. Only a very small Rust subset will work out of the box here, unless
55-
you use features like build-std, which are not covered by this guide. Look at the codegen test to get a feeling for how to write a working example.
56-
In the last step you can run your binary, if all went well you will see a data transfer being reported:
57-
```
58-
omptarget device 0 info: Entering OpenMP data region with being_mapper at unknown:0:0 with 1 arguments:
59-
omptarget device 0 info: tofrom(unknown)[1024]
60-
omptarget device 0 info: Creating new map entry with HstPtrBase=0x00007fffffff9540, HstPtrBegin=0x00007fffffff9540, TgtAllocBegin=0x0000155547200000, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=1, HoldRefCount=0, Name=unknown
61-
omptarget device 0 info: Copying data from host to device, HstPtr=0x00007fffffff9540, TgtPtr=0x0000155547200000, Size=1024, Name=unknown
62-
omptarget device 0 info: OpenMP Host-Device pointer mappings after block at unknown:0:0:
63-
omptarget device 0 info: Host Ptr Target Ptr Size (B) DynRefCount HoldRefCount Declaration
64-
omptarget device 0 info: 0x00007fffffff9540 0x0000155547200000 1024 1 0 unknown at unknown:0:0
65-
// some other output
66-
omptarget device 0 info: Exiting OpenMP data region with end_mapper at unknown:0:0 with 1 arguments:
67-
omptarget device 0 info: tofrom(unknown)[1024]
68-
omptarget device 0 info: Mapping exists with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=0 (decremented, delayed deletion), HoldRefCount=0
69-
omptarget device 0 info: Copying data from device to host, TgtPtr=0x0000155547200000, HstPtr=0x00007fffffff9540, Size=1024, Name=unknown
70-
omptarget device 0 info: Removing map entry with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, Name=unknown
71-
```

src/offload/usage.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Usage
2+
3+
This feature is work-in-progress, and not ready for usage. The instructions here are for contributors, or people interested in following the latest progress.
4+
We currently work on launching the following Rust kernel on the GPU. To follow along, copy it to a `src/lib.rs` file.
5+
6+
```rust
7+
#![feature(abi_gpu_kernel)]
8+
#![no_std]
9+
10+
#[panic_handler]
11+
fn panic(_: &core::panic::PanicInfo) -> ! {
12+
loop {}
13+
}
14+
15+
#[unsafe(no_mangle)]
16+
#[inline(never)]
17+
fn main() {
18+
let mut x = [3.0; 256];
19+
//if cfg!(target_os = "linux") {
20+
#[cfg(target_os = "linux")]
21+
{
22+
kernel_1(&mut x);
23+
}
24+
core::hint::black_box(&x);
25+
}
26+
27+
#[cfg(target_os = "linux")]
28+
#[unsafe(no_mangle)]
29+
#[inline(never)]
30+
pub fn kernel_1(x: &mut [f32; 256]) {
31+
x[0] = 21.0;
32+
//for i in 0..256 {
33+
// x[i] = 21.0;
34+
//}
35+
}
36+
37+
#[cfg(not(target_os = "linux"))]
38+
#[unsafe(no_mangle)]
39+
#[inline(never)]
40+
pub extern "gpu-kernel" fn kernel_2(x: &mut [f32; 256]) {
41+
x[0] = 21.0;
42+
//for i in 0..256 {
43+
// x[i] = 21.0;
44+
//}
45+
}
46+
```
47+
48+
49+
## Usage for memory transfer
50+
It is important to use a clang compiler build on the same llvm as rustc. Just calling clang without the full path will likely use your system clang, which probably will be incompatible.
51+
```
52+
/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/stage1/bin/rustc +offload --edition=2024 --crate-type cdylib src/lib.rs --emit=llvm-ir -O -C lto=fat -Cpanic=abort -Zoffload=Enable
53+
/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/llvm/bin/clang++ -fopenmp --offload-arch=native -g -O3 lib.ll -o main -save-temps
54+
LIBOMPTARGET_INFO=-1 ./main
55+
```
56+
The first step will generate a `main.ll` file, which has enough instructions to cause the offload runtime to move data to and from a gpu.
57+
The second step will use clang as the compilation driver to compile our IR file down to a working binary. Only a very small Rust subset will work out of the box here.
58+
In the last step you can run your binary, if all went well you will see a data transfer being reported:
59+
```
60+
omptarget device 0 info: Entering OpenMP data region with being_mapper at unknown:0:0 with 1 arguments:
61+
omptarget device 0 info: tofrom(unknown)[1024]
62+
omptarget device 0 info: Creating new map entry with HstPtrBase=0x00007fffffff9540, HstPtrBegin=0x00007fffffff9540, TgtAllocBegin=0x0000155547200000, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=1, HoldRefCount=0, Name=unknown
63+
omptarget device 0 info: Copying data from host to device, HstPtr=0x00007fffffff9540, TgtPtr=0x0000155547200000, Size=1024, Name=unknown
64+
omptarget device 0 info: OpenMP Host-Device pointer mappings after block at unknown:0:0:
65+
omptarget device 0 info: Host Ptr Target Ptr Size (B) DynRefCount HoldRefCount Declaration
66+
omptarget device 0 info: 0x00007fffffff9540 0x0000155547200000 1024 1 0 unknown at unknown:0:0
67+
// some other output
68+
omptarget device 0 info: Exiting OpenMP data region with end_mapper at unknown:0:0 with 1 arguments:
69+
omptarget device 0 info: tofrom(unknown)[1024]
70+
omptarget device 0 info: Mapping exists with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=0 (decremented, delayed deletion), HoldRefCount=0
71+
omptarget device 0 info: Copying data from device to host, TgtPtr=0x0000155547200000, HstPtr=0x00007fffffff9540, Size=1024, Name=unknown
72+
omptarget device 0 info: Removing map entry with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, Name=unknown
73+
```
74+
75+
## Usage for gpu kernel launches
76+
This feature is not fully implemented yet. We recommend to check out the following PR for experiments: https://github.com/rust-lang/rust/pull/142696
77+
It allows compiling Rust code to a GPU and to inspect the IR, but it will not yet launch it. In a follow-up PR we will automate this step and unite it with the usage above.
78+
```
79+
RUSTFLAGS="-Ctarget-cpu=gfx90a" cargo +offload build -Zunstable-options -r --target amdgcn-amd-amdhsa -Zbuild-std=core
80+
```
81+
You will need to adjust the target-cpu to match your local GPU.

0 commit comments

Comments
 (0)