|
| 1 | +# Usage |
| 2 | + |
| 3 | +This feature is work-in-progress, and not ready for usage. The instructions here are for contributors, or people interested in following the latest progress. |
| 4 | +We currently work on launching the following Rust kernel on the GPU. To follow along, copy it to a `src/lib.rs` file. |
| 5 | + |
| 6 | +```rust |
| 7 | +#![feature(abi_gpu_kernel)] |
| 8 | +#![no_std] |
| 9 | + |
| 10 | +#[panic_handler] |
| 11 | +fn panic(_: &core::panic::PanicInfo) -> ! { |
| 12 | + loop {} |
| 13 | +} |
| 14 | + |
| 15 | +#[unsafe(no_mangle)] |
| 16 | +#[inline(never)] |
| 17 | +fn main() { |
| 18 | + let mut x = [3.0; 256]; |
| 19 | + //if cfg!(target_os = "linux") { |
| 20 | + #[cfg(target_os = "linux")] |
| 21 | + { |
| 22 | + kernel_1(&mut x); |
| 23 | + } |
| 24 | + core::hint::black_box(&x); |
| 25 | +} |
| 26 | + |
| 27 | +#[cfg(target_os = "linux")] |
| 28 | +#[unsafe(no_mangle)] |
| 29 | +#[inline(never)] |
| 30 | +pub fn kernel_1(x: &mut [f32; 256]) { |
| 31 | + x[0] = 21.0; |
| 32 | + //for i in 0..256 { |
| 33 | + // x[i] = 21.0; |
| 34 | + //} |
| 35 | +} |
| 36 | + |
| 37 | +#[cfg(not(target_os = "linux"))] |
| 38 | +#[unsafe(no_mangle)] |
| 39 | +#[inline(never)] |
| 40 | +pub extern "gpu-kernel" fn kernel_2(x: &mut [f32; 256]) { |
| 41 | + x[0] = 21.0; |
| 42 | + //for i in 0..256 { |
| 43 | + // x[i] = 21.0; |
| 44 | + //} |
| 45 | +} |
| 46 | +``` |
| 47 | + |
| 48 | + |
| 49 | +## Usage for memory transfer |
| 50 | +It is important to use a clang compiler build on the same llvm as rustc. Just calling clang without the full path will likely use your system clang, which probably will be incompatible. |
| 51 | +``` |
| 52 | +/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/stage1/bin/rustc +offload --edition=2024 --crate-type cdylib src/lib.rs --emit=llvm-ir -O -C lto=fat -Cpanic=abort -Zoffload=Enable |
| 53 | +/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/llvm/bin/clang++ -fopenmp --offload-arch=native -g -O3 lib.ll -o main -save-temps |
| 54 | +LIBOMPTARGET_INFO=-1 ./main |
| 55 | +``` |
| 56 | +The first step will generate a `main.ll` file, which has enough instructions to cause the offload runtime to move data to and from a gpu. |
| 57 | +The second step will use clang as the compilation driver to compile our IR file down to a working binary. Only a very small Rust subset will work out of the box here. |
| 58 | +In the last step you can run your binary, if all went well you will see a data transfer being reported: |
| 59 | +``` |
| 60 | +omptarget device 0 info: Entering OpenMP data region with being_mapper at unknown:0:0 with 1 arguments: |
| 61 | +omptarget device 0 info: tofrom(unknown)[1024] |
| 62 | +omptarget device 0 info: Creating new map entry with HstPtrBase=0x00007fffffff9540, HstPtrBegin=0x00007fffffff9540, TgtAllocBegin=0x0000155547200000, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=1, HoldRefCount=0, Name=unknown |
| 63 | +omptarget device 0 info: Copying data from host to device, HstPtr=0x00007fffffff9540, TgtPtr=0x0000155547200000, Size=1024, Name=unknown |
| 64 | +omptarget device 0 info: OpenMP Host-Device pointer mappings after block at unknown:0:0: |
| 65 | +omptarget device 0 info: Host Ptr Target Ptr Size (B) DynRefCount HoldRefCount Declaration |
| 66 | +omptarget device 0 info: 0x00007fffffff9540 0x0000155547200000 1024 1 0 unknown at unknown:0:0 |
| 67 | +// some other output |
| 68 | +omptarget device 0 info: Exiting OpenMP data region with end_mapper at unknown:0:0 with 1 arguments: |
| 69 | +omptarget device 0 info: tofrom(unknown)[1024] |
| 70 | +omptarget device 0 info: Mapping exists with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=0 (decremented, delayed deletion), HoldRefCount=0 |
| 71 | +omptarget device 0 info: Copying data from device to host, TgtPtr=0x0000155547200000, HstPtr=0x00007fffffff9540, Size=1024, Name=unknown |
| 72 | +omptarget device 0 info: Removing map entry with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, Name=unknown |
| 73 | +``` |
| 74 | + |
| 75 | +## Usage for gpu kernel launches |
| 76 | +This feature is not fully implemented yet. We recommend to check out the following PR for experiments: https://github.com/rust-lang/rust/pull/142696 |
| 77 | +It allows compiling Rust code to a GPU and to inspect the IR, but it will not yet launch it. In a follow-up PR we will automate this step and unite it with the usage above. |
| 78 | +``` |
| 79 | +RUSTFLAGS="-Ctarget-cpu=gfx90a" cargo +offload build -Zunstable-options -r --target amdgcn-amd-amdhsa -Zbuild-std=core |
| 80 | +``` |
| 81 | +You will need to adjust the target-cpu to match your local GPU. |
0 commit comments