# Thrust

* Template library of parallel algorithms and data structures
* STL-like interfaces (Containers, Algorithms)
* Build on top of CUDA / HIP but with *higher abstraction*
* Links to external resources:
  * [Thrust Homepage @ NVIDIA](https://nvidia.github.io/cccl/thrust/) (Documentation and API Referenz)
  * [rocThrust documentation](https://rocm.docs.amd.com/projects/rocThrust/en/latest/)

## Supported Compilers

* AMD ROCm compiler
* LLVM compiler
* NVIDIA CUDA compiler

## Hardware Portability

* AMD GPUs ([ROCm Libraries - rocthrust](https://github.com/ROCm/rocm-libraries))
* NVIDIA GPUs ([CUDA Core Compute Libraries - Thrust](https://nvidia.github.io/cccl/thrust/))

## Example Code

* Source code available in [exampleThrust.cu](../src/exampleThrust.cu)

* Include Thrust support into your code

  ```c++
  #include <thrust/device_vector.h>
  #include <thrust/execution_policy.h>
  #include <thrust/for_each.h>
  #include <thrust/host_vector.h>

  int main() {
    // ..
  }
  ```

* Platform information -> only provided by the used backend

* Allocate memory on the host

  ```c++
  thrust::host_vector<double> a(size);
  ```

* Allocate memory on the device

  ```c++
  thrust::device_vector<double> device_a(size);
  ```

* Pre-process / initialize data on the host
  e.g. read data from storage

  ```c++
  for (unsigned int i = 0; i < size; i++) {
      a[i] = 1;
  };
  ```

* Copy data from the host to the device

  ```c++
  thrust::copy(a.begin(), a.end(), device_a.begin());
  ```

* Compute on the device

  * Without access to vector index

    ```c++
    thrust::for_each_n(
        thrust::device,
        device_a.begin(),
        device_a.size(),
        // kernel expressed as lambda expression
        [=] __device__ __host__(double &a_i) {
            a_i++;
        });
    ```

  * With access to vector index

    ```c++
    double *data_ptr = thrust::raw_pointer_cast(device_a.data());
    thrust::for_each(
        thrust::device,
        thrust::make_counting_iterator<size_t>(0),
        thrust::make_counting_iterator<size_t>(size),
        // kernel expressed as lambda expression
        [=] __device__ __host__(size_t i) {
            data_ptr[i]++;
        });
    ```

* Transfer data back from the device to the host

  ```c++
  thrust::copy(device_a.begin(), device_a.end(), a.begin());
  ```

* Post-process data on the host
  e.g. write data to storage or perform consistency checks

  ```c++
  for (unsigned int i = 0; i < size; i++) {
      if (a[i] != 3.) {
          cout << "a[" << i << "] = " << a[i] << endl;
          errx(2, "Computation on GPU failed");
      }
  }
  ```

### Compilation

In [1]:
#!/usr/bin/bash
# AMD ROCm Compiler
! hipcc \
    -O2 -march=native -flto -Wall -Wextra -std=c++20 \
    --offload-arch=native -foffload-lto \
    "../src/exampleThrust.cu" -o "../bin/exampleThrust"

In [None]:
#!/usr/bin/bash
# LLVM for AMD GPUs
! clang++-21 \
    -O2 -march=native -Wall -Wextra -fuse-ld=lld -Wl,-rpath=/usr/lib/llvm-21/lib \
    -x hip --offload-arch=native \
    "../src/exampleThrust.cu" -o "../bin/exampleThrust"

In [None]:
#!/usr/bin/bash
# LLVM for NVIDIA GPUs
! clang++-21 \
    -O2 -march=native -Wall -Wextra -fuse-ld=lld -Wl,-rpath=/usr/lib/llvm-21/lib \
    -x cuda --offload-arch=native \
    "../src/exampleThrust.cu" -o "../bin/exampleThrust" \
    -L/usr/local/cuda/targets/x86_64-linux/lib -lcudart

In [2]:
#!/usr/bin/bash
# NVIDIA CUDA Compiler
! module purge; \
  module add devel/cuda/12.9; \
  nvcc \
    -O2 -ccbin='gcc' --forward-unknown-to-host-compiler -march=native -Wall -Wextra \
    --extended-lambda --gpu-architecture=native \
    "../src/exampleThrust.cu" -o "../bin/exampleThrust"

### Execution

In [3]:
#!/usr/bin/bash
# NVIDIA CUDA Compiler
! module purge; \
  module add devel/cuda/12.9; \
  ../bin/exampleThrust

* Allocate memory on the host
* Allocate memory on the device
* Pre-process / initialize data on the host
  e.g. read data from storage
* Copy data from the host to the device
* Compute on the device
* Transfer data back from the device to the host
* Post-process data on the host
  e.g. write data to storage or perform consistency checks
