# OpenACC (Open Accelerators)

* Directive-based parallel programming model for C, C++ and Fortran
  <!-- Similar concept as OpenMP
       * primary mode of programming is directives
       * Additional runtime functions -->
* Launched before OpenMP provided GPU offloading support -> Focus on accelerators
* Many of the OpenACC concepts have since been incorporated into OpenMP
* Managed by the nonprofit OpenACC Organization
* Links to external resources:
  * [OpenACC Homepage](https://www.openacc.org) (Specification, Guides and Tutorials)
  * [OpenACC Programming and Best Practices Guide](https://openacc-best-practices-guide.readthedocs.io/en/latest/)

## Supported Compilers

* GCC (OpenACC 2.6 from 2017)
* NVIDIA HPC SDK Compiler

## Hardware Portability

* CPUs
* NVIDIA GPUs

## Example Code

* Source code available in [exampleOpenACC.c](../src/exampleOpenACC.c)

* Include OpenACC Support into your code

  ```c
  #ifdef _OPENACC
      #include <openacc.h>
  #else
      #error "OpenACC support required"
  #endif

  int main() {
      // ...

      return 0;
  }
  ```

 * Platform information

   ```c
   acc_device_t devicetype = acc_device_default;
   int num_devices = acc_get_num_devices(devicetype);
   int device_num  = acc_get_device_num(devicetype);
   acc_set_device_num(device_num, devicetype);
   // 201306 -> 2.0
   // 201510 -> 2.5
   // 201711 -> 2.6
   printf("OpenACC version (yyyymm): %i\n", _OPENACC);
   printf("Number of OpenACC devices: %i\n", num_devices);
   printf("OpenACC Device number: %i\n", device_num);
   // acc_get_property, acc_get_property_string introduced with OpenACC 2.6
   #if _OPENACC >= 201711
   long int    mem             = acc_get_property(       device_num, acc_device_current, acc_property_memory);
   long int    free_mem        = acc_get_property(       device_num, acc_device_current, acc_property_free_memory);
   const char *property_name   = acc_get_property_string(device_num, acc_device_current, acc_property_name);
   const char *property_vendor = acc_get_property_string(device_num, acc_device_current, acc_property_vendor );
   const char *property_driver = acc_get_property_string(device_num, acc_device_current, acc_property_driver );
   printf("Memory on OpenACC device: %li\n", mem);
   printf("Free Memory on OpenACC device: %li\n", free_mem);
   if (property_name != NULL) {
       printf("OpenACC device name: %s\n", property_name);
   }
   if (property_vendor != NULL) {
       printf("OpenACC device vendor: %s\n", property_vendor);
   }
   if (property_driver != NULL) {
       printf("OpenACC device driver: %s\n", property_driver);
   }
   #endif
   ```

* Device selection can be done by setting environment variable `ACC_DEVICE_NUM`

* Allocate memory on the host

  ```c
  double *a = (double *) malloc(size * sizeof(double));
  if (a == NULL) {
      errx(1, "malloc a[] failed");
  }
  ```


* Pre-process / initialize data on the host
  e.g. read data from storage

  ```c
  for (unsigned int i = 0; i < size; i++) {
      a[i] = 1.;
  }
  ```


* Automatically allocate memory on the device
* Copy data from the host to the device

  ```c
  #pragma acc enter data copyin(a[0:size])
  {}
  ```


* Compute on the device

  ```c
  #pragma acc parallel loop present(a[0:size]) wait
  for (unsigned int i = 0; i < size; i++) {
      a[i]++;
  }
  ```


* Transfer data back from the device to the host

  ```c
  #pragma acc update host(a[0:size])
  {}
  ```


* Delete data on the device

  ```c
  #pragma acc exit data delete(a[0:size])
  {}
  ```


* Post-process data on the host
  e.g. write data to storage

  ```c
  for (unsigned int i = 0; i < size; i++) {
      if (a[i] != 2.) {
          errx(2, "Computation on GPU failed");
      }
  }
  ```

* Free memory on the host

  ```c
  free(a);
  ```


### Compilation

In [None]:
#!/usr/bin/bash
# GCC
! gcc \
    -O2 -march=native -Wall -Wextra \
    -fopenacc \
    "../src/exampleOpenACC.c" -o "../bin/exampleOpenACC"

In [None]:
#!/usr/bin/bash
# NVIDIA HPC SDK Compiler
! nvc \
    -O2 -tp=host -Minform=inform \
    -acc -target=gpu -gpu=ccnative \
    "../src/exampleOpenACC.c" -o "../bin/exampleOpenACC" # NVIDIA

### Execution

In [1]:
! ./exampleOpenACC

OpenACC version (yyyymm): 201711
Number of OpenACC devices: 4
OpenACC Device number: 0
Memory on OpenACC device: 99960487936
Free Memory on OpenACC device: 99374923776
OpenACC device name: NVIDIA H100
OpenACC device vendor: NVIDIA
OpenACC device driver: 12080
* Allocate memory on the host
* Pre-process / initialize data on the host
  e.g. read data from storage
* Automatically allocate memory on the device
* Copy data from the host to the device
* Compute on the device
* Transfer data back from the device to the host
* Delete data on the device
* Post-process data on the host
  e.g. write data to storage
* Free memory on the host
