Introduction
This article will show how to use Intel® oneAPI Math Kernel Library (oneMKL) BLAS functions in Data Parallel C++ (dpc++). It will also show how to offload the computation to devices like CPUs and Intel® GPUs.
This article is the first of the oneMKL series. Future articles will show how to use different oneMKL functions from LAPACK and from different domains like Vector Math, Fast Fourier Transform, (FFT), Random Number Generators (RNG) as well as offloading the computation to the GPUs using languages other than dpc++ like C.
Before continuing, let’s briefly talk about what dpc++ and oneMKL are all about.
What is DPC++?
Data Parallel C++ (DPC++) is the oneAPI Implementation of SYCL*. It is an open and cross-architecture language and is designed for data parallel programming and heterogeneous computing.
What is oneMKL?
Intel® oneAPI Math Kernel Library (oneMKL) is a high-performance math library that contains highly optimized, threaded and vectorized routines that can be used for scientific, engineering, and financial applications. It provides key functionality for dense and sparse linear algebra (BLAS, LAPACK, PARDISO), FFTs, vector math, summary statistics, splines and so on.
In addition, oneMKL automatically takes advantage of special features like Intel® Advanced Vector Extensions 512 (Intel® AVX-512) to improve the application performance. More information about oneMKL can be found at the link in the reference section.
To use oneMKL, download and install the Intel® oneAPI Base Toolkit. This article utilizes version 2024.2 of the Intel® oneAPI Base Toolkit, which includes the DPC++ compiler and oneMKL.
The following section will show how to check if a system supports GPUs.
How to Check if a System supports Intel® GPU
Use the command sycl-ls to see if the system supports GPU. At the command prompt type:
You will see something similar to this:
If you see opencl:gpu:2… or level_zero:gpu… that means the system supports GPU.
For the list of supported GPUs, please see Intel® oneAPI Math Kernel Library System Requirements. The next section will show how to offload computation to GPUs.
How to Use oneMKL functions in a Program
The example below shows how to do matrix multiply of float data type:
C = alpha * op(A) * op(B) + beta * C
Where:
A: Matrix of m x k dimensions
B: Matrix of k x n dimensions
alpha, beta: Constant
Note that op() means that either or both matrices A and B can be of the following three types: trans, nontrans, conjtrans.
1) Include appropriate header files
2) Initialize data
Note that A, B and C will be stored as vectors (std::vector).
3) Specify whether the matrices are transposed
In this case both matrices A and B are non-transpose:
4) Add Exception handler
The following exception handler will catch asynchronous exceptions
5) Select devices to offload computation
If the desired device is GPU then use:
If the device is CPU then use:
6) Create queue and buffers
Note that the member function data() point to the first element of the matrix while size() returns the number of elements of the matrix.
7) Call oneMKL functions
How to build and link to oneMKL
The following instructions used in this article are for Linux. It is highly recommended that the Intel® oneAPI Math Kernel Library Link Line Advisor be used to build and link the program to oneMKL.
The following will show how to build a program and dynamically link to oneMKL:
Note that user will need to replace <gemm.cpp> with the actual program.
The above link line will link to the single-thread version of oneMKL. To link to the multi-threaded version of oneMKL, use the following link line:
If user decides to statically link to oneMKL then use the following link line:
Conclusion
Using dpc++ allows users to develop a single version of the code that can be run on either CPU or GPU by selecting which device to run on resulting in simplifying the maintaining process. Also, in general, using oneMKL functions not only reducing the development time of user’s application but also improving its performance. In addition to that, oneMKL automatically takes advantage of special features like Intel® Advanced Vector Extensions 512 (Intel® AVX-512). Users don’t have to worry about manually enabling such kind of features. They can concentrate on the functionality of their applications.