向量加法 程序流程 創建平台 選擇設備 創建上下文 創建命令隊列 創建程序對象 編譯程序 開辟內存 創建Kernel對象 設置Kernel參數 運行Kernel 拷貝結果回主機 釋放資源 完整代碼 效果截圖 Hello World 代碼如下
//創建平台對象
status = clGetPlatformIDs( 1, &platform, NULL );
注意:上式是選擇默認的第一個平台。如果我們系統中安裝不止一個opencl平台,如何選擇自己需要的平台? 比如我現在安裝了intel和NVIDIA平台。那麼我們就需要進行一個選擇判斷。第一次調用是獲取平台的數量,numPlatforms裡面存的就是平台的數量。第二個是獲取可用的平台。另外,我也沒有增加錯誤檢測之類的代碼,但是我增加了一個 status 的變量,通常如果函數執行正確,返回的值是 0。
/*Step1: Getting platforms and choose an available one.*/
cl_uint numPlatforms; //the NO. of platforms
cl_platform_id platform = NULL; //the chosen platform
cl_int status = clGetPlatformIDs(0, NULL, &numPlatforms);
if (status != CL_SUCCESS)
{
cout << "Error: Getting platforms!" << endl;
return FAILURE;
}
/*For clarity, choose the first available platform. */
if(numPlatforms > 0)
{
cl_platform_id* platforms = (cl_platform_id* )malloc(numPlatforms* sizeof(cl_platform_id));
status = clGetPlatformIDs(numPlatforms, platforms, NULL);
platform = platforms[0];
free(platforms);
}
詢問設備名稱,並選擇一個。
/*Step 2:Query the platform and choose the first GPU device if has one.Otherwise use the CPU as device.*/
cl_uint numDevices = 0;
cl_device_id *devices;
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);
if (numDevices == 0) //no GPU available.
{
cout << "No GPU device available." << endl;
cout << "Choose CPU as default device." << endl;
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 0, NULL, &numDevices);
devices = (cl_device_id*)malloc(numDevices * sizeof(cl_device_id));
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, numDevices, devices, NULL);
}
else
{
devices = (cl_device_id*)malloc(numDevices * sizeof(cl_device_id));
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, numDevices, devices, NULL);
}
下面我們來看下 OpenCL 中 Context 的概念。通常,Context 是指管理 OpenCL 對象和資源的上下文環境。為了管理 OpenCL 程序,下面的一些對象都要和 Context 關聯起來:
? 設備(Devices): 執行 Kernel 程序對象。
? 程序對象(Program objects): kernel 程序源代碼
? Kernels: 運行在 OpenCL 設備上的函數
? 內存對象(Memory objects): 設備上存放數據
? 命令隊列(Command queues): 設備的交互機制
? 內存命令(Memory commands)(用於在主機內存和設備內存之間拷貝數據)
? Kernel 執行(Kernel execution)
? 同步(Synchronization)
注意:創建一個 Context 的時候,我們必須把一個或多個設備和它關聯起來。對於其它的 OpenCL 資源,它們創建時候,也要和 Context 關聯起來,一般創建這些資源的 OpenCL 函數的輸入參數中,都會有 context。
/*Step 3: Create context.*/
cl_context context = clCreateContext(NULL,1, devices,NULL,NULL,NULL);
接下來,我們要看下命令隊列。在 OpenCL 中,命令隊列就是主機的請求,在設備上執行的一種機制。
在 Kernel 執行前,我們一般要進行一些內存拷貝的工作,比如把主機內存中的數據傳輸到設備內存中。
另外要注意的幾點就是:對於不同的設備,它們都有自己的獨立的命令隊列;命令隊列中的命令 (kernel 函數)可能是同步的,也可能是異步的,它們的執行順序可以是有序的,也可以是亂序的。
命令隊列在 device 和 context 之間建立了一個連接。
命令隊列 properties 指定一下內容:
? 是否亂序執行(在 AMD GPU 中,好像現在還不支持亂序執行)
? 是否啟動 profiling。 Profiling 通過事件機制來得到 kernel 執行時間等有用的信息,但它本身也會有一些開銷。
/*Step 4: Creating command queue associate with the context.*/
cl_command_queue commandQueue = clCreateCommandQueue(context, devices[0], 0, NULL);
clCreateProgramWithSource()這個函數通過源代碼 (strings),創建一個程序對象,其中 counts 指定源代碼串的數量,lengths 指定源代碼串的長度(為 NULL 結束的串時,可以省略)。當然,我們還必須自己編寫一個從文件中讀取源代碼串的函數。
/*Step 5: Create program object */
const char *filename = "Vadd.cl";
string sourceStr;
status = convertToString(filename, sourceStr);
const char *source = sourceStr.c_str();
size_t sourceSize[] = {strlen(source)};
cl_program program = clCreateProgramWithSource(context, 1, &source, sourceSize, NULL);
//從文件中讀取源代碼串的函數
/* convert the kernel file into a string */
int convertToString(const char *filename, std::string& s)
{
size_t size;
char* str;
std::fstream f(filename, (std::fstream::in | std::fstream::binary));
if(f.is_open())
{
size_t fileSize;
f.seekg(0, std::fstream::end);
size = fileSize = (size_t)f.tellg();
f.seekg(0, std::fstream::beg);
str = new char[size+1];
if(!str)
{
f.close();
return 0;
}
f.read(str, fileSize);
f.close();
str[size] = '\0';
s = str;
delete[] str;
return 0;
}
cout<<"Error: failed to open file\n:"<
編譯程序
/*Step 6: Build program. */
status=clBuildProgram(program, 1,devices,NULL,NULL,NULL);
if(status != 0)
{return -1;} //如果創建成功,clBuildProgram返回0.
開辟內存
OpenCL 內存對象就是一些 OpenCL 數據,這些數據一般在設備內存中,能夠被拷入也能夠被拷出。 OpenCL 內存對象包括 buffer 對象和 image 對象。
? Buffer 對象:連續的內存塊 —-順序存儲,能夠通過指針、行列式等直接訪問。
? Image 對象:是 2 維或 3 維的內存對象,只能通過 read_image() 或 write_image() 來讀取。 image 對象可以是可讀或可寫的,但不能同時既可讀又可寫。
該函數會在指定的 context 上創建一個 buffer 對象,image 對象相對比較復雜,留在後面再講。 flags 參數指定 buffer對象的讀寫屬性,host_ptr 可以是 NULL,如果不為 NULL,一般是一個有效的 host buffer 對象,這時,函數創建 OpenCL buffer 對象後,會把對應 host buffer 的內容拷貝到 OpenCL buffer 中。
在 Kernel 執行之前,host 中原始輸入數據必須顯式的傳到 device 中,Kernel 執行完後,結果也要從 device 內存中傳回到 host 內存中。我們主要通過函數 clEnqueue{Read/Write}Buffer/Image} 來實現這兩種操作。從 host 到 device,我們用 clEnqueueWrite,從 device 到 host,我們用 clEnqueueRead。 clEnqueueWrite 命令包括初始化內存對象以及把host 數據傳到 device 內存這兩種操作。當然,像前面一段說的那樣,也可以把 host buffer 指針直接用在 CreateBuffer 函數中來實現隱式的數據寫操作。
/*Step 7: Initial input,output for the host and create memory objects for the kernel*/
const char* input = "GdkknVnqkc";
size_t strlength = strlen(input);
cout << "input string:" << endl;
cout << input << endl;
char *output = (char*) malloc(strlength + 1);
cl_mem inputBuffer = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR, (strlength + 1) * sizeof(char),(void *) input, NULL);
cl_mem outputBuffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY , (strlength + 1) * sizeof(char), NULL, NULL);
創建Kernel對象
在這裡要特別說明,”“
/*Step 8: Create kernel object */
cl_kernel kernel = clCreateKernel(program,"vecadd", NULL);
設置Kernel參數
這裡的參數設置就是傳給kernel的參數,0,1,2就是順序,sizeof就是類型,還有一個就是存在從機上的地址。
/*Step 9: Sets Kernel arguments.*/
cl_int clnum = BUFSIZE;
status = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*) &clbuf1);
status = clSetKernelArg(kernel, 1, sizeof(cl_mem), (void*) &clbuf2);
clSetKernelArg(kernel, 2, sizeof(cl_mem), (void*) &buffer);
運行Kernel
//執行kernel,Range用1維,work itmes size 為BUFSIZE,沒有設置group size,這時候,系統會使用默認的work group size。
size_t global_work_size = BUFSIZE;
status = clEnqueueNDRangeKernel( queue, kernel, 1,
NULL,&global_work_size, NULL, 0, NULL, &ev);
/*Step 10: Running the kernel.*/
size_t global_work_size = BUFSIZE;
status = clEnqueueNDRangeKernel( queue, kernel, 1,
NULL,&global_work_size, NULL, 0, NULL, &ev);
拷貝結果回主機
/*Step 11: Read the cout put back to host memory.*/
cl_float *ptr;
cl_event mapevt;
ptr = (cl_float *) clEnqueueMapBuffer( queue,
buffer,CL_TRUE, CL_MAP_READ, 0, BUFSIZE * sizeof(cl_float), 0, NULL, NULL, NULL );
釋放資源
/*Step 12: Clean the resources.*/
status = clReleaseKernel(kernel); //Release kernel.
status = clReleaseProgram(program); //Release the program object.
status = clReleaseMemObject(clbuf1);
status = clReleaseMemObject(clbuf2);
status = clReleaseMemObject(buffer);
status = clReleaseCommandQueue(commandQueue); //Release Command queue.
status = clReleaseContext(context); //Release context.
if (buffer != NULL)
{
free(clbuf1);
output = NULL;
}
if (devices != NULL)
{
free(devices);
devices = NULL;
}
完整代碼
kernel
__kernel void vecadd(__global const float* A, __global const float* B, __global
float* C)
{
int id = get_global_id(0);
C[id] = A[id] + B[id];
}
c++
#include
#include
#include
#include
#include
#include
using namespace std;
#define NWITEMS 6
//把文本文件讀入一個 string 中
int convertToString(const char *filename, std::string& s)
{
size_t size;
char* str;
std::fstream f(filename, (std::fstream::in | std::fstream::binary));
if (f.is_open())
{
size_t fileSize;
f.seekg(0, std::fstream::end);
size = fileSize = (size_t)f.tellg();
f.seekg(0, std::fstream::beg);
str = new char[size + 1];
if (!str)
{
f.close();
return NULL;
}
f.read(str, fileSize);
f.close();
str[size] = '\0';
s = str;
delete[] str;
return 0;
}
printf("Error: Failed to open file %s\n", filename);
return 1;
}
int main(int argc, char* argv[])
{
//在 host 內存中創建三個緩沖區
float *buf1 = 0;
float *buf2 = 0;
float *buf = 0;
buf1 = (float *)malloc(NWITEMS * sizeof(float));
buf2 = (float *)malloc(NWITEMS * sizeof(float));
buf = (float *)malloc(NWITEMS * sizeof(float));
//初始化 buf1 和buf2 的內容
int i;
srand((unsigned)time(NULL));
for (i = 0; i < NWITEMS; i++)
cin >> buf1[i];
//srand((unsigned)time(NULL) + 1000);
for (i = 0; i < NWITEMS; i++)
cin >> buf2[i];
for (i = 0; i < NWITEMS; i++)
buf[i] = buf1[i] + buf2[i];
cl_uint status;
cl_platform_id platform;
//創建平台對象
status = clGetPlatformIDs(1, &platform, NULL);
cl_device_id device;
//創建 GPU 設備
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU,
1,
&device,
NULL);
//創建context
cl_context context = clCreateContext(NULL,
1,
&device,
NULL, NULL, NULL);
//創建命令隊列
cl_command_queue queue = clCreateCommandQueue(context,
device,
CL_QUEUE_PROFILING_ENABLE, NULL);
//創建三個 OpenCL 內存對象,並把buf1 的內容通過隱式拷貝的方式
//拷貝到clbuf1, buf2 的內容通過顯示拷貝的方式拷貝到clbuf2
cl_mem clbuf1 = clCreateBuffer(context,
CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
NWITEMS*sizeof(cl_float), buf1,
NULL);
cl_mem clbuf2 = clCreateBuffer(context,
CL_MEM_READ_ONLY,
NWITEMS*sizeof(cl_float), NULL,
NULL);
status = clEnqueueWriteBuffer(queue, clbuf2, 1,
0, NWITEMS*sizeof(cl_float), buf2, 0, 0, 0);
cl_mem buffer = clCreateBuffer(context,
CL_MEM_WRITE_ONLY,
NWITEMS * sizeof(cl_float),
NULL, NULL);
const char * filename = "Vadd.cl";
std::string sourceStr;
status = convertToString(filename, sourceStr);
const char * source = sourceStr.c_str();
size_t sourceSize[] = { strlen(source) };
//創建程序對象
cl_program program = clCreateProgramWithSource(
context,
1,
&source,
sourceSize,
NULL);
//編譯程序對象
status = clBuildProgram(program, 1, &device, NULL, NULL, NULL);
if (status)
cout << status << endl;
if (status != 0)
{
printf("clBuild failed:%d\n", status);
char tbuf[0x10000];
clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0x10000, tbuf,
NULL);
printf("\n%s\n", tbuf);
//return ?1;
}
//創建 Kernel 對象
cl_kernel kernel = clCreateKernel(program, "Vadd", NULL);
//設置 Kernel 參數
cl_int clnum = NWITEMS;
status = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&clbuf1);
if (status)
cout << status << endl;
status = clSetKernelArg(kernel, 1, sizeof(cl_mem), (void*)&clbuf2);
if (status)
cout << status << endl;
clSetKernelArg(kernel, 2, sizeof(cl_mem), (void*)&buffer);
if (status)
cout << status << endl;
//執行 kernel
cl_event ev;
size_t global_work_size = NWITEMS;
clEnqueueNDRangeKernel(queue,
kernel,
1,
NULL,
&global_work_size,
NULL, 0, NULL, &ev);
//clFinish(queue);
//數據拷回 host 內存
cl_float *ptr;
ptr = (cl_float *)clEnqueueMapBuffer(queue,
buffer,
CL_TRUE,
CL_MAP_READ,
0,
NWITEMS * sizeof(cl_float),
0, NULL, NULL, NULL);
//結果驗證,和 cpu 計算的結果比較
for (int i = 0; i < NWITEMS; i++)
cout << ptr[i] << endl;
if (!memcmp(buf, ptr, NWITEMS))
printf("Verify passed\n");
else printf("verify failed\n");
if (buf)
free(buf);
if (buf1)
free(buf1);
if (buf2)
free(buf2);
//刪除 OpenCL 資源對象
clReleaseMemObject(clbuf1);
clReleaseMemObject(clbuf2);
clReleaseMemObject(buffer);
clReleaseProgram(program);
clReleaseCommandQueue(queue);
clReleaseContext(context);
system("pause");
return 0;
}
效果截圖
Hello World
代碼如下
寫得非常詳細的過程,各個步驟都寫出來了。一個hello world就已經這麼變態,要是改別的算法,簡直不敢想像。
/**********************************************************************
Copyright ?014 Advanced Micro Devices, Inc. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
?Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
?Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
********************************************************************/
// For clarity,error checking has been omitted.
#include
#include
#include
#include
#include
#include
#include
#define SUCCESS 0
#define FAILURE 1
using namespace std;
/* convert the kernel file into a string */
int convertToString(const char *filename, std::string& s)
{
size_t size;
char* str;
std::fstream f(filename, (std::fstream::in | std::fstream::binary));
if(f.is_open())
{
size_t fileSize;
f.seekg(0, std::fstream::end);
size = fileSize = (size_t)f.tellg();
f.seekg(0, std::fstream::beg);
str = new char[size+1];
if(!str)
{
f.close();
return 0;
}
f.read(str, fileSize);
f.close();
str[size] = '\0';
s = str;
delete[] str;
return 0;
}
cout<<"Error: failed to open file\n:"< 0)
{
cl_platform_id* platforms = (cl_platform_id* )malloc(numPlatforms* sizeof(cl_platform_id));
status = clGetPlatformIDs(numPlatforms, platforms, NULL);
platform = platforms[0];
free(platforms);
}
/*Step 2:Query the platform and choose the first GPU device if has one.Otherwise use the CPU as device.*/
cl_uint numDevices = 0;
cl_device_id *devices;
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);
if (numDevices == 0) //no GPU available.
{
cout << "No GPU device available." << endl;
cout << "Choose CPU as default device." << endl;
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 0, NULL, &numDevices);
devices = (cl_device_id*)malloc(numDevices * sizeof(cl_device_id));
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, numDevices, devices, NULL);
}
else
{
devices = (cl_device_id*)malloc(numDevices * sizeof(cl_device_id));
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, numDevices, devices, NULL);
}
/*Step 3: Create context.*/
cl_context context = clCreateContext(NULL,1, devices,NULL,NULL,NULL);
/*Step 4: Creating command queue associate with the context.*/
cl_command_queue commandQueue = clCreateCommandQueue(context, devices[0], 0, NULL);
/*Step 5: Create program object */
const char *filename = "HelloWorld_Kernel.cl";
string sourceStr;
status = convertToString(filename, sourceStr);
const char *source = sourceStr.c_str();
size_t sourceSize[] = {strlen(source)};
cl_program program = clCreateProgramWithSource(context, 1, &source, sourceSize, NULL);
/*Step 6: Build program. */
status=clBuildProgram(program, 1,devices,NULL,NULL,NULL);
/*Step 7: Initial input,output for the host and create memory objects for the kernel*/
const char* input = "GdkknVnqkc";
size_t strlength = strlen(input);
cout << "input string:" << endl;
cout << input << endl;
char *output = (char*) malloc(strlength + 1);
cl_mem inputBuffer = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR, (strlength + 1) * sizeof(char),(void *) input, NULL);
cl_mem outputBuffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY , (strlength + 1) * sizeof(char), NULL, NULL);
/*Step 8: Create kernel object */
cl_kernel kernel = clCreateKernel(program,"helloworld", NULL);
/*Step 9: Sets Kernel arguments.*/
status = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&inputBuffer);
status = clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&outputBuffer);
/*Step 10: Running the kernel.*/
size_t global_work_size[1] = {strlength};
status = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL, global_work_size, NULL, 0, NULL, NULL);
/*Step 11: Read the cout put back to host memory.*/
status = clEnqueueReadBuffer(commandQueue, outputBuffer, CL_TRUE, 0, strlength * sizeof(char), output, 0, NULL, NULL);
output[strlength] = '\0'; //Add the terminal character to the end of output.
cout << "\noutput string:" << endl;
cout << output << endl;
/*Step 12: Clean the resources.*/
status = clReleaseKernel(kernel); //Release kernel.
status = clReleaseProgram(program); //Release the program object.
status = clReleaseMemObject(inputBuffer); //Release mem object.
status = clReleaseMemObject(outputBuffer);
status = clReleaseCommandQueue(commandQueue); //Release Command queue.
status = clReleaseContext(context); //Release context.
if (output != NULL)
{
free(output);
output = NULL;
}
if (devices != NULL)
{
free(devices);
devices = NULL;
}
std::cout<<"Passed!\n";
return SUCCESS;
}