OpenCL雙邊濾波實現美顏功能。本站提示廣大學習愛好者:(OpenCL雙邊濾波實現美顏功能)文章只能為提供參考,不一定能成為您想要的結果。以下是OpenCL雙邊濾波實現美顏功能正文
OpenCL是一個並行異構計算的框架,包括intel,AMD,英偉達等等許多廠家都有對它的支持,不過英偉達只到1.2版本,主要發展自己的CUDA去了。雖然沒有用過CUDA,但個人感覺CUDA比OpenCL更好一點,但OpenCL支持面更管,CPU,GPU,DSP,FPGA等多種芯片都能支持OpenCL。OpenCL與D3D中的像素著色器非常相似。
1.雙邊濾波原理雙邊濾波器的原理參考女神Rachel-Zhang的博客 雙邊濾波器的原理及實現. 引自Rachel-Zhang的博客,原理如下:
雙邊濾波(Bilateral filter)是一種可以保邊去噪的濾波器。之所以可以達到此去噪效果,是因為濾波器是由兩個函數構成。一個函數是由幾何空間距離決定濾波器系數。另一個由像素差值決定濾波器系數。可以與其相比較的兩個filter:高斯低通濾波器(http://en.wikipedia.org/wiki/Gaussian_filter)和α-截尾均值濾波器(去掉百分率為α的最小值和最大之後剩下像素的均值作為濾波器)。
雙邊濾波器中,輸出像素的值依賴於鄰域像素的值的加權組合,
,
權重系數w(i,j,k,l)取決於定義域核和值域核的乘積。同時考慮了空間域與值域的差別,而Gaussian Filter和α均值濾波分別只考慮了空間域和值域差別。
本文基於這個公式用OpenCL實現雙邊濾波來做美顏。
2.核函數磨皮算法原理參考自http://www.zealfilter.com/portal.php?mod=view&aid=138,其中的膚色檢測算法不好,我給去掉了,本來還要做個銳化處理的,但發現不做銳化效果也蠻好,所以就先沒做,學下一步的OpenCL時在做銳化。
const sampler_t sampler = CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST; kernel void bilateralBlur(read_only image2d_t src,write_only image2d_t dst) { int x = (int)get_global_id(0); int y = (int)get_global_id(1); if (x >= get_image_width(src) || y >= get_image_height(src)) return; int ksize = 11; float sigma_d = 3.0; float sigma_r = 0.1; float4 fij = read_imagef(src, sampler, (int2)(x, y)); float alpha = 0.2; float4 fkl; float dkl; float4 rkl; float4 wkl; float4 numerator = (float4)(0.0f,0.0f,0.0f,0.0f); float4 denominator = (float4)(1.0f, 1.0f, 1.0f, 1.0f); for (int K = -ksize / 2; K <= ksize / 2; K++) { for (int L = -ksize / 2; L <= ksize / 2; L++) { fkl = read_imagef(src, sampler, (int2)(x + K, y + L)); dkl = -(K*K + L*L) / (2 * sigma_d*sigma_d); rkl.x = -(fij.x - fkl.x)*(fij.x - fkl.x) / (2 * sigma_r*sigma_r); rkl.y = -(fij.y - fkl.y)*(fij.y - fkl.y) / (2 * sigma_r*sigma_r); rkl.z = -(fij.z - fkl.z)*(fij.z - fkl.z) / (2 * sigma_r*sigma_r); wkl.x = exp(dkl + rkl.x); wkl.y = exp(dkl + rkl.y); wkl.z = exp(dkl + rkl.z); numerator.x += fkl.x * wkl.x; numerator.y += fkl.y * wkl.y; numerator.z += fkl.z * wkl.z; denominator.x += wkl.x; denominator.y += wkl.y; denominator.z += wkl.z; } } float4 gij = (float4)(0.0f, 0.0f, 0.0f, 1.0f); if (denominator.x > 0 && denominator.y > 0 && denominator.z) { gij.x = numerator.x / denominator.x; gij.y = numerator.y / denominator.y; gij.z = numerator.z / denominator.z; //雙邊濾波後再做一個融合
gij.x = fij.x*alpha + gij.x*(1.0 - alpha); gij.y = fij.y*alpha + gij.y*(1.0 - alpha); gij.z = fij.z*alpha + gij.z*(1.0 - alpha); } write_imagef(dst, (int2)(x, y), gij); }
kernel函數裡面基本就是把數學公式寫出來,可以說是非常簡單的。
3.host端代碼OpenCL代碼分為host端的代碼和device端的代碼,kernel是跑在並行設備device上的,host一般適合跑串行的邏輯性強的代碼,device則比較適合用來做計算,如卷積運算。計算機中,通常把CPU當host,把GPU當device。不過實際上CPU也可以作為device,因為intel也是支持OpenCL的。本文以CPU為host,GPU為device。
#include "stdafx.h" #include <iostream> #include <fstream> #include <sstream> #include <malloc.h> #include <string.h> #include <opencv2/opencv.hpp> #include <CL/cl.h> //----------獲取OpenCL平台設備信息--------- void DisplayPlatformInfo( cl_platform_id id, cl_platform_info name, std::string str) { cl_int errNum; std::size_t paramValueSize; errNum = clGetPlatformInfo( id, name, 0, NULL, ¶mValueSize); if (errNum != CL_SUCCESS) { std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl; return; } char * info = (char *)alloca(sizeof(char) * paramValueSize); errNum = clGetPlatformInfo( id, name, paramValueSize, info, NULL); if (errNum != CL_SUCCESS) { std::cerr << "Failed to find OpenCL platform " << str << "." << std::endl; return; } std::cout << "\t" << str << ":\t" << info << std::endl; } template<typename T> void appendBitfield(T info, T value, std::string name, std::string & str) { if (info & value) { if (str.length() > 0) { str.append(" | "); } str.append(name); } } /// // Display information for a particular device. // As different calls to clGetDeviceInfo may return // values of different types a template is used. // As some values returned are arrays of values, a templated class is // used so it can be specialized for this case, see below. // template <typename T> class InfoDevice { public: static void display( cl_device_id id, cl_device_info name, std::string str) { cl_int errNum; std::size_t paramValueSize; errNum = clGetDeviceInfo( id, name, 0, NULL, ¶mValueSize); if (errNum != CL_SUCCESS) { std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl; return; } T * info = (T *)alloca(sizeof(T) * paramValueSize); errNum = clGetDeviceInfo( id, name, paramValueSize, info, NULL); if (errNum != CL_SUCCESS) { std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl; return; } // Handle a few special cases switch (name) { case CL_DEVICE_TYPE: { std::string deviceType; appendBitfield<cl_device_type>( *(reinterpret_cast<cl_device_type*>(info)), CL_DEVICE_TYPE_CPU, "CL_DEVICE_TYPE_CPU", deviceType); appendBitfield<cl_device_type>( *(reinterpret_cast<cl_device_type*>(info)), CL_DEVICE_TYPE_GPU, "CL_DEVICE_TYPE_GPU", deviceType); appendBitfield<cl_device_type>( *(reinterpret_cast<cl_device_type*>(info)), CL_DEVICE_TYPE_ACCELERATOR, "CL_DEVICE_TYPE_ACCELERATOR", deviceType); appendBitfield<cl_device_type>( *(reinterpret_cast<cl_device_type*>(info)), CL_DEVICE_TYPE_DEFAULT, "CL_DEVICE_TYPE_DEFAULT", deviceType); std::cout << "\t\t" << str << ":\t" << deviceType << std::endl; } break; case CL_DEVICE_SINGLE_FP_CONFIG: { std::string fpType; appendBitfield<cl_device_fp_config>( *(reinterpret_cast<cl_device_fp_config*>(info)), CL_FP_DENORM, "CL_FP_DENORM", fpType); appendBitfield<cl_device_fp_config>( *(reinterpret_cast<cl_device_fp_config*>(info)), CL_FP_INF_NAN, "CL_FP_INF_NAN", fpType); appendBitfield<cl_device_fp_config>( *(reinterpret_cast<cl_device_fp_config*>(info)), CL_FP_ROUND_TO_NEAREST, "CL_FP_ROUND_TO_NEAREST", fpType); appendBitfield<cl_device_fp_config>( *(reinterpret_cast<cl_device_fp_config*>(info)), CL_FP_ROUND_TO_ZERO, "CL_FP_ROUND_TO_ZERO", fpType); appendBitfield<cl_device_fp_config>( *(reinterpret_cast<cl_device_fp_config*>(info)), CL_FP_ROUND_TO_INF, "CL_FP_ROUND_TO_INF", fpType); appendBitfield<cl_device_fp_config>( *(reinterpret_cast<cl_device_fp_config*>(info)), CL_FP_FMA, "CL_FP_FMA", fpType); #ifdef CL_FP_SOFT_FLOAT appendBitfield<cl_device_fp_config>( *(reinterpret_cast<cl_device_fp_config*>(info)), CL_FP_SOFT_FLOAT, "CL_FP_SOFT_FLOAT", fpType); #endif std::cout << "\t\t" << str << ":\t" << fpType << std::endl; } case CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: { std::string memType; appendBitfield<cl_device_mem_cache_type>( *(reinterpret_cast<cl_device_mem_cache_type*>(info)), CL_NONE, "CL_NONE", memType); appendBitfield<cl_device_mem_cache_type>( *(reinterpret_cast<cl_device_mem_cache_type*>(info)), CL_READ_ONLY_CACHE, "CL_READ_ONLY_CACHE", memType); appendBitfield<cl_device_mem_cache_type>( *(reinterpret_cast<cl_device_mem_cache_type*>(info)), CL_READ_WRITE_CACHE, "CL_READ_WRITE_CACHE", memType); std::cout << "\t\t" << str << ":\t" << memType << std::endl; } break; case CL_DEVICE_LOCAL_MEM_TYPE: { std::string memType; appendBitfield<cl_device_local_mem_type>( *(reinterpret_cast<cl_device_local_mem_type*>(info)), CL_GLOBAL, "CL_LOCAL", memType); appendBitfield<cl_device_local_mem_type>( *(reinterpret_cast<cl_device_local_mem_type*>(info)), CL_GLOBAL, "CL_GLOBAL", memType); std::cout << "\t\t" << str << ":\t" << memType << std::endl; } break; case CL_DEVICE_EXECUTION_CAPABILITIES: { std::string memType; appendBitfield<cl_device_exec_capabilities>( *(reinterpret_cast<cl_device_exec_capabilities*>(info)), CL_EXEC_KERNEL, "CL_EXEC_KERNEL", memType); appendBitfield<cl_device_exec_capabilities>( *(reinterpret_cast<cl_device_exec_capabilities*>(info)), CL_EXEC_NATIVE_KERNEL, "CL_EXEC_NATIVE_KERNEL", memType); std::cout << "\t\t" << str << ":\t" << memType << std::endl; } break; case CL_DEVICE_QUEUE_PROPERTIES: { std::string memType; appendBitfield<cl_device_exec_capabilities>( *(reinterpret_cast<cl_device_exec_capabilities*>(info)), CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, "CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE", memType); appendBitfield<cl_device_exec_capabilities>( *(reinterpret_cast<cl_device_exec_capabilities*>(info)), CL_QUEUE_PROFILING_ENABLE, "CL_QUEUE_PROFILING_ENABLE", memType); std::cout << "\t\t" << str << ":\t" << memType << std::endl; } break; default: std::cout << "\t\t" << str << ":\t" << *info << std::endl; break; } } }; /// // Simple trait class used to wrap base types. // template <typename T> class ArrayType { public: static bool isChar() { return false; } }; /// // Specialized for the char (i.e. null terminated string case). // template<> class ArrayType<char> { public: static bool isChar() { return true; } }; /// // Specialized instance of class InfoDevice for array types. // template <typename T> class InfoDevice<ArrayType<T> > { public: static void display( cl_device_id id, cl_device_info name, std::string str) { cl_int errNum; std::size_t paramValueSize; errNum = clGetDeviceInfo( id, name, 0, NULL, ¶mValueSize); if (errNum != CL_SUCCESS) { std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl; return; } T * info = (T *)alloca(sizeof(T) * paramValueSize); errNum = clGetDeviceInfo( id, name, paramValueSize, info, NULL); if (errNum != CL_SUCCESS) { std::cerr << "Failed to find OpenCL device info " << str << "." << std::endl; return; } if (ArrayType<T>::isChar()) { std::cout << "\t" << str << ":\t" << info << std::endl; } else if (name == CL_DEVICE_MAX_WORK_ITEM_SIZES) { cl_uint maxWorkItemDimensions; errNum = clGetDeviceInfo( id, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, sizeof(cl_uint), &maxWorkItemDimensions, NULL); if (errNum != CL_SUCCESS) { std::cerr << "Failed to find OpenCL device info " << "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS." << std::endl; return; } std::cout << "\t" << str << ":\t"; for (cl_uint i = 0; i < maxWorkItemDimensions; i++) { std::cout << info[i] << " "; } std::cout << std::endl; } } }; /// // Enumerate platforms and display information about them // and their associated devices. // void displayInfo(void) { cl_int errNum; cl_uint numPlatforms; cl_platform_id * platformIds; cl_context context = NULL; // First, query the total number of platforms errNum = clGetPlatformIDs(0, NULL, &numPlatforms); if (errNum != CL_SUCCESS || numPlatforms <= 0) { std::cerr << "Failed to find any OpenCL platform." << std::endl; return; } // Next, allocate memory for the installed plaforms, and qeury // to get the list. platformIds = (cl_platform_id *)alloca(sizeof(cl_platform_id) * numPlatforms); // First, query the total number of platforms errNum = clGetPlatformIDs(numPlatforms, platformIds, NULL); if (errNum != CL_SUCCESS) { std::cerr << "Failed to find any OpenCL platforms." << std::endl; return; } std::cout << "Number of platforms: \t" << numPlatforms << std::endl; // Iterate through the list of platforms displaying associated information for (cl_uint i = 0; i < numPlatforms; i++) { // First we display information associated with the platform DisplayPlatformInfo( platformIds[i], CL_PLATFORM_PROFILE, "CL_PLATFORM_PROFILE"); DisplayPlatformInfo( platformIds[i], CL_PLATFORM_VERSION, "CL_PLATFORM_VERSION"); DisplayPlatformInfo( platformIds[i], CL_PLATFORM_VENDOR, "CL_PLATFORM_VENDOR"); DisplayPlatformInfo( platformIds[i], CL_PLATFORM_EXTENSIONS, "CL_PLATFORM_EXTENSIONS"); // Now query the set of devices associated with the platform cl_uint numDevices; errNum = clGetDeviceIDs( platformIds[i], CL_DEVICE_TYPE_ALL, 0, NULL, &numDevices); if (errNum != CL_SUCCESS) { std::cerr << "Failed to find OpenCL devices." << std::endl; return; } cl_device_id * devices = (cl_device_id *)alloca(sizeof(cl_device_id) * numDevices); errNum = clGetDeviceIDs( platformIds[i], CL_DEVICE_TYPE_ALL, numDevices, devices, NULL); if (errNum != CL_SUCCESS) { std::cerr << "Failed to find OpenCL devices." << std::endl; return; } std::cout << "\tNumber of devices: \t" << numDevices << std::endl; // Iterate through each device, displaying associated information for (cl_uint j = 0; j < numDevices; j++) { InfoDevice<cl_device_type>::display( devices[j], CL_DEVICE_TYPE, "CL_DEVICE_TYPE"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_VENDOR_ID, "CL_DEVICE_VENDOR_ID"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_MAX_COMPUTE_UNITS, "CL_DEVICE_MAX_COMPUTE_UNITS"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS"); InfoDevice<ArrayType<size_t> >::display( devices[j], CL_DEVICE_MAX_WORK_ITEM_SIZES, "CL_DEVICE_MAX_WORK_ITEM_SIZES"); InfoDevice<std::size_t>::display( devices[j], CL_DEVICE_MAX_WORK_GROUP_SIZE, "CL_DEVICE_MAX_WORK_GROUP_SIZE"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, "CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, "CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, "CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, "CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, "CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, "CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE"); #ifdef CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, "CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, "CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, "CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, "CL_DEVICE_NATIVE_VECTOR_WIDTH_INT"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, "CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, "CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, "CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, "CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF"); #endif InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_MAX_CLOCK_FREQUENCY, "CL_DEVICE_MAX_CLOCK_FREQUENCY"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_ADDRESS_BITS, "CL_DEVICE_ADDRESS_BITS"); InfoDevice<cl_ulong>::display( devices[j], CL_DEVICE_MAX_MEM_ALLOC_SIZE, "CL_DEVICE_MAX_MEM_ALLOC_SIZE"); InfoDevice<cl_bool>::display( devices[j], CL_DEVICE_IMAGE_SUPPORT, "CL_DEVICE_IMAGE_SUPPORT"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_MAX_READ_IMAGE_ARGS, "CL_DEVICE_MAX_READ_IMAGE_ARGS"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_MAX_WRITE_IMAGE_ARGS, "CL_DEVICE_MAX_WRITE_IMAGE_ARGS"); InfoDevice<std::size_t>::display( devices[j], CL_DEVICE_IMAGE2D_MAX_WIDTH, "CL_DEVICE_IMAGE2D_MAX_WIDTH"); InfoDevice<std::size_t>::display( devices[j], CL_DEVICE_IMAGE2D_MAX_WIDTH, "CL_DEVICE_IMAGE2D_MAX_WIDTH"); InfoDevice<std::size_t>::display( devices[j], CL_DEVICE_IMAGE2D_MAX_HEIGHT, "CL_DEVICE_IMAGE2D_MAX_HEIGHT"); InfoDevice<std::size_t>::display( devices[j], CL_DEVICE_IMAGE3D_MAX_WIDTH, "CL_DEVICE_IMAGE3D_MAX_WIDTH"); InfoDevice<std::size_t>::display( devices[j], CL_DEVICE_IMAGE3D_MAX_HEIGHT, "CL_DEVICE_IMAGE3D_MAX_HEIGHT"); InfoDevice<std::size_t>::display( devices[j], CL_DEVICE_IMAGE3D_MAX_DEPTH, "CL_DEVICE_IMAGE3D_MAX_DEPTH"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_MAX_SAMPLERS, "CL_DEVICE_MAX_SAMPLERS"); InfoDevice<std::size_t>::display( devices[j], CL_DEVICE_MAX_PARAMETER_SIZE, "CL_DEVICE_MAX_PARAMETER_SIZE"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_MEM_BASE_ADDR_ALIGN, "CL_DEVICE_MEM_BASE_ADDR_ALIGN"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE, "CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE"); InfoDevice<cl_device_fp_config>::display( devices[j], CL_DEVICE_SINGLE_FP_CONFIG, "CL_DEVICE_SINGLE_FP_CONFIG"); InfoDevice<cl_device_mem_cache_type>::display( devices[j], CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, "CL_DEVICE_GLOBAL_MEM_CACHE_TYPE"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, "CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE"); InfoDevice<cl_ulong>::display( devices[j], CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, "CL_DEVICE_GLOBAL_MEM_CACHE_SIZE"); InfoDevice<cl_ulong>::display( devices[j], CL_DEVICE_GLOBAL_MEM_SIZE, "CL_DEVICE_GLOBAL_MEM_SIZE"); InfoDevice<cl_ulong>::display( devices[j], CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, "CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE"); InfoDevice<cl_uint>::display( devices[j], CL_DEVICE_MAX_CONSTANT_ARGS, "CL_DEVICE_MAX_CONSTANT_ARGS"); InfoDevice<cl_device_local_mem_type>::display( devices[j], CL_DEVICE_LOCAL_MEM_TYPE, "CL_DEVICE_LOCAL_MEM_TYPE"); InfoDevice<cl_ulong>::display( devices[j], CL_DEVICE_LOCAL_MEM_SIZE, "CL_DEVICE_LOCAL_MEM_SIZE"); InfoDevice<cl_bool>::display( devices[j], CL_DEVICE_ERROR_CORRECTION_SUPPORT, "CL_DEVICE_ERROR_CORRECTION_SUPPORT"); #ifdef CL_DEVICE_HOST_UNIFIED_MEMORY InfoDevice<cl_bool>::display( devices[j], CL_DEVICE_HOST_UNIFIED_MEMORY, "CL_DEVICE_HOST_UNIFIED_MEMORY"); #endif InfoDevice<std::size_t>::display( devices[j], CL_DEVICE_PROFILING_TIMER_RESOLUTION, "CL_DEVICE_PROFILING_TIMER_RESOLUTION"); InfoDevice<cl_bool>::display( devices[j], CL_DEVICE_ENDIAN_LITTLE, "CL_DEVICE_ENDIAN_LITTLE"); InfoDevice<cl_bool>::display( devices[j], CL_DEVICE_AVAILABLE, "CL_DEVICE_AVAILABLE"); InfoDevice<cl_bool>::display( devices[j], CL_DEVICE_COMPILER_AVAILABLE, "CL_DEVICE_COMPILER_AVAILABLE"); InfoDevice<cl_device_exec_capabilities>::display( devices[j], CL_DEVICE_EXECUTION_CAPABILITIES, "CL_DEVICE_EXECUTION_CAPABILITIES"); InfoDevice<cl_command_queue_properties>::display( devices[j], CL_DEVICE_QUEUE_PROPERTIES, "CL_DEVICE_QUEUE_PROPERTIES"); InfoDevice<cl_platform_id>::display( devices[j], CL_DEVICE_PLATFORM, "CL_DEVICE_PLATFORM"); InfoDevice<ArrayType<char> >::display( devices[j], CL_DEVICE_NAME, "CL_DEVICE_NAME"); InfoDevice<ArrayType<char> >::display( devices[j], CL_DEVICE_VENDOR, "CL_DEVICE_VENDOR"); InfoDevice<ArrayType<char> >::display( devices[j], CL_DRIVER_VERSION, "CL_DRIVER_VERSION"); InfoDevice<ArrayType<char> >::display( devices[j], CL_DEVICE_PROFILE, "CL_DEVICE_PROFILE"); InfoDevice<ArrayType<char> >::display( devices[j], CL_DEVICE_VERSION, "CL_DEVICE_VERSION"); #ifdef CL_DEVICE_OPENCL_C_VERSION InfoDevice<ArrayType<char> >::display( devices[j], CL_DEVICE_OPENCL_C_VERSION, "CL_DEVICE_OPENCL_C_VERSION"); #endif InfoDevice<ArrayType<char> >::display( devices[j], CL_DEVICE_EXTENSIONS, "CL_DEVICE_EXTENSIONS"); std::cout << std::endl << std::endl; } } } //-----------以上為獲取並顯示OpenCL設備信息的代碼------------------ cl_program CreateProgram(cl_context context, cl_device_id device, const char* fileName) { cl_int errNum; cl_program program; std::ifstream kernelFile(fileName, std::ios::in); if (!kernelFile.is_open()) { std::cerr << "Failed to open file for reading: " << fileName << std::endl; return NULL; } std::ostringstream oss; oss << kernelFile.rdbuf(); std::string srcStdStr = oss.str(); const char *srcStr = srcStdStr.c_str(); program = clCreateProgramWithSource(context, 1, (const char**)&srcStr, NULL, NULL); if (program == NULL) { std::cerr << "Failed to create CL program from source." << std::endl; return NULL; } errNum = clBuildProgram(program, 0, NULL, NULL, NULL, NULL); if (errNum != CL_SUCCESS) { // Determine the reason for the error char buildLog[16384]; clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, sizeof(buildLog), buildLog, NULL); std::cerr << "Error in kernel: " << std::endl; std::cerr << buildLog; clReleaseProgram(program); return NULL; } return program; } void Cleanup(cl_context context, cl_command_queue commandQueue, cl_program program, cl_kernel kernel, cl_mem imageObjects[2]) { for (int i = 0; i < 2; i++) { if (imageObjects[i] != 0) clReleaseMemObject(imageObjects[i]); } if (commandQueue != 0) clReleaseCommandQueue(commandQueue); if (kernel != 0) clReleaseKernel(kernel); if (program != 0) clReleaseProgram(program); if (context != 0) clReleaseContext(context); } cl_mem LoadImage(cl_context context, char *fileName, int &width, int &height) { cv::Mat image1 = cv::imread(fileName); width = image1.cols; height = image1.rows; char *buffer = new char[width * height * 4]; int w = 0; for (int v = height - 1; v >= 0; v--) { for (int u = 0; u <width; u++) { buffer[w++] = image1.at<cv::Vec3b>(v, u)[0]; buffer[w++] = image1.at<cv::Vec3b>(v, u)[1]; buffer[w++] = image1.at<cv::Vec3b>(v, u)[2]; w++; } } // Create OpenCL image cl_image_format clImageFormat; clImageFormat.image_channel_order = CL_RGBA; clImageFormat.image_channel_data_type = CL_UNORM_INT8; cl_int errNum; cl_mem clImage; clImage = clCreateImage2D(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, &clImageFormat, width, height, 0, buffer, &errNum); if (errNum != CL_SUCCESS) { std::cerr << "Error creating CL image object" << std::endl; return 0; } return clImage; } size_t RoundUp(int groupSize, int globalSize) { int r = globalSize % groupSize; if (r == 0) { return globalSize; } else { return globalSize + groupSize - r; } } int main(int argc, char** argv) { cl_context context = 0; cl_command_queue commandQueue = 0; cl_program program = 0; cl_device_id device = 0; cl_kernel kernel = 0; cl_mem imageObjects[2] = { 0, 0 }; cl_int errNum; //打印所有OpenCL平台設備信息 displayInfo(); cl_uint numplatforms; errNum = clGetPlatformIDs(0, NULL, &numplatforms); if (errNum != CL_SUCCESS || numplatforms <= 0){ printf("沒有找到OpenCL平台 \n"); return 1; } cl_platform_id * platformIds; platformIds = (cl_platform_id*)alloca(sizeof(cl_platform_id)*numplatforms); errNum = clGetPlatformIDs(numplatforms, platformIds, NULL); if (errNum != CL_SUCCESS){ printf("沒有找到OpenCL平台 \n"); return 1; } printf("平台數:%d \n", numplatforms); //選用CL_DEVICE_MAX_WORK_GROUP_SIZE最大的顯卡 cl_uint numDevices,index_platform = 0,index_device = 0; cl_device_id *devicesIds; std::size_t paramValueSize = 0; for (cl_uint i = 0; i < numplatforms; i++){ errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices); if (errNum != CL_SUCCESS || numDevices <= 0){ printf("平台 %d 沒有找到設備",i); continue; } devicesIds = (cl_device_id*)alloca(sizeof(cl_device_id)*numDevices); errNum = clGetDeviceIDs(platformIds[i], CL_DEVICE_TYPE_GPU, numDevices, devicesIds, NULL); if (errNum != CL_SUCCESS ){ printf("平台 %d 獲取設備ID失敗", i); continue; } for (cl_uint j = 0; j < numDevices; j++){ std::size_t tmpSize = 0; errNum = clGetDeviceInfo(devicesIds[j], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &tmpSize, NULL); if (errNum != CL_SUCCESS){ std::cerr << "Failed to find OpenCL device info " << std::endl; continue; } if (tmpSize >= paramValueSize){ index_platform = i; index_device = j; } } } cl_context_properties contextProperties[] ={ CL_CONTEXT_PLATFORM, (cl_context_properties)platformIds[index_platform], 0 }; context = clCreateContext(contextProperties, numDevices, devicesIds, NULL, NULL, &errNum); if (errNum != CL_SUCCESS){ std::cerr << "Failed to Create Context " << std::endl; return 1; } device = devicesIds[index_device]; // Create a command-queue on the first device available // on the created context commandQueue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &errNum); if (commandQueue == NULL) { Cleanup(context, commandQueue, program, kernel, imageObjects); system("pause") ; return 1; } // Make sure the device supports images, otherwise exit cl_bool imageSupport = CL_FALSE; clGetDeviceInfo(device, CL_DEVICE_IMAGE_SUPPORT, sizeof(cl_bool), &imageSupport, NULL); if (imageSupport != CL_TRUE) { std::cerr << "OpenCL device does not support images." << std::endl; Cleanup(context, commandQueue, program, kernel, imageObjects); system("pause") ; return 1; } // Load input image from file and load it into // an OpenCL image object int width, height; char *src0 = "test.png"; imageObjects[0] = LoadImage(context, src0, width, height); if (imageObjects[0] == 0) { std::cerr << "Error loading: " << std::string(src0) << std::endl; Cleanup(context, commandQueue, program, kernel, imageObjects); system("pause") ; return 1; } // Create ouput image object cl_image_format clImageFormat; clImageFormat.image_channel_order = CL_RGBA; clImageFormat.image_channel_data_type = CL_UNORM_INT8; imageObjects[1] = clCreateImage2D(context, CL_MEM_WRITE_ONLY, &clImageFormat, width, height, 0, NULL, &errNum); if (errNum != CL_SUCCESS){ std::cerr << "Error creating CL output image object." << std::endl; Cleanup(context, commandQueue, program, kernel, imageObjects); system("pause") ; return 1; } // Create OpenCL program program = CreateProgram(context, device, "bilateralBlur.cl"); if (program == NULL) { Cleanup(context, commandQueue, program, kernel, imageObjects); system("pause") ; return 1; } // Create OpenCL kernel kernel = clCreateKernel(program, "bilateralBlur", NULL); if (kernel == NULL) { std::cerr << "Failed to create kernel" << std::endl; Cleanup(context, commandQueue, program, kernel, imageObjects); system("pause") ; return 1; } // Set the kernel arguments errNum = clSetKernelArg(kernel, 0, sizeof(cl_mem), &imageObjects[0]); errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &imageObjects[1]); if (errNum != CL_SUCCESS) { std::cerr << "Error setting kernel arguments." << std::endl; Cleanup(context, commandQueue, program, kernel, imageObjects); system("pause") ; return 1; } size_t localWorkSize[2] = { 32, 32 }; size_t globalWorkSize[2] = { RoundUp(localWorkSize[0], width), RoundUp(localWorkSize[1], height) }; cl_event prof_event; // Queue the kernel up for execution errNum = clEnqueueNDRangeKernel(commandQueue, kernel, 2, NULL, globalWorkSize, localWorkSize, 0, NULL, &prof_event); if (errNum != CL_SUCCESS) { std::cerr << "Error queuing kernel for execution." << std::endl; Cleanup(context, commandQueue, program, kernel, imageObjects); system("pause") ; return 1; } clFinish(commandQueue); errNum = clWaitForEvents(1, &prof_event); if (errNum) { printf("clWaitForEvents() failed for histogram_rgba_unorm8 kernel. (%d)\n", errNum); return EXIT_FAILURE; } cl_ulong ev_start_time = (cl_ulong)0; cl_ulong ev_end_time = (cl_ulong)0; size_t return_bytes; errNum = clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong), &ev_start_time, &return_bytes); errNum |= clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_END,sizeof(cl_ulong), &ev_end_time, &return_bytes); if (errNum) { printf("clGetEventProfilingInfo() failed for kernel. (%d)\n", errNum); return EXIT_FAILURE; } double run_time = (double)(ev_end_time - ev_start_time); printf("Image dimensions: %d x %d pixels, Image type = CL_RGBA, CL_UNORM_INT8\n", width, height); printf("Work Timer:%lfms\n", run_time / 1000000); clReleaseEvent(prof_event); // Read the output buffer back to the Host char *buffer = new char[width * height * 4]; size_t origin[3] = { 0, 0, 0 }; size_t region[3] = { width, height, 1 }; errNum = clEnqueueReadImage(commandQueue, imageObjects[1], CL_TRUE, origin, region, 0, 0, buffer, 0, NULL, NULL); if (errNum != CL_SUCCESS) { std::cerr << "Error reading result buffer." << std::endl; Cleanup(context, commandQueue, program, kernel, imageObjects); system("pause") ; return 1; } std::cout << std::endl; std::cout << "Executed program succesfully." << std::endl; // Save the image out to disk char *saveImage = "output.jpg"; //std::cout << buffer << std::endl; cv::Mat imageColor = cv::imread(src0); cv::Mat imageColor2; imageColor2.create(imageColor.rows, imageColor.cols, imageColor.type()); int w = 0; for (int v = imageColor2.rows-1; v >=0; v--) { for (int u =0 ; u <imageColor2.cols; u++) { imageColor2.at<cv::Vec3b>(v, u)[0] = buffer[w++]; imageColor2.at<cv::Vec3b>(v, u)[1] = buffer[w++]; imageColor2.at<cv::Vec3b>(v, u)[2] = buffer[w++]; w++; } } cv::imshow("原始圖像", imageColor); cv::imshow("磨皮後", imageColor2); cv::imwrite(saveImage, imageColor2); cv::waitKey(0); delete[] buffer; Cleanup(context, commandQueue, program, kernel, imageObjects); return 0; }
這個host端的程序包含了opencv的一點內容,主要是用opencv來讀取圖片,用其他方式讀取圖片當然也是可以的。實際上,opencv本身有一個ocl模塊,貌似是由AMD給opencv做得OpenCL擴展,其中包括了許多用OpenCL實現的opencv的一些常用函數,其中就已經包括了雙邊濾波和自適應雙邊濾波。
這段程序選用了CL_DEVICE_MAX_WORK_GROUP_SIZE最大的顯卡,最佳的OpenCL設備的選擇應當綜合考慮,在我的電腦上CL_DEVICE_MAX_WORK_GROUP_SIZE的CPU似乎就是最佳的OpenCL設備,雖然在實際獲取的設備信息中CPU的許多參數比GPU強,但是實際運行的時長卻是GPU的幾倍,所以對於用哪些參數來判斷一個OpenCL設備是最佳的我也不是很清楚,希望懂得朋友可以指導一二。
另外,這段程序其實是很簡單的,實際有效的代碼只有300多行,獲取設備信息的代碼只是為了看看自己的電腦上有哪些OpenCL設備以及相關的信息,main中的displayInfo();完全可以注釋掉。
另外關於OpenCL庫文件的獲取,可以從intel,英偉達,AMD等獲取到,我所使用的OpenCL的頭文件和lib文件就是從英偉達的CUDA裡面copy出來的,你也可以直接就是用我的。
4.運行結果(1)硬件信息
(2)控制台輸出OpenCL設備的信息
Number of platforms: 2
CL_PLATFORM_PROFILE: FULL_PROFILE
CL_PLATFORM_VERSION: OpenCL 2.0
CL_PLATFORM_VENDOR: Intel(R) Corporation
CL_PLATFORM_EXTENSIONS: cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
Number of devices: 2
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_VENDOR_ID: 32902
CL_DEVICE_MAX_COMPUTE_UNITS: 24
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 256 256 256
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 0
CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: 0
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF: 1
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1050
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 390280806
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 128
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_HEIGHT: 16384
CL_DEVICE_IMAGE3D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE3D_MAX_HEIGHT: 16384
CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048
CL_DEVICE_MAX_SAMPLERS: 16
CL_DEVICE_MAX_PARAMETER_SIZE: 1024
CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST | CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF
CL_DEVICE_SINGLE_FP_CONFIG: CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 64
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 524288
CL_DEVICE_GLOBAL_MEM_SIZE: 1561123226
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536
CL_DEVICE_MAX_CONSTANT_ARGS: 8
CL_DEVICE_LOCAL_MEM_TYPE:
CL_DEVICE_LOCAL_MEM_SIZE: 65536
CL_DEVICE_ERROR_CORRECTION_SUPPORT: 0
CL_DEVICE_HOST_UNIFIED_MEMORY: 1
CL_DEVICE_PROFILING_TIMER_RESOLUTION: 83
CL_DEVICE_ENDIAN_LITTLE: 1
CL_DEVICE_AVAILABLE: 1
CL_DEVICE_COMPILER_AVAILABLE: 1
CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_PLATFORM: 00DEC488
CL_DEVICE_NAME: Intel(R) HD Graphics 520
CL_DEVICE_VENDOR: Intel(R) Corporation
CL_DRIVER_VERSION: 20.19.15.4364
CL_DEVICE_PROFILE: FULL_PROFILE
CL_DEVICE_VERSION: OpenCL 2.0
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 2.0
CL_DEVICE_EXTENSIONS: cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_ctz cl_intel_d3d11_nv12_media_sharing cl_intel_dx9_media_sharing cl_intel_motion_estimation cl_intel_simultaneous_sharing cl_intel_subgroups cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_fp16 cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_gl_sharing cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_khr_spir
CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
CL_DEVICE_VENDOR_ID: 32902
CL_DEVICE_MAX_COMPUTE_UNITS: 4
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 8192 8192 8192
CL_DEVICE_MAX_WORK_GROUP_SIZE: 8192
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF: 0
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR: 32
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT: 16
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: 8
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: 4
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: 8
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: 4
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF: 0
CL_DEVICE_MAX_CLOCK_FREQUENCY: 2500
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 536838144
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 480
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 480
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_HEIGHT: 16384
CL_DEVICE_IMAGE3D_MAX_WIDTH: 2048
CL_DEVICE_IMAGE3D_MAX_HEIGHT: 2048
CL_DEVICE_IMAGE3D_MAX_DEPTH: 2048
CL_DEVICE_MAX_SAMPLERS: 480
CL_DEVICE_MAX_PARAMETER_SIZE: 3840
CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST
CL_DEVICE_SINGLE_FP_CONFIG: CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 64
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 262144
CL_DEVICE_GLOBAL_MEM_SIZE: 2147352576
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 131072
CL_DEVICE_MAX_CONSTANT_ARGS: 480
CL_DEVICE_LOCAL_MEM_TYPE: CL_LOCAL | CL_GLOBAL
CL_DEVICE_LOCAL_MEM_SIZE: 32768
CL_DEVICE_ERROR_CORRECTION_SUPPORT: 0
CL_DEVICE_HOST_UNIFIED_MEMORY: 1
CL_DEVICE_PROFILING_TIMER_RESOLUTION: 395
CL_DEVICE_ENDIAN_LITTLE: 1
CL_DEVICE_AVAILABLE: 1
CL_DEVICE_COMPILER_AVAILABLE: 1
CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL | CL_EXEC_NATIVE_KERNEL
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_PLATFORM: 00DEC488
CL_DEVICE_NAME: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
CL_DEVICE_VENDOR: Intel(R) Corporation
CL_DRIVER_VERSION: 5.2.0.10094
CL_DEVICE_PROFILE: FULL_PROFILE
CL_DEVICE_VERSION: OpenCL 2.0 (Build 10094)
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 2.0
CL_DEVICE_EXTENSIONS: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 cl_khr_image2d_from_buffer
CL_PLATFORM_PROFILE: FULL_PROFILE
CL_PLATFORM_VERSION: OpenCL 1.2 CUDA 8.0.44
CL_PLATFORM_VENDOR: NVIDIA Corporation
CL_PLATFORM_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts
Number of devices: 1
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_VENDOR_ID: 4318
CL_DEVICE_MAX_COMPUTE_UNITS: 3
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 1024 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF: 0
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF: 0
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1241
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 536870912
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 256
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_WIDTH: 16384
CL_DEVICE_IMAGE2D_MAX_HEIGHT: 16384
CL_DEVICE_IMAGE3D_MAX_WIDTH: 4096
CL_DEVICE_IMAGE3D_MAX_HEIGHT: 4096
CL_DEVICE_IMAGE3D_MAX_DEPTH: 4096
CL_DEVICE_MAX_SAMPLERS: 32
CL_DEVICE_MAX_PARAMETER_SIZE: 4352
CL_DEVICE_MEM_BASE_ADDR_ALIGN: 4096
CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE: 128
CL_DEVICE_SINGLE_FP_CONFIG: CL_FP_DENORM | CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST | CL_FP_ROUND_TO_ZERO | CL_FP_ROUND_TO_INF | CL_FP_FMA
CL_DEVICE_SINGLE_FP_CONFIG: CL_READ_ONLY_CACHE | CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: CL_READ_WRITE_CACHE
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 128
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 49152
CL_DEVICE_GLOBAL_MEM_SIZE: 2147483648
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65536
CL_DEVICE_MAX_CONSTANT_ARGS: 9
CL_DEVICE_LOCAL_MEM_TYPE:
CL_DEVICE_LOCAL_MEM_SIZE: 49152
CL_DEVICE_ERROR_CORRECTION_SUPPORT: 0
CL_DEVICE_HOST_UNIFIED_MEMORY: 0
CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1000
CL_DEVICE_ENDIAN_LITTLE: 1
CL_DEVICE_AVAILABLE: 1
CL_DEVICE_COMPILER_AVAILABLE: 1
CL_DEVICE_EXECUTION_CAPABILITIES: CL_EXEC_KERNEL
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_PLATFORM: 00E30580
CL_DEVICE_NAME: GeForce 940MX
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 369.30
CL_DEVICE_PROFILE: FULL_PROFILE
CL_DEVICE_VERSION: OpenCL 1.2 CUDA
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2
CL_DEVICE_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts
平台數:2
Image dimensions: 273 x 415 pixels, Image type = CL_RGBA, CL_UNORM_INT8
Work Timer:3.422816msExecuted program succesfully.
273X415大小的圖片用時不到4ms。
(3)雙邊濾波的效果
效果應該來說是很明顯的。不過由於沒有膚色檢測和最後一步銳化,以及參數的設置等問題,連我朋友都說這個磨皮效果太嫩了,看著很假。所以在算法上我這個是有待完善的。
另外,在速度上,這個算法應該依然有優化的空間。
源碼:http://download.csdn.net/download/qq_33892166/9761287
源碼如果報錯“Error queuing kernel for execution.”,嘗試修改 size_t localWorkSize[2] = { 32, 32 }; 為 size_t localWorkSize[2] = { 16, 16 };