基于GPU+CPU的CANNY算子快速实现

唐斌; 龙文

doi:doi:10.3788/yjyxs20163107.0714

液晶与显示, 2016, 31 (7): 714, 网络出版: 2016-08-29

基于GPU+CPU的CANNY算子快速实现

Fast Canny algorithm based on GPU + CPU

唐斌 ¹龙文 ²

作者单位

¹ 贵州财经大学信息学院, 贵州贵阳 550025

² 贵州财经大学贵州省经济系统仿真重点实验室, 贵州贵阳 550025

加速 Canny CANNY CUDA CUDA GPU GPU acceleration

摘要

本文提出一种基于GPU+CPU的快速实现Canny算子的方法。首先将算子分为串行和并行两部分, 高斯滤波、梯度幅值和方向计算、非极大值抑制和双阈值处理在GPU中完成, 将二维高斯滤波分解为水平方向上和垂直方向上的两次一维滤波从而降低计算的复杂度; 然后使用CUDA编程完成多线程并行计算以加快计算速度; 最后使用共享存储器隐藏线程访问全局存储的延迟; 在CPU中则使用队列FIFO完成边缘连接。仿真测试结果表明: 对分辨率为1 024×1 024的8位图像的处理时间为122 ms, 相对应单独使用CPU而言, 加速比最高可达5.39倍, 因此本文方法充分利用了GPU的并行性的特征和CPU的串行处理能力。

Abstract

This paper presents a fast method for Canny algorithm based on GPU + CPU. The Canny algorithm is divided into two parts: Gauss filtering, gradient computations, non maximum suppression and double thresholding are processed by GPU. The fast method convert two-dimensional Gaussian filter to two separable convolutions to reduce the computation complexity. Then, multiple threads execute kernel in parallel to speed up the computation in the CUDA program. Finally, threads access shared memory instead of global memory to hide the latencies of global memory. In addition, FIFO is used to connect components in CPU. The simulation results show that the processing time of the 8-bit images with the resolution 1 024×1 024 is 122 ms, which is 5.39 times faster than CPU. Therefore, this method takes full advantage of the parallelism of GPU and the serial processing capability of CPU.

PDF全文

唐斌, 龙文. 基于GPU+CPU的CANNY算子快速实现[J]. 液晶与显示, 2016, 31(7): 714. TANG Bin, LONG Wen. Fast Canny algorithm based on GPU + CPU[J]. Chinese Journal of Liquid Crystals and Displays, 2016, 31(7): 714.

基于GPU+CPU的CANNY算子快速实现

关于本站 Cookie 的使用提示

全站搜索

基于GPU+CPU的CANNY算子快速实现

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索