Chinese Optics Letters, 2018, 16 (1): 013501, Published Online: Jul. 17, 2018   

Fusion of the low-light-level visible and infrared images for night-vision context enhancement Download: 1055次

Author Affiliations
School of Optoelectronics, Beijing Institute of Technology, Beijing 100081, China
Abstract
For better night-vision applications using the low-light-level visible and infrared imaging, a fusion framework for night-vision context enhancement (FNCE) method is proposed. An adaptive brightness stretching method is first proposed for enhancing the visible image. Then, a hybrid multi-scale decomposition with edge-preserving filtering is proposed to decompose the source images. Finally, the fused result is obtained via a combination of the decomposed images in three different rules. Experimental results demonstrate that the FNCE method has better performance on the details (edges), the contrast, the sharpness, and the human visual perception. Therefore, better results for the night-vision context enhancement can be achieved.

The low-light-level visible images always provide the details and background scenery, while the target is often detected/recognized via the infrared imaging[1,2] in night vision. As the visible and infrared image fusion technology can improve the perception of the scene in addition to the ability to detect/recognize the target[3], the fusion technology of the low-light-level visible and infrared images plays a significant role in night vision and has been successfully applied in the areas of defense and security[4].

However, night-vision images usually have relatively strong noise, low contrast, and unclear details (including edges). Moreover, human eyes are very sensitive to the details and noise. As these factors have not been considered in most proposed fusion methods, it is difficult to achieve good results in night vision. Thus, an appropriate fusion technology is required for night vision to obtain better results for the night-vision context enhancement.

Liu et al. proposed a modified method to fuse the visible and infrared images for night vision[5]. In the method, the visible image is enhanced via the corresponding infrared image, and the fused result is obtained using a conventional multi-scale fusion method. The details of the visible image are not fully enhanced. Salient targets in the infrared image are displayed in dark pixels, which is not good for visual perception. A fusion method for low-light visible and infrared images based on contourlet transform is proposed[6]. Different rules are used for the combination of the low-frequency and high-frequency information. The details of the visible image are not fully enhanced either. Zhou et al. proposed a guided-filter-based context enhancement (GFCE) fusion method for night vision[7]. In the result of the GFCE method, the noise has been amplified along with the detail enhancement, and some distortions may emerge in the bright regions due to over enhancement. In all of these methods, neither a denoising method nor a detail enhancing method is used. Furthermore, the details (including edges) cannot be preserved well enough during the fusion process. Thus, further research needs to be done to obtain better context-enhancement results for the low-light-level visible and infrared imaging.

In order to address the above problems for better night-vision applications, a low-light-level visible and infrared images fusion framework for night-vision context enhancement (FNCE) is proposed in this Letter, as shown in Fig. 1. Actually, the FNCE method can be divided into two parts: the initial enhancement and the fusion process. In the initial enhancement, an adaptive brightness stretching method has been first proposed to enhance the visibility of the low-light-level visible image. At the same time, the denoising and detail enhancement methods are applied for source images. As the multi-scale decomposition (MSD) method based on edge-preserving filtering can accurately extract the details at different scales[8]. Furthermore, the gradient domain guided image filtering (GDGF)[9] has better edge performance than the guided image filtering (GF)[10]. Therefore, in the fusion process, a structure of the hybrid MSD with the GF and the GDGF has been proposed to fully decompose the enhanced source images. In addition, the multi-scale weight maps are obtained using a perception-based saliency detection technology at each scale. Finally, the fused result is obtained via the combination of decomposed images with the multi-scale weight maps in three different combination rules according to different scales.

Fig. 1. The proposed infrared and visible image FNCE.

下载图片 查看所有图片

The “Queen’s Road,” as shown in Fig. 2(a), is collected from the website http://www.imagefusion.org/. The “Buildings” source images, as shown in Fig. 2(b), are captured by a low-light-sensitive CMOS camera and a mid-wave infrared camera. The two test pairs are the typical scenes of urban surveillance applications. As shown in Fig. 2, the source images are usually displayed with unclear details, as well as some noise. Moreover, the contrast of the low-light-level visible image is always low. Thus, as shown in Fig. 1, some enhancement methods must be applied to the source images before the fusion process.

Fig. 2. Test pairs of visible and infrared images captured under low-light-level conditions. (a) “Queen’s Road,” (b) “Buildings.”

下载图片 查看所有图片

Firstly, a denoising method with the edge-preserving GDGF[9] is used for both the source images to reduce the noise. The values of the filtering parameters of the GDGF are r=2, λ=0.0001 for the low-light-level visible image, and r=3, λ=0.001 for the infrared image, respectively.

Following this, an adaptive brightness stretching method is proposed for enhancing the visibility of the low-light-level visible image as follows: IBS={mμII,ifI<μII,ifIHHmHμI(IμI)+m,ifμII<H,where I is the input image, IBS is the stretched image, μI is the mean of the input image, and (μI, m) and (H, H) are the two inflection points of the piecewise linear stretching. As shown in Fig. 3, parts smaller than μI will be linearly stretched to m, parts larger than H will be retained, and the rest will be linearly mapped between m and H. The values of parameters in our work are m=3μI, 90<m<150, and H=220. The μI is stretched to three times that of the original. At the same time, m should be between 90 and 150, where the mean of the image with normal illumination is. Therefore, values of the low-light-level visible image can be effectively enhanced to an appropriate range without insufficient or over enhancement via the proposed adaptive brightness stretching method.

Fig. 3. Adaptive brightness stretching method.

下载图片 查看所有图片

Finally, the detail enhancement method with the GF[10] is applied to both source images. The values of the filtering parameters of the GF are r=2, λ=0.5, the detail layer is boosted 2.5 times for the low-light-level visible image, and r=3, λ=0.5, boosted 3 times for the infrared image.

The initial enhancement results for the “Queen’s Road” visible image are shown in Fig. 4. Close-up views for the labeled regions are presented. It can be seen that more and clearer details are presented with less noise using our enhancing method. Thus, the proposed enhancing method for the low-light-level visible image is more effective.

Fig. 4. Initial enhancement results for the “Queen’s Road” visible image. (a) The original, (b) result with Zhou’s method[7], and (c) result with the proposed method.

下载图片 查看所有图片

In order to make full use of the information at different scales, a hybrid MSD with the GF and the GDGF is proposed to decompose both source images. The structure of the proposed hybrid MSD is designed as shown in Fig. 5. The GDGF is used to obtain the details of the image (including edges). As an adequate amount of low-frequency information is difficult to obtain via the GDGF, the low-frequency information will be obtained by using the strong GF. As shown in Fig. 5, there are three different levels: the small-scale detail level, the large-scale detail level, and the base level in the decomposition. As the finer details for an image are at the first scale of the hybrid MSD, detail images from the first scale are regarded as the images of the small-scale detail level. The detail images from the second to the nth scale are regarded as the images of the large-scale detail level. The coarsest-scale information is obtained for the base level and can roughly represent the energy distribution.

Fig. 5. Structure of the hybrid MSD with the GF and the GDGF.

下载图片 查看所有图片

In the proposed hybrid MSD structure, texture information D(i,0) and edge information D(i,1) at the ith scale are respectively computed as D(i,0)=Is(i1)Ig(i),D(i,1)=Ig(i)Is(i),(i=1,,n),where Is(i) and Ig(i) are the filtered images at the ith scale with the GF and the GDGF, respectively, both Is(0) and Ig(0) are the input image I, and n is the number of decomposition scales. Is(i) and Ig(i) can be obtained as follows: Is(i)=GFrs(i),λs(Is(i1),Is(i1)),Ig(i)=GDGFrg(i),λg(i)(Ig(i1),Ig(i1)),(i=1,,n),where rs(i) and λs are the filtering parameters of the GF at the ith scale, rs(i+1)=krs(i), k is a decomposition factor between adjacent scales, λs is set to be very large (1×104) to acquire low-frequency information; rg(i) and λg(i) are the filtering parameters of the GDGF at the ith scale, rg(i+1)=krg(i), and λg(i+1)=λg(i)/k. Furthermore, as shown in Fig. 5, let the filtered image of the GF at the nth scale Is(n) be the base image B.

In the fusion process, three different combination rules are respectively designed for the three different levels. The frequency-tuned filtering computes bottom-up saliency[11]. The output of the saliency model strongly correlates with human visual perception[12]. For better weight maps, the frequency-tuned filtering is used to extract the saliency information from the background at each scale. As the targets are always more significant in the infrared images, the weight maps are mainly based on the infrared information, and the infrared characteristic information of the target is maximally highlighted at each scale.

The fused image Fuse is obtained via weighted combinations of the decomposed images as follows: Fuse=FB+i=1nFD(i),where FB is the base fused image, and FD(i) is the detail fused image at the ith scale.

For the small-scale detail level, the fused image FD(1) can be obtained as FD(1)=j[WIR(1,j)DIR(1,j)+(1WIR(1,j))DVis(1,j)],(j=0,1),where DIR(1,j) and DVis(1,j) are the infrared and visible images at the first scale, and WIR(1,j) is the weight maps of the infrared images for the small-scale detail level.

The saliency maps of the infrared and visible images for the small-scale detail levels SIR(1,j) and SVis(1,j) (j=0, 1) are obtained using the frequency-tuned filtering for DIR(1,j) and DVis(1,j) (j=0,1), respectively. Following this, the binary weight maps of the infrared BWIR(1,j) are computed as BWIR(1,j)={1ifSIR(1,j)SVis(1,j)0otherwise,(j=0,1).

The resulting binary weight maps are noisy and are typically not well aligned with object boundaries. Therefore, spatial consistency is restored through the GDGF with the corresponding detail images (DIR(1,j)) used as guidance images. Finally, the weight maps of the infrared images for the small-scale detail level WIR(1,j) is obtained as follows: WIR(1,j)=GDGFr(1),λ(1)(DIR(1,j),BWIR(1,j)),(j=0,1),where the values of filtering parameters are r(1)=rg(1) and λ(1)=λg(1)/10.

For the large-scale detail level, the combination rule is similar to that for the small-scale detail level. The fused image for the large-scale detail level FD(i) is obtained as FD(i)=[WIR(i)DIR(i,j)+(1WIR(i))DVis(i,j)],(i=2,,n;j=0,1),where DIR(i,j) and DVis(i,j) are the infrared image and visible images at the ith scale, and WIR(i) is the weight maps of the infrared images at the corresponding scale.

It should be noted that there is only one weight map WIR(i) for the images at the ith scale for the large-scale detail level. It is because the two kinds of detail images, D(i,0) and D(i,1) (i=2,,n), always have similar structures. However, the edge detail image D(i,1) has better edge performance than the texture detail image D(i,0). Thus, only D(i,1) is used for the large-scale detail level.

The saliency maps of the infrared and visible images at ith scale for the large-scale detail levels SIR(i,1) and SVis(i,1) (i=2,,n) are also obtained using frequency-tuned filtering for the corresponding images DIR(i,1) and DVis(i,1) (i=2,,n). Following this, the binary weight maps of the infrared BWIR(i) (i=2,,n) are computed as BWIR(i)={1ifSIR(i,1)SVis(i,1)0otherwise,(i=2,,n).

The binary weight maps BWIR(i) (i=2,,n) are also filtered using the GDGF with the corresponding detail images DIR(i,1) (i=2,,n) as guidance images. Finally, weight maps of the infrared for the large-scale detail level WIR(i) (i=2,,n) can be obtained as WIR(i)=GDGFr(i),λ(i)(DIR(i,1),BWIR(i)),(i=2,,n),where r(i)=rg(i), and λ(i)=λg(i)/10 for the GDGF.

For the base level, the fused image FB is computed as FB=WIRBBIR+(1WIRB)BVis,where BIR and BVis are the base images of the infrared and visible, respectively, and WIRB is the weight map of the infrared image for the base level.

The saliency maps of the infrared and visible images for the base levels SIRB and SVisB are obtained via the frequency-tuned filtering for the corresponding base images. Then, the binary weight map of the infrared BWIRB is computed as BWIRB={1ifSIRBSVisB0otherwise.

The binary weight map BWIRB is smoothed using a Gaussian filter to fit the combination of extremely coarse-scale information. Finally, the weight map for the base level WIRB is obtained as follows: WIRB=gσb(BWIRB),where σb=2rsn for the Gaussian filtering g().

In order to test the proposed FNCE method, three state-of-the-art fusion methods are selected for comparison: the guided filtering fusion (GFF) method[13], the gradient transfer fusion (GTF) method[14], and the GFCE method[7]. All the comparative methods are implemented using the public codes, where the parameters are set according to the corresponding Letters. In the FNCE method, the number of decomposition scales n=4, the decomposition factor k=2, and the initial values of the GF are rs(1)=3, and λs=104, as well as the initial values of the GDGF of rg(1)=2, and λg(1)=0.05.

It can be seen from the fusion results of the test images in Fig. 6 that the results of the FNCE method have clearer details (including edges), more salient targets, better contrast, and less noise than other methods. Close-up views for the labeled regions are presented below the images. The results of the GFF method have little detail information from the visible image and are similar than the infrared image with unclear details, as shown in Fig. 6(a). Moreover, the clouds from the infrared image are nearly lost in the “Buildings” result. For the GTF method, as shown in Fig. 6(b), although the results have the least noise, the details are unclear enough. Moreover, lots of information from the visible is lost, for example, the lights. For the GFCE method, as shown in Fig. 6(c), the bright parts (for example, the labeled building with lights) are obviously over-enhanced, and the noise in the sky is obvious in the “Buildings” result. The results of the GFCE method have obvious noise and not clear enough details (edges). Moreover, some distortions may occur due to the over enhancement in the GFCE method. For the FNCE method, as shown in Fig. 6(d), the road sign shown by the red arrow is clearest without distortions in the “Queen’s Road” result. Therefore, the proposed FNCE method is able to acquire better results for the human visual perception in night vision.

Fig. 6. Fusion results of different methods for the test images.

下载图片 查看所有图片

Information entropy (IE), average gradient (AG), gradient-based fusion metric (QG)[15], the metric based on perceptual saliency (PS)[7], and the fusion metric based on visual information fidelity (VIFF)[16] are selected for the objective assessment. IE evaluates the amount of information contained in an image. AG indicates the degree of sharpness. QG is recommended for night-vision applications[17] to evaluate the amount of edge information transferred from the source images. PS measures the saliency of perceptual information contained in an image. VIFF evaluates the image quality of the fused image in terms of human visual perception. Table 1 gives the quantitative assessments of different fusion methods on four test image pairs, and the best results are highlighted in bold. The values in Table 1 are averaged values of the four test pairs. It can be seen from Table 1 that IE, AG, PS, and VIFF all achieve the best values in the FNCE method, which means the proposed FNCE method can extract more information, have better sharpness, have more saliency information, and achieve better human visual perception. In addition, the QG value of the FNCE method is in the second rank, and it means edges can be relatively better preserved via the FNCE method as well.

Table 1. Quantitative Assessments of Different Methods

MethodIEAGQGPSVIFF
GFF6.60930.01000.613216.48000.4278
GTF6.32750.00610.284713.75860.2205
GFCE6.81060.01520.361219.57980.5648
FNCE6.97860.01870.602920.69760.6580

查看所有表

The average running time of the different methods on 640×480 source images is shown in Table 2. All of the compared methods are implemented via MATLAB on a computer (Inter i5 3.40 GHz CPU, 4G RAM).

Table 2. Average Running Time on 640 × 480 Images

MethodGFFGTFGFCEFNCE
Time (s)0.5266.8331.6052.069

查看所有表

After the experimental comparisons, it can be seen that better human visual perception is achieved with more salient targets, better details (edges) performance, better contrast, better sharpness, and less noise in the FNCE method. Obviously, the proposed FNCE method is more effective, which will help to obtain better context enhancement for the night-vision imaging. Although the FNCE method is slightly time-consuming, it is acceptable, considering the better fused result.

In conclusion, an FNCE method is proposed. First, an adaptive brightness stretching method is proposed to enhance the visibility of the low-light-level visible image. Following this, a structure of the hybrid MSD with the GF and the GDGF is proposed for fully decomposing the enhanced source images. In addition, weight maps are obtained via a perception-based saliency detection technology at each scale.

Experimental results show that better results for night-vision context enhancement can be acquired via the proposed FNCE method. In the future, the idea of the fast GF[18] may be introduced into the simplifications of the FNCE method for practical applications. Moreover, the previous frame video image may be used as the guidance image of the current frame to reduce the delay.

Jin Zhu, Weiqi Jin, Li Li, Zhenghao Han, Xia Wang. Fusion of the low-light-level visible and infrared images for night-vision context enhancement[J]. Chinese Optics Letters, 2018, 16(1): 013501.

本文已被 2 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!