1. Introduction
The change detection task involves analyzing and comparing images acquired at different times in the same location, identifying the changed areas and the unchanged areas, and ultimately representing the result as a binary image. Because synthetic aperture radar (SAR) has the characteristics of all-day and all-weather coverage and strong penetration [
1,
2], it is currently widely used in land cover, disaster monitoring and assessment, urban planning, etc. [
3,
4]. Currently, many SAR satellites are able to obtain many multi-temporal SAR images. Due to the high-resolution imaging technology of the SAR system [
5,
6], SAR images are characterized by high resolution, high-repeat cycle, and wide coverage. It is very important and meaningful to propose a new intelligent remote sensing image processing algorithm to simply, quickly, and accurately analyze SAR imagery information.
Due to the special imaging system of SAR, SAR images inevitably have speckle noise. Overcoming the influence of speckle noise on images is very challenging. A large number of experts and scholars have conducted many studies and contributions in this regard [
7,
8,
9,
10]. At the same time, excessive denoising will also lead to the loss of image details. How to strike a balance between denoising and detail preservation is a problem worth studying. For unsupervised change detection methods, denoising is particularly important. Unsupervised SAR image change detection is usually divided into the following processes: first, image preprocessing, then, difference image generation, and finally, difference image analysis. Image preprocessing primarily involves image registration and geometric correction [
11]. The generation of difference images generally directly reflects the quality of subsequent difference image analysis results, so a large number of experts and scholars are committed to studying better difference image generation methods [
12,
13,
14]. The difference image contains many potential change areas. The purpose of difference image analysis is to find out the areas that are most likely to have changed and the areas that are most likely not to have changed. Since the noise model of SAR images is a multiplicative noise model, the logarithmic ratio (LR) operator [
15] converts multiplicative noise into additive noise, thus preserving the background and emphasizing the changed areas. The mean ratio operator [
16] takes average gray value of the pixel’s neighborhood, which has a good suppression effect on speckle noise, but has poor suppression in areas with severe noise. Gong et al. designed a local similarity measurement to judge the similarity of pixel patches in [
17]. Multi-difference image fusion is used to obtain different information from different difference images, and uses complementary information to form a better image, which provides reliable information for subsequent difference image analysis. Zheng et al. performed parameter weighting on different difference images in [
18]. Peng et al. proposed a visual saliency difference image generation method [
19], which is represented by multi-dimensional difference features and fuses the difference images of three different algorithms for feature fusion. These fused difference images usually show a good denoising performance. However, combining different difference images in the spatial domain usually only represents the image intensity information, and often cannot represent the image texture and edge well. Therefore, many experts and scholars later tried to study the fusion difference image in the frequency domain. Hou et al. used discrete wavelet transform to fuse complementary information in high and low frequency bands of different difference images in [
20], and used contourlet transform to set the threshold on the high frequency band coefficients of the fused difference image for denoising. Ma et al. proposed a wavelet-based fusion difference image [
21], which combines the advantages of mean ratio operator and LR operator. Zhang et al. proposed to use contourlet transform to fuse different difference images in [
22]. Contourlet can effectively capture the multi-scale features and provide good directionality, so that it can better represent the image features with edge and curve structures. However, these methods do not provide a better division and fusion of frequency domain information.
Difference image analysis is used to classify pixels into two categories, and the purpose of this is to divide changed areas. Thresholding and clustering are the most popular unsupervised approaches. The threshold method selects a specific threshold to divide the image into several meaningful areas. The selection of the optimal threshold requires the establishment of a model to fit the conditional distribution of changed and unchanged pixels [
23,
24]. In addition, the clustering algorithm is more flexible and has a better generalization ability, does not require model building, and is more robust to noise. Fuzzy clustering is an image segmentation algorithm because fuzzifying the membership matrix can preserve more information of the pixels. Fuzzy c-means clustering (FCM) [
25] is the most classic clustering algorithm, but it is sensitive to noise because it ignores contextual information. Subsequently, many experts and scholars have proposed various variants of FCM. Krinidis et al. considered local information based on FCM in [
26], which made the fuzzy local information c-means (FLICM) cluster algorithm more robust. Subsequently, an improved FLICM algorithm was proposed in [
27], which expanded the neighborhood size to
, reduced the interference of isolated noise on FLICM, and made the calculation of the membership matrix more accurate. Subsequently, many experts and scholars improved the FCM algorithm to detect changing images in [
28,
29]. The FLICM algorithm has a good performance in suppressing speckle noise, its framework is applicable for detecting SAR image. For FLICM, we add local direction weights and spatial weights into the fuzzy factor, which can effectively segment the image when the change area is large and complex. The FLICM algorithm is supplemented with local similar patch information and spatial weighted information to improve the algorithm’s denoising performance.
Deep learning is becoming increasingly popular and efficient, and training deep learning models has been widely used in difference image analysis. These models usually use unsupervised methods to extract labeled samples, and the trained models are mainly used as classifiers to identify potential change pixels. Cui et al. proposed a hierarchical clustering method for pixel pre-classification in [
30], and finally trained the network to perform binary classification on uncertain pixels. Gao et al. proposed a PCANet to analyze SAR image in [
31]. The convolutional neural network (CNN) is widely used in the vision field, and many CNN-based variants are designed for SAR image change detection. Gao et al. combined wavelet and convolutional neural network in [
32]. Zhang et al. performed a threshold analysis on the difference image in [
33] by minimizing the histogram fitting error and then fed the difference image into CNN. Zhang et al. combined CNN and Transformer in both series and parallel modes [
34], enhancing the connection between features. Recently, Chen et al. proposed an efficient spatial-frequency network in [
35], which extracts noise-robust features in both spatial and frequency domains and significantly improves computational efficiency. Dong et al. proposed a new dynamic bilinear fusion network (DBFNet) [
36] to capture the dependency relationship between spatio-temporal features. Xie et al. proposed a wavelet-based bi-dimensional aggregation network (WBANet) [
37], which introduced wavelet in the self-attention block. Deep neural networks are now commonly used in various fields [
38,
39]. Gong and Zhao et al. have proposed an end-to-end deep neural network to directly classify the original image in [
40,
41] without generating difference images. Dong et al. designed a shearlet denoising layer to enhance feature extraction, a deep shearlet convolutional neural network is proposed in [
42]. Zhang et al. designed deep convolutional generative adversarial networks to add pseudo labels to solve the sample difference problem in [
43]. These network models usually achieve excellent performance, but at the same time, the complex architecture and randomness of sample selection are also very important to consider.
Until now, deep learning has made significant progress and success in SAR imagery change detection, but sample uncertainty and complex architecture are still difficult to solve. In this paper, we focus on generating difference images with more detailed information while suppressing noise in SAR images. First of all, we need to get a better difference image. We choose curvelet fusion strategy and use the complementary information of different operator images to form a fusion image, which provides powerful help for subsequent difference image analysis. The curvelet transform (CT) is used to decompose the difference image in the frequency domain, dividing the frequency domain information of different scales into low, medium, and high frequency components. Different fusion rules are designed for the three levels of information. Then, the fused difference image is obtained through inverse transformation. A clustering method is used for difference image analysis, local patch similarity information and local spatial weighting are introduced into the FLICM clustering algorithm to enhance the algorithm’s robustness to noise. The improved algorithm framework fuzzy local patch similarity information c-means (FLPSICM) is very suitable for detecting changed SAR images. Combining curvelet fusion and clustering, we propose a change detection framework for SAR images based on curvelet fusion and local patch similarity information clustering (CF-LPSICM), as shown in
Figure 1. The main contributions of this paper are as follows:
(1) A novel unsupervised method framework is proposed for SAR imagery change detection.
(2) A curvelet fusion model based on local gradient weighting and energy is designed to generate high-quality difference images, effectively utilizing the complementary information of the two difference images.
(3) A new clustering algorithm combining local patch similarity information and spatial weighting is proposed, which suppresses noise while enhancing pixel structure information, and the unsupervised analysis method is more flexible and faster.
(4) The experimental results are compared in four different scenes, which proves that CF-LPSICM method has the most advanced performance and can accurately detect the changed area.
The rest of this article is organized as follows.
Section 2 details the proposed approach,
Section 3 shows the experiments and analysis, and
Section 4 concludes.
2. Method
The framework of the proposed method CF-LPSICM is shown in
Figure 1. This method can be divided into the following steps: difference image generation, curvelet fusion, local patch similarity information clustering and output detection results.
Firstly, in order to fully utilize complementary information, the LR operator image and the ratio–mean ratio (RMR) operator image [
44] are used as fusion images. The LR operator image retains a lot of background information, and the RMR operator image suppresses noise. Apply curvelet transform to the difference images. Then, the frequency domain information is hierarchically divided and utilized, the fusion schemes of the high-frequency, medium-frequency and low-frequency sub-band coefficients after curvelet transform are designed respectively. The low-frequency sub-band coefficients represent background contours, which should be retained as much as possible. The mid-frequency sub-band coefficients represent image textures, which should be highlighted. The high-frequency sub-band coefficients represent noise, which should be suppressed. Finally, in the step of clustering analysis of difference images, in order to effectively suppress speckle noise in SAR images and enhance the structural information of the changed area, local spatial weight and local patch similarity measurement are introduced into the fuzzy factor of the FLICM algorithm. The details of difference image generation, curvelet fusion difference image and local patch similarity information cluster are reported in
Section 2.1–
Section 2.3.
2.1. Difference Image Generation
Since SAR images have multiplicative speckle noise, noise greatly affects the results. The mean ratio algorithm can suppress the speckle noise in the background information and play a role in spatial filtering, but it is not effective in suppressing areas with severe noise. Xuan et al. proposed the RMR operator difference image [
44], which multiplied the ratio operator with the mean ratio operator, greatly improving a suppression effect on severe noise areas. The LR operator is a nonlinear algorithm that retains most background information. In the experiment, the LR and RMR operators were used to generate difference images. The LR operator plays a role in retaining details, and the RMR operator plays a role in denoising. The formulas of the two operators are as follows:
where and are two SAR images gray-values of the same size at different times after correction and registration, and are the average gray values of pixels in the neighborhood windows centered at the coordinates in the two images. Then normalize the two images to regularize gray value.
2.2. Curvelet Fusion Difference Image
Curvelet transform [
45] is an image decomposition method developed based on wavelet transform. The curvelet transform is anisotropic and multi-directional. The coefficients decomposed by the curvelet transform can contain more original image information than the coefficients decomposed by the wavelet transform. CT is sensitive to image edges and textures. It can represent image texture information at different scales through a small number of sparse non-zero coefficients. In this paper, we adopt the second-generation curvelet transform [
46], which avoids ridgelet calculation compared with the first-generation transform and is multi-scale and multi-directional. The algorithm is simpler and faster. First, the image is subjected to a two-dimensional Fast Fourier Transform (FFT), and then the two-dimensional Fourier frequency domain plane is divided using a wedge basis. Appropriate curvelet coefficients are selected for all bases at each scale
j and direction
l, and finally these bases are inversely transformed by FFT.
The image frequency domain spatial distribution diagram is shown in
Figure 2 after the image is subjected to discrete curvelet transform. The surrounding small images are visualizations of the outermost curvelet coefficients when the decomposition scale is 4. The curvelet transform evenly divides the frequency domain into rings containing different directions, it can be seen that as the decomposition scale increases, the image has more decomposition directions in the frequency domain, and texture information in more directions is extracted. Curvelets provide a mathematical framework that is well suited for representing objects that display smoothness with discontinuities in curves, such as images with edges. This is well suited for representing the appearance of changing areas in SAR images.
It is very important to choose a fusion method to fuse the high, medium, and low frequency information of the image. The decomposition scale for the image after the curvelet transform varies with the image size. The default decomposition scale number calculation formula proposed in [
46] is
, rounded up,
H and
W are the height and width of the image. The image after curvelet transformation has strong anisotropy at the fine level and can express the texture and edges of the image in different directions well. The image is transformed by curvelet transform to get the coefficient matrix
Cjl in different directions at different scales, where
j represents the scale and
l represents the direction at this scale. For example, if the LR operator difference image is transformed by curvelet transform, the decomposition scale is 5, the innermost and outermost layers are coefficient matrices
C1,
C5 composed of low-frequency coefficients and high-frequency coefficients, and the number of directions of
C2
C3
C4 at the 2nd, 3rd, and 4th scales are 16, 32, and 32 respectively. The fusion schemes are shown in
Figure 1. First, the two different difference images are decomposed by curvelet transform to generate frequency domain sub-band coefficients at different scales. Different fusion rules are designed for the sub-band coefficients at different scales of the image. All scales are divided. The coefficients at the first scale show the overall contour information, that is, low-frequency information. The sub-band coefficients at the second to fourth scales show more of the texture of the image, and the edges also have contour information, so they are set to medium-frequency information. The coefficients of the last scale show the edges of the image, but also contain a lot of speckle noise, which is high-frequency information. Three different fusion rules are designed for the low, medium and high frequency domain information. In the following,
Cl represents the sub-band coefficient of the LR difference image, and
Cm represents the sub-band coefficient of the RMR difference image.The default decomposition scale level is set to 5.
a. First, for the low-frequency sub-band, that is, the coefficients at the first scale, the following fusion rules are proposed as follows:
a similar method to averaging is adopted, and is set to 1.7, which retains the most background information. Where represents the fused coefficient tensor.
b. For the fusion of medium frequency sub-bands, the medium frequency coefficients usually depict the texture and edge information in the image and reflect the clarity of the image. The gradient of image pixels reflects the image sharpness. The larger the gradient, the sharper the image, containing more texture information, and less affected by noise and loss of details. The image gradient is calculated by median-value difference, and the feature measurement
is designed as follows:
represents the pixel gray value, and represents the pixel coordinate.
Through
, we can construct the average sharpness measurement model. The local gradient
represents texture and edge information of the image and can be expressed as follows:
is the window size.
For the medium frequency sub-band coefficients
and
, the local gradients
and
are calculated respectively, and the local gradient weighting is used to fuse the sub-band coefficients. The fusion coefficient
of all directions at scale
is expressed as follows:
where represents the scale number ().
c. For the fusion of high-frequency sub-band coefficients, that is, the fifth scale coefficient, we adopt the fusion rule of minimum local area energy. The fused high-frequency coefficients can be expressed as
where and represent the regional energies of the LR difference image and the RMR difference image in different directions at this scale, the formula for calculating regional energy is as follows:
for high-frequency sub-band coefficients, the values are usually very small and accompanied by a lot of noise. In this case, the selection rule with the minimum energy in the local area can effectively avoid the influence of noise.
Finally, the fused coefficients are inversely transformed to restore the image.
2.3. Difference Image Cluster Analysis
Difference image analysis is used to divide the meaningful areas and finally divide the pixels into two categories: unchanged and changed. Image clustering is used to divide the image into several non-overlapping areas according to the characteristics of different areas. It is an effective method for detecting changed areas. FCM is the most traditional clustering method, but it is not robust to noise. FLICM [
26] introduces local spatial information based on FCM. A new fuzzy factor
combines spatial and gray-level information in FLICM, which is expressed as
where ith pixel represents the central pixel of local window, represents the gray-level value of the neighborhood pixel surrounding the ith pixel in the set, represents the spatial distance between the neighborhood pixel and the central pixel, represents the fuzzy membership value of the jth pixel with respect to cluster k, is the prototype of cluster center k and m is the weighted index of each fuzzy membership.
The prototype of cluster center and membership values are iteratively calculated by the following formula:
Cluster centers and membership values are updated to make the objective function reach a local minimal extreme. The objective function in FLICM algorithm is
Although FLICM combines local spatial information and local gray-level information and has a suppressive effect on local noise, it does not consider the impact of pixel patches in different directions on the local central pixel, which may cause the loss of central pixel details in areas with severe noise. In addition, in FLICM, the introduction of the fuzzy factor ensures that the neighboring pixels have similar membership, thereby achieving a denoising effect. However, when the noise distribution is concentrated, FLICM will mistakenly include noisy pixels into the same cluster. Zhang et al. proposed an improved image clustering method that simultaneously considered self-similarity and back-projection to enhance robustness in [
47], achieving a balance between noise suppression and detail preservation. Inspired by [
47], considering that introducing pixel patch weights in different directions to supplement the details of the central pixel can avoid the influence of noise in the noisy area, the search window is expanded to search for pixels with local similar structures to the central pixel. The schematic diagram is shown in
Figure 3, patches with different similarities are assigned different weights, and patches with similar structures are assigned higher weights, and vice versa. This can provide supplementary detail information for pixels with the same texture and edge information, and can also effectively suppress patchy noise. A pixel similarity measurement model is introduced in the search window, and adjacent pixel patches are considered with different weights.
In addition, FLICM is insufficient in considering the correlation of pixels only through spatial distance, which will further reduce the performance. Therefore, in order to effectively suppress speckle noise, we improve FLICM by expanding the search window, introducing pixel similarity measurement and spatial weighted information.
Referring to the pixel relevance model mentioned in [
47], the calculation example is shown in
Figure 4. For any pixel in the image, an image patch
is constructed, and its adjacent pixel patches are
. The pixel patch size in the figure is 3 × 3, and the search window size is assumed to be 5 × 5.
The steps for calculating pixel relevance are as follows:
1. Nine image patches centered on adjacent pixels were constructed, as shown in
Figure 4a;
2. Calculate the difference between the adjacent pixel patches and the central pixel patch:
,
is the number of adjacent pixel patches, and then the weights of different directions are calculated based on the differences between the image patches:
,
is usually set to 0.1 to prevent the result from being too small. The whole process is shown in
Figure 4b;
3. According to the image patch differences and different direction weights obtained in step 2, calculate the weighted distances in different directions:
, where ⊙ represents the dot product of two vector matrices, as shown in
Figure 4c;
4. Calculate the correlation between adjacent pixels and the central pixel:
, as shown in
Figure 4d.
We modify the fuzzy factor
, add the local patch pixel correlation and spatial information weight, and generate a new factor
as follows:
where represents the set of neighborhood pixels centered on pixel j, r represents the side length of the neighborhood window, and the window size is set to 5 × 5. The spatial local information is introduced to weight the pixels so that all pixels in the window can be reconstructed by their neighboring pixels, which plays a good role in suppressing isolated speckle noise and makes the algorithm more robust to noise. represents the weight coefficient. If the pixel center is closer to its neighboring pixels, the weight will be larger, and vice versa. is calculated by Euclidean distance, , and . The improved fuzzy factor reflects the similarity between the neighborhood pixel patches in different directions and the central pixel, supplements the structural information of the changing area, at the same time, pixel reconstruction suppresses speckle noise.
The improved algorithm clustering objective function is
The updated cluster center and membership matrix formulas are as follows:
Therefore, in the objective function of the algorithm in this paper, the local patch similarity information extended near the pixel is further used to classify the close elements into one category. At the same time, the spatial information weighting is used to increase the noise resistance and improve the clustering performance. The improved clustering algorithm of this paper is called Fuzzy Local Patch Similarity Information C-Means (FLPSICM). The steps of FLPSICM are shown in Algorithm 1.
Algorithm 1: Procedure of FLPSICM |
Step 1 | Set the number c of cluster centers, hyperparameters m and the stopping condition . |
Step 2 | Randomly initialize the membership matrix. |
Step 3 | Set the loop count, starting from 0. |
Step 4 | Update the cluster center by Formula (17). |
Step 5 | Update the membership matrix through Formula (18). |
Step 6 | Stop if the distance between cluster centers is less than or the maximum number of iterations is reached, otherwise repeat steps 4 and 5. |
Step 7 | Assign the pixel to the class with the highest membership by |