Edge-Guided Stepwise Dual Kernel Update Network for Panoptic Segmentation

Greenberg February 1, 2025 in News - 3 Minutes

The overall structure of EGSDK-Net is shown in Figure 2. First, we utilize the backbone composed of convolutional layers and RTFormer blocks proposed in RTFormer [27] to obtain the basic feature maps. Since RTFormer [27] employs a dual-branch module starting from stage three, the top layer of the backbone outputs low-resolution and high-resolution feature maps, denoted as

R_{5}^{l o w}

and

R_{5}^{h i g h}

, respectively. We denote the output of the second stage as

R_{2}

. Since

R_{2}

has a high resolution and is rich in detailed information, we use it as the input for the real-time edge guidance module (RTEGM). The edge feature map extracted and supervised by the RTEGM subsequently affects

R_{5}^{l o w}

through weighted application. The reason for choosing

R_{5}^{l o w}

instead of

R_{5}^{h i g h}

is that

R_{5}^{l o w}

retains a significant amount of semantic information but loses many detailed cues. Enhancing its discrimination of target areas using the edge feature map will be more effective. The edge-guided

R_{5}^{l o w}

is then enhanced through DAPPM [28] to improve its ability to perceive multi-scale objects. After upsampling, the output of this process is combined with

R_{5}^{h i g h}

and processed through an additional convolutional module to mitigate aliasing effects, yielding the feature map F. The stage that follows the application of the edge feature map represents the neck of the entire model. The output F from the neck is then fed into the head, which consists of the stepwise dual kernel update module (SDKUM), to obtain the model’s mask predictions and class probability predictions. Considering that the kernel convolves the features to create mask predictions, additional supervision for the feature map F is required to ensure accurate mask predictions. Following [5], we use the auxiliary loss function as follows:

$\begin{matrix} L^{a u x} = ω_{r a n k} L^{r a n k} + ω_{s e g} L^{s e g} + ω_{d i s c} L^{d i s c}, \end{matrix}$

(1)

where $ω_{r a n k} = 0.1$ , $ω_{s e g} = 1.0$ , and $ω_{d i s c} = 1.0$ are the balancing factors used in the auxiliary loss function to balance $L^{r a n k}$ , $L^{s e g}$ , $L^{d i s c}$ , which is consistent with the design in RT-K-Net. $L^{r a n k}$ represents the mask-ID cross-entropy loss, $L^{s e g}$ denotes the cross-entropy loss, and $L^{d i s c}$ is the contrastive loss function introduced by RT-K-Net [5]. The overall loss function of EGSDK-Net can be formulated as follows:

$\begin{matrix} L_{t o t a l} = ω_{m a s k} L^{m a s k} + ω_{d i c e} L^{d i c e} + ω_{c l s} L^{c l s} + {ω_{e d g e} L^{e d g e} + L}^{a u x}, \end{matrix}$

(2)

where $ω_{m a s k} = 1.0$ , $ω_{d i c e} = 4.0$ , $ω_{c l s} = 2.0$ , and $ω_{e d g e} = 1.0$ are also the balancing factors for the overall loss function. $ω_{m a s k}$ , $ω_{d i c e}$ , $ω_{c l s}$ are consistent with the design in RT-K-Net, while $ω_{e d g e}$ is set by us. $L^{m a s k}$ refers to the binary cross-entropy loss, $L^{d i c e}$ represents the dice loss, $L^{c l s}$ denotes the focal loss, and $L^{e d g e}$ utilizes the balanced cross-entropy loss function. The applications of $L^{m a s k}$ , $L^{d i c e}$ , and $L^{c l s}$ are consistent with those in K-Net [2] and RT-K-Net [5], to improve the guidance of stuff masks and thing masks during training. Moreover, we adopt the same training and inference optimizations, post-processing methods, and instance-based cropping augmentation as RT-K-Net [5]. Please refer to [5] for more information on these steps.

Source link

Pengyu Mu www.mdpi.com

Greenberg

Learn More →

Related Posts

Nanomaterials, Vol. 15, Pages 1084: Thermal-Induced Oxygen Vacancy Enhancing the Thermo-Chromic Performance of W-VO2&minus;x@AA/PVP Nanoparticle Composite-Based Smart Windows

JCM, Vol. 14, Pages 4951: Evaluation of Awareness, Use, and Perceptions of Injury Prevention Programs Among Youth Sport Coaches in Poland

Minerals, Vol. 15, Pages 730: Neoproterozoic Subduction Zone Fluids and Sediment Melt-Metasomatized Mantle Magmatism on the Northern Yangtze Block: Constraints from the Ca. 880 Ma Taoyuan Syenogranite

Greenberg

Nanomaterials, Vol. 15, Pages 1084: Thermal-Induced Oxygen Vacancy Enhancing the Thermo-Chromic Performance of W-VO2−x@AA/PVP Nanoparticle Composite-Based Smart Windows