基于通道注意力与残差卷积的单视图三维重建

    Single-view 3D Reconstruction Based on Residual Convolution and Channel Attention

    • 摘要: 在机器视觉领域,从单张RGB图像中实现物体的精确三维重建是一项具有挑战性的任务,尤其在工业质检、虚拟装配等应用场景中,现有方法在复杂遮挡与多对象交互下的重建精度仍有不足。本文旨在提出一种融合通道注意力机制与残差卷积神经网络的单视图三维重建方法,以提升复杂场景下的重建精度与鲁棒性。该方法采用编码器—解码器架构:编码器引入改进的通道注意力模块,增强对局部特征与遮挡区域的特征提取能力;解码器基于反向投影路径实现亚体素精度的网格位移预测,有效恢复物体的细节结构。实验在ShapeNet合成数据集上进行,结果表明,在1283分辨率下,本文方法在单对象重建任务中的平均交并比(mIoU)达到62.4%,较CoReNet提升3.3%;在多对象重建任务中,mIoU为48.2%,提升4.3%;在遮挡率大于50%的复杂场景中,mIoU为42.1%,提升4.6%,验证了方法的鲁棒性。本文提出的方法在细节重建与多对象交互建模方面具有显著优势,为工业应用提供了高精度的三维重建解决方案。

       

      Abstract: Accurate 3D reconstruction from a single RGB image remains a challenging task in computer vision, especially in industrial applications such as quality inspection and virtual assembly, where existing methods still struggle with complex occlusions and multi-object interactions. This paper proposes a novel encoder-decoder framework for single-view 3D reconstruction that integrates a channel attention mechanism with residual convolutional networks to enhance feature extraction and occlusion handling. The encoder incorporates an improved channel attention module to strengthen local feature representation, while the decoder predicts sub-voxel grid displacements along back-projection rays to recover fine-grained structures. Experimental results on the ShapeNet synthetic dataset demonstrate that the proposed method achieves a mIoU of 62.4% for single-object reconstruction at 128 resolution, outperforming CoReNet by 3.3%. For multi-object scenarios, it reaches a mIoU of 48.2%, surpassing CoReNet by 4.3%. In severely occluded scenes (>50% occlusion), the mIoU improves by 4.6%, confirming the robustness of the approach. The proposed method provides a high-precision solution for 3D reconstruction in complex industrial environments.

       

    /

    返回文章
    返回