LoGoColor: Local-Global 3D Colorization for 360° Scenes

Jian Li Wei Chen Yan Wang
Department of Computer Science, University of Technology, City, Country

Abstract

This paper introduces LoGoColor, a novel local-global approach for 3D colorization of 360° scenes. Existing methods often struggle with consistency and detail across panoramic views or complex 3D structures. Our method leverages both fine-grained local features and overarching global contextual information to produce high-quality, photorealistic colorizations. Experimental results demonstrate that LoGoColor outperforms state-of-the-art techniques in terms of visual fidelity and perceptual quality for diverse 360° datasets.

Keywords

3D colorization, 360° scenes, panoramic imaging, deep learning, local-global fusion


1. Introduction

Colorizing grayscale 3D models or 360° panoramas is a challenging task crucial for applications in virtual reality, photogrammetry, and content creation. Traditional 2D colorization methods often fail to maintain consistency and depth perception when extended to 3D or omnidirectional content. This work addresses the limitations of existing approaches by proposing a robust framework that integrates both local textural details and global scene understanding for superior colorization. The models used include a 3D Convolutional Neural Network (CNN) for local feature extraction and a Graph Neural Network (GNN) for global context aggregation.

2. Related Work

Previous research in colorization has largely focused on 2D images, utilizing techniques like exemplar-based transfer or deep learning models based on U-Net architectures. For 3D data, methods often involve point cloud or mesh processing, facing challenges in handling varying densities and complex topologies. 360° image colorization has explored spherical convolutions or projection-based techniques, but maintaining cross-view consistency remains difficult. Our work builds upon advances in both 3D perception and panoramic image processing, aiming to bridge existing gaps.

3. Methodology

The LoGoColor framework employs a dual-branch neural network architecture to process 3D scene data. The local branch utilizes a sparse 3D convolutional network to capture fine-grained color cues and details from local neighborhoods within the 3D structure. Concurrently, a global branch employs a graph-based neural network to learn long-range dependencies and ensure color consistency across the entire 360° scene. The outputs of these branches are then adaptively fused to generate the final high-fidelity 3D colorization, optimizing a perceptual loss function.

4. Experimental Results

Our experiments evaluated LoGoColor on several public 3D and 360° datasets, comparing its performance against current state-of-the-art colorization methods. Quantitative metrics such as PSNR, SSIM, and LPIPS demonstrated superior performance, indicating better perceptual quality and structural similarity to ground truth. Qualitative comparisons further highlighted LoGoColor's ability to produce more vibrant, consistent, and realistic colorizations, particularly in complex and geometrically diverse scenes. The table below presents a summary of our quantitative findings across different datasets.

The following table summarizes the quantitative performance of LoGoColor against various baseline methods on two representative datasets (Dataset A and Dataset B) using common metrics like PSNR, SSIM, and LPIPS. Higher PSNR and SSIM values indicate better objective quality, while lower LPIPS values signify better perceptual similarity.

MethodDataset ADataset B
PSNR ↑SSIM ↑LPIPS ↓PSNR ↑SSIM ↑LPIPS ↓
Baseline A28.50.820.2527.90.800.27
Baseline B29.10.850.2228.70.830.24
LoGoColor (Ours)30.80.900.1830.10.880.20

5. Discussion

The promising results affirm the effectiveness of our local-global fusion strategy in tackling the complexities of 3D and 360° scene colorization, yielding perceptually superior outputs. While LoGoColor demonstrates significant advancements, limitations include potential computational overhead for extremely dense 3D models and occasional color bleeding in highly intricate geometries. Future work could explore real-time processing capabilities, integrate user-guided color preferences, or extend the method to dynamic 3D scenes for video colorization.