RGB2POINT, WACV2025

Abstract

We introduce RGB2Point, an unposed single-view RGB image to a 3D point cloud generation based on Transformer. RGB2Point takes an input image of an object and generates a dense 3D point cloud. Contrary to prior works based on CNN layers and diffusion-denoising approaches, we use pre-trained Transformer layers that are fast and generate high-quality point clouds with consistent quality over available categories. Our generated point clouds demonstrate high quality on a real-world dataset, as evidenced by improved Chamfer distance (51.15%) and Earth Mover’s distance (36.17%) metrics compared to the current state-ofthe-art. Additionally, our approach shows a better quality on a synthetic dataset, achieving better Chamfer distance (39.26%), Earth Mover’s distance (26.95%), and F-score (47.16%). Moreover, our method produces 63.1% more consistent high-quality results across various object categories compared to prior works. Furthermore, RGB2Point is computationally efficient, requiring only 2.3GB of VRAM to reconstruct a 3D point cloud from a single RGB image, and our implementation generates the results 15,133× faster than a SOTA diffusion-based model.

Overview

RGB2Point takes a single view RGB image and extracts image features from the pre-trained ViT. The Contextual Feature Integrator then refines these extracted features, which applies a multi-head attention mechanism to highlight specific regions of interest within the features. The weighted features are forwarded to the Geometric Projection Module, which maps them into a 3D space, resulting in a point cloud representation. We carefully designed the model, RGB2Point which requires only 2.3GB of VRAM to generate a 3D point cloud from a single RGB image.

Visualization

We show a reconstructed 3D point cloud using real-world single images.

Various Point Cloud Resolutions

We demonstrate reconstruction results using different point cloud resolutions.

BibTeX

@article{lee2024rgb2point,
  title={RGB2Point: 3D Point Cloud Generation from Single RGB Images},
  author={Lee, Jae Joong and Benes, Bedrich},
  journal={arXiv preprint arXiv:2407.14979},
  year={2024}
}