Tree D-fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors

ECCV 2024

¹Purdue University, Department of Computer Science
²Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
³Google
⁴Purdue University, Department of Forestry & Natural Resources

Abstract

We introduce Tree-D Fusion, featuring the first collection of 600,000 environmentally aware, 3D simulation-ready tree models generated through Diffusion priors. Each reconstructed 3D tree model corresponds to an image from Google's Auto Arborist Dataset, comprising street view images and associated genus labels of trees across North America. Our method distills the scores of two tree-adapted diffusion models by utilizing text prompts to specify a tree genus, thus facilitating shape reconstruction. This process involves reconstructing a 3D tree envelope filled with point markers, which are subsequently utilized to estimate the tree's branching structure using the space colonization algorithm conditioned on a specified genus.

Overview

The input to Tree-D Fusion is an RGB image of a tree and its genus. To perform shape reconstruction, we minimize the loss function w.r.t. the NeRF parameter θ. The loss function is constructed from two diffusion models, StableDiffusion with Lora and Zero123, trained on real tree images and synthetic 3D tree models. The output is an optimized NeRF τ (θ∗), which is a detailed 3D tree envelope. We then populate the volume of τ (θ∗) by markers based on the envelope and reconstruct trees by genus-conditioned space colonization algorithm.

Quantitative Comparisons

Tree D-fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors

Tree D-fusion can create a simulation-ready tree model from unposed images, such as those from Google Street View. Using the generated shape, it then simulates realistic trees.

Abstract

Overview

Qualitative Comparisons

Single tree image reconstruction front-view results.

Single tree image reconstruction top-view results.

Quantitative Comparisons

ICTree is the perceived realism scores of generated trees. Tree-D Fusion shows an average improvement of 44.83%±25.9%.

LPIPS between a frontal view of a tree envelope from RBV and Tree-D Fusion shows 20.21%±8.89% improvement.

CLIP-Similarity between four views of a tree envelope shows an improvement of 45.34%±23.86%.

BibTeX