Tutorial

Title: From Traditional to AI-based 3D Scene Capture and Modeling

Target Audience: PhD students, researchers, and practitioners

Level: Beginner to expert

Date and time: June 10, 2024 at 9am

Duration: 3 hours

Prerequisites: Participants are expected to be familiar with Python

Teaching resources will be distributed (i.e. the PowerPoint presentations, image datasets, python codes).

Deep learning has led to significant breakthroughs in various fields. The advent of implicit, neural-network-based scene representations such as Neural Radiance Fields (NeRFs) marks a significant leap in photogrammetric computer vision and novel view synthesis as well as respective applications in robotics, urban mapping, autonomous navigation, virtual/augmented reality, etc.. Employing neural networks to efficiently encode high-resolution scene information has been demonstrated to capture precise 3D models, while additionally being more compact than scene representations in terms of point clouds or voxel block models. Through a blend of theoretical insights, visual illustrations and practical exercises, this course will delve into core concepts, implementation strategies, and advanced applications of traditional and neural-network-based 3D scene capture and visualization, providing you with the skills and knowledge to reflect on the strengths, innovation potential and limitations of current approaches.

Photogrammetry, point cloud generation, machine/deep learning, conventional and learning-based approaches for 3D reconstruction from imagery, Structure-from-Motion (SfM), Multi- View Stereo (MVS), Neural Radiance Fields (NeRFs), 3D Gaussian Splatting

Module 1: Traditional Photogrammetry

This module provides an overview on basic concepts of geometry acquisition via (passive and active) optical 3D sensing techniques. A particular focus will be put on traditional approaches for the generation of 3D scene representations from acquired imagery using low-cost consumer hardware which is for instance commonly applied for large-scale 3D reconstruction from aerial imagery or for indoor 3D mapping. In this regard, the organization of large unstructured (e.g. crowdsourced) image collections and well-established 3D reconstruction techniques (from either imagery or RGB-D data) will be considered in detail. In this context, representations of acquired data in the form of 3D point clouds, voxel occupancy grids and 3D meshes will be discussed as well as different approaches for closed surface reconstruction from 3D point clouds.

Module 2: Deep Learning meets Photogrammetry

This module addresses developments that leverage the potential of deep learning for 3D reconstruction from imagery, while still relying on conventional scene representations in the form of 3D point clouds, voxel occupancy grids, 3D meshes or depth maps. After an introduction to deep learning including standard network architectures, gradient-based optimization and regularization schemes, we will consider how different approaches can benefit from involving deep learning principles. In addition to learning-based multi-view 3D reconstruction, we will also revisit approaches for the ill-posed scenario of single-image-based 3D scene reconstruction.

Module 3: Advanced Scene Representations for 3D Scene Reconstruction

This module covers recent developments regarding advanced scene representations for 3D scene reconstruction. This includes the recently emerging implicit neural scene representations such as Neural Radiance Fields (NeRFs) with their underlying idea to represent a scene in terms of the weights of a neural network. This network is optimized in a supervised manner to match its predicted scene appearance under certain views to the respectively observed input photographs for the corresponding view configurations. Furthermore, we will focus on extensions towards improved model quality, acceleration of the training of the underlying network, sparse input data, handling photo collections taken in-the-wild, and handling large-scale scenarios. Finally, since training the involved neural network makes these approaches particularly costly regarding the computational burden, we will also discuss 3D Gaussian Splatting which offers a trade-off with respect to training time and model quality.