Stochastic Interpolants for Controllable Scene Generation
Stochastic interpolants offer a powerful toolbox for generative modeling, including denoising diffusion, flow matching, and inductive moment matching. In the context of ETHAR, they hold great promise as a path towards controllable sampling and synthesis of multimodal content across the 2D and 3D domains. This potential is exemplified by scenarios where users aim to reimagine or enhance physical environments based on minimal imagery and high-level guidance. The proposed framework leverages stochastic interpolants to generate semantically consistent and geometrically plausible 3D reconstructions, including novel views that complete and enrich the scene beyond what is directly observable. We propose to develop a framework for interactive, semantically guided, and geometrically coherent content creation based on stochastic interpolants with variational regularization, in service of immersive user experiences and augmented reality.