摘要
This paper introduces MVDiffusion, a simple yet effective multi-view image generation method for scenarios where pixel-to-pixel correspondences are available, such as perspective crops from panorama or multi-view images given geometry (depth maps and poses).
Unlike prior models that rely on iterative image warping and inpainting, MVDiffusion concurrently generates all images with a global awareness, encompassing high resolution and rich content, effectively addressing the error accumulation prevalent in preceding models.
MVDiffusion specifically incorporates a correspondence-aware attention mechanism, enabling effective cross-view interaction.
This mechanism underpins three pivotal modules:
1) a generation module that produc