TwoSquared: 4D Reconstruction from 2D Image Pairs

Lu Sang*1,2Zehranaz Canfes*2Dongliang Cao3Riccardo Marin1,2Florian Bernard3 Daniel Cremers1,2

1Technical University of Munich, 2Munich Center of Machine Learning, 3University of Bonn
* equal contribution

We present TwoSquared, a method that takes a pair of 2D images representing the initial and final states of an object as input and generates texture-consistent, geometry-consistent 4D continuous sequences. It is designed to be robust to varying input quality, operating without the need for predefined templates or object-class priors. This adaptability enables greater flexibility in processing diverse images while maintaining structural integrity and visual coherence throughout the generated sequences. As demonstrated, our approach effectively handles humans, animals, and inanimate objects.

4D Reconstructed Mesh Interactive View

(For the best experience, please view this page on a computer.)

Method Overview

TwoSquared processes two input images through an image-to-3D generation block, producing two 3D meshes. We then extract per-vertex features and compute a cosine similarity map, which is refined using a functional map module and a close loop check module to obtain point-to-point correspondences. These registered points are then fed into our shape deformation module, where we model their trajectory of the deformed point cloud. During the inference time, we can directly infer the generated textured mesh from I0 to obtain the 4D sequence.

Comparisons

V2M4 Input Video

V2M4

V2M4 Mesh

Ours input

Ours

Ours Mesh


V2M4 Input Video

V2M4

V2M4 Mesh

Ours input

Ours

Ours Mesh


V2M4 Input Video

V2M4

V2M4 Mesh

Ours input

Ours

Ours Mesh


V2M4 Input Video

V2M4

V2M4 Mesh

Ours input

Ours

Ours Mesh


V2M4 Input Video

V2M4

V2M4 Mesh

Ours input

Ours

Ours Mesh


Real-world Image Generated 4D sequence

Input Images
t=0
4D motion
left
4D motion
front
4D motion
right
4D motion
back
Input Images
t=0
Overlay Image Overlay Image

Input Images
t=0
4D motion
left
4D motion
front
4D motion
right
4D motion
back
Input Images
t=1
Overlay Image Overlay Image

Input Images
t=0
4D motion
left
4D motion
front
4D motion
right
4D motion
back
Input Images
t=1
Overlay Image Overlay Image

Input Images
t=0
4D motion
left
4D motion
front
4D motion
right
4D motion
back
Input Images
t=1
Overlay Image Overlay Image

Input Images
t=0
4D motion
left
4D motion
front
4D motion
right
4D motion
back
Input Images
t=1
Overlay Image Overlay Image

Citation


          @misc{sang2025twosquared4dgeneration2d,
            title={TwoSquared: 4D Generation from 2D Image Pairs}, 
            author={Lu Sang and Zehranaz Canfes and Dongliang Cao and Riccardo Marin and Florian Bernard and Daniel Cremers},
            year={2025},
            eprint={2504.12825},
            archivePrefix={arXiv},
            primaryClass={cs.CV},
            url={https://arxiv.org/abs/2504.12825}, 
      }