We present TwoSquared, a method that takes a pair of 2D images representing the initial and final states of an object as input and generates texture-consistent, geometry-consistent 4D continuous sequences. It is designed to be robust to varying input quality, operating without the need for predefined templates or object-class priors. This adaptability enables greater flexibility in processing diverse images while maintaining structural integrity and visual coherence throughout the generated sequences. As demonstrated, our approach effectively handles humans, animals, and inanimate objects.
(For the best experience, please view this page on a computer.)
TwoSquared processes two input images through an image-to-3D generation block, producing two 3D meshes. We then extract per-vertex features and compute a cosine similarity map, which is refined using a functional map module and a close loop check module to obtain point-to-point correspondences. These registered points are then fed into our shape deformation module, where we model their trajectory of the deformed point cloud. During the inference time, we can directly infer the generated textured mesh from I0 to obtain the 4D sequence.
V2M4 Input Video
V2M4
V2M4 Mesh
Ours input
Ours
Ours Mesh
V2M4 Input Video
V2M4
V2M4 Mesh
Ours input
Ours
Ours Mesh
V2M4 Input Video
V2M4
V2M4 Mesh
Ours input
Ours
Ours Mesh
V2M4 Input Video
V2M4
V2M4 Mesh
Ours input
Ours
Ours Mesh
V2M4 Input Video
V2M4
V2M4 Mesh
Ours input
Ours
Ours Mesh
@misc{sang2025twosquared4dgeneration2d,
title={TwoSquared: 4D Generation from 2D Image Pairs},
author={Lu Sang and Zehranaz Canfes and Dongliang Cao and Riccardo Marin and Florian Bernard and Daniel Cremers},
year={2025},
eprint={2504.12825},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.12825},
}