We learn to predict the shape and pose of an object from a single input view. Our framework can leverage training data of the form of multi-view observations of objects, and learn shape and pose prediction despite the lack of any direct supervision.
Paper
Tulsiani, Efros, Malik.
Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction.
Learning using online images. We can train our system using images downloaded from eBay and corresponding automatically obtained segmentations. We visualize our learned shape and pose predictions using the depicted training data.
Acknowledgements
We thank David Fouhey for insightful discussions, and Saurabh Gupta and Tinghui Zhou for helpful comments. This work was supported in part by Intel/NSF VEC award IIS-1539099 and NSF Award IIS-1212798. We gratefully acknowledge NVIDIA corporation for the donation of GPUs used for this research. This webpage template was borrowed from some colorful folks.