State-of-the-art in deep learning approaches for automatic single-panorama indoor modeling and exploration

Authors

Giovanni Pintore CRS4, Italy
Marco Agus HBKU, Qatar
Jens Schneider HBKU, Qatar
Enrico Gobbetti CRS4, Italy

Abstract

A single surround-view panoramic image provides complete coverage of the environment visible from a single viewpoint and inherently supports dynamic exploration, especially when viewed through a head-mounted display. For these reasons, single or linked 360-degree panoramas have become a widely adopted modality for indoor scene acquisition and virtual tour creation. Despite their popularity, panoramas present inherent limitations: they statically represent the captured scene, do not provide an explicit 3D architectural structure or geometry, and exhibit minimal parallax due to their single viewpoint. These limitations restrict their applicability and often require significant modeling efforts to generate missing data. In our survey, we provided an up-to-date, integrative overview of recent techniques to address these challenges, bringing together complementary perspectives from machine learning, computer vision, and computer graphics.

Note: This repository accompanies our survey published in Computer Graphics Forum and presented at Eurographics 2026. It provides direct links to relevant works, organized by topic. We will strive to keep this repository updated also after the survey publication.

How to Cite

If you find this survey or the organized repository helpful for your research, please cite our work as follows:

APA:

Pintore, G., Agus, M., Schneider, J., & Gobbetti, E. (2026). State-of-the-art in deep learning approaches for automatic single-panorama indoor modeling and exploration. Computer Graphics Forum, 45(2), doi: 10.1111/cgf.70396

BibTeX:

@article{Pintore:2026:SDL,
  title={State-of-the-art in deep learning approaches for automatic single-panorama indoor modeling and exploration},
  author={Giovanni Pintore and Marco Agus and Jens Schneider and Enrico Gobbetti},
  journal={Computer Graphics Forum},
  volume={45},
  number={2},
  year={2026},
  doi={10.1111/cgf.70396}
}

Acknowledgments

This work was supported by NPRP-S 14th Cycle grant 0403-210132 AIN2 from the Qatar National Research Fund (a member of Qatar Foundation) and by the Sardinian Regional Authorities under the project XDATA (Art9 LR 20/2015).

Related surveys

Our work examines methods for automatically transforming indoor panoramic shots into representations that support structural and geometric recovery, dynamic exploration, and basic editing. Our focus lies on techniques that operate mostly from asingle panoramic image, covering both analysis (for model reconstruction) and synthesis (for immersive viewing, navigation, and editing). Several other surveys provide complementary information. We list here relevant works.

References:

[1] H. Ai, Z. Cao, and L. Wang, "A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision," Int J Comput Vis, 2025, doi: 10.1007/s11263-025-02391-w.

[2] T. L. da Silveira, P. G. Pinto, J. Murrugarra-Llerena, and C. R. Jung, "3D scene geometry estimation from 360° imagery: A survey," ACM Computing Surveys, 2022, doi: 10.1145/3519021.

[3] S. Gao, K. Yang, H. Shi, K. Wang, and J. Bai, "Review on Panoramic Imaging and Its Applications in Scene Understanding," IEEE TIM, 2022, doi: 10.1109/TIM.2022.3216675.

[4] J. Li, W. Gao, Y. Wu, Y. Liu, and Y. Shen, "High-quality indoor scene 3D reconstruction with RGB-D cameras: A brief review," Computational Visual Media, 2022, doi: 10.1007/s41095-021-0250-8.

[5] M. Meng, Y. Zhu, Y. Zhao, Z. Li, and Z. Zhu, "3D Indoor Scene Geometry Estimation from a Single Omnidirectional Image: A Comprehensive Survey," Computational Visual Media, 2025, doi: 10.26599/CVM.2025.9450438.

[6] A. G. Patil, S. G. Patil, M. Li, M. Fisher, M. Savva, and H. Zhang, "Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes," Computer Graphics Forum, 2024, doi: 10.1111/cgf.14927.

[7] G. Pintore, C. Mura, F. Ganovelli, L. Fuentes-Perez, R. Pajarola, and E. Gobbetti, "State-of-the-art in Automatic 3D Reconstruction of Structured Indoor Environments," Comput. Graph. Forum, 2020, doi: 10.1111/cgf.14021.

[8] M. Tukur, S. Jashari, M. Alzubaidi, B. A. Salami, Y. Boraey, S. Yong, D. Saleh, G. Pintore, E. Gobbetti, J. Schneider, N. Fetais, and M. Agus, "Panoramic Imaging in Immersive Extended Reality: A Scoping Review of Technologies, Applications, Perceptual Studies, and User Experience Challenges," Frontiers in Virtual Reality, 2025, doi: 10.3389/frvir.2025.1622605.

[9] H. Wang and M. Li, "A new era of indoor scene reconstruction: A survey," IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3440260.

[10] J. Yu, A. C. P. Grassi, and G. Hirtz, "Applications of Deep Learning for Top-View Omnidirectional Imaging: A Survey," Proc. CVPRW, 2023, doi: 10.1109/CVPRW59228.2023.00683.

[11] J. Zhu, C. Gao, Q. Sun, M. Wang, and Z. Deng, "A survey of indoor 3D reconstruction based on RGB-D cameras," IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3443065.

[12] M. Zollhöfer, P. Stotko, A. Görlitz, C. Theobalt, M. Niessner, R. Klein, and A. Kolb, "State-of-the-art on 3D reconstruction with RGB-D cameras," Computer graphics forum, 2018, doi: 10.1111/cgf.13386.

[13] C. Zou, J. W. Su, C. H. Peng, A. Colburn, Q. Shan, P. Wonka, H. K. Chu, and D. Hoiem, "Manhattan Room Layout Reconstruction from a Single 360 Image: A Comparative Study of State-of-the-Art Methods," International Journal of Computer Vision, 2021, doi: 10.1007/s11263-020-01426-8.

↑ Back to Top

Background

In addition to reviewing state-of-the-art methods, we outline key background concepts, including omnidirectional representations, gravity alignment, architectural priors, and major datasets supporting 360-degree indoor research.

↑ Back to Top

360-degree image capture and processing representation

References:

[1] R. Cabral and Y. Furukawa, "Piecewise Planar and Compact Floorplan Reconstruction from Images," Proc. CVPR, 2014, doi: 10.1109/CVPR.2014.546.

[2] M. L. Champel, R. Doré, and N. Mollet, "Key factors for a high-quality VR experience," Applications of Digital Image Processing XL, 2017, doi: 10.1117/12.2274336.

[3] J. M. Coughlan and A. L. Yuille, "Manhattan World: Compass direction from a single image by Bayesian inference," Proc. ICCV, 1999, doi: 10.1109/ICCV.1999.790349.

[4] T. L. da Silveira, P. G. Pinto, J. Murrugarra-Llerena, and C. R. Jung, "3D scene geometry estimation from 360° imagery: A survey," ACM Computing Surveys, 2022, doi: 10.1145/3519021.

[5] E. Delage, H. Lee, and A. Y. Ng, "A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image," Proc. CVPR, 2006, doi: 10.1109/CVPR.2006.23.

[6] M. Eder, M. Shvets, J. Lim, and J. M. Frahm, "Tangent Images for Mitigating Spherical Distortion," Proc. CVPR, 2020, doi: 10.1109/CVPR42600.2020.01244.

[7] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, "Reconstructing building interiors from images," Proc. ICCV, 2009, doi: 10.1109/ICCV.2009.5459145.

[8] C. Godard, O. M. Aodha, and G. J. Brostow, "Unsupervised Monocular Depth Estimation with Left-Right Consistency," Proc. CVPR, 2017, doi: 10.1109/CVPR.2017.699.

[9] V. Hedau, D. Hoiem, and D. Forsyth, "Recovering the spatial layout of cluttered rooms," Proc. ICCV, 2009, doi: 10.1109/ICCV.2009.5459411.

[10] J. Huang, Z. Chen, D. Ceylan, and H. Jin, "6-DOF VR videos with a single 360-camera," Proc. IEEE VR, 2017, doi: 10.1109/VR.2017.7892229.

[11] S. Im, H. Ha, F. Rameau, H. G. Jeon, G. Choe, and I. S. Kweon, "All-Around Depth from Small Motion with a Spherical Panoramic Camera," Proc. ECCV, 2016, doi: 10.1007/978-3-319-46487-9_10.

[12] T. Jokela, J. Ojala, and K. Väänänen, "How people use 360-degree cameras," Proc. International Conference on Mobile and Ubiquitous Multimedia, 2019, doi: 10.1145/3365610.3365645.

[13] D. C. Lee, M. Hebert, and T. Kanade, "Geometric reasoning for single image structure recovery," Proc. CVPR, 2009, doi: 10.1109/CVPR.2009.5206872.

[14] G. Pintore, F. Ganovelli, R. Pintus, R. Scopigno, and E. Gobbetti, "3D floor plan recovery from overlapping spherical images," Computational Visual Media, 2018, doi: 10.1007/s41095-018-0125-9.

[15] G. Pintore, R. Pintus, F. Ganovelli, R. Scopigno, and E. Gobbetti, "Recovering 3D existing-conditions of indoor structures from spherical images," Computers & Graphics, 2018, doi: 10.1016/j.cag.2018.09.013.

[16] G. Pintore, C. Mura, F. Ganovelli, L. Fuentes-Perez, R. Pajarola, and E. Gobbetti, "State-of-the-art in Automatic 3D Reconstruction of Structured Indoor Environments," Comput. Graph. Forum, 2020, doi: 10.1111/cgf.14021.

[17] G. Pintore, E. Almansa, M. Agus, and E. Gobbetti, "Deep3DLayout: 3D Reconstruction of an Indoor Layout from a Spherical Panoramic Image," ACM TOG, 2021, doi: 10.1145/3478513.3480480.

[18] M. Rey-Area, M. Yuan, and C. Richardt, "360MonoDepth: High-Resolution 360° Monocular Depth Estimation," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.00374.

[19] G. Schindler and F. Dellaert, "Atlanta world: an expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments," Proc. CVPR, 2004, doi: 10.1109/CVPR.2004.1315033.

↑ Back to Top

Gravity alignment

References:

[1] B. Davidson, M. S. Alvi, and J. F. H. Henriques, "360 Camera Alignment via Segmentation," Proc. ECCV, 2020, doi: 10.1007/978-3-030-58604-1_35.

[2] R. Jung, A. S. J. Lee, A. Ashtari, and J. Bazin, "Deep360Up: A Deep Learning-Based Approach for Automatic VR Image Upright Adjustment," Proc. IEEE VR, 2019, doi: 10.1109/VR.2019.8798326.

[3] G. Pintore, M. Agus, E. Almansa, J. Schneider, and E. Gobbetti, "SliceNet: deep dense depth estimation from a single indoor panorama using a slice-based representation," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.01137.

[4] W. Xian, Z. Li, M. Fisher, J. Eisenmann, E. Shechtman, and N. Snavely, "UprightNet: geometry-aware camera orientation estimation from single images," Proc. ICCV, 2019, doi: 10.1109/ICCV.2019.01007.

↑ Back to Top

Architectural and data-driven priors

References:

[1] M. Berger, A. Tagliasacchi, L. M. Seversky, P. Alliez, G. Guennebaud, J. A. Levine, A. Sharf, and C. T. Silva, "A Survey of Surface Reconstruction from Point Clouds," Computer Graphics Forum, 2017, doi: 10.1111/cgf.12802.

[2] J. M. Coughlan and A. L. Yuille, "Manhattan World: Compass direction from a single image by Bayesian inference," Proc. ICCV, 1999, doi: 10.1109/ICCV.1999.790349.

[3] Stanford-University, "BuildingParser Dataset," Public dataset, 2017, url: https://sdss.redivis.com/datasets/9q3m-9w5pa1a2h.

[4] M. Rey-Area, M. Yuan, and C. Richardt, "360MonoDepth Matterport3D," Public dataset, 2022, url: https://researchdata.bath.ac.uk/1126/.

[5] M. Rey-Area, M. Yuan, and C. Richardt, "360MonoDepth Replica," Public dataset, 2022, url: https://manurare.github.io/360monodepth/.

[6] I. Armeni, Z. Y. He, J. Gwak, A. R. Zamir, M. Fischer, J. Malik, and S. Savarese, "The 3DSceneGraph Dataset," Public dataset, 2019, url: https://github.com/StanfordVL/3DSceneGraph/.

[7] Z. Li, L. Wang, X. Huang, C. Pan, and J. Yang, "FutureHouse Dataset," Public dataset, 2022, url: https://github.com/LZleejean/FutureHouse.

[8] F. Xia, A. R. Zamir, Z. He, A. Sax, J. Malik, and S. Savarese, "GibsonV2," Public dataset, 2018, url: https://github.com/StanfordVL/GibsonEnv/tree/master/gibson/data.

[9] Matterport, "Matterport3D," Public dataset, 2017, url: https://github.com/niessner/Matterport.

[10] C. Zou, J. W. Su, C. H. Peng, A. Colburn, Q. Shan, P. Wonka, H. K. Chu, and D. Hoiem, "MatterportLayout," Public dataset, 2017, url: https://github.com/ericsujw/Matterport3DLayoutAnnotation.

[11] G. Albanis, N. Zioulis, P. Drakoulis, V. Gkitsas, V. Sterzentsenko, F. Alvarez, D. Zarpalas, and P. Daras, "Pano3D," Public dataset, 2021, url: https://vcl3d.github.io/Pano3D/download/.

[12] Y. Zhang, S. Song, P. Tan, and J. Xiao, "PanoContext dataset," Public dataset, 2014, url: https://panocontext.cs.princeton.edu/.

[13] M. Khan, J. Abu-Khalaf, and D. Suter, "PanoSCU Dataset," Public dataset, 2025, url: https://doi.org/10.25958/4p6j-z657.

[14] J. Xu, J. Zheng, Y. Xu, R. Tang, and S. Gao, "PNVS Dataset," Public dataset, 2021, url: https://github.com/bluestyle97/PNVS.

[15] ShanghaiTech-University and Kujiale-com, "Shanghaitech-Kujiale Indoor 360° (SKI360) dataset," Public dataset, 2020, url: https://svip-lab.github.io/dataset/indoor_360.html.

[16] J. Zheng, J. Zhang, J. Li, R. Tang, S. Gao, and Z. Zhou, "Indoor3Dmapping dataset," Public dataset, 2020, url: https://structured3d-dataset.org/.

[17] S. Cruz, W. Hutchcroft, Y. Li, N. Khosravan, I. Boyadzhiev, and S. B. Kang, "Zillow Indoor Dataset (ZInD)," Public dataset, 2021, url: https://github.com/zillow/zind.

[18] T. Deb, L. Wang, Z. Bessinger, N. Khosravan, E. Penner, and S. B. Kang, "ZInD-Tell Dataset," Public dataset, 2024, url: https://github.com/zillow/zindtell.

[19] E. Delage, H. Lee, and A. Y. Ng, "A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image," Proc. CVPR, 2006, doi: 10.1109/CVPR.2006.23.

[20] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, "Reconstructing building interiors from images," Proc. ICCV, 2009, doi: 10.1109/ICCV.2009.5459145.

[21] V. Hedau, D. Hoiem, and D. Forsyth, "Recovering the spatial layout of cluttered rooms," Proc. ICCV, 2009, doi: 10.1109/ICCV.2009.5459411.

[22] D. C. Lee, M. Hebert, and T. Kanade, "Geometric reasoning for single image structure recovery," Proc. CVPR, 2009, doi: 10.1109/CVPR.2009.5206872.

[23] G. Pintore, F. Ganovelli, R. Pintus, R. Scopigno, and E. Gobbetti, "3D floor plan recovery from overlapping spherical images," Computational Visual Media, 2018, doi: 10.1007/s41095-018-0125-9.

[24] G. Pintore, C. Mura, F. Ganovelli, L. Fuentes-Perez, R. Pajarola, and E. Gobbetti, "State-of-the-art in Automatic 3D Reconstruction of Structured Indoor Environments," Comput. Graph. Forum, 2020, doi: 10.1111/cgf.14021.

[25] G. Pintore, M. Agus, E. Almansa, J. Schneider, and E. Gobbetti, "SliceNet: deep dense depth estimation from a single indoor panorama using a slice-based representation," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.01137.

[26] G. Schindler and F. Dellaert, "Atlanta world: an expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments," Proc. CVPR, 2004, doi: 10.1109/CVPR.2004.1315033.

[27] J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba, "Recognizing scene viewpoint using panoramic place representation," Proc. CVPR, 2012, doi: 10.1109/CVPR.2012.6247991.

[28] N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras, "OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas," Proc. ECCV, 2018, doi: 10.1007/978-3-030-01231-1_28.

[29] C. Zou, J. W. Su, C. H. Peng, A. Colburn, Q. Shan, P. Wonka, H. K. Chu, and D. Hoiem, "3D Manhattan Room Layout Reconstruction from a Single 360 Image," ArXiv e-print arXiv:1910.04099, 2019, doi: 10.48550/arXiv.1910.04099.

[30] C. Zou, J. W. Su, C. H. Peng, A. Colburn, Q. Shan, P. Wonka, H. K. Chu, and D. Hoiem, "Manhattan Room Layout Reconstruction from a Single 360 Image: A Comparative Study of State-of-the-Art Methods," International Journal of Computer Vision, 2021, doi: 10.1007/s11263-020-01426-8.

↑ Back to Top

Open research data

References:

[1] G. Albanis, N. Zioulis, P. Drakoulis, V. Gkitsas, V. Sterzentsenko, F. Alvarez, D. Zarpalas, and P. Daras, "Pano3D: A holistic benchmark and a solid baseline for 360° depth estimation," Proc. CVPR Workshops, 2021, doi: 10.1109/CVPRW53098.2021.00413.

[2] I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, "3D Semantic Parsing of Large-scale Indoor Spaces," Proc. CVPR, 2016, doi: 10.1109/CVPR.2016.170.

[3] I. Armeni, Z. Y. He, J. Gwak, A. R. Zamir, M. Fischer, J. Malik, and S. Savarese, "3D Scene Graph: A structure for unified semantics, 3D space, and camera," Proc. CVPR, 2019, doi: 10.1109/ICCV.2019.00576.

[4] B. Attal, S. Ling, A. Gokaslan, C. Richardt, and J. Tompkin, "MatryODShka: Real-time 6DoF video view synthesis using multi-sphere images," Proc. ECCV, 2020, doi: 10.1007/978-3-030-58452-8_26.

[5] A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, "Matterport3D: Learning from RGB-D Data in Indoor Environments," Proc. 3DV, 2017, doi: 10.1109/3DV.2017.00081.

[6] S. Cruz, W. Hutchcroft, Y. Li, N. Khosravan, I. Boyadzhiev, and S. B. Kang, "Zillow indoor dataset: Annotated floor plans with 360° panoramas and 3D room layouts," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.00217.

[7] Stanford-University, "BuildingParser Dataset," Public dataset, 2017, url: https://sdss.redivis.com/datasets/9q3m-9w5pa1a2h.

[8] M. Rey-Area, M. Yuan, and C. Richardt, "360MonoDepth Matterport3D," Public dataset, 2022, url: https://researchdata.bath.ac.uk/1126/.

[9] M. Rey-Area, M. Yuan, and C. Richardt, "360MonoDepth Replica," Public dataset, 2022, url: https://manurare.github.io/360monodepth/.

[10] I. Armeni, Z. Y. He, J. Gwak, A. R. Zamir, M. Fischer, J. Malik, and S. Savarese, "The 3DSceneGraph Dataset," Public dataset, 2019, url: https://github.com/StanfordVL/3DSceneGraph/.

[11] Z. Li, L. Wang, X. Huang, C. Pan, and J. Yang, "FutureHouse Dataset," Public dataset, 2022, url: https://github.com/LZleejean/FutureHouse.

[12] F. Xia, A. R. Zamir, Z. He, A. Sax, J. Malik, and S. Savarese, "GibsonV2," Public dataset, 2018, url: https://github.com/StanfordVL/GibsonEnv/tree/master/gibson/data.

[13] A. A. Sánchez Alcázar, G. Pintore, and M. Sgrenzaroli, "Indoor3Dmapping dataset," Public dataset, 2022, url: https://doi.org/10.5281/zenodo.6367381.

[14] Matterport, "Matterport3D," Public dataset, 2017, url: https://github.com/niessner/Matterport.

[15] C. Zou, J. W. Su, C. H. Peng, A. Colburn, Q. Shan, P. Wonka, H. K. Chu, and D. Hoiem, "MatterportLayout," Public dataset, 2017, url: https://github.com/ericsujw/Matterport3DLayoutAnnotation.

[16] G. Albanis, N. Zioulis, P. Drakoulis, V. Gkitsas, V. Sterzentsenko, F. Alvarez, D. Zarpalas, and P. Daras, "Pano3D," Public dataset, 2021, url: https://vcl3d.github.io/Pano3D/download/.

[17] Y. Zhang, S. Song, P. Tan, and J. Xiao, "PanoContext dataset," Public dataset, 2014, url: https://panocontext.cs.princeton.edu/.

[18] M. Khan, J. Abu-Khalaf, and D. Suter, "PanoSCU Dataset," Public dataset, 2025, url: https://doi.org/10.25958/4p6j-z657.

[19] J. Xu, J. Zheng, Y. Xu, R. Tang, and S. Gao, "PNVS Dataset," Public dataset, 2021, url: https://github.com/bluestyle97/PNVS.

[20] J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Verma, A. Clarkson, M. Yan, B. Budge, Y. Yan, X. Pan, J. Yon, Y. Zou, K. Leon, N. Carter, J. Briales, T. Gillingham, E. Mueggler, L. Pesqueira, M. Savva, D. Batra, H. M. Strasdat, R. D. Nardi, M. Goesele, S. Lovegrove, and R. Newcombe, "The Replica Dataset," Public dataset, 2019, url: https://github.com/facebookresearch/Replica-Dataset.

[21] ShanghaiTech-University and Kujiale-com, "Shanghaitech-Kujiale Indoor 360° (SKI360) dataset," Public dataset, 2020, url: https://svip-lab.github.io/dataset/indoor_360.html.

[22] J. Zheng, J. Zhang, J. Li, R. Tang, S. Gao, and Z. Zhou, "Indoor3Dmapping dataset," Public dataset, 2020, url: https://structured3d-dataset.org/.

[23] S. Cruz, W. Hutchcroft, Y. Li, N. Khosravan, I. Boyadzhiev, and S. B. Kang, "Zillow Indoor Dataset (ZInD)," Public dataset, 2021, url: https://github.com/zillow/zind.

[24] T. Deb, L. Wang, Z. Bessinger, N. Khosravan, E. Penner, and S. B. Kang, "ZInD-Tell Dataset," Public dataset, 2024, url: https://github.com/zillow/zindtell.

[25] T. Deb, L. Wang, Z. Bessinger, N. Khosravan, E. Penner, and S. B. Kang, "ZInD-Tell: Towards Translating Indoor Panoramas into Descriptions," Proc. CVPR Workshops, 2024, doi: 10.1109/CVPRW63382.2024.00210.

[26] L. Jin, Y. Xu, J. Zheng, J. Zhang, R. Tang, S. Xu, J. Yu, and S. Gao, "Geometric Structure Based and Regularized Depth Estimation from 360 Indoor Imagery," Proc. CVPR, 2020, doi: 10.1109/CVPR42600.2020.00097.

[27] M. Khan, Y. Qiu, Y. Cong, J. Abu-Khalaf, D. Suter, and B. Rosenhahn, "PanoSCU: A Simulation-Based Dataset for Panoramic Indoor Scene Understanding," IEEE Access, 2025, doi: 10.1109/ACCESS.2025.3561055.

[28] E. Kolve, R. Mottaghi, W. Han, E. VanderBilt, L. Weihs, A. Herrasti, M. Deitke, K. Ehsani, D. Gordon, Y. Zhu, A. Kembhavi, A. Gupta, and A. Farhadi, "AI2-THOR: An interactive 3D environment for visual AI," arXiv preprint arXiv:1712.05474, 2017, doi: 10.48550/arXiv.1712.05474.

[29] Z. Li, L. Wang, X. Huang, C. Pan, and J. Yang, "PhyIR: Physics-based Inverse Rendering for Panoramic Indoor Images," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.01238.

[30] T. Li, Z. Zhang, Y. Wang, Y. Cui, Y. Li, D. Zhou, B. Yin, and X. Yang, "Self-supervised indoor scene point cloud completion from a single panorama," The Visual Computer, 2025, doi: 10.1007/s00371-024-03509-w.

[31] P. Moulon, P. Monasse, R. Perrot, and R. Marlet, "OpenMVG: Open multiple view geometry," International Workshop on Reproducible Research in Pattern Recognition, 2016, doi: 10.1007/978-3-319-56414-2_5.

[32] A. G. Patil, S. G. Patil, M. Li, M. Fisher, M. Savva, and H. Zhang, "Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes," Computer Graphics Forum, 2024, doi: 10.1111/cgf.14927.

[33] G. Pintore, M. Agus, and E. Gobbetti, "AtlantaNet: Inferring the 3D Indoor Layout from a Single 360 Image Beyond the Manhattan World Assumption," Proc. ECCV, 2020, doi: 10.1007/978-3-030-58598-3_26.

[34] G. Pintore, C. Mura, F. Ganovelli, L. Fuentes-Perez, R. Pajarola, and E. Gobbetti, "State-of-the-art in Automatic 3D Reconstruction of Structured Indoor Environments," Comput. Graph. Forum, 2020, doi: 10.1111/cgf.14021.

[35] G. Pintore, M. Agus, E. Almansa, J. Schneider, and E. Gobbetti, "SliceNet: deep dense depth estimation from a single indoor panorama using a slice-based representation," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.01137.

[36] M. Rey-Area, M. Yuan, and C. Richardt, "360MonoDepth: High-Resolution 360° Monocular Depth Estimation," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.00374.

[37] M. A. Shabani, W. Song, M. Odamaki, H. Fujiki, and Y. Furukawa, "Extreme structure from motion for indoor panoramas without visual overlaps," Proc. ICCV, 2021, doi: 10.1109/ICCV48922.2021.00565.

[38] J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Verma, A. Clarkson, M. Yan, B. Budge, Y. Yan, X. Pan, J. Yon, Y. Zou, K. Leon, N. Carter, J. Briales, T. Gillingham, E. Mueggler, L. Pesqueira, M. Savva, D. Batra, H. M. Strasdat, R. D. Nardi, M. Goesele, S. Lovegrove, and R. Newcombe, "The Replica Dataset: A Digital Replica of Indoor Spaces," ArXiv e-print arXiv:1906.05797, 2019, doi: 10.48550/arXiv.1906.05797.

[39] F. Xia, A. R. Zamir, Z. He, A. Sax, J. Malik, and S. Savarese, "Gibson Env: real-world perception for embodied agents," Proc. CVPR, 2018, doi: 10.1109/CVPR.2018.00945.

[40] J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba, "Recognizing scene viewpoint using panoramic place representation," Proc. CVPR, 2012, doi: 10.1109/CVPR.2012.6247991.

[41] J. Xu, J. Zheng, Y. Xu, R. Tang, and S. Gao, "Layout-guided novel view synthesis from a single indoor panorama," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.01617.

[42] Y. Zhang, S. Song, P. Tan, and J. Xiao, "PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding," Proc. ECCV, 2014, doi: 10.1007/978-3-319-10599-4_43.

[43] J. Zheng, J. Zhang, J. Li, R. Tang, S. Gao, and Z. Zhou, "Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling," Proc. ECCV, 2020, doi: 10.1007/978-3-030-58545-7_30.

[44] C. Zhuang, Z. Lu, Y. Wang, J. Xiao, and Y. Wang, "SPDET: Edge-aware self-supervised panoramic depth estimation transformer with spherical geometry," IEEE TPAMI, 2023, doi: 10.1109/TPAMI.2023.3272949.

[45] N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras, "OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas," Proc. ECCV, 2018, doi: 10.1007/978-3-030-01231-1_28.

[46] C. Zou, J. W. Su, C. H. Peng, A. Colburn, Q. Shan, P. Wonka, H. K. Chu, and D. Hoiem, "Manhattan Room Layout Reconstruction from a Single 360 Image: A Comparative Study of State-of-the-Art Methods," International Journal of Computer Vision, 2021, doi: 10.1007/s11263-020-01426-8.

↑ Back to Top

Automatic geometric and structural modeling

These methods infer geometric, structural, and visual properties of indoor scenes from a single omnidirectional panorama (or approximately one per room in multi-room settings). Approaches range from pixel-level augmentation (e.g., depth, normals) to higher-level structural abstractions of room layouts and objects, and extend to sparse multi-room reasoning that recovers layouts and inter-room connectivity for navigation and exploration.

↑ Back to Top

Depth and other pixel-wise information inference

References:

[1] H. Ai, Z. Cao, Y. P. Cao, Y. Shan, and L. Wang, "HRDFuse: Monocular 360° Depth Estimation by Collaboratively Learning Holistic-With-Regional Depth Distributions," Proc. CVPR, 2023, doi: 10.1109/CVPR52729.2023.01275.

[2] H. Ai and L. Wang, "Elite360D: Towards Efficient 360 Depth Estimation via Semantic-and Distance-Aware Bi-Projection Fusion," Proc. CVPR, 2024, doi: 10.1109/CVPR52733.2024.00947.

[3] Y. Cao, Z. Wu, and C. Shen, "Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks," IEEE TCSVT, 2018, doi: 10.1109/TCSVT.2017.2740321.

[4] Z. Cao, J. Zhu, W. Zhang, H. Ai, H. Bai, H. Zhao, and L. Wang, "PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation," Proc. CVPR, 2025, doi: 10.1109/CVPR52734.2025.00100.

[5] H. Cheng, C. Chao, J. Dong, H. Wen, T. Liu, and M. Sun, "Cube Padding for Weakly-Supervised Saliency Prediction in 360 Videos," Proc. CVPR, 2018, doi: 10.1109/CVPR.2018.00154.

[6] D. Eigen, C. Puhrsch, and R. Fergus, "Depth Map Prediction from a Single Image using a Multi-Scale Deep Network," Proc. NeurIPS, 2014, url: https://dl.acm.org/doi/abs/10.5555/2969033.2969091.

[7] D. Eigen and R. Fergus, "Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture," Proc. ICCV, 2015, doi: 10.1109/ICCV.2015.304.

[8] H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, "Deep Ordinal Regression Network for Monocular Depth Estimation," Proc. CVPR, 2018, doi: 10.1109/CVPR.2018.00214.

[9] G. Payen de La Garanderie, A. Atapour Abarghouei, and T. P. Breckon, "Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360 Panoramic Imagery," Proc. ECCV, 2018, doi: 10.1007/978-3-030-01261-8_48.

[10] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," Proc. CVPR, 2016, doi: 10.1109/CVPR.2016.90.

[11] C. Huang, F. Shao, H. Chen, B. Mu, and L. Xu, "GADFNet: Geometric Priors Assisted Dual-Projection Fusion Network for Monocular Panoramic Depth Estimation," IEEE Trans. Circuits and Systems for Video Technology, 2025, doi: 10.1109/TCSVT.2025.3553472.

[12] H. Jiang, Z. Sheng, S. Zhu, Z. Dong, and R. Huang, "Unifuse: Unidirectional fusion for 360 panorama depth estimation," IEEE Robotics and Automation Letters, 2021, doi: 10.1109/LRA.2021.3058957.

[13] H. Jiang, Z. Song, Z. Lou, R. Xu, and M. Tan, "Depth Anything in 360°: Towards Scale Invariance in the Wild," arXiv preprint arXiv:2512.22819, 2025, doi: 10.48550/arXiv.2512.22819.

[14] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, "Deeper Depth Prediction with Fully Convolutional Residual Networks," Proc. 3DV, 2016, doi: 10.1109/3DV.2016.32.

[15] S. Lambert-Lacroix and L. Zwald, "The adaptive BerHu penalty in robust regression," Journal of Nonparametric Statistics, 2016, doi: 10.1080/10485252.2016.1190359.

[16] J. Lee, M. Heo, K. Kim, and C. Kim, "Single-Image Depth Estimation Based on Fourier Domain Analysis," Proc. CVPR, 2018, doi: 10.1109/CVPR.2018.00042.

[17] J. Lee, H. Park, B. U. Lee, and K. Joo, "HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics," Proc. CVPR, 2025, doi: 10.1109/CVPR52734.2025.01547.

[18] Y. Li, Y. Guo, Z. Yan, X. Huang, Y. Duan, and L. Ren, "Omnifusion: 360 monocular depth estimation via geometry-aware fusion," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.00282.

[19] F. Liu, C. Shen, and G. Lin, "Deep convolutional neural fields for depth estimation from a single image," Proc. CVPR, 2015, doi: 10.1109/CVPR.2015.7299152.

[20] G. Pintore, M. Agus, E. Almansa, J. Schneider, and E. Gobbetti, "SliceNet: deep dense depth estimation from a single indoor panorama using a slice-based representation," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.01137.

[21] G. Pintore, M. Agus, A. Signoroni, and E. Gobbetti, "DDD: Deep indoor panoramic Depth estimation with Density maps consistency," STAG: Smart Tools and Applications in Graphics, 2024, doi: 10.2312/stag.20241336.

[22] G. Pintore, M. Agus, A. Signoroni, and E. Gobbetti, "DDD++: Exploiting Density map consistency for Deep Depth estimation in indoor environments," Graphical Models, 2025, doi: 10.1016/j.gmod.2025.101281.

[23] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, " Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer ," IEEE Transactions on Pattern Analysis & Machine Intelligence, 2022, doi: 10.1109/TPAMI.2020.3019967.

[24] M. Rey-Area, M. Yuan, and C. Richardt, "360MonoDepth: High-Resolution 360° Monocular Depth Estimation," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.00374.

[25] A. Saxena, M. Sun, and A. Y. Ng, "Make3D: Learning 3D Scene Structure from a Single Still Image," IEEE TPAMI, 2009, doi: 10.1109/TPAMI.2008.132.

[26] Z. Shen, C. Lin, K. Liao, L. Nie, Z. Zheng, and Y. Zhao, "PanoFormer: Panorama Transformer for Indoor 360 Depth Estimation," Proc. ECCV, 2022, doi: 10.1007/978-3-031-19769-7_12.

[27] Z. Shen, Z. Zheng, C. Lin, L. Nie, K. Liao, S. Zheng, and Y. Zhao, "Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness," Proc. CVPR, 2023, doi: 10.1109/CVPR52729.2023.01663.

[28] Y. C. Su and K. Grauman, "Learning Spherical Convolution for Fast Features from 360 Imagery," Proc. NeurIPS, 2017, url: https://dl.acm.org/doi/abs/10.5555/3294771.3294822.

[29] Y. Su and K. Grauman, "Kernel Transformer Networks for Compact Spherical Convolution," Proc. CVPR, 2019, doi: 10.1109/CVPR.2019.00967.

[30] C. Sun, M. Sun, and H. T. Chen, "HoHoNet: 360° Indoor Holistic Understanding with Latent Horizontal Features," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.00260.

[31] K. Tateno, N. Navab, and F. Tombari, "Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images," Proc. ECCV, 2018, doi: 10.1007/978-3-030-01270-0_43.

[32] P. Wang, X. Shen, Z. Lin, S. Cohen, B. Price, and A. Yuille, "Towards unified depth and semantic prediction from a single image," Proc. CVPR, 2015, doi: 10.1109/CVPR.2015.7298897.

[33] F. E. Wang, H. N. Hu, H. T. Cheng, J. T. Lin, S. T. Yang, M. L. Shih, H. K. Chu, and M. Sun, "Self-supervised learning of depth and camera motion from 360 videos," Proc. ACCV, 2018, doi: 10.1007/978-3-030-20873-8_4.

[34] F. E. Wang, Y. H. Yeh, M. Sun, W. C. Chiu, and Y. H. Tsai, "BiFuse: Monocular 360 Depth Estimation via Bi-Projection Fusion," Proc. CVPR, 2020, doi: 10.1109/CVPR42600.2020.00054.

[35] F. E. Wang, Y. H. Yeh, Y. H. Tsai, W. C. Chiu, and M. Sun, "Bifuse++: Self-supervised and efficient bi-projection fusion for 360 depth estimation," IEEE TPAMI, 2022, doi: 10.1109/TPAMI.2022.3203516.

[36] N. H. A. Wang and Y. L. Liu, "Depth anywhere: Enhancing 360 monocular depth estimation via perspective distillation and unlabeled data augmentation," Proc. NeurIPS, 2024, url: https://dl.acm.org/doi/10.5555/3737916.3741972.

[37] X. Wang, Z. He, Q. Zhang, Y. Yang, T. Zhao, and J. Jiang, "Geometry-Aware Self-Supervised Indoor 360° Depth Estimation via Asymmetric Dual-Domain Collaborative Learning," IEEE Trans. Multimedia, 2025, doi: 10.1109/TMM.2025.3535340.

[38] D. Xu, W. Wang, H. Tang, H. Liu, N. Sebe, and E. Ricci, "Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation," Proc. CVPR, 2018, doi: 10.1109/CVPR.2018.00412.

[39] Q. Yan, Q. Wang, K. Zhao, J. Chen, B. Li, X. Chu, and F. Deng, "SphereFusion: Efficient Panorama Depth Estimation via Gated Fusion," Proc. 3DV, 2025, doi: 10.1109/3DV66043.2025.00084.

[40] L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, "Depth Anything V2," Proc. NeurIPS, 2024, doi: 10.52202/079017-0688.

[41] Y. Zhang, S. Song, P. Tan, and J. Xiao, "PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding," Proc. ECCV, 2014, doi: 10.1007/978-3-319-10599-4_43.

[42] N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras, "OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas," Proc. ECCV, 2018, doi: 10.1007/978-3-030-01231-1_28.

[43] N. Zioulis, A. Karakottas, D. Zarpalas, F. Alvarez, and P. Daras, "Spherical View Synthesis for Self-Supervised 360° Depth Estimation," Proc. 3DV, 2019, doi: 10.1109/3DV.2019.00081.

↑ Back to Top

Single-image room layout estimation

References:

[1] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, "ShapeNet: An Information-Rich 3D Model Repository," arXiv preprint arXiv:1512.03012, 2015, doi: 10.48550/arXiv.1512.03012.

[2] Z. Chen, V. Badrinarayanan, C. Y. Lee, and A. Rabinovich, "GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks," Proc. ICML, 2018, doi: 10.48550/arXiv.1711.02257.

[3] X. Chen, H. Zhao, G. Zhou, and Y. Q. Zhang, "PQ-Transformer: Jointly Parsing 3D Objects and Layouts from Point Clouds," Proc. CVPR, 2022, doi: 10.1109/LRA.2022.3143224.

[4] Y. Dai, N. Fei, and Z. Lu, "Improvable gap balancing for multi-task learning," Uncertainty in Artificial Intelligence, 2023, doi: 10.48550/arXiv.2307.15429.

[5] M. Rey-Area, M. Yuan, and C. Richardt, "360MonoDepth Replica," Public dataset, 2022, url: https://manurare.github.io/360monodepth/.

[6] M. Dong, L. Huan, H. Xiong, S. Shen, and X. Zheng, "Shape anchor guided holistic indoor scene understanding," Proc. ICCV, 2023, doi: 10.1109/ICCV51070.2023.02003.

[7] Y. Dong, C. Fang, L. Bo, Z. Dong, and P. Tan, "PanoContext-Former: Panoramic total scene understanding with a transformer," Proc. CVPR, 2024, doi: 10.1109/CVPR52733.2024.02653.

[8] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020, doi: 10.48550/arXiv.2010.11929.

[9] V. Gkitsas, V. Sterzentsenko, N. Zioulis, G. Albanis, and D. Zarpalas, "PanoDR: Spherical Panorama Diminished Reality for Indoor Scenes," Proc. CVPR Workshops, 2021, doi: 10.1109/CVPRW53098.2021.00412.

[10] V. Hedau, D. Hoiem, and D. Forsyth, "Recovering the spatial layout of cluttered rooms," Proc. ICCV, 2009, doi: 10.1109/ICCV.2009.5459411.

[11] D. Hoiem, A. A. Efros, and M. Hebert, "Recovering Surface Layout from an Image," International Journal of Computer Vision, 2007, doi: 10.1007/s11263-006-0031-y.

[12] R. Hu and A. Singh, "Uniting Vision-and-Language Tasks via Text Generation," Proc. ICML, 2021, doi: 10.48550/arXiv.2102.02779.

[13] S. Huang, S. Qi, and Y. Zhu, "Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation," Proc. NeurIPS, 2018, url: https://dl.acm.org/doi/abs/10.5555/3326943.3326963.

[14] S. Ikehata, H. Yang, and Y. Furukawa, "Structured Indoor Modeling," Proc. ICCV, 2015, doi: 10.1109/ICCV.2015.156.

[15] Z. Jiang, Z. Xiang, J. Xu, and M. Zhao, "LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.00170.

[16] L. Jin, Y. Xu, J. Zheng, J. Zhang, R. Tang, S. Xu, J. Yu, and S. Gao, "Geometric Structure Based and Regularized Depth Estimation from 360 Indoor Imagery," Proc. CVPR, 2020, doi: 10.1109/CVPR42600.2020.00097.

[17] A. Kendall, Y. Gal, and R. Cipolla, "Multi-task learning using uncertainty to weigh losses for scene geometry and semantics," Proc. CVPR, 2018, doi: 10.1109/CVPR.2018.00781.

[18] J. Lambert, Y. Li, I. Boyadzhiev, L. Wixson, M. Narayana, W. Hutchcroft, J. Hays, F. Dellaert, and S. B. Kang, "Salve: Semantic alignment verification for floorplan reconstruction from sparse panoramas," Proc. ECCV, 2022, doi: 10.1007/978-3-031-19821-2_37.

[19] D. C. Lee, M. Hebert, and T. Kanade, "Geometric reasoning for single image structure recovery," Proc. CVPR, 2009, doi: 10.1109/CVPR.2009.5206872.

[20] J. H. Lee, C. Lee, and C. S. Kim, "Learning multiple pixelwise tasks based on loss scale balancing," Proc. ICCV, 2021.

[21] J. Lee, H. Park, B. U. Lee, and K. Joo, "HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics," Proc. CVPR, 2025, doi: 10.1109/CVPR52734.2025.01547.

[22] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, "BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension," arXiv preprint arXiv:1910.13461, 2019, doi: 10.48550/arXiv.1910.13461.

[23] Y. Li, I. Boyadzhiev, Z. Liu, L. Shapiro, and A. Colburn, "BADGR: Bundle Adjustment Diffusion Conditioned by Gradients for Wide-Baseline Floor Plan Reconstruction," Proc. CVPR, 2025, doi: 10.1109/CVPR52734.2025.01564.

[24] B. Liu, X. Liu, X. Jin, P. Stone, and Q. Liu, "Conflict-averse gradient descent for multi-task learning," Proc. NeurIPS, 2021, doi: 10.48550/arXiv.2110.14048.

[25] Z. Liu, Z. Zhang, Y. Cao, H. Hu, X. Tong, B. Dai, and S. Lin, "Group-Free 3D Object Detection via Transformers," Proc. ICCV, 2021, doi: 10.1109/ICCV48922.2021.00294.

[26] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows," Proc. ICCV, 2021, doi: 10.1109/ICCV48922.2021.00986.

[27] H. Liu, Y. Zheng, G. Chen, S. Cui, and X. Han, "Towards High-Fidelity Single-View Holistic Reconstruction of Indoor Scenes," Proc. ECCV, 2022, doi: 10.1007/978-3-031-19769-7_25.

[28] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, "Occupancy Networks: Learning 3D Reconstruction in Function Space," Proc. CVPR, 2019, doi: 10.1109/CVPR.2019.00459.

[29] Y. Nie, J. Hou, X. Han, and M. Nießner, "Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image," Proc. CVPR, 2020, doi: 10.1109/CVPR42600.2020.00013.

[30] G. Pintore, M. Agus, and E. Gobbetti, "AtlantaNet: Inferring the 3D Indoor Layout from a Single 360 Image Beyond the Manhattan World Assumption," Proc. ECCV, 2020, doi: 10.1007/978-3-030-58598-3_26.

[31] G. Pintore, C. Mura, F. Ganovelli, L. Fuentes-Perez, R. Pajarola, and E. Gobbetti, "State-of-the-art in Automatic 3D Reconstruction of Structured Indoor Environments," Comput. Graph. Forum, 2020, doi: 10.1111/cgf.14021.

[32] G. Pintore, E. Almansa, M. Agus, and E. Gobbetti, "Deep3DLayout: 3D Reconstruction of an Indoor Layout from a Spherical Panoramic Image," ACM TOG, 2021, doi: 10.1145/3478513.3480480.

[33] G. Pintore, M. Agus, E. Almansa, and E. Gobbetti, "Instant Automatic Emptying of Panoramic Indoor Scenes," IEEE TVCG, 2022, doi: 10.1109/TVCG.2022.3202999.

[34] G. Pintore, U. Shah, M. Agus, and E. Gobbetti, "NadirFloorNet: reconstructing multi-room floorplans from a small set of registered panoramic images," Proc. CVPR, 2025, doi: 10.1109/CVPRW67362.2025.00186.

[35] G. Pintore, S. Jashari, M. Agus, and E. Gobbetti, "PanoFloor: Reconstruction and Immersive Exploration of Large Multi-Room Scenes from a Minimal Set of Registered Panoramic Images Using Denoised Density Maps," Proc. ISMAR, 2025, doi: 10.1109/ISMAR67309.2025.00052.

[36] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, "Frustum PointNets for 3D Object Detection from RGB-D Data," Proc. CVPR, 2018, doi: 10.1109/CVPR.2018.00102.

[37] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, "Learning Transferable Visual Models From Natural Language Supervision," Proc. ICML, 2021, url: https://icml.cc/virtual/2021/oral/9194.

[38] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, "Exploring the limits of transfer learning with a unified text-to-text transformer," J. Mach. Learn. Res., 2020.

[39] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, doi: 10.1109/TPAMI.2016.2577031.

[40] J. Schult, F. Engelmann, T. Kontogianni, and B. Leibe, "Mask3D: Mask Transformer for 3D Semantic Instance Segmentation," Proc. ICCV, 2023, doi: 10.1109/ICRA48891.2023.10160590.

[41] O. Sener and V. Koltun, "Multi-task learning as multi-objective optimization," Proc. NeurIPS, 2018, doi: 10.48550/arXiv.1810.04650.

[42] M. A. Shabani, W. Song, M. Odamaki, H. Fujiki, and Y. Furukawa, "Extreme structure from motion for indoor panoramas without visual overlaps," Proc. ICCV, 2021, doi: 10.1109/ICCV48922.2021.00565.

[43] Z. Shen, Z. Zheng, C. Lin, L. Nie, K. Liao, S. Zheng, and Y. Zhao, "Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness," Proc. CVPR, 2023, doi: 10.1109/CVPR52729.2023.01663.

[44] V. Sitzmann, J. Martel, A. Bergman, D. Lindell, and G. Wetzstein, "Implicit Neural Representations with Periodic Activation Functions," Proc. NeurIPS, 2020, doi: 10.48550/arXiv.2006.09661.

[45] S. Song and J. Xiao, "Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images," Proc. CVPR, 2016, doi: 10.1109/CVPR.2016.94.

[46] C. Sun, C. W. Hsiao, M. Sun, and H. T. Chen, "HorizonNet: Learning room layout with 1D representation and pano stretch data augmentation," Proc. CVPR, 2019, doi: 10.1109/CVPR.2019.00114.

[47] C. Sun, W. E. Tai, Y. L. Shih, K. W. Chen, Y. J. Syu, Y. C. F. Wang, and H. T. Chen, "Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360° Room Layout Reconstruction," Proc. CVPR, 2024, doi: 10.1109/CVPR52733.2024.00993.

[48] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, "Training Data-efficient Image Transformers & Distillation through Attention," Proc. ICML, 2021, url: https://proceedings.mlr.press/v139/touvron21a.

[49] P. V. Tran, "SSLayout360: Semi-supervised indoor layout estimation from 360deg panorama," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.01510.

[50] Y. J. Tsai, J. C. Jhang, J. Zheng, W. Wang, A. Y. Chen, M. Sun, C. H. Kuo, and M. H. Yang, "No more ambiguity in 360° room layout via bi-layout estimation," Proc. CVPR, 2024, doi: 10.1109/CVPR52733.2024.02650.

[51] S. Tulsiani, T. Zhou, A. A. Efros, and J. Malik, "Multi-View Supervision for Single-View Reconstruction via Differentiable Ray Consistency," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, doi: 10.1109/TPAMI.2019.2898859.

[52] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proc. NeurIPS, 2017.

[53] F. E. Wang, Y. H. Yeh, M. Sun, W. C. Chiu, and Y. H. Tsai, "LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.01276.

[54] Y. Wang, T. Ye, L. Cao, W. Huang, F. Sun, F. He, and D. Tao, "Bridged transformer for vision and point cloud 3D object detection," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.01180.

[55] J. Xu, J. Zheng, Y. Xu, R. Tang, and S. Gao, "Layout-guided novel view synthesis from a single indoor panorama," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.01617.

[56] S. T. Yang, F. E. Wang, C. H. Peng, P. Wonka, M. Sun, and H. K. Chu, "DuLa-Net: A Dual-Projection Network for Estimating Room Layouts from a Single RGB Panorama," Proc. CVPR, 2019, doi: 10.1109/CVPR.2019.00348.

[57] T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, "Gradient surgery for multi-task learning," Proc. NeurIPS, 2020, doi: 10.48550/arXiv.2001.06782.

[58] H. Yu, L. He, B. Jian, W. Feng, and S. Liu, "PanelNet: Understanding 360 indoor environment via panel representation," Proc. CVPR, 2023, doi: 10.1109/CVPR52729.2023.00091.

[59] C. Zhang, Z. Cui, C. Chen, S. Liu, B. Zeng, H. Bao, and Y. Zhang, "DeepPanoContext: Panoramic 3D scene understanding with holistic scene context graph and relation-based optimization," Proc. ICCV, 2021, doi: 10.1109/ICCV48922.2021.01240.

[60] Y. Zhang, S. Song, P. Tan, and J. Xiao, "PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding," Proc. ECCV, 2014, doi: 10.1007/978-3-319-10599-4_43.

[61] Y. Zhao, C. Wen, Z. Xue, and Y. Gao, "3D Room Layout Estimation from a Cubemap of Panorama Image via Deep Manhattan Hough Transform," Proc. ECCV, 2022, doi: 10.1007/978-3-031-19769-7_37.

[62] C. Zou, A. Colburn, Q. Shan, and D. Hoiem, "LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image," Proc. CVPR, 2018, doi: 10.1109/CVPR.2018.00219.

[63] C. Zou, J. W. Su, C. H. Peng, A. Colburn, Q. Shan, P. Wonka, H. K. Chu, and D. Hoiem, "3D Manhattan Room Layout Reconstruction from a Single 360 Image," ArXiv e-print arXiv:1910.04099, 2019, doi: 10.48550/arXiv.1910.04099.

[64] C. Zou, J. W. Su, C. H. Peng, A. Colburn, Q. Shan, P. Wonka, H. K. Chu, and D. Hoiem, "Manhattan Room Layout Reconstruction from a Single 360 Image: A Comparative Study of State-of-the-Art Methods," International Journal of Computer Vision, 2021, doi: 10.1007/s11263-020-01426-8.

↑ Back to Top

Positioning and connecting single rooms

References:

[1] R. Cabral and Y. Furukawa, "Piecewise Planar and Compact Floorplan Reconstruction from Images," Proc. CVPR, 2014, doi: 10.1109/CVPR.2014.546.

[2] J. Chen, C. Liu, J. Wu, and Y. Furukawa, "Floor-SP: Inverse CAD for floorplans by sequential room-wise shortest path," Proc. CVPR, 2019, doi: 10.1109/ICCV.2019.00275.

[3] J. Chen, Y. Qian, and Y. Furukawa, "HEAT: Holistic Edge Attention Transformer for Structured Reconstruction," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.00384.

[4] J. Chen, R. Deng, and Y. Furukawa, "Polydiffuse: Polygonal shape reconstruction via guided set diffusion models," NeurIPS, 2024, url: https://dl.acm.org/doi/abs/10.5555/3666122.3666212.

[5] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, "Reconstructing building interiors from images," Proc. ICCV, 2009, doi: 10.1109/ICCV.2009.5459145.

[6] K. He, G. Gkioxari, P. Dolláar, and R. Girshick, "Mask R-CNN," Proc. ICCV, 2017, doi: 10.1109/ICCV.2017.322.

[7] S. Ikehata, H. Yang, and Y. Furukawa, "Structured Indoor Modeling," Proc. ICCV, 2015, doi: 10.1109/ICCV.2015.156.

[8] J. Lambert, Y. Li, I. Boyadzhiev, L. Wixson, M. Narayana, W. Hutchcroft, J. Hays, F. Dellaert, and S. B. Kang, "Salve: Semantic alignment verification for floorplan reconstruction from sparse panoramas," Proc. ECCV, 2022, doi: 10.1007/978-3-031-19821-2_37.

[9] Y. Li, I. Boyadzhiev, Z. Liu, L. Shapiro, and A. Colburn, "BADGR: Bundle Adjustment Diffusion Conditioned by Gradients for Wide-Baseline Floor Plan Reconstruction," Proc. CVPR, 2025, doi: 10.1109/CVPR52734.2025.01564.

[10] C. Liu, J. Wu, and Y. Furukawa, "FloorNet: A unified framework for floorplan reconstruction from 3D scans," Proc. ECCV, 2018, doi: 10.1007/978-3-030-01231-1_13.

[11] A. Monszpart, N. Mellado, G. J. Brostow, and N. J. Mitra, "RAPter: rebuilding man-made scenes with regular arrangements of planes.," ACM TOG, 2015, doi: 10.1145/2766995.

[12] N. Nauata and Y. Furukawa, "Vectorizing world buildings: Planar graph reconstruction by primitive detection and relationship inference," Proc. ECCV, 2020, doi: 10.1007/978-3-030-58598-3_42.

[13] G. Pintore, F. Ganovelli, A. J. Villanueva, and E. Gobbetti, "Automatic modeling of cluttered floorplans from panoramic images," Computer Graphics Forum, 2019.

[14] G. Pintore, U. Shah, M. Agus, and E. Gobbetti, "NadirFloorNet: reconstructing multi-room floorplans from a small set of registered panoramic images," Proc. CVPR, 2025, doi: 10.1109/CVPRW67362.2025.00186.

[15] G. Pintore, S. Jashari, M. Agus, and E. Gobbetti, "PanoFloor: Reconstruction and Immersive Exploration of Large Multi-Room Scenes from a Minimal Set of Registered Panoramic Images Using Denoised Density Maps," Proc. ISMAR, 2025, doi: 10.1109/ISMAR67309.2025.00052.

[16] G. Pu, Y. Zhao, and Z. Lian, "Pano2Room: Novel View Synthesis from a Single Indoor Panorama," Proc. SIGGRAPH Asia Conference Papers, 2024, doi: 10.1145/3680528.3687616.

[17] M. A. Shabani, W. Song, M. Odamaki, H. Fujiki, and Y. Furukawa, "Extreme structure from motion for indoor panoramas without visual overlaps," Proc. ICCV, 2021, doi: 10.1109/ICCV48922.2021.00565.

[18] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from rgbd images," Proc. ECCV, 2012, doi: 10.1007/978-3-642-33715-4_54.

[19] S. Stekovic, M. Rad, F. Fraundorfer, and V. Lepetit, "Montefloor: Extending MCTS for reconstructing accurate large-scale floor plans," Proc. CVPR, 2021, doi: 10.1109/ICCV48922.2021.01573.

[20] J. W. Su, K. Y. Tung, C. H. Peng, P. Wonka, and H. K. Chu, "SLIBO-Net: Floorplan Reconstruction via Slicing Box Representation with Local Geometry Regularization," Proc. NeurIPS, 2023, url: https://dl.acm.org/doi/abs/10.5555/3666122.3668240.

[21] Y. Yue, T. Kontogianni, K. Schindler, and F. Engelmann, "Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries," Proc. CVPR, 2023, doi: 10.1109/CVPR52729.2023.00088.

↑ Back to Top

Novel view synthesis and immersive model exploration

While geometric and structural descriptions provide a strong foundation, they are not enough for interactive use. We review techniques for exploration from a single enriched panorama, including methods that recover stereo cues and motion parallax and support advanced navigation in sparsely-sampled multi-room environments, and approaches that go beyond plain navigation.

↑ Back to Top

View synthesis from a single panorama

References:

[1] H. AlZayer, H. Lin, and K. Bala, "Autophoto: Aesthetic photo capture using reinforcement learning," Proc. IROS, 2021, doi: 10.1109/IROS51168.2021.9636788.

[2] B. Attal, S. Ling, A. Gokaslan, C. Richardt, and J. Tompkin, "MatryODShka: Real-time 6DoF video view synthesis using multi-sphere images," Proc. ECCV, 2020, doi: 10.1007/978-3-030-58452-8_26.

[3] M. T. Bagdasarian, P. Knoll, Y. Li, F. Barthel, A. Hilsmann, P. Eisert, and W. Morgenstern, "3DGS.zip: A survey on 3D gaussian splatting compression methods," Computer Graphics Forum, 2025, doi: 10.1111/cgf.70078.

[4] J. Bai, L. Huang, J. Guo, W. Gong, Y. Li, and Y. Guo, "360-GS: Layout-Guided Panoramic Gaussian Splatting for Indoor Roaming," Proc. 3DV, 2025, doi: 10.1109/3DV66043.2025.00101.

[5] T. Bertel, N. D. Campbell, and C. Richardt, "MegaParallax: Casual 360° panoramas with motion parallax," IEEE TVCG, 2019, doi: 10.1109/TVCG.2019.2898799.

[6] T. Bertel, M. Yuan, R. Lindroos, and C. Richardt, "OmniPhotos: Casual 360° VR Photography," ACM TOG, 2020, doi: 10.1145/3414685.3417770.

[7] S. Boorboor, Y. Kim, P. Hu, J. M. Moses, B. A. Colle, and A. E. Kaufman, "Submerse: Visualizing storm surge flooding simulations in immersive display ecologies," IEEE TVCG, 2023, doi: 10.1109/TVCG.2023.3332511.

[8] P. Bourke, "Capturing omni-directional stereoscopic spherical projections with a single camera," Proc. IEEE VSMM, 2010, doi: 10.1109/VSMM.2010.5665988.

[9] M. Broxton, J. Flynn, R. Overbeck, D. Erickson, P. Hedman, M. DuVall, J. Dourgarian, J. Busch, M. Whalen, and P. Debevec, "Immersive Light Field Video with a Layered Mesh Representation," ACM TOG, 2020, doi: 10.1145/3386569.3392485.

[10] S. Cai, A. Obukhov, D. Dai, and L. Van Gool, "Pix2nerf: Unsupervised conditional P-GAN for single image to neural radiance fields translation," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.00395.

[11] D. Charatan, S. L. Li, A. Tagliasacchi, and V. Sitzmann, "PixelSplat: 3D Gaussian Splats from image pairs for scalable generalizable 3D reconstruction," Proc. CVPR, 2024, doi: 10.1109/CVPR52733.2024.01840.

[12] Z. Chen, Y. P. Cao, Y. C. Guo, C. Wang, Y. Shan, and S. H. Zhang, "PanoGRF: generalizable spherical radiance fields for wide-baseline panoramas," Proc. NeurIPS, 2023, doi: 10.48550/arXiv.2306.01531.

[13] J. Xu, J. Zheng, Y. Xu, R. Tang, and S. Gao, "PNVS Dataset," Public dataset, 2021, url: https://github.com/bluestyle97/PNVS.

[14] C. Deng, C. Jiang, C. R. Qi, X. Yan, Y. Zhou, L. Guibas, and D. Anguelov, "NeRDI: Single-view NeRF synthesis with language-guided diffusion as general image priors," Proc. CVPR, 2023, doi: 10.1109/CVPR52729.2023.01977.

[15] M. Di Benedetto, F. Ganovelli, M. Balsa Rodriguez, A. Jaspe Villanueva, R. Scopigno, and E. Gobbetti, "ExploreMaps: Efficient Construction and Ubiquitous Exploration of Panoramic View Graphs of Complex 3D Environments," Comput. Graph. Forum, 2014, doi: 10.1111/cgf.12334.

[16] Y. Dong, C. Fang, L. Bo, Z. Dong, and P. Tan, "PanoContext-Former: Panoramic total scene understanding with a transformer," Proc. CVPR, 2024, doi: 10.1109/CVPR52733.2024.02653.

[17] M. Feixas, M. Sbert, and F. González, "A unified information-theoretic framework for viewpoint selection and mesh saliency," ACM Trans. Appl. Percept., 2009, doi: 10.1145/1462055.1462056.

[18] A. Gokaslan, A. F. Cooper, J. Collins, L. Seguin, A. Jacobson, M. Patel, J. Frankle, C. Stephenson, and V. Kuleshov, "CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images," Proc. CVPR, 2024, doi: 10.1109/CVPR52733.2024.00788.

[19] J. Gu, A. Trevithick, K. E. Lin, J. M. Susskind, C. Theobalt, L. Liu, and R. Ramamoorthi, "NeRFDiff: Single-image view synthesis with NeRF-guided distillation from 3D-aware diffusion," Proc. ICML, 2023, url: https://dl.acm.org/doi/abs/10.5555/3618408.3618881.

[20] P. Hedman, S. Alsisan, R. Szeliski, and J. Kopf, "Casual 3D photography," ACM Transactions on Graphics (TOG), 2017, doi: 10.1145/3130800.3130828.

[21] P. Hedman and J. Kopf, "Instant 3D Photography," ACM TOG., 2018, doi: 10.1145/3197517.3201384.

[22] J. Huang, Z. Chen, D. Ceylan, and H. Jin, "6-DOF VR videos with a single 360-camera," Proc. IEEE VR, 2017, doi: 10.1109/VR.2017.7892229.

[23] H. Huang, Y. Chen, T. Zhang, and S. K. Yeung, "360Roam: Real-Time Indoor Roaming Using Geometry-Aware 360° Radiance Fields," arXiv preprint arXiv:2208.02705, 2022, doi: 10.48550/arXiv.2208.02705.

[24] A. Jain, M. Tancik, and P. Abbeel, "Putting NeRF on a diet: Semantically consistent few-shot view synthesis," Proc. ICCV, 2021, doi: 10.1109/ICCV48922.2021.00583.

[25] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, "3D Gaussian Splatting for Real-Time Radiance Field Rendering," ACM TOG, 2023, doi: 10.1145/3592433.

[26] S. Kulkarni, P. Yin, and S. Scherer, "360FusionNeRF: Panoramic Neural Radiance Fields with Joint Guidance," Proc. IROS, 2023, doi: 10.1109/IROS55552.2023.10341346.

[27] D. Li, Y. Zhang, C. Häne, D. Tang, A. Varshney, and R. Du, "OmniSyn: Synthesizing 360 videos with wide-baseline panoramas," Proc. VRW, 2022, doi: 10.48550/arXiv.2202.08752.

[28] K. E. Lin, Z. Xu, B. Mildenhall, P. P. Srinivasan, Y. Hold-Geoffroy, S. DiVerdi, Q. Sun, K. Sunkavalli, and R. Ramamoorthi, "Deep multi depth panoramas for view synthesis," Proc. ECCV, 2020, doi: 10.1007/978-3-030-58601-0_20.

[29] B. Luo, F. Xu, C. Richardt, and J. H. Yong, "Parallax360: Stereoscopic 360° Scene Representation for Head-Motion Parallax," IEEE TVCG, 2018, doi: 10.1109/TVCG.2018.2794071.

[30] T. Marrinan and M. E. Papka, "Real-time omnidirectional stereo rendering: generating 360° surround-view panoramic images for comfortable immersive viewing," IEEE TVCG, 2021, doi: 10.1109/TVCG.2021.3067780.

[31] K. Matzen, M. F. Cohen, B. Evans, J. Kopf, and R. Szeliski, "Low-cost 360 Stereo Photography and Video Capture," ACM TOG, 2017, doi: 10.1145/3072959.3073645.

[32] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, "NeRF: representing scenes as neural radiance fields for view synthesis," Commun. ACM, 2021, doi: 10.1145/3503250.

[33] S. Peleg and M. Ben-Ezra, "Stereo panorama with a single camera," Proc. CVPR, 1999, doi: 10.1109/CVPR.1999.786969.

[34] G. Pintore, F. Bettio, M. Agus, and E. Gobbetti, "Deep scene synthesis of Atlanta-world interiors from a single omnidirectional image," IEEE TVCG, 2023, doi: 10.1109/TVCG.2023.3320219.

[35] G. Pintore, A. J. Villanueva, M. Hadwiget, E. Gobbetti, J. Schneider, and M. Agus, "PanoVerse: automatic generation of stereoscopic environments from single indoor panoramic images for Metaverse applications," Proc. ACM Web3D, 2023, doi: 10.1145/3611314.3615914.

[36] G. Pintore, A. Jaspe-Villanueva, M. Hadwiger, J. Schneider, M. Agus, F. Marton, F. Bettio, and E. Gobbetti, "Deep synthesis and exploration of omnidirectional stereoscopic environments from a single surround-view panoramic image," Computers & Graphics, 2024, doi: 10.1016/j.cag.2024.103907.

[37] G. Pintore, S. Jashari, M. Agus, and E. Gobbetti, "PanoFloor: Reconstruction and Immersive Exploration of Large Multi-Room Scenes from a Minimal Set of Registered Panoramic Images Using Denoised Density Maps," Proc. ISMAR, 2025, doi: 10.1109/ISMAR67309.2025.00052.

[38] G. Pu, Y. Zhao, and Z. Lian, "Pano2Room: Novel View Synthesis from a Single Indoor Panorama," Proc. SIGGRAPH Asia Conference Papers, 2024, doi: 10.1145/3680528.3687616.

[39] P. Rademacher and G. Bishop, "Multiple-center-of-projection images," Proc. SIGGRAPH, 1998, doi: 10.1145/280814.280871.

[40] M. Rey-Area and C. Richardt, "360° 3D Photos from a Single 360° Input Image," IEEE TVCG, 2025, doi: 10.1109/TVCG.2025.3549538.

[41] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image synthesis with latent diffusion models," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.01042.

[42] A. Serrano, I. Kim, Z. Chen, S. DiVerdi, D. Gutierrez, A. Hertzmann, and B. Masia, "Motion parallax for 360 RGBD video," IEEE TVCG, 2019, doi: 10.1109/TVCG.2019.2898757.

[43] K. Song and L. Zhang, "Novel view synthesis with wide-baseline stereo pairs based on local--global information," Computers & Graphics, 2025, doi: 10.1016/j.cag.2024.104139.

[44] G. Sridevi and S. Srinivas Kumar, "Image inpainting based on fractional-order nonlinear diffusion for image reconstruction," Circuits, Systems, and Signal Processing, 2019, doi: 10.1007/s00034-019-01029-w.

[45] R. Tucker and N. Snavely, "Single-view view synthesis with multiplane images," Proc. CVPR, 2020, doi: 10.1109/CVPR42600.2020.00063.

[46] M. Tukur, G. Pintore, E. Gobbetti, J. Schneider, and M. Agus, "SPIDER: A framework for processing, editing and presenting immersive high-resolution spherical indoor scenes," Graphical Models, 2023, doi: 10.1016/j.gmod.2023.101182.

[47] T. Uchida, Y. Kanamori, and Y. Endo, "3D View Optimization for Improving Image Aesthetics," Proc. ICASSP, 2025, doi: 10.1109/ICASSP49660.2025.10888966.

[48] J. Waidhofer, R. Gadgil, A. Dickson, S. Zollmann, and J. Ventura, "PanoSynthVR: Toward Light-weight 360-Degree View Synthesis from a Single Panoramic Input," Proc. ISMAR, 2022, doi: 10.1109/ISMAR55827.2022.00075.

[49] G. Wang, P. Wang, Z. Chen, W. Wang, C. C. Loy, and Z. Liu, "PERF: Panoramic Neural Radiance Field from a Single Panorama," IEEE TPAMI, 2024, doi: 10.1109/TPAMI.2024.3387307.

[50] Z. Wei, J. Zhang, X. Shen, Z. Lin, R. Mech, M. Hoai, and D. Samaras, "Good view hunting: Learning photo composition from dense view pairs," Proc. CVPR, 2018, doi: 10.1109/CVPR.2018.00570.

[51] O. Wiles, G. Gkioxari, R. Szeliski, and J. Johnson, "Synsin: End-to-end view synthesis from a single image," Proc. CVPR, 2020, doi: 10.1109/CVPR42600.2020.00749.

[52] D. Xie, P. Hu, X. Sun, S. Pirk, J. Zhang, R. Mech, and A. E. Kaufman, "GAIT: Generating aesthetic indoor tours with deep reinforcement learning," Proc. ICCV, 2023, doi: 10.1109/ICCV51070.2023.00681.

[53] J. Xu, B. Stenger, T. Kerola, and T. Tung, "Pano2CAD: Room Layout from a Single Panorama Image," Proc. WACV, 2017, doi: 10.1109/WACV.2017.46.

[54] J. Xu, J. Zheng, Y. Xu, R. Tang, and S. Gao, "Layout-guided novel view synthesis from a single indoor panorama," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.01617.

[55] D. Xu, Y. Jiang, P. Wang, Z. Fan, H. Shi, and Z. Wang, "SinNeRF: Training neural radiance fields on complex scenes from a single image," Proc. ECCV, 2022, doi: 10.1007/978-3-031-20047-2_42.

[56] Y. Yang, S. Jin, R. Liu, and J. and Yu, "Automatic 3D Indoor Scene Modeling From Single Panorama," Proc. CVPR, 2018, doi: 10.1109/CVPR.2018.00413.

[57] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, "Free-form image inpainting with gated convolution," Proc. ICCV, 2019, doi: 10.1109/ICCV.2019.00457.

[58] Y. Zeng, J. Fu, H. Chao, and B. Guo, "Learning pyramid-context encoder network for high-quality image inpainting," Proc. CVPR, 2019, doi: 10.1109/CVPR.2019.00158.

[59] C. Zhang, Z. Cui, C. Chen, S. Liu, B. Zeng, H. Bao, and Y. Zhang, "DeepPanoContext: Panoramic 3D scene understanding with holistic scene context graph and relation-based optimization," Proc. ICCV, 2021, doi: 10.1109/ICCV48922.2021.01240.

[60] Q. Zhao, L. Wan, W. Feng, J. Zhang, and T. T. Wong, "Cube2Video: Navigate between cubic panoramas in real-time," IEEE Trans. Multimedia, 2013, doi: 10.1109/TMM.2013.2280249.

[61] T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely, "Stereo Magnification: Learning View Synthesis Using Multiplane Images," ACM TOG, 2018, doi: 10.1145/3197517.3201323.

[62] H. Zhou, X. Cheng, W. Yu, Y. Tian, and L. Yuan, "HoloDreamer: Holistic 3D panoramic world generation from text descriptions," arXiv preprint arXiv:2407.15187, 2024, doi: 10.48550/arXiv.2407.15187.

↑ Back to Top

Dynamic modifications

References:

[1] A. Dolhasz, C. Ma, D. Gausebeck, K. Chen, G. Miller, L. Hayne, G. Hovden, A. Sabik, O. Brandt, and M. Slavcheva, "Defurnishing with X-Ray Vision: Joint Removal of Furniture from Panoramas and Mesh," Proc. CVPR, 2025, doi: 10.1109/CVPRW67362.2025.00624.

[2] C. Fang, Y. Dong, K. Luo, X. Hu, R. Shrestha, and P. Tan, "Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints," Proc. 3DV, 2025, doi: 10.1109/3DV66043.2025.00069.

[3] V. Gkitsas, V. Sterzentsenko, N. Zioulis, G. Albanis, and D. Zarpalas, "PanoDR: Spherical Panorama Diminished Reality for Indoor Scenes," Proc. CVPR Workshops, 2021, doi: 10.1109/CVPRW53098.2021.00412.

[4] Z. Huang, J. He, J. Ye, L. Jiang, W. Li, Y. Chen, and T. Han, "Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration," Proc. CVPR, 2025, doi: 10.1109/CVPR52734.2025.02489.

[5] Z. Li, L. Wang, X. Huang, C. Pan, and J. Yang, "PhyIR: Physics-based Inverse Rendering for Panoramic Indoor Images," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.01238.

[6] R. Liang, Z. Gojcic, M. Nimier-David, D. Acuna, N. Vijaykumar, S. Fidler, and Z. Wang, "Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering," Proc. ECCV, 2024, doi: 10.1007/978-3-031-73030-6_25.

[7] L. Lyu, V. Deschaintre, Y. Hold-Geoffroy, M. Hasan, J. S. Yoon, T. Leimkühler, C. Theobalt, and I. Georgiev, "IntrinsicEdit: Precise generative image manipulation in intrinsic space," ACM Trans. Graph., 2025, doi: 10.1145/3731173.

[8] G. Pintore, M. Agus, E. Almansa, and E. Gobbetti, "Instant Automatic Emptying of Panoramic Indoor Scenes," IEEE TVCG, 2022, doi: 10.1109/TVCG.2022.3202999.

[9] G. Pu, Y. Zhao, and Z. Lian, "Pano2Room: Novel View Synthesis from a Single Indoor Panorama," Proc. SIGGRAPH Asia Conference Papers, 2024, doi: 10.1145/3680528.3687616.

[10] U. Shah, M. Tukur, M. Alzubaidi, G. Pintore, E. Gobbetti, M. Househ, J. Schneider, and M. Agus, "MultiPanoWise: holistic deep architecture for multi-task dense prediction from a single panoramic image," Proc. CVPRW - OmniCV, 2024, doi: 10.1109/CVPRW63382.2024.00138.

[11] U. Shah, S. Jashari, M. Tukur, M. Househ, J. Schneider, G. Pintore, E. Gobbetti, and M. Agus, "Virtual Staging of Indoor Panoramic Images via Multi-task Learning and Inverse Rendering," IEEE CG&A, 2025, doi: 10.1109/MCG.2025.3605806.

[12] S. Shen, Z. Bao, W. Xu, and C. Xiao, "IllumiDiff: Indoor Illumination Estimation From a Single Image With Diffusion Model," IEEE TVCG, 2025, doi: 10.1109/TVCG.2025.3553853.

[13] M. Slavcheva, D. Gausebeck, K. Chen, D. Buchhofer, A. Sabik, C. Ma, S. Dhillon, O. Brandt, and A. Dolhasz, "An Empty Room is All We Want: Automatic Defurnishing of Indoor Panoramas," Proc. CVPRW, 2024, doi: 10.1109/CVPRW63382.2024.00734.

[14] M. Tukur, A. U. Rehman, G. Pintore, E. Gobbetti, J. Schneider, and M. Agus, "PanoStyle: Semantic, Geometry-Aware and Shading Independent Photorealistic Style Transfer for Indoor Panoramic Scenes," Proc. ICCVW, 2023, doi: 10.1109/ICCVW60793.2023.00170.

[15] M. Tukur, S. Jashari, D. Abou Hassanain, F. Bettio, J. Schneider, G. Pintore, E. Gobbetti, and M. Agus, "PanoStyleVR: style-based similarity metrics for Web-based immersive panoramic style transfer," Proceedings of the 30th International Conference on 3D Web Technology, 2025, doi: 10.1145/3746237.3746287.

[16] W. Wang, C. Qing, J. Tan, and X. Xu, "Multi-view Panoramic Image Style Transfer with Multi-scale Attention and Global Sharing," ACM Trans. Multimedia Comput. Commun. Appl., 2025, doi: 10.1145/3735137.

[17] X. Xing, K. Groh, S. Karaoglu, T. Gevers, and A. Bhattad, "Luminet: Latent intrinsics meets diffusion models for indoor scene relighting," Proc. CVPR, 2025, doi: 10.1109/CVPR52734.2025.00050.

[18] S. Yang, J. Tan, M. Zhang, T. Wu, G. Wetzstein, Z. Liu, and D. Lin, "LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation," Proc. SIGGRAPH Conference Papers, 2025, doi: 10.1145/3721238.3730643.

[19] W. Ye, C. Ji, Z. Chen, J. Gao, X. Huang, S. H. Zhang, W. Ouyang, T. He, C. Zhao, and G. Zhang, "DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion," Proc. NeurIPS, 2024, doi: 10.48550/arXiv.2410.24203.

[20] W. Ye, Y. Wang, Y. Liu, W. Lin, and X. Xiang, "Panoramic Arbitrary Style Transfer with Deformable Distortion Constraints," Journal of Visual Communication and Image Representation, 2025, doi: 10.1016/j.jvcir.2024.104344.

[21] H. X. Yu, H. Duan, C. Herrmann, W. T. Freeman, and J. Wu, "WonderWorld: Interactive 3D Scene Generation from a Single Image," Proc. CVPR, 2025, doi: 10.1109/CVPR52734.2025.00555.

[22] C. Zhang, Q. Wu, C. C. Gambardella, X. Huang, D. Phung, W. Ouyang, and J. Cai, "Taming Stable Diffusion for Text to 360° Panorama Image Generation," Proc. CVPR, 2024, doi: 10.1109/CVPR52733.2024.00607.

[23] T. Zhi, B. Chen, I. Boyadzhiev, S. B. Kang, M. Hebert, and S. G. Narasimhan, "Semantically supervised appearance decomposition for virtual staging from a single panorama," ACM TOG, 2022, doi: 10.1145/3528223.3530148.

[24] S. Zhou, Z. Fan, D. Xu, H. Chang, P. Chari, T. Bharadwaj, S. You, Z. Wang, and A. Kadambi, "DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting," Proc. ECCV, 2025, doi: 10.1007/978-3-031-72658-3_19.

↑ Back to Top

View synthesis

References:

[1] B. Attal, S. Ling, A. Gokaslan, C. Richardt, and J. Tompkin, "MatryODShka: Real-time 6DoF video view synthesis using multi-sphere images," Proc. ECCV, 2020, doi: 10.1007/978-3-030-58452-8_26.

[2] T. Bertel, M. Yuan, R. Lindroos, and C. Richardt, "OmniPhotos: Casual 360° VR Photography," ACM TOG, 2020, doi: 10.1145/3414685.3417770.

[3] S. Cai, A. Obukhov, D. Dai, and L. Van Gool, "Pix2nerf: Unsupervised conditional P-GAN for single image to neural radiance fields translation," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.00395.

[4] C. Deng, C. Jiang, C. R. Qi, X. Yan, Y. Zhou, L. Guibas, and D. Anguelov, "NeRDI: Single-view NeRF synthesis with language-guided diffusion as general image priors," Proc. CVPR, 2023, doi: 10.1109/CVPR52729.2023.01977.

[5] J. Gu, A. Trevithick, K. E. Lin, J. M. Susskind, C. Theobalt, L. Liu, and R. Ramamoorthi, "NeRFDiff: Single-image view synthesis with NeRF-guided distillation from 3D-aware diffusion," Proc. ICML, 2023, url: https://dl.acm.org/doi/abs/10.5555/3618408.3618881.

[6] P. Hedman and J. Kopf, "Instant 3D Photography," ACM TOG., 2018, doi: 10.1145/3197517.3201384.

[7] J. Huang, Z. Chen, D. Ceylan, and H. Jin, "6-DOF VR videos with a single 360-camera," Proc. IEEE VR, 2017, doi: 10.1109/VR.2017.7892229.

[8] A. Jain, M. Tancik, and P. Abbeel, "Putting NeRF on a diet: Semantically consistent few-shot view synthesis," Proc. ICCV, 2021, doi: 10.1109/ICCV48922.2021.00583.

[9] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, "3D Gaussian Splatting for Real-Time Radiance Field Rendering," ACM TOG, 2023, doi: 10.1145/3592433.

[10] B. Luo, F. Xu, C. Richardt, and J. H. Yong, "Parallax360: Stereoscopic 360° Scene Representation for Head-Motion Parallax," IEEE TVCG, 2018, doi: 10.1109/TVCG.2018.2794071.

[11] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, "NeRF: representing scenes as neural radiance fields for view synthesis," Commun. ACM, 2021, doi: 10.1145/3503250.

[12] G. Pintore, F. Ganovelli, E. Gobbetti, and R. Scopigno, "Mobile Mapping and Visualization of Indoor Structures to Simplify Scene Understanding and Location Awareness," Proc. ECCV Workshops, 2016.

[13] G. Pintore, F. Bettio, M. Agus, and E. Gobbetti, "Deep scene synthesis of Atlanta-world interiors from a single omnidirectional image," IEEE TVCG, 2023, doi: 10.1109/TVCG.2023.3320219.

[14] G. Pintore, A. Jaspe-Villanueva, M. Hadwiger, J. Schneider, M. Agus, F. Marton, F. Bettio, and E. Gobbetti, "Deep synthesis and exploration of omnidirectional stereoscopic environments from a single surround-view panoramic image," Computers & Graphics, 2024, doi: 10.1016/j.cag.2024.103907.

[15] G. Pu, Y. Zhao, and Z. Lian, "Pano2Room: Novel View Synthesis from a Single Indoor Panorama," Proc. SIGGRAPH Asia Conference Papers, 2024, doi: 10.1145/3680528.3687616.

[16] M. Tukur, G. Pintore, E. Gobbetti, J. Schneider, and M. Agus, "SPIDER: A framework for processing, editing and presenting immersive high-resolution spherical indoor scenes," Graphical Models, 2023, doi: 10.1016/j.gmod.2023.101182.

[17] J. Waidhofer, R. Gadgil, A. Dickson, S. Zollmann, and J. Ventura, "PanoSynthVR: Toward Light-weight 360-Degree View Synthesis from a Single Panoramic Input," Proc. ISMAR, 2022, doi: 10.1109/ISMAR55827.2022.00075.

[18] G. Wang, P. Wang, Z. Chen, W. Wang, C. C. Loy, and Z. Liu, "PERF: Panoramic Neural Radiance Field from a Single Panorama," IEEE TPAMI, 2024, doi: 10.1109/TPAMI.2024.3387307.

[19] J. Xu, J. Zheng, Y. Xu, R. Tang, and S. Gao, "Layout-guided novel view synthesis from a single indoor panorama," Proc. CVPR, 2021, doi: 10.1109/CVPR46437.2021.01617.

[20] D. Xu, Y. Jiang, P. Wang, Z. Fan, H. Shi, and Z. Wang, "SinNeRF: Training neural radiance fields on complex scenes from a single image," Proc. ECCV, 2022, doi: 10.1007/978-3-031-20047-2_42.

[21] T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely, "Stereo Magnification: Learning View Synthesis Using Multiplane Images," ACM TOG, 2018, doi: 10.1145/3197517.3201323.

↑ Back to Top

Exploration

References:

[1] T. Marrinan and M. E. Papka, "Real-time omnidirectional stereo rendering: generating 360° surround-view panoramic images for comfortable immersive viewing," IEEE TVCG, 2021, doi: 10.1109/TVCG.2021.3067780.

[2] K. Matzen, M. F. Cohen, B. Evans, J. Kopf, and R. Szeliski, "Low-cost 360 Stereo Photography and Video Capture," ACM TOG, 2017, doi: 10.1145/3072959.3073645.

[3] G. Pintore, F. Bettio, M. Agus, and E. Gobbetti, "Deep scene synthesis of Atlanta-world interiors from a single omnidirectional image," IEEE TVCG, 2023, doi: 10.1109/TVCG.2023.3320219.

[4] G. Pintore, A. Jaspe-Villanueva, M. Hadwiger, J. Schneider, M. Agus, F. Marton, F. Bettio, and E. Gobbetti, "Deep synthesis and exploration of omnidirectional stereoscopic environments from a single surround-view panoramic image," Computers & Graphics, 2024, doi: 10.1016/j.cag.2024.103907.

[5] M. Tukur, A. U. Rehman, G. Pintore, E. Gobbetti, J. Schneider, and M. Agus, "PanoStyle: Semantic, Geometry-Aware and Shading Independent Photorealistic Style Transfer for Indoor Panoramic Scenes," Proc. ICCVW, 2023, doi: 10.1109/ICCVW60793.2023.00170.

↑ Back to Top

Emerging pre-trained and foundation models

Single-panorama indoor inference is highly ill-posed and relies on strong priors typically learned from large annotated datasets, whose creation and use are costly and often task-specific. Foundation models trained on large-scale data are emerging as a possible alternative by enabling reusable representations that can be adapted to many tasks. In 360-degree analysis, they are mainly used either by repurposing standard vision models for omnidirectional problems or by developing models specifically designed for panoramic imagery.

References:

[1] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, "Language models are few-shot learners," Proc. NeurIPS, 2020, doi: 10.48550/arXiv.2005.14165.

[2] M. Chen, A. Radford, R. Child, J. Wu, H. Jun, D. Luan, and I. Sutskever, "Generative pretraining from pixels," Proc. PMLR, 2020, url: https://proceedings.mlr.press/v119/chen20s.html.

[3] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, "Scaling laws for neural language models," arXiv preprint arXiv:2001.08361, 2020, doi: 10.48550/arXiv.2001.08361.

[4] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W. Y. Lo, P. Dollár, and R. Girshick, "Segment Anything," Proc. ICCV, 2023, doi: 10.1109/ICCV51070.2023.00371.

[5] H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, "A Comprehensive Overview of Large Language Models," ACM Trans. Intell. Syst. Technol., 2025, doi: 10.1145/3744746.

[6] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, "Learning Transferable Visual Models From Natural Language Supervision," Proc. ICML, 2021, url: https://icml.cc/virtual/2021/oral/9194.

[7] M. Xu, Z. Zhang, H. Hu, J. Wang, L. Wang, F. Wei, X. Bai, and Z. Liu, "End-to-End Semi-Supervised Object Detection with Soft Teacher," Proc. ICCV, 2021, doi: 10.1109/ICCV48922.2021.00305.

↑ Back to Top

Transfer and composition of conventional vision foundation models for 360-degree tasks

References:

[1] H. Ai, Z. Cao, Y. P. Cao, Y. Shan, and L. Wang, "HRDFuse: Monocular 360° Depth Estimation by Collaboratively Learning Holistic-With-Regional Depth Distributions," Proc. CVPR, 2023, doi: 10.1109/CVPR52729.2023.01275.

[2] H. Ai, Z. Cao, and L. Wang, "A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision," Int J Comput Vis, 2025, doi: 10.1007/s11263-025-02391-w.

[3] Z. Cao, J. Zhu, H. Ai, L. Jiang, Y. Lyu, and H. Xiong, "ST$^2$360D: Spatial-to-Temporal Consistency for Training-free 360 Monocular Depth Estimation," Proc. NeurIPS, 2025, url: [https://neurips.cc/virtual/2025/loc/san-diego/poster/ 117039](https://neurips.cc/virtual/2025/loc/san-diego/poster/ 117039).

[4] S. Chen, H. Guo, S. Zhu, F. Zhang, Z. Huang, J. Feng, and B. Kang, "Video Depth Anything: Consistent Depth Estimation for Super-Long Videos," Proc. CVPR, 2025, doi: 10.1109/CVPR52734.2025.02126.

[5] A. Gokaslan, A. F. Cooper, J. Collins, L. Seguin, A. Jacobson, M. Patel, J. Frankle, C. Stephenson, and V. Kuleshov, "CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images," Proc. CVPR, 2024, doi: 10.1109/CVPR52733.2024.00788.

[6] H. Jiang, Z. Song, Z. Lou, R. Xu, and M. Tan, "Depth Anything in 360°: Towards Scale Invariance in the Wild," arXiv preprint arXiv:2512.22819, 2025, doi: 10.48550/arXiv.2512.22819.

[7] S. Kulkarni, P. Yin, and S. Scherer, "360FusionNeRF: Panoramic Neural Radiance Fields with Joint Guidance," Proc. IROS, 2023, doi: 10.1109/IROS55552.2023.10341346.

[8] Y. Li, Y. Guo, Z. Yan, X. Huang, Y. Duan, and L. Ren, "Omnifusion: 360 monocular depth estimation via geometry-aware fusion," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.00282.

[9] X. Luo, J. B. Huang, R. Szeliski, K. Matzen, and J. Kopf, "Consistent video depth estimation," ACM Trans. Graph., 2020, doi: 10.1145/3386569.3392377.

[10] C. H. Peng and J. Zhang, "High-Resolution Depth Estimation for 360° Panoramas through Perspective and Panoramic Depth Images Registration," Proc. WACV, 2023, doi: 10.1109/WACV56688.2023.00313.

[11] G. Pu, Y. Zhao, and Z. Lian, "Pano2Room: Novel View Synthesis from a Single Indoor Panorama," Proc. SIGGRAPH Asia Conference Papers, 2024, doi: 10.1145/3680528.3687616.

[12] M. Rey-Area, M. Yuan, and C. Richardt, "360MonoDepth: High-Resolution 360° Monocular Depth Estimation," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.00374.

[13] M. Rey-Area and C. Richardt, "360° 3D Photos from a Single 360° Input Image," IEEE TVCG, 2025, doi: 10.1109/TVCG.2025.3549538.

[14] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image synthesis with latent diffusion models," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.01042.

[15] Z. Shen, Z. Zheng, C. Lin, L. Nie, K. Liao, S. Zheng, and Y. Zhao, "Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness," Proc. CVPR, 2023, doi: 10.1109/CVPR52729.2023.01663.

[16] N. H. A. Wang and Y. L. Liu, "Depth anywhere: Enhancing 360 monocular depth estimation via perspective distillation and unlabeled data augmentation," Proc. NeurIPS, 2024, url: https://dl.acm.org/doi/10.5555/3737916.3741972.

[17] L. Yang, B. Kang, Z. Huang, X. Xu, J. Feng, and H. Zhao, "Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data," Proc. CVPR, 2024, doi: 10.1109/CVPR52733.2024.00987.

[18] L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, "Depth Anything V2," Proc. NeurIPS, 2024, doi: 10.52202/079017-0688.

↑ Back to Top

Native 360-degree foundation models

References:

[1] Z. Cao, J. Zhu, W. Zhang, H. Ai, H. Bai, H. Zhao, and L. Wang, "PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation," Proc. CVPR, 2025, doi: 10.1109/CVPR52734.2025.00100.

[2] B. Cottier, R. Rahman, L. Fattorini, N. Maslej, T. Besiroglu, and D. Owen, "The rising costs of training frontier AI models," arXiv preprint arXiv:2405.21015, 2025, doi: 10.48550/arXiv.2405.21015.

[3] H. Jiang, Z. Song, Z. Lou, R. Xu, and M. Tan, "Depth Anything in 360°: Towards Scale Invariance in the Wild," arXiv preprint arXiv:2512.22819, 2025, doi: 10.48550/arXiv.2512.22819.

[4] X. Liu, T. Zhou, C. Wang, Y. Wang, Y. Wang, Q. Cao, W. Du, Y. Yang, J. He, Y. Qiao, and others, "Toward the unification of generative and discriminative visual foundation model: a survey," The Visual Computer, 2025, doi: 10.1007/s00371-024-03608-8.

[5] N. Maslej, L. Fattorini, R. Perrault, Y. Gil, V. Parli, N. Kariuki, E. Capstick, A. Reuel, E. Brynjolfsson, J. Etchemendy, and others, "Artificial intelligence index report 2025," arXiv preprint arXiv:2504.07139, 2025, doi: 10.48550/arXiv.2310.03715.

[6] M. Wallingford, A. Bhattad, A. Kusupati, V. Ramanujan, M. Deitke, A. Kembhavi, R. Mottaghi, W. C. Ma, and A. Farhadi, "From an image to a scene: Learning to imagine the world from a million 360 videos," Proc. NeurIPS, 2024, doi: 10.48550/arXiv.2412.07770.

[7] T. Wiedemer, Y. Li, P. Vicol, S. S. Gu, N. Matarese, K. Swersky, B. Kim, P. Jaini, and R. Geirhos, "Video models are zero-shot learners and reasoners," arXiv preprint arXiv:2509.20328, 2025, doi: 10.48550/arXiv.2509.20328.

[8] L. Yang, B. Kang, Z. Huang, X. Xu, J. Feng, and H. Zhao, "Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data," Proc. CVPR, 2024, doi: 10.1109/CVPR52733.2024.00987.

[9] L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, "Depth Anything V2," Proc. NeurIPS, 2024, doi: 10.52202/079017-0688.

↑ Back to Top

Opportunities and Limitations

Foundation models are increasingly used for single-view 360-degree inference by enabling knowledge reuse and improved generalization, though most approaches adapt general vision models rather than 360-specific ones. However, their high training and deployment cost, in terms of data, computation, and energy, and the growing concentration within large industrial actors, place them largely beyond the reach of individual researchers. Moreover, inference cost is typically high. Lightweight task-specific models are thus likely to remain important alongside them.

References:

[1] R. Bommasani, S. Kapoor, K. Klyman, S. Longpre, A. Ramaswami, D. Zhang, M. Schaake, D. E. Ho, A. Narayanan, and P. Liang, "Considerations for governing open foundation models," Science, 2024, doi: 10.1126/science.adp1848.

[2] N. Maslej, L. Fattorini, R. Perrault, Y. Gil, V. Parli, N. Kariuki, E. Capstick, A. Reuel, E. Brynjolfsson, J. Etchemendy, and others, "Artificial intelligence index report 2025," arXiv preprint arXiv:2504.07139, 2025, doi: 10.48550/arXiv.2310.03715.

↑ Back to Top

Applications

Panorama-first workflows are enabling practical applications across domains such as AEC/BIM documentation, education and safety training, cultural heritage dissemination, remote collaboration, and real estate marketing, where plausible and easily captured models are often more valuable than precise measurements.

References:

[1] M. De Fino, C. Ceppi, and F. Fatiguso, "Virtual tours and informational models for improving territorial attractiveness and the smart management of architectural heritage: The 3D-IMP-ACT project," The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2020, doi: 10.5194/isprs-archives-XLIV-M-1-2020-473-2020.

[2] M. De Fino, S. Bruno, and F. Fatiguso, "Dissemination, assessment and management of historic buildings by thematic virtual tours and 3D models," Virtual Archaeology Review, 2022, doi: 10.4995/var.2022.15426.

[3] R. Dieb, A. Alsalloum, and N. Webb, "Interactive 360° media for the dissemination of endangered world heritage sites: the ancient city of Palmyra in Syria," Built Heritage, 2024, doi: 10.1186/s43238-024-00126-3.

[4] R. Eiris, M. Gheisari, and B. Esmaeili, "Desktop-based safety training using 360-degree panorama and static virtual reality techniques: A comparative experimental study," Automation in construction, 2020, doi: 10.1016/j.autcon.2019.102969.

[5] R. Eiris, J. Wen, and M. Gheisari, "iVisit-Collaborate: Collaborative problem-solving in multiuser 360-degree panoramic site visits," Computers & Education, 2022, doi: 10.1016/j.compedu.2021.104365.

[6] X. Fang, H. Li, H. Wu, L. Fan, T. Kong, and Y. Wu, "A fast end-to-end method for automatic interior progress evaluation using panoramic images," Engineering Applications of Artificial Intelligence, 2023, doi: 10.1016/j.engappai.2023.106733.

[7] J. Isingizwe, R. Eiris, and A. J. Al-Bayati, "Enhancing safety training engagement through immersive storytelling: A case study in the residential construction," Safety Science, 2024, doi: 10.1016/j.ssci.2024.106631.

[8] M. Khan, Y. Qiu, Y. Cong, J. Abu-Khalaf, D. Suter, and B. Rosenhahn, "PanoSCU: A Simulation-Based Dataset for Panoramic Indoor Scene Understanding," IEEE Access, 2025, doi: 10.1109/ACCESS.2025.3561055.

[9] J. S. Kim, T. Leathem, and J. Liu, "Comparing virtual reality modalities and 360 photography in a construction management classroom," 55th ASC Annual International Conference Proceedings, 2019, url: [http://ascpro0.ascweb.org/archives/cd/2019/paper/ CERT284002019.pdf](http://ascpro0.ascweb.org/archives/cd/2019/paper/ CERT284002019.pdf).

[10] Z. Li, L. Wang, X. Huang, C. Pan, and J. Yang, "PhyIR: Physics-based Inverse Rendering for Panoramic Indoor Images," Proc. CVPR, 2022, doi: 10.1109/CVPR52688.2022.01238.

[11] G. Pintore, R. Pintus, F. Ganovelli, R. Scopigno, and E. Gobbetti, "Recovering 3D existing-conditions of indoor structures from spherical images," Computers & Graphics, 2018, doi: 10.1016/j.cag.2018.09.013.

[12] G. Pintore, M. Agus, E. Almansa, and E. Gobbetti, "Instant Automatic Emptying of Panoramic Indoor Scenes," IEEE TVCG, 2022, doi: 10.1109/TVCG.2022.3202999.

[13] U. Shah, S. Jashari, M. Tukur, M. Househ, J. Schneider, G. Pintore, E. Gobbetti, and M. Agus, "Virtual Staging of Indoor Panoramic Images via Multi-task Learning and Inverse Rendering," IEEE CG&A, 2025, doi: 10.1109/MCG.2025.3605806.

[14] Y. Shinde, K. Lee, B. Kiper, M. Simpson, and S. Hasanzadeh, "A Systematic Literature Review on 360° Panoramic Applications in Architecture, Engineering, and Construction (AEC) Industry," Journal of Information Technology in Construction (ITcon), 2023.

[15] P. Subramanian and M. Gheisari, "Using 360-degree panoramic photogrammetry and laser scanning techniques to create point cloud data: A comparative pilot study," Proc. Associated Schools of Construction Annual Conference, 2019, url: [http://ascpro0.ascweb.org/archives/cd/2019/paper/ CPRT305002019.pdf](http://ascpro0.ascweb.org/archives/cd/2019/paper/ CPRT305002019.pdf).

[16] M. Tukur, A. U. Rehman, G. Pintore, E. Gobbetti, J. Schneider, and M. Agus, "PanoStyle: Semantic, Geometry-Aware and Shading Independent Photorealistic Style Transfer for Indoor Panoramic Scenes," Proc. ICCVW, 2023, doi: 10.1109/ICCVW60793.2023.00170.

[17] M. Tukur, S. Jashari, M. Alzubaidi, B. A. Salami, Y. Boraey, S. Yong, D. Saleh, G. Pintore, E. Gobbetti, J. Schneider, N. Fetais, and M. Agus, "Panoramic Imaging in Immersive Extended Reality: A Scoping Review of Technologies, Applications, Perceptual Studies, and User Experience Challenges," Frontiers in Virtual Reality, 2025, doi: 10.3389/frvir.2025.1622605.

[18] S. Wang, Y. Lu, Z. Xia, Z. Chen, Y. Wang, L. Zhong, and T. Zhong, "Integrating visual-SLAM and multi-view panoramas for efficient indoor 3D layout reconstruction in building projects," Advanced Engineering Informatics, 2025, doi: 10.1016/j.aei.2025.103629.

[19] W. Wang, C. Qing, J. Tan, and X. Xu, "Multi-view Panoramic Image Style Transfer with Multi-scale Attention and Global Sharing," ACM Trans. Multimedia Comput. Commun. Appl., 2025, doi: 10.1145/3735137.

[20] Y. Xia, S. Weng, S. Yang, J. Liu, C. Zhu, M. Teng, Z. Jia, H. Jiang, and B. Shi, "PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms," arXiv preprint arXiv:2505.22016, 2025, doi: 10.48550/arXiv.2505.22016.

[21] S. Yang, J. Tan, M. Zhang, T. Wu, G. Wetzstein, Z. Liu, and D. Lin, "LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation," Proc. SIGGRAPH Conference Papers, 2025, doi: 10.1145/3721238.3730643.

[22] W. Ye, Y. Wang, Y. Liu, W. Lin, and X. Xiang, "Panoramic Arbitrary Style Transfer with Deformable Distortion Constraints," Journal of Visual Communication and Image Representation, 2025, doi: 10.1016/j.jvcir.2024.104344.

[23] T. Zhi, B. Chen, I. Boyadzhiev, S. B. Kang, M. Hebert, and S. G. Narasimhan, "Semantically supervised appearance decomposition for virtual staging from a single panorama," ACM TOG, 2022, doi: 10.1145/3528223.3530148.

[24] S. Zhou, Z. Fan, D. Xu, H. Chang, P. Chari, T. Bharadwaj, S. You, Z. Wang, and A. Kadambi, "DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting," Proc. ECCV, 2025, doi: 10.1007/978-3-031-72658-3_19.

↑ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

State-of-the-art in deep learning approaches for automatic single-panorama indoor modeling and exploration

Authors

Abstract

How to Cite

Acknowledgments

Table of Contents

Related surveys

Background

360-degree image capture and processing representation

Gravity alignment

Architectural and data-driven priors

Open research data

Automatic geometric and structural modeling

Depth and other pixel-wise information inference

Single-image room layout estimation

Positioning and connecting single rooms

Novel view synthesis and immersive model exploration

View synthesis from a single panorama

Dynamic modifications

View synthesis

Exploration

Emerging pre-trained and foundation models

Transfer and composition of conventional vision foundation models for 360-degree tasks

Native 360-degree foundation models

Opportunities and Limitations

Applications

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

State-of-the-art in deep learning approaches for automatic single-panorama indoor modeling and exploration

Authors

Abstract

How to Cite

Acknowledgments

Table of Contents

Related surveys

Background

360-degree image capture and processing representation

Gravity alignment

Architectural and data-driven priors

Open research data

Automatic geometric and structural modeling

Depth and other pixel-wise information inference

Single-image room layout estimation

Positioning and connecting single rooms

Novel view synthesis and immersive model exploration

View synthesis from a single panorama

Dynamic modifications

View synthesis

Exploration

Emerging pre-trained and foundation models

Transfer and composition of conventional vision foundation models for 360-degree tasks

Native 360-degree foundation models

Opportunities and Limitations

Applications

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages