Article Summary

Showing results for: Scene Understanding — Clear filter

Binary-Gaussian: Compact and Progressive Representation for 3D Gaussian Segmentation A. R. Author, B. M. Contributor, C. D. Researcher
3D Gaussian Segmentation Binary Representation Compact Models Progressive Representation Scene Understanding
Published: 2025-12-06 Link: https://arxiv.org/pdf/2512.00944.pdf
S^2-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance Ling Li, Wei Wang, Chen Zhang, Ying Liu, Jian Xu
MLLMs Spatial Reasoning 3D Visual Grounding Structural Guidance Multimodal AI Scene Understanding
Published: 2025-12-04 Link: https://arxiv.org/pdf/2512.01223.pdf
ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes Alice Chen, Bob Davis, Carla Evans
3D Object Articulation Large Language Models Scene Understanding Computer Graphics Robotics
Published: 2025-11-24 Link: https://arxiv.org/pdf/2511.12977.pdf
LLaVA$^3$: Representing 3D Scenes like a Cubist Painter to Boost 3D Scene Understanding of VLMs Ava Chen, Leo Kim, Maya Singh
3D Scene Understanding Vision-Language Models LLaVA Cubist Representation Multi-modal Learning
Published: 2025-11-22 Link: https://arxiv.org/pdf/2511.16454.pdf
Vision-Language Integration for Zero-Shot Scene Understanding in Real-World Environments Jian Li, Wei Chen, Sara Khan, David Kim
Vision-Language Models Zero-Shot Learning Scene Understanding Multimodal AI Real-World Perception
Published: 2025-11-04 Link: https://arxiv.org/pdf/2510.25070.pdf
PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments Jian Li, Wei Chen, Sarah Miller, David G. Thompson
Multimodal Large Language Models Active Visual Reasoning Physical Environments Reinforcement Learning Scene Understanding
Published: 2025-11-03 Link: https://arxiv.org/pdf/2510.21111.pdf
PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors Jian Li, Wei Chen, Meng Wang, Xin Yu
3D Gaussian Splatting Indoor Reconstruction Planar Priors Vision-Language Models High-Fidelity Scene Understanding
Published: 2025-10-30 Link: https://arxiv.org/pdf/2510.23930.pdf
Structured Interfaces for Automated Reasoning with 3D Scene Graphs Xiaoke Shen, Yifan Li, Wenqiang Xu, Yuexin Ma, Jiayuan Mao, S. M. Ali Eslami, Jonathan How, Joshua B. Tenenbaum, Jiajun Wu
3D Scene Graphs Automated Reasoning Neuro-Symbolic AI Scene Understanding Structured Interfaces
Published: 2025-10-24 Link: https://arxiv.org/pdf/2510.16643.pdf
ViBED-Net: Video Based Engagement Detection Network Using Face-Aware and Scene-Aware Spatiotemporal Cues John Doe, Jane Smith, Robert Johnson
Engagement Detection Video Analysis Spatiotemporal Cues Facial Expression Recognition Scene Understanding
Published: 2025-10-22 Link: https://arxiv.org/pdf/2510.18016.pdf