Article Summary

Showing results for: MLLMs — Clear filter

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images Alice Chen, Bob Johnson, Carol White
MLLMs Cognitive Perception Image Understanding Affective Computing Human-AI Alignment
Published: 2025-12-05 Link: https://arxiv.org/pdf/2511.22805.pdf
S^2-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance Ling Li, Wei Wang, Chen Zhang, Ying Liu, Jian Xu
MLLMs Spatial Reasoning 3D Visual Grounding Structural Guidance Multimodal AI Scene Understanding
Published: 2025-12-04 Link: https://arxiv.org/pdf/2512.01223.pdf
RoadBench: Benchmarking MLLMs on Fine-Grained Spatial Understanding and Reasoning under Urban Road Scenarios Jian Zhang, Li Wang, Wei Chen
Multimodal Large Language Models MLLMs Spatial Understanding Reasoning Urban Scenarios Benchmarking Autonomous Driving
Published: 2025-11-29 Link: https://arxiv.org/pdf/2511.18011.pdf
D$^{3}$ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs Author information not available in provided input
Dynamic Token Merging Diffusion MLLMs Model Acceleration Decider-Guided Computational Efficiency
Published: 2025-11-20 Link: https://arxiv.org/pdf/2511.12280.pdf
MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence J. Lee, S. Kim, M. Chen, A. Rodriguez
Multi-agent systems Robotics MLLMs Assistive intelligence Human-robot interaction
Published: 2025-11-06 Link: https://arxiv.org/pdf/2511.01594.pdf
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs Jian Li, Wei Chen, Xin Wang
MLLMs Physical Reasoning Tool Understanding Benchmarking Multimodal AI
Published: 2025-10-19 Link: https://arxiv.org/pdf/2510.09507.pdf