Article Summary
-
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
Alice Chen, Bob Johnson, Carol White
Published: 2025-12-05
Link: https://arxiv.org/pdf/2511.22805.pdf
-
S^2-MLLM: Boosting Spatial Reasoning Capability of MLLMs for 3D Visual Grounding with Structural Guidance
Ling Li, Wei Wang, Chen Zhang, Ying Liu, Jian Xu
Published: 2025-12-04
Link: https://arxiv.org/pdf/2512.01223.pdf
-
RoadBench: Benchmarking MLLMs on Fine-Grained Spatial Understanding and Reasoning under Urban Road Scenarios
Jian Zhang, Li Wang, Wei Chen
Published: 2025-11-29
Link: https://arxiv.org/pdf/2511.18011.pdf
-
D$^{3}$ToM: Decider-Guided Dynamic Token Merging for Accelerating Diffusion MLLMs
Author information not available in provided input
Published: 2025-11-20
Link: https://arxiv.org/pdf/2511.12280.pdf
-
MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence
J. Lee, S. Kim, M. Chen, A. Rodriguez
Published: 2025-11-06
Link: https://arxiv.org/pdf/2511.01594.pdf
-
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Jian Li, Wei Chen, Xin Wang
Published: 2025-10-19
Link: https://arxiv.org/pdf/2510.09507.pdf