Article Summary
-
Towards Stable Cross-Domain Depression Recognition under Missing Modalities
Jing Li, Wei Chen, Xiaoyan Wang
Published: 2025-12-14
Link: None
-
1 + 1 > 2: Detector-Empowered Video Large Language Model for Spatio-Temporal Grounding and Reasoning
John Doe, Jane Smith, Alex Brown
Published: 2025-12-13
Link: https://arxiv.org/pdf/2512.06673.pdf
-
Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task
Jian Li, Wei Chen, Yan Zhang, Min Wang
Published: 2025-12-13
Link: https://arxiv.org/pdf/2512.10359.pdf
-
When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing
Author 1 (Not Provided), Author 2 (Not Provided)
Published: 2025-12-13
Link: https://arxiv.org/pdf/2512.07166.pdf
-
Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
Jian Li, Wei Chen, Ying Zhang
Published: 2025-12-13
Link: https://arxiv.org/pdf/2512.06281.pdf
-
Explaining the Unseen: Multimodal Vision-Language Reasoning for Situational Awareness in Underground Mining Disasters
Author 1 Name Not Provided, Author 2 Name Not Provided
Published: 2025-12-12
Link: https://arxiv.org/pdf/2512.09092.pdf
-
MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning
J. Chen, L. Wang, K. Gupta
Published: 2025-12-11
Link: https://arxiv.org/pdf/2512.07203.pdf
-
Thinking with Images via Self-Calling Agent
Li Wei, Chen Jie, Wang Siyu
Published: 2025-12-11
Link: https://arxiv.org/pdf/2512.08511.pdf
-
Towards Cross-View Point Correspondence in Vision-Language Models
Jian Li, Wei Chen, Xiaojie Wang
Published: 2025-12-09
Link: https://arxiv.org/pdf/2512.04686.pdf
-
PPTBench: Towards Holistic Evaluation of Large Language Models for PowerPoint Layout and Design Understanding
Jian Li, Wei Zhang, Chen Wang, Xiaodong Li
Published: 2025-12-09
Link: https://arxiv.org/pdf/2512.02624.pdf