Article Summary
-
1 + 1 > 2: Detector-Empowered Video Large Language Model for Spatio-Temporal Grounding and Reasoning
John Doe, Jane Smith, Alex Brown
Published: 2025-12-13
Link: https://arxiv.org/pdf/2512.06673.pdf
-
Concept-based Explainable Data Mining with VLM for 3D Detection
Jian Li, Wei Zhang, Chen Wang, Xiaoyu Liu
Published: 2025-12-13
Link: https://arxiv.org/pdf/2512.05482.pdf
-
When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing
Author 1 (Not Provided), Author 2 (Not Provided)
Published: 2025-12-13
Link: https://arxiv.org/pdf/2512.07166.pdf
-
Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
Jian Li, Wei Chen, Ying Zhang
Published: 2025-12-13
Link: https://arxiv.org/pdf/2512.06281.pdf
-
Building Reasonable Inference for Vision-Language Models in Blind Image Quality Assessment
Not Provided
Published: 2025-12-12
Link: https://arxiv.org/pdf/2512.09555.pdf
-
Explainable Melanoma Diagnosis with Contrastive Learning and LLM-based Report Generation
John Doe, Jane Smith, Robert Johnson
Published: 2025-12-11
Link: https://arxiv.org/pdf/2512.06105.pdf
-
MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning
J. Chen, L. Wang, K. Gupta
Published: 2025-12-11
Link: https://arxiv.org/pdf/2512.07203.pdf
-
RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation
Jian Li, Wei Chen, Yan Wang
Published: 2025-12-10
Link: https://arxiv.org/pdf/2512.07273.pdf
-
SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models
Ava Chen, Benjamin Lee, Sophia Garcia, Daniel Kim
Published: 2025-12-09
Link: https://arxiv.org/pdf/2512.05955.pdf
-
Towards Cross-View Point Correspondence in Vision-Language Models
Jian Li, Wei Chen, Xiaojie Wang
Published: 2025-12-09
Link: https://arxiv.org/pdf/2512.04686.pdf