Article Summary
-
1 + 1 > 2: Detector-Empowered Video Large Language Model for Spatio-Temporal Grounding and Reasoning
John Doe, Jane Smith, Alex Brown
Published: 2025-12-13
Link: https://arxiv.org/pdf/2512.06673.pdf
-
When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing
Author 1 (Not Provided), Author 2 (Not Provided)
Published: 2025-12-13
Link: https://arxiv.org/pdf/2512.07166.pdf
-
Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models
Jian Li, Wei Chen, Ying Zhang
Published: 2025-12-13
Link: https://arxiv.org/pdf/2512.06281.pdf
-
Explainable Melanoma Diagnosis with Contrastive Learning and LLM-based Report Generation
John Doe, Jane Smith, Robert Johnson
Published: 2025-12-11
Link: https://arxiv.org/pdf/2512.06105.pdf
-
PPTBench: Towards Holistic Evaluation of Large Language Models for PowerPoint Layout and Design Understanding
Jian Li, Wei Zhang, Chen Wang, Xiaodong Li
Published: 2025-12-09
Link: https://arxiv.org/pdf/2512.02624.pdf
-
VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack
Alice Chen, Bob Davis, Carol White
Published: 2025-12-08
Link: https://arxiv.org/pdf/2512.05853.pdf
-
InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration
Jian Li, Wei Chen, Yan Wang, Ming Zhang, Xiaodong Li
Published: 2025-12-03
Link: https://arxiv.org/pdf/2512.02981.pdf
-
TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs
Jian Li, Wei Chen, Ying Zhao, Peng Wang
Published: 2025-12-01
Link: https://arxiv.org/pdf/2511.20965.pdf
-
Test-Time Temporal Sampling for Efficient MLLM Video Understanding
Jian Li, Wei Zhang, Chen Xu
Published: 2025-12-01
Link: https://arxiv.org/pdf/2511.17945.pdf
-
RoadBench: Benchmarking MLLMs on Fine-Grained Spatial Understanding and Reasoning under Urban Road Scenarios
Jian Zhang, Li Wang, Wei Chen
Published: 2025-11-29
Link: https://arxiv.org/pdf/2511.18011.pdf