Article Summary

Showing results for: Multimodal Models — Clear filter

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models First Author, Second Author, Third Author
Multimodal Models Visual Representations Unified Architectures Cross-modal Learning Deep Learning
Published: 2025-12-05 Link: https://arxiv.org/pdf/2512.02014.pdf
Architecture Decoupling Is Not All You Need For Unified Multimodal Model A. Research, B. Scientist, C. Innovator
Multimodal Models Architecture Decoupling Unified Learning Cross-Modal Interaction Deep Learning
Published: 2025-12-01 Link: https://arxiv.org/pdf/2511.22663.pdf
While recognizing actions, LMMs struggle to detect core interaction events Anonymous Author 1, Anonymous Author 2
LMMs Action Recognition Interaction Events Multimodal Models Event Detection
Published: 2025-11-28 Link: https://arxiv.org/pdf/2511.20162.pdf
DeepEyesV2: Toward Agentic Multimodal Model J. Doe, A. Smith, B. Lee
Multimodal Models Agentic AI Deep Learning Vision-Language Models Autonomous Agents
Published: 2025-11-15 Link: https://arxiv.org/pdf/2511.05271.pdf
QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models Alice Chen, Bob Davis, Carol Evans
Large Multimodal Models Image Captioning Question Answering Chain-of-Thought Natural Language Generation
Published: 2025-11-09 Link: https://arxiv.org/pdf/2511.03206.pdf
Emu3.5: Native Multimodal Models are World Learners A. Researcher, B. Developer, C. Engineer
Multimodal AI Large Multimodal Models World Models Native Multimodality Emu3.5
Published: 2025-11-06 Link: https://arxiv.org/pdf/2510.26583.pdf