Article Summary
-
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
Alice Chen, Bob Davis, Carol White, David Green
Published: 2025-12-06
Link: https://arxiv.org/pdf/2512.01816.pdf
-
RoadBench: Benchmarking MLLMs on Fine-Grained Spatial Understanding and Reasoning under Urban Road Scenarios
Jian Zhang, Li Wang, Wei Chen
Published: 2025-11-29
Link: https://arxiv.org/pdf/2511.18011.pdf
-
AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models
Ava Chen, Benjamin Lee, Chloe Kim, Daniel Wang
Published: 2025-11-18
Link: https://arxiv.org/pdf/2511.12149.pdf
-
STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models
Alice Chen, Bob Davis, Carla Evans, David Foster
Published: 2025-11-02
Link: https://arxiv.org/pdf/2510.22571.pdf
-
Evaluation of Vision-LLMs in Surveillance Video
Jian Li, Wei Chen, Mei Lin
Published: 2025-10-30
Link: https://arxiv.org/pdf/2510.23190.pdf
-
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
John Doe, Jane Smith, Robert Johnson
Published: 2025-10-29
Link: https://arxiv.org/pdf/2510.24563.pdf
-
How to Evaluate Monocular Depth Estimation?
Jane Doe, John Smith, Alice Johnson
Published: 2025-10-24
Link: https://arxiv.org/pdf/2510.19814.pdf
-
DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
Jia Li, Wei Wang, Min Chen
Published: 2025-10-23
Link: https://arxiv.org/pdf/2510.14949.pdf
-
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
Jian Li, Wei Chen, Xin Wang
Published: 2025-10-19
Link: https://arxiv.org/pdf/2510.09507.pdf
-
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Jane Doe, John Smith, Alice Wonderland
Published: 2025-10-18
Link: https://arxiv.org/pdf/2510.13626.pdf