Article Summary

Showing results for: evaluation — Clear filter

DEAR: Dataset for Evaluating the Aesthetics of Rendering John Doe, Jane Smith, Michael Brown
rendering aesthetics dataset evaluation computer graphics
Published: 2025-12-12 Link: https://arxiv.org/pdf/2512.05209.pdf
Textured Geometry Evaluation: Perceptual 3D Textured Shape Metric via 3D Latent-Geometry Network Jane Doe, John Smith, Alice Brown
3D textured shapes Perceptual metrics Latent-geometry network Quality evaluation Deep learning
Published: 2025-12-08 Link: https://arxiv.org/pdf/2512.01380.pdf
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models Jian Li, Wei Chen, Xiaofeng Wang, Qian Zhang
video generation reasoning benchmark evaluation AI models
Published: 2025-12-01 Link: https://arxiv.org/pdf/2511.16668.pdf
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts Alice Researcher, Bob Scientist, Carol Engineer
benchmark design dataset bias non-visual shortcuts AI evaluation robustness
Published: 2025-11-08 Link: https://arxiv.org/pdf/2511.04655.pdf
MELDAE: A Framework for Micro-Expression Spotting, Detection, and Automatic Evaluation in In-the-Wild Conversational Scenes Jian Li, Wei Chen, Xiaojun Wu
micro-expressions facial expressions spotting detection automatic evaluation in-the-wild conversational scenes deep learning computer vision
Published: 2025-10-28 Link: https://arxiv.org/pdf/2510.22575.pdf
VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety Alice Chen, Bob Davis, Carol White
multimodal AI AI safety joint understanding benchmark evaluation
Published: 2025-10-25 Link: None
Constantly Improving Image Models Need Constantly Improving Benchmarks J. Doe, A. Smith, R. Johnson
image models benchmarks computer vision dynamic evaluation model assessment
Published: 2025-10-21 Link: https://arxiv.org/pdf/2510.15021.pdf
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark Jian Li, Wei Chen, Qian Wang, Ming Zhao
multimodal AI benchmark large language models foundation models evaluation metrics
Published: 2025-10-17 Link: https://arxiv.org/pdf/2510.13759.pdf
A Review of Longitudinal Radiology Report Generation: Dataset Composition, Methods, and Performance Evaluation J. Doe, A. Smith, C. Brown
radiology report generation longitudinal data medical imaging natural language processing performance evaluation
Published: 2025-10-16 Link: https://arxiv.org/pdf/2510.12444.pdf