1. Introduction
The growing global concern for mental health necessitates innovative tools to support self-expression and well-being. Text-to-image (T2I) generation models offer a novel avenue for individuals to externalize complex emotions. This work addresses the critical need for a human-centered evaluation of such models when applied to the sensitive context of mental distress. Models used include various contemporary Text-to-Image models for generation, and GPT-4o was utilized for the systematic creation of the evaluation dataset.
2. Related Work
Existing research explores the application of AI in mental health, ranging from conversational agents to digital therapeutics, but often overlooks expressive creative AI. Previous studies have advanced text-to-image generation capabilities, focusing on realism and artistic style rather than emotional nuance or user well-being. This work builds upon human-computer interaction principles for sensitive applications and ethical AI development, recognizing the unique challenges of AI in mental health contexts.
3. Methodology
Our methodology involved the creation of a comprehensive dataset using GPT-4o, generating diverse prompts reflecting various facets of mental distress. This dataset was then used to solicit image generations from several leading text-to-image models. A human-centered evaluation framework was established, involving a cohort of participants who interacted with the generated images and provided qualitative and quantitative feedback. Ethical guidelines and privacy considerations were paramount throughout the data collection and analysis phases.
4. Experimental Results
The evaluation revealed significant variations in the capacity of different T2I models to accurately and sensitively represent mental distress. User feedback highlighted the importance of emotional resonance and symbolic representation over literal depiction for effective self-expression. For instance, participants rated the emotional accuracy and perceived helpfulness of images generated by Model A higher than Model B in specific distress categories.
Explanation: The table below illustrates a hypothetical comparison of two text-to-image models (Model A and Model B) across key human-centered evaluation metrics. It provides a glimpse into how models were rated on emotional accuracy, perceived helpfulness, and overall user satisfaction. The "Emotional Accuracy" and "Perceived Helpfulness" scores are on a scale of 1-5, while "User Satisfaction" is a percentage of positive responses.
| Metric | Model A | Model B |
|---|---|---|
| Emotional Accuracy (1-5) | 4.2 | 3.5 |
| Perceived Helpfulness (1-5) | 3.9 | 3.1 |
| User Satisfaction (%) | 78% | 62% |
5. Discussion
The results suggest that while current T2I models show promise, significant advancements are needed to robustly support self-expression of mental distress with sensitivity and accuracy. The human-centered evaluation underscored that mere technical proficiency is insufficient; empathy and contextual understanding are crucial for effective application. Future work should focus on fine-tuning models with emotionally resonant data and integrating ethical design principles to ensure user safety and therapeutic benefit.