The ViStoryBench Dataset

A meticulously curated collection designed to advance story visualization research, featuring diverse narratives, artistic styles, and complex character interactions.

Core Mission

ViStoryBench aims to provide a comprehensive and challenging benchmark to rigorously evaluate and compare story visualization models. It focuses on real-world complexities such as character consistency, narrative coherence, and stylistic integrity across a sequence of images.

Download Dataset Code & Tools (GitHub) View Leaderboard

Leaderboard Rankings

View More Details

Task Definition for Story Visualization

The primary goal is to generate a sequence of images \(I_1, I_2, ..., I_m\) that faithfully and consistently visualize a given story, detailed through character information and a series of storyboard shot descriptions.

Given Inputs:

Character Information (\(C_1, ..., C_n\)): For each of the \(n\) characters, provide:
- Textual appearance description (\(T_i\)).
- Corresponding reference image(s) (\(S_i\)) consistent with \(T_i\).
Storyboard Shots (\(Shot_1, ..., Shot_m\)): For each of the \(m\) shots, provide a textual description including:
- Setting Description: The environment and context of the current scene.
- Plot Correspondence: The segment of the original narrative relevant to this shot.
- Onstage Characters: A list of characters present and active in this shot.
- Static Shot Description: Detailed depiction of characters' static actions, expressions, positions, and significant objects within the frame.
- Shot Perspective Design: Cinematographic details like shot scale (e.g., close-up, wide shot), camera angle, and shot type.

Objective:

To generate an image sequence \(I_1, ..., I_m\) where each image \(I_j\) accurately and aesthetically portrays the specified \(Shot_j\). This includes:

Faithful depiction of characters according to their references and descriptions.
Correct representation of character actions, interactions, and expressions.
Accurate portrayal of the scene setting and atmosphere.
Adherence to the specified camera perspectives and shot compositions.
Maintaining visual and stylistic consistency across the image sequence.

Dataset Core Characteristics

Rich Diversity

Encompasses varied story types (comedy, horror, fantasy) and artistic styles (anime, 3D, realistic) for comprehensive model testing.

Complex Narratives

Features stories with single/multiple protagonists, intricate plots, and detailed world-building to challenge generation accuracy and consistency.

Character-Centric

Focuses on maintaining character consistency across shots, providing detailed descriptions and multiple reference images per character.

Significant Scale

Comprises 80 story segments, 344 unique characters, and 509 reference images, ensuring a robust evaluation base.

News & Updates

[2025] 🏆 Ongoing leaderboard maintenance and evaluation of new story visualization methods.
[2025.08.19] 🛠️ Major code update: Full benchmark implementation released.
[2025.08.12] 📄 arXiv v3 is now available.
[2025.06.25] 📄 arXiv v2 has been published.
[2025.05.30] 📝 Technical report v1 released on arXiv.
[2025.05.21] 🚀 Initial project launch and code release.

About

Citation

If you use ViStoryBench in your research, please cite our paper:

@article{zhuang2025vistorybench,
  title={ViStoryBench: Comprehensive Benchmark Suite for Story Visualization},
  author={Cailin Zhuang, Ailin Huang, Wei Cheng, Jingwei Wu, Yaoqi Hu, Jiaqi Liao, Hongyuan Wang, Xinyao Liao, Weiwei Cai, Hengyuan Xu, Xuanyang Zhang, Xianfang Zeng, Zhewei Huang, Gang Yu, Chi Zhang},
  journal={arXiv preprint arxiv:2505.24862},
  year={2025}
}

License

The benchmark code and results are licensed under the MIT License.