Storytelling Benchmark Overview & Leaderboard

Storytelling Benchmark Overview & Leaderboard

The ViStoryBench Dataset

A meticulously curated collection designed to advance story visualization research, featuring diverse narratives, artistic styles, and complex character interactions.

Core Mission

ViStoryBench aims to provide a comprehensive and challenging benchmark to rigorously evaluate and compare story visualization models. It focuses on real-world complexities such as character consistency, narrative coherence, and stylistic integrity across a sequence of images.

Leaderboard Rankings

Task Definition for Story Visualization

The primary goal is to generate a sequence of images \(I_1, I_2, ..., I_m\) that faithfully and consistently visualize a given story, detailed through character information and a series of storyboard shot descriptions.

Given Inputs:

  1. Character Information (\(C_1, ..., C_n\)): For each of the \(n\) characters, provide:
    • Textual appearance description (\(T_i\)).
    • Corresponding reference image(s) (\(S_i\)) consistent with \(T_i\).
  2. Storyboard Shots (\(Shot_1, ..., Shot_m\)): For each of the \(m\) shots, provide a textual description including:
    • Setting Description: The environment and context of the current scene.
    • Plot Correspondence: The segment of the original narrative relevant to this shot.
    • Onstage Characters: A list of characters present and active in this shot.
    • Static Shot Description: Detailed depiction of characters' static actions, expressions, positions, and significant objects within the frame.
    • Shot Perspective Design: Cinematographic details like shot scale (e.g., close-up, wide shot), camera angle, and shot type.

Objective:

To generate an image sequence \(I_1, ..., I_m\) where each image \(I_j\) accurately and aesthetically portrays the specified \(Shot_j\). This includes:

  • Faithful depiction of characters according to their references and descriptions.
  • Correct representation of character actions, interactions, and expressions.
  • Accurate portrayal of the scene setting and atmosphere.
  • Adherence to the specified camera perspectives and shot compositions.
  • Maintaining visual and stylistic consistency across the image sequence.

Dataset Core Characteristics

Rich Diversity

Encompasses varied story types (comedy, horror, fantasy) and artistic styles (anime, 3D, realistic) for comprehensive model testing.

Complex Narratives

Features stories with single/multiple protagonists, intricate plots, and detailed world-building to challenge generation accuracy and consistency.

Character-Centric

Focuses on maintaining character consistency across shots, providing detailed descriptions and multiple reference images per character.

Significant Scale

Comprises 80 story segments, 344 unique characters, and 509 reference images, ensuring a robust evaluation base.

News & Updates

  • [Date]: [Placeholder: Benchmark v1.0 launched!]
  • [Date]: [Placeholder: Added new model 'XYZ' to the evaluation.]

Resources

About

Citation

[Placeholder: If you use this benchmark in your research, please cite it as follows:]

[Placeholder: BibTeX entry]

License

[Placeholder: The benchmark code and results are licensed under the [License Name, e.g., Apache 2.0] License. The dataset might have its own license.]