ViStoryBench Dataset Deep Dive

How ViStoryBench Was Built

Defining the Task: Comprehensive Story Visualization

ViStoryBench is designed around a comprehensive story generation task. The goal is to generate a sequence of images \(I_1...I_m\) that faithfully represent \(m\) storyboard shot descriptions. This process starts with providing appearance descriptions \(T_1, T_2, ..., T_n\) and corresponding reference images \(S_1, S_2, ..., S_n\) for \(n\) characters \(C_1, C_2, ..., C_n\), where each character \(C_i = (T_i, S_i)\).

Each of the \(m\) storyboard shot descriptions (Shot\(_i\)) includes several key components to guide image generation:

Setting Description: Details of the current scene's environment and atmosphere.
Plot Correspondence: The specific segment from the original narrative corresponding to this shot.
Onstage Characters: A list of characters present in the current shot.
Static Shot Description: A description of characters' static actions or positions and object placements, representing a fixed visual state.
Shot Perspective Design: Photography information, including shot scale, type, and camera angle.

The objective is to accurately depict characters, their actions, scene settings, and specified camera perspectives, adhering to all provided information.

Story & Script Curation

To ensure diversity, we manually collected 80 story segments from various sources, including film and television scripts, literary masterpieces, global legends, novels, and picture books. This extensive collection covers a wide range of narrative styles and themes.

For lengthy stories, Large Language Models (LLMs) assisted in summarizing them to a few hundred words each. These summaries were then converted into structured scripts, again with LLM assistance. Each script provides character descriptions and detailed storyboards for each shot, laying a clear foundation for visual generation. This process is exemplified in Figure 1(a) of the ViStoryBench paper.

Character Reference Imagery

We manually collected reference images from the internet that aligned with the character descriptions. A crucial aspect was maintaining stylistic consistency within each story, ensuring all character images in a single narrative shared the same artistic style (e.g., all anime or all realistic), as shown in Figure 1(b) of the paper.

For a small portion of characters where specific visual traits were needed, reference images were generated using SDXL. This dual strategy resulted in 344 unique characters and 509 reference images in total. Basic statistics about characters and story genres are presented in Figure 1(c) of the ViStoryBench paper.

Visualizing Dataset Composition

Character Category Distribution

Real Human

190

Virtual Human

135

Non-human

Character Gender Distribution

Male

210

Female

108

Agender/Non-binary

Story Genre Distribution (80 Stories)

Bar chart showing story counts per genre.

Dataset Analytics: Shots, Characters & Density

The following scatter plots illustrate the relationship between the number of shots and unique characters per story, categorized by style. The size of each point reflects its "Point Density" value from the dataset, indicating areas of higher data concentration.

Realistic Stories

Shots vs. Unique Characters (Realistic)

Unrealistic Stories

Shots vs. Unique Characters (Unrealistic)

All Stories

Shots vs. Unique Characters (All Combined)

Detailed Data Structure

The dataset is organized in a JSON-like structure. Below is a representation of the stories_data object, which holds all narrative and character information:


stories_data = {
    "story_id (e.g., 01)": {
    "type": "[e.g., WildStory_en]", // Source or version identifier (e.g., dataset_language)
        "shots": [
            {
        "shot_id": "00", // Unique identifier for the shot within the story
        "index": 1,      // Sequential order of the shot
        // --- Storyboard Information ---
        "Setting Description": "[Detailed description of the scene's environment and atmosphere]",
        "Plot Correspondence": "[The segment from the original story narrative that this shot visualizes]",
        "Onstage Characters": ["[Character Name A]", "[Character Name B]", ...], // List of characters appearing in this shot
        "Static Shot Description": "[A comprehensive description of the visual elements: character poses, expressions, object placements, etc.]",
        "Shot Perspective Design": "[Specifies camera work: e.g., 'Medium shot, eye level', 'High-angle wide shot']",
      },
      // ... additional shots for this story
        ],
        "characters": {
      "character_key (e.g., Little_Brown_Rabbit)": {
        "name": "Little Brown Rabbit",         // Canonical name of the character
        // "name_en": "Little Brown Rabbit",  // Example: English name (if localization is used)
        // "name_zh": "小棕兔",             // Example: Chinese name (if localization is used)
        "key": "Little_Brown_Rabbit",     // Unique key for the character, often matching the name
        "prompt": "[A detailed textual description of the character's appearance, attire, and defining features]",
        "tag": "[e.g., 'protagonist', 'animal', 'child', 'elderly']", // Descriptive tags for categorization
        "num_of_appearance": 16,         // Total number of shots this character appears in within this story
        "ref_images": [                  // Array of filenames for reference images
          "ref_image_01.jpg",
          "ref_image_02.png",
          // These filenames are relative to a path like: /data/datasets/{dataset_type}/{story_id}/image/{character_key}/
        ]
      },
      // ... additional characters in this story
    }
  },
  // ... additional stories in the dataset
}

Key Points about Structure:

Each story is an entry in stories_data keyed by its story_id.
shots array contains ordered shot descriptions, directly mapping to the paper's defined storyboard elements.
characters object contains details for each character, including descriptive prompts and paths to reference images.
Reference image paths are typically relative and resolved based on the story and character ID.

Illustrative Data Examples

Example: Storyboard Shot

(Story 01, Shot 00 - Little Brown Rabbit)

Setting Description: Nighttime, bedroom, cozy atmosphere, soft moonlight filtering through a window.

Plot Correspondence: It's time for the little rabbit to go to bed, but he tightly holds onto the big rabbit's very long ears, wanting to ensure Big Brown Rabbit is listening.

On-stage Characters: Little Brown Rabbit, Big Brown Rabbit.

Static Shot Description: The Little Brown Rabbit, looking sleepy but mischievous, is sitting on a patchwork quilt bed. He has a firm grasp on Big Brown Rabbit's ears. Big Brown Rabbit looks down with a patient, loving expression.

Shot Perspective Design: Medium shot, eye level with Little Brown Rabbit, capturing the interaction and expressions of both characters.

Example: Character Profile

(Little Brown Rabbit from Story 01)

Name: Little Brown Rabbit

Key: Little_Brown_Rabbit

Prompt: "An adorable small rabbit with soft, chestnut-brown fur, large expressive hazel eyes, and a twitchy pink nose. Wears tiny blue pajamas. Appears curious and a little cheeky."

Number of Appearances: 16

Reference Images:

More refs...

(Actual reference images for the character would be shown here)