Speech results from "Comic-Guided Speech Synthesis"

Abstract: We introduce a novel approach to synthesize realistic speeches for comics. Using a comic page as input, our approach synthesizes speeches for each comic character in a lively and engaging manner following the reading flow. It adopts a cascading strategy to synthesize a speech in two stages: Comic Visual Analysis and Comic Speech Synthesis. In the first stage, it analyzes the input comic page to identify the gender and age of the comic characters, as well as the texts each character speaks and the corresponding speaking emotion. Guided by such an analysis, our approach synthesizes realistic speeches for each character in the second stage that are consistent with the visual observations. In our experiments, we show that the proposed approach can synthesize realistic and lively speeches for different types of comics. Perceptual studies performed on the synthesis results of multiple example comics validate the efficacy of our approach.

Contents

Comic Speech Results

American Hero Comics
Western Fairy Tail Comics
Japanese Detective Manga
Educational Comics
Real-Life Comics
Poster Comics

Particular Common Scenarios in Comics

Different Emotions
Diﬀerent Degrees of Same Emotion
Combined Emotions

Our Approach vs. Other Methods

Our Approach
Manual Synthesis
Professional Narration

Comic Speech Results

We applied our approach for synthesizing speeches for four comic pages, which cover diﬀerent themes and styles.

American Hero Comics (Infinity Countdown : Captain Marvel #1 (May 2018) )

Western Fairy Tail Comics (Briar Rose)

Please refer to https://disney.fandom.com/wiki/Disney_Princess_(comic_book) (Disney Princess - Issue #1 page 6) for the comics.

Japanese Detective Manga (Detective Conan - Chapter 1025 (Dec. 2018))

Note: the reading flow of Janpanese Manga is from right to left and from top to bottom.

Educational Comics (The Magic School Bus Rides Again - Season 2, Episode 10 - Tim and the Talking Trees. (Apr. 2018))

Real-Life Comics

Poster Comics

Particular Common Scenarios in Comics

In addition, we demonstrate how our approach can be applied to tackle several common scenarios in comics narration:

(1) Different Emotions

The same speech text can be spoken with diﬀerent emotions as inferred from the diﬀerent facial expressions.

Text: How to draw comics when I can't actually draw.

(2) Diﬀerent Degrees of the Same Emotion

The same speech text can be spoken in diﬀerent degrees of the same emotion. In the first case, the three characters speak the same speech text with diﬀerent degrees of anger as inferred from their facial expressions. In the second case, the three characters speak the same speech text with diﬀerent degrees of sad as inferred from their facial expressions.

Part 1: Different Comic Characters with Same Speech Text

Text: You really caught our attention.

Text: Why do you want to break up with her?

Part 2: Same Comic Character with Different Speech Texts

Text: I didn't mean it that way.

Text: You see, I have been trying since the very beginning.

Text: I'm sorry for your loss.

(3) Combined Emotions

Speech texts can be spoken with diﬀerent combined emotions as inferred from the characters’ faces and their speech texts. For example, a speech text can be spoken with emotion states fearful and sad, or fearful and angry.

Text: This does not suppose to be this way.

Text: Are you sure to do that?

Our Approach vs. Other Methods

We compare the synthesized comic speeches of our approach with the The compared approaches consist of our approach, professional narration, and user generation (i.e., Tacotron text-to-speech techniques and audio editing tools like CoolEdit). We use the educational comics "The Magic School Bus Rides Again" (© Joanna Cole, Bruce Degen / Netflix Inc.) to conduct this experiment.

		Oh, the forest, where since the beginning of time people have come for peace and tranquility.
		Shh, I'm trying to hear the trees you guys.
		Please, please. Say something.
		I knew it. Nothing to hear here.
		Untrue. I hear animal noises.
		But is that noise or is it communication between animals?
		Noises sound without meaning.
		And communication is about sending a message.