Speech results from "Comic-Guided Speech Synthesis"

Abstract: We introduce a novel approach to synthesize realistic speeches for comics. Using a comic page as input, our approach synthesizes speeches for each comic character in a lively and engaging manner following the reading flow. It adopts a cascading strategy to synthesize a speech in two stages: Comic Visual Analysis and Comic Speech Synthesis. In the first stage, it analyzes the input comic page to identify the gender and age of the comic characters, as well as the texts each character speaks and the corresponding speaking emotion. Guided by such an analysis, our approach synthesizes realistic speeches for each character in the second stage that are consistent with the visual observations. In our experiments, we show that the proposed approach can synthesize realistic and lively speeches for different types of comics. Perceptual studies performed on the synthesis results of multiple example comics validate the efficacy of our approach.

Contents

 

 

Comic Speech Results

We applied our approach for synthesizing speeches for four comic pages, which cover different themes and styles.

American Hero Comics (Infinity Countdown : Captain Marvel #1 (May 2018) )

© Jim Mccann, Diego Olortegui, In-Hyuk Lee / Marvel Comics Inc.





Western Fairy Tail Comics (Briar Rose)


© Geoffrey Golden, Georgia Ball, Amy Mebberson / Disney Inc.

Please refer to https://disney.fandom.com/wiki/Disney_Princess_(comic_book) (Disney Princess - Issue #1 page 6) for the comics.


Japanese Detective Manga (Detective Conan - Chapter 1025 (Dec. 2018))

Note: the reading flow of Janpanese Manga is from right to left and from top to bottom.

© Gosho Aoyama / VIZ Media LLC (English)

Educational Comics (The Magic School Bus Rides Again - Season 2, Episode 10 - Tim and the Talking Trees. (Apr. 2018))


© Joanna Cole, Bruce Degen / Netflix Inc.


Real-Life Comics

© Cassandra Calin



Poster Comics





Particular Common Scenarios in Comics

In addition, we demonstrate how our approach can be applied to tackle several common scenarios in comics narration:

(1) Different Emotions

The same speech text can be spoken with different emotions as inferred from the different facial expressions.

Text: How to draw comics when I can't actually draw.

(2) Different Degrees of the Same Emotion

The same speech text can be spoken in different degrees of the same emotion. In the first case, the three characters speak the same speech text with different degrees of anger as inferred from their facial expressions. In the second case, the three characters speak the same speech text with different degrees of sad as inferred from their facial expressions.

Part 1: Different Comic Characters with Same Speech Text

Text: You really caught our attention.

Text: Why do you want to break up with her?

Part 2: Same Comic Character with Different Speech Texts

Text: I didn't mean it that way.
Text: You see, I have been trying since the very beginning.
Text: I'm sorry for your loss.

(3) Combined Emotions

Speech texts can be spoken with different combined emotions as inferred from the characters’ faces and their speech texts. For example, a speech text can be spoken with emotion states fearful and sad, or fearful and angry.

Text: This does not suppose to be this way.
Text: Are you sure to do that?



Our Approach vs. Other Methods

We compare the synthesized comic speeches of our approach with the The compared approaches consist of our approach, professional narration, and user generation (i.e., Tacotron text-to-speech techniques and audio editing tools like CoolEdit). We use the educational comics "The Magic School Bus Rides Again" (© Joanna Cole, Bruce Degen / Netflix Inc.) to conduct this experiment.

Oh, the forest, where since the beginning of time people have come for peace and tranquility.

Shh, I'm trying to hear the trees you guys.

Please, please. Say something.

I knew it. Nothing to hear here.

Untrue. I hear animal noises.

But is that noise or is it communication between animals?

Noises sound without meaning.

And communication is about sending a message.