Code-switching speech synthesis for Mandarin-English using FastSpeech2: A unified IPA-based approach

Author: Wang Yinqiu

Overview:

This research explores two main methods for synthesizing natural-sounding code-switched speech between Mandarin and English using the FastSpeech2 model:

Experimental Setup

For Method 1, three sets of experiments were conducted:

For the unified IPA-based approach (Method 2/Group D), the input formats for both Mandarin and English were represented as phonological features based on the IPA. This method was implemented using the IMS-Toucan repository, with the pre-trained model fine-tuned on only 500 high-quality mixed Mandarin-English sentences.

Audio Samples
Groups / Sentences Tiktok是最近非常热门的一款APP。 我对这个topic很感兴趣。 没关系,也许之后能有更好的chance。
Group A
Group B
Group C
Group D

Relevance:

Successful development of code-switching TTS systems can facilitate communication across languages, with applications in education, media, and assistive technologies for enhancing accessibility in multilingual societies.