Yang Liu / Shanghai Liulishuo Information Technology Co. Ltd.
Han Liu / Shanghai University
Wei Chu / Shanghai Liulishuo Information Technology Co. Ltd.
Yalei Zaho / Shanghai Jiao Tong University
Abstract: In this paper, a novel face reenactment-based method is proposed to assist the learning of English pronunciation through synthesizing a virtual pronunciation video of a student from a portrait photo of a student and a pre-recorded canonical pronunciation portrait video of a teacher. Firstly, shape models are constructed from the mouth regions of both teacher and student by running face analysis. Then, sequential changes in the shape model of the teacher are learned from the image sequence in the video. By applying affine transformed changes in shape model from the teacher, a sequence of mouth shapes of the student is constructed into standard degree of lip-rounding without losing his/her own mouth characteristics. Delaunay triangulation-based face morphing is then applied to reconstruct the texture of the mouth region. After the teeth and tongue of the teacher are in-painted into the mouth of the student, poisson editing and histogram equalization in RGB color are employed to make the syn-thesized student face look natural. In the end, a virtual pronunciation video of the student with canonical audio track from the teacher is generated to let the student clearly view how his/her own mouth should move properly during pronunciation. The proposed method has been demonstrated in our con-trolled subjective testing to be promising in pronunciation learning.