Visyllable Based Speech Animation

Kshirsagar, S. and Magnenat-Thalmann, N.

Abstract: Visemes are visual counterpart of phonemes. Traditionally, the speech animation of 3D synthetic faces involves extraction of visemes from input speech followed by the application of co-articulation rules to generate realistic animation. In this paper, we take a novel approach for speech animation - using visyllables, the visual counterpart of syllables. The approach results into a concatenative visyllable based speech animation system. The key contribution of this paper lies in two main areas. Firstly, we define a set of visyllable units for spoken English along with the associated phonological rules for valid syllables. Based on these rules, we have implemented a syllabification algorithm that allows segmentation of a given phoneme stream into syllables and subsequently visyllables. Secondly, we have recorded the database of visyllables using a facial motion capture system. The recorded visyllable units are post-processed semi-automatically to ensure continuity at the vowel boundaries of the visyllables. We define each visyllable in terms of the Facial Movement Parameters (FMP). The FMPs are obtained as a result of the statistical analysis of the facial motion capture data. The FMPs allow a compact representation of the visyllables. Further, the FMPs also facilitate the formulation of rules for boundary matching and smoothing after concatenating the visyllables units. Ours is the first visyllable based speech animation system. The proposed technique is easy to implement, effective for real-time as well as non real-time applications and results into realistic speech animation.

  journal = {Computer Graphics Forum (Proc. Eurographics '03)},
  author = {Kshirsagar, S. and Magnenat-Thalmann, N.},
  title = {Visyllable Based Speech Animation},
  publisher = {Blackwell Publishing},
  volume = {Vol. 22},
  number = {No. 3},
  pages = {632-640},
  month = sep,
  year = {2003},
  topic = {Facial Animation}