Multimedia Communication with Virtual Humans

Kshirsagar, S. and Magnenat-Thalmann, N.

Abstract: A virtual environment is typically inhabited by a clone or an avatar representing a real person, and a virtual, autonomous actor. In order to have a multimodal interaction between these entities, we must consider the mimicking aspect of the clone as well as the autonomy of the virtual actor. The mimicking can be considered for speech, facial expressions and body gestures. In this paper, we concentrate on speech and facial expressions as output media for the avatar as well as for the autonomous actor. Speech and text are used as input media for a dialogue with the autonomous, virtual human. Lip synchronization is an important aspect of speech animation for virtual humans. This paper describes a Linear Predictive (LP) analysis method to extract lip synchronization information from speech signal and apply it to a synthetic MPEG-4 compatible 3D face. We use neural networks to classify the acoustic characteristics (coefficients obtained from LP analysis) in order to recognize phonemes/visemes. In addition, we use the energy variation to generate satisfactory co-articulation for the speech animation. In this paper, we also discuss the implementation of emotionally autonomous actor and possibility of dialogue with it. We use MPEG-4 compatible synthetic faces for representing virtual humans or avatars. Though we are currently using text input for this dialogue, it can be extended to speech input with the help of a powerful speech recognition software.

  booktitle = {Proceedings of Euromedia 2000},
  author = {Kshirsagar, S. and Magnenat-Thalmann, N.},
  title = {Multimedia Communication with Virtual Humans},
  publisher = {Society for Computer Simulation},
  month = may,
  year = {2000},
  topic = {Facial Animation}