Toward Development of Spoken Dialogue System Based on Deep Learning and Non-Extensive Statistics

1.Abstract

The purpose of this research is to develop automatic speech recognition (ASR) and
natural language processing (NLP) based on deep learning (DL) and non-extensive
statistics (NE). Currently, DL becomes state-of-the-art for many pattern recognition
problems, especially for many natural signals such as speech, language and video. How-
ever, some principal problems remain for ASR or DL. Some of them are high computa-
tional load for training and problem of local minima. In this study, we investigate the
use of NE formulations to solve some problems of DL based ASR and NLP. We expect
this study could launch the development of spoken dialogue systems in the future.

2.Keywords
Automatic speech recognition, natural language processing, deep learning, non-extensive statistics
3.Objective

The objective of this study is to develop ASR and NLP systems for large vocabulary and
spontaneous speech robust against environmental distortions. We would like to prove our
hypothesis that by implementing NE frameworks in feature extraction for DNN systems,
we could improve the performance of current ASR and NLP systems. We expect this study
could be a pioneer for developing SDS in the future.

4.Methodology

To achieve our objective, we first derive new model for speech on NE frameworks, assuming
speech complexity. Various formulations of NE frameworks would be studied. Some of them
are: the use of q-log and q-exp for speech dynamics, the use of q-Gaussian to model speech
distribution, and q-operator to express various relations of speech components. Then, a
feature extraction method based on this model is developed. Parameter q is found influential
in many NE frameworks. Therefore, various aspects would be investigated: (1) which values
of q parameter of NE frameworks achieve the best performance, (2) the relations of q and
phoneme classes. We plan to evaluate the proposed method on large vocabulary tasks:
LibriSpeech corpus and AMI corpus that are freely available for research purposes. The
extension on more difficult task are also a possibility to evaluate the generality of features
on various tasks.

5.Team

1. Hilman F. Pardede (Group Leader). Researcher at Research Center for Informatics, Indonesian Institute of Sciences
2. Driszal Fryantoni. Researcher at Research Center for Informatics, Indonesian Institute of Sciences
3. Iftitahu Ni�mah. Researcher at Research Center for Informatics, Indonesian Institute of Sciences
4. Raden S. Yuwana. Researcher at Research Center for Informatics, Indonesian Institute of Sciences
5. Vicky Zilvan. Researcher at Technical Management Unit for Signal and Navigation Development. Indonesian Institute of Sciences
6. Achmad F. Abka. Research assistant at Research Center for Informatics, Indonesian Institute of Sciences
7. Asri R. Yuliani. Research assistant at Research Center for Informatics, Indonesian Institute of Sciences

6.Computation plan (required processor core hours, data storage, software, etc)

Required software:
Kaldi toolkit http://kaldi.sourceforge.net/about.html

data storage: 10 TB

 

7.Source of funding
Independent research
8.Target/outputs
1 International journal and 2 proceedings.
9.Date of usage
07/01/2016 - 31/12/2019
10.Gpu usage
use gpu
11.Supporting files
prop_1450774645.pdf
12.Created at
22/12/2015
13.Approval status
approved