548 / 2019-03-14 18:01:31
MSAM:A Multi-Layer Bi-LSTM Based Speech to Vector Model with Residual Attention Mechanism
recurrent neural network,speech to vector,multi-LSTM layers,residual attention mechanism
终稿
Dongdong Cui / Tsinghua University
Shouyi Yin / Tsinghua University
Jiangyuan Gu / Tsinghua University
Leibo Liu / Tsinghua University
Shaojun Wei / Tsinghua University
Word embedding is one of the most popular representation of a document vocabulary. It is capable of capturing the context, semantic and syntactic similarity of words in a document. Word2vec is a well-known technique to learn word embeddings of fixed dimensionality by using shallow neural networks, which can also be used to transform the audio segment of each words into a vector. In this paper, a deep neural network based on speech to vector model is proposed to learn the vector directly from the speech segment, in which the vector can represent some semantic information. Unlike the previous methods, such as speech2vec , our proposed model adopts a high-performance parser based on the residual attention mechanism, which uses multi-layer bi-directional long short-term memory (LSTM) network to learn representations of the audio segment. Finally, our proposed speech to vector model is analyzed and evaluated on 12 public datasets, which are widely-used in word similarity and word analogy benchmarks.
重要日期
  • 会议日期

    06月12日

    2019

    06月14日

    2019

  • 06月12日 2019

    初稿截稿日期

  • 06月14日 2019

    注册截止日期

承办单位
Xi'an University of Technology
联系方式
历届会议
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询