Long short-term memory (LSTM) is a specific recurrent neural network (RNN) architecture that is designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we propose to use deep bidirectional LSTM (BLSTM) for audio/visual modeling in our photo-real talking head system. An audio/visual database of a subject’s talking is firstly reco