Abstract
Developing deep neural models for continuous recognition of sign gestures
and generation of sign videos from spoken sentences is still challenging and requires
much investigation in earlier studies. Although the recent approaches provide plausible
solutions for these tasks, they still fail to perform well in handling continuous sentences
and visual quality aspects. The recent advancements in deep learning techniques
envisioned new milestones in handling such complex tasks and producing impressive
results. This paper proposes novel approaches to develop a deep neural framework for
recognizing multilingual sign datasets and multimodal sign gestures. In addition to that,
the proposed model generates sign gesture videos from spoken sentences. In the first
fold, it deals with the sign gesture recognition tasks using a hybrid CNN-LSTM
algorithm. The second fold uses the hybrid NMT-GAN techniques to produce highquality sign gesture videos. The proposed model has been evaluated using different
quality metrics. We also compared the proposed model performance qualitatively using
different benchmark sign language datasets. The proposed model achieves 98%
classification accuracy and improved video quality in sign language recognition and
video generation tasks.
Keywords: Convolutional Neural Network, Feature extraction, Generative Adversarial Networks, Long short-term Memory, Machine Translation, Recognition, Sign Language, Subunit modeling, Translation, Video generation.