Computer Optics
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computer Optics:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computer Optics, 2022, Volume 46, Issue 6, Pages 955–962
DOI: https://doi.org/10.18287/2412-6179-CO-1092
(Mi co1091)
 

IMAGE PROCESSING, PATTERN RECOGNITION

Method for visual analysis of driver's face for automatic lip-reading in the wild

A. A. Axyonov, D. A. Ryumin, A. M. Kashevnik, D. V. Ivanko, A. A. Karpov

St. Petersburg Federal Research Center of the Russian Academy of Sciences
Abstract: The paper proposes a method of visual analysis for automatic speech recognition of the vehicle driver. Speech recognition in acoustically noisy conditions is one of big challenges of artificial intelligence. The problem of effective automatic lip-reading in vehicle environment has not yet been resolved due to the presence of various kinds of interference (frequent turns of driver's head, vibration, varying lighting conditions, etc.). In addition, the problem is aggravated by the lack of available databases on this topic. A MediaPipe Face Mesh is used to find and extract the region-of-interest (ROI). We have developed End-to-End neural network architecture for the analysis of visual speech. Visual features are extracted from a single image using a convolutional neural network (CNN) in conjunction with a fully connected layer. The extracted features are input to a Long Short-Term Memory (LSTM) neural network. Due to a small amount of training data we proposed that a Transfer Learning method should be applied. Experiments on visual analysis and speech recognition present great opportunities for solving the problem of automatic lip-reading. The ex-periments were performed on an in-house multi-speaker audio-visual dataset RUSAVIC. The maximum recognition accuracy of 62 commands is 64.09%. The results can be used in various automatic speech recognition systems, especially in acoustically noisy conditions (high speed, open windows or a sunroof in a vehicle, backgoround music, poor noise insulation, etc.) on the road.
Keywords: vehicle, driver, visual speech recognition, automated lip-reading, machine learning, End-to-End, CNN, LSTM
Funding agency Grant number
Russian Foundation for Basic Research 19-29-09081-мк
Ministry of Science and Higher Education of the Russian Federation FFZF-2022-0005
Grant Council of the President of the Russian Federation НШ-17.2022.1.6
This work was partly funded by the Russian Foundation for Basic Research under grant No. 19-29-09081 and the state research project No. 0073-2019-0005.
Received: 25.12.2021
Accepted: 30.04.2022
Document Type: Article
Language: Russian
Citation: A. A. Axyonov, D. A. Ryumin, A. M. Kashevnik, D. V. Ivanko, A. A. Karpov, “Method for visual analysis of driver's face for automatic lip-reading in the wild”, Computer Optics, 46:6 (2022), 955–962
Citation in format AMSBIB
\Bibitem{AxyRyuKas22}
\by A.~A.~Axyonov, D.~A.~Ryumin, A.~M.~Kashevnik, D.~V.~Ivanko, A.~A.~Karpov
\paper Method for visual analysis of driver's face for automatic lip-reading in the wild
\jour Computer Optics
\yr 2022
\vol 46
\issue 6
\pages 955--962
\mathnet{http://mi.mathnet.ru/co1091}
\crossref{https://doi.org/10.18287/2412-6179-CO-1092}
Linking options:
  • https://www.mathnet.ru/eng/co1091
  • https://www.mathnet.ru/eng/co/v46/i6/p955
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computer Optics
    Statistics & downloads:
    Abstract page:51
    Full-text PDF :79
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025