Computing, Telecommunication and Control
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computing, Telecommunication and Control:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computing, Telecommunication and Control, 2025, Volume 18, Issue 1, Pages 60–71
DOI: https://doi.org/10.18721/JCSTCS.18105
(Mi ntitu389)
 

Circuits and Systems for Receiving, Transmitting and Signal Processing

Resnet-SV: Fast and accurate speaker verification with a multi-layer cascade attention mechanism

A. A. Aliyev, S. A. Molodyakov

Peter the Great St. Petersburg Polytechnic University
Abstract: One of the most challenging issues of voice biometrics rapid development is the need to develop methods that can combine speed and accuracy. Traditional solutions tend to choose a compromise between these two aspects, which either complicates the speaker verification process or reduces accuracy, especially under real-world conditions in which background noise and fluctuation in speech are substantial obstacles. This paper examines modern approaches and their architectural features. The architecture is based on ResNet, originally designed for computer vision tasks, which was modified and adapted for optimal performance in speech processing. The proposed modification method based on a multi-layer cascade attention mechanism for feature extraction from convolutional blocks is described in detail. This modification allows using fewer layers for feature extraction, thereby increasing the speed of the model, and allows to deal more effectively with the noise in the audio signal. The paper concludes with the model parameters used in the training process, as well as key metrics such as EER and minDCF computed on the VoxCeleb1 dataset. The results are compared with solutions built on other architectures. Through experimentation, the authors were able to achieve a high level of accuracy, with a smaller number of the neural network model parameters. This work brings us closer to a wider application of voice biometric systems in various scenarios.
Keywords: speaker verification, speaker identification, voice biometrics, convolutional neural networks, attention mechanism, speech processing.
Received: 24.11.2024
Document Type: Article
UDC: 004.89
Language: English
Citation: A. A. Aliyev, S. A. Molodyakov, “Resnet-SV: Fast and accurate speaker verification with a multi-layer cascade attention mechanism”, Computing, Telecommunication and Control, 18:1 (2025), 60–71
Citation in format AMSBIB
\Bibitem{AliMol25}
\by A.~A.~Aliyev, S.~A.~Molodyakov
\paper Resnet-SV: Fast and accurate speaker verification with a multi-layer cascade attention mechanism
\jour Computing, Telecommunication and Control
\yr 2025
\vol 18
\issue 1
\pages 60--71
\mathnet{http://mi.mathnet.ru/ntitu389}
\crossref{https://doi.org/10.18721/JCSTCS.18105}
Linking options:
  • https://www.mathnet.ru/eng/ntitu389
  • https://www.mathnet.ru/eng/ntitu/v18/i1/p60
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computing, Telecommunication and Control
    Statistics & downloads:
    Abstract page:77
    Full-text PDF :43
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025