Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Dokl. RAN. Math. Inf. Proc. Upr.:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia, 2024, Volume 520, Number 2, Pages 260–266
DOI: https://doi.org/10.31857/S2686954324700620
(Mi danma605)
 

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

MDS-ViTNet: Improving Saliency Prediction for Eye-Tracking with Vision Transformer

I. Polezhaevab, I. Goncharenkobc, N. Yurinac

a Yandex, Moscow, Russia
b Moscow Institute of Physics and Technology, Dolgoprudny, Moscow oblast, Russia
c Sber, Moscow, Russia
DOI: https://doi.org/10.31857/S2686954324700620
Abstract: In this paper, we present a novel methodology we call MDS-ViTNet (Multi Decoder Saliency by Vision Transformer Network) for enhancing visual saliency prediction or eye-tracking. This approach holds significant potential for diverse fields, including marketing, medicine, robotics, and retail. We propose a network architecture that leverages the Vision Transformer, moving beyond the conventional ImageNet backbone. The framework adopts an encoder-decoder structure, with the encoder utilizing a Swin transformer to efficiently embed most important features. This process involves a Transfer Learning method, wherein layers from the Vision Transformer are converted by the Encoder Transformer and seamlessly integrated into a CNN Decoder. This methodology ensures minimal information loss from the original input image. The decoder employs a multi-decoding technique, utilizing dual decoders to generate two distinct attention maps. These maps are subsequently combined into a singular output via an additional CNN model. Our trained model MDS-ViTNet achieves state-of-the-art results across several benchmarks. Committed to fostering further collaboration, we intend to make our code, models, and datasets accessible to the public.
Received: 27.09.2024
Accepted: 02.10.2024
English version:
Doklady Mathematics, 2024, Volume 110, Issue suppl. 1, Pages S230–S235
DOI: https://doi.org/10.1134/S1064562424602117
Bibliographic databases:
Document Type: Article
UDC: 004.8
Language: Russian
Citation: I. Polezhaev, I. Goncharenko, N. Yurina, “MDS-ViTNet: Improving Saliency Prediction for Eye-Tracking with Vision Transformer”, Dokl. RAN. Math. Inf. Proc. Upr., 520:2 (2024), 260–266; Dokl. Math., 110:suppl. 1 (2024), S230–S235
Citation in format AMSBIB
\Bibitem{PolGonYur24}
\by I.~Polezhaev, I.~Goncharenko, N.~Yurina
\paper MDS-ViTNet: Improving Saliency Prediction for Eye-Tracking with Vision Transformer
\jour Dokl. RAN. Math. Inf. Proc. Upr.
\yr 2024
\vol 520
\issue 2
\pages 260--266
\mathnet{http://mi.mathnet.ru/danma605}
\elib{https://elibrary.ru/item.asp?id=80287453}
\transl
\jour Dokl. Math.
\yr 2024
\vol 110
\issue suppl. 1
\pages S230--S235
\crossref{https://doi.org/10.1134/S1064562424602117}
Linking options:
  • https://www.mathnet.ru/eng/danma605
  • https://www.mathnet.ru/eng/danma/v520/i2/p260
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025