|
Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia, 2024, Volume 520, Number 2, Pages 260–266 DOI: https://doi.org/10.31857/S2686954324700620
(Mi danma605)
|
|
|
|
SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES
MDS-ViTNet: Improving Saliency Prediction for Eye-Tracking with Vision Transformer
I. Polezhaevab, I. Goncharenkobc, N. Yurinac a Yandex, Moscow, Russia
b Moscow Institute of Physics and Technology, Dolgoprudny, Moscow oblast, Russia
c Sber, Moscow, Russia
DOI:
https://doi.org/10.31857/S2686954324700620
Abstract:
In this paper, we present a novel methodology we call MDS-ViTNet (Multi Decoder Saliency by Vision Transformer Network) for enhancing visual saliency prediction or eye-tracking. This approach holds significant potential for diverse fields, including marketing, medicine, robotics, and retail. We propose a network architecture that leverages the Vision Transformer, moving beyond the conventional ImageNet backbone. The framework adopts an encoder-decoder structure, with the encoder utilizing a Swin transformer to efficiently embed most important features. This process involves a Transfer Learning method, wherein layers from the Vision Transformer are converted by the Encoder Transformer and seamlessly integrated into a CNN Decoder. This methodology ensures minimal information loss from the original input image. The decoder employs a multi-decoding technique, utilizing dual decoders to generate two distinct attention maps. These maps are subsequently combined into a singular output via an additional CNN model. Our trained model MDS-ViTNet achieves state-of-the-art results across several benchmarks. Committed to fostering further collaboration, we intend to make our code, models, and datasets accessible to the public.
Received: 27.09.2024 Accepted: 02.10.2024
Citation:
I. Polezhaev, I. Goncharenko, N. Yurina, “MDS-ViTNet: Improving Saliency Prediction for Eye-Tracking with Vision Transformer”, Dokl. RAN. Math. Inf. Proc. Upr., 520:2 (2024), 260–266; Dokl. Math., 110:suppl. 1 (2024), S230–S235
Linking options:
https://www.mathnet.ru/eng/danma605 https://www.mathnet.ru/eng/danma/v520/i2/p260
|
|