Zapiski Nauchnykh Seminarov POMI
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive
Impact factor

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Zap. Nauchn. Sem. POMI:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Zapiski Nauchnykh Seminarov POMI, 2024, Volume 540, Pages 178–193 (Mi znsl7550)  

An opensource library for AutoML multimodal clustering on Apache Spark

S. Muravyova, V. Kazakovtsevb, I. Usova, P. Shpinevaa, O. Muravyovaa, A. Shalytoa

a ITMO University, St. Petersburg, Russia
b Siberian Federal University, Krasnoyarsk, Russia
References:
Abstract: We present a library that allows to choose and configure the clustering algorithm for multimodal datasets, i.e., for data where every object is stored not as a single vector but can be presented as a vector, text, and an image at the same time, and every modality is significant. Our library automatically finds a tradeoff between exploration and exploitation for the input data among a set of implemented clustering algorithms according to the selected internal clustering validation index. The library also implements a recommender system for the internal validation index and can predict the best fitting measure for the input data. We used Apache Spark to implement clustering algorithms, thus, it can be used on distributed computing system to clusterize big multimodal data.
Key words and phrases: automatic machine learning, multimodal models, clustering, Apache Spark.
Funding agency Grant number
ITMO University 623097
This work was carried out as part of ITMO University project No. 623097 “Development of libraries of promising machine learning methods”.
Received: 15.11.2024
Document Type: Article
Language: English
Citation: S. Muravyov, V. Kazakovtsev, I. Usov, P. Shpineva, O. Muravyova, A. Shalyto, “An opensource library for AutoML multimodal clustering on Apache Spark”, Investigations on applied mathematics and informatics. Part IV, Zap. Nauchn. Sem. POMI, 540, POMI, St. Petersburg, 2024, 178–193
Citation in format AMSBIB
\Bibitem{MurKazUso24}
\by S.~Muravyov, V.~Kazakovtsev, I.~Usov, P.~Shpineva, O.~Muravyova, A.~Shalyto
\paper An opensource library for AutoML multimodal clustering on Apache Spark
\inbook Investigations on applied mathematics and informatics. Part~IV
\serial Zap. Nauchn. Sem. POMI
\yr 2024
\vol 540
\pages 178--193
\publ POMI
\publaddr St.~Petersburg
\mathnet{http://mi.mathnet.ru/znsl7550}
Linking options:
  • https://www.mathnet.ru/eng/znsl7550
  • https://www.mathnet.ru/eng/znsl/v540/p178
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Записки научных семинаров ПОМИ
    Statistics & downloads:
    Abstract page:130
    Full-text PDF :52
    References:26
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025