Computer Research and Modeling
RUS  ENG    JOURNALS   PEOPLE   ORGANISATIONS   CONFERENCES   SEMINARS   VIDEO LIBRARY   PACKAGE AMSBIB  
General information
Latest issue
Archive

Search papers
Search references

RSS
Latest issue
Current issues
Archive issues
What is RSS



Computer Research and Modeling:
Year:
Volume:
Issue:
Page:
Find






Personal entry:
Login:
Password:
Save password
Enter
Forgotten password?
Register


Computer Research and Modeling, 2024, Volume 16, Issue 7, Pages 1779–1792
DOI: https://doi.org/10.20537/2076-7633-2024-16-7-1779-1792
(Mi crm1248)
 

SPECIAL ISSUE

Fast and accurate x86 disassembly using a graph convolutional network model

N. A. Strygin, N. D. Kudasov

Innopolis University, 1 Universitetskaya st., Innopolis, 420500, Russia
References:
Abstract: Disassembly of stripped x86 binaries is an important yet non-trivial task. Disassembly is difficult to perform correctly without debug information, especially on x86 architecture, which has variablesized instructions interleaved with data. Moreover, the presence of indirect jumps in binary code adds another layer of complexity. Indirect jumps impede the ability of recursive traversal, a common disassembly technique, to successfully identify all instructions within the code. Consequently, disassembling such code becomes even more intricate and demanding, further highlighting the challenges faced in this field. Many tools, including commercial ones such as IDA Pro, struggle with accurate x86 disassembly. As such, there has been some interest in developing a better solution using machine learning (ML) techniques. ML can potentially capture underlying compiler-independent patterns inherent for the compiler-generated assembly. Researchers in this area have shown that it is possible for ML approaches to outperform the classical tools. They also can be less timeconsuming to develop compared to manual heuristics, shifting most of the burden onto collecting a big representative dataset of executables with debug information. Following this line of work, we propose an improvement of an existing RGCN-based architecture, which builds control and flow graph on superset disassembly. The enhancement comes from augmenting the graph with data flow information. In particular, in the embedding we add Jump Control Flow and Register Dependency edges, inspired by Probabilistic Disassembly. We also create an open-source x86 instruction identification dataset, based on a combination of ByteWeight dataset and a selection open-source Debian packages. Compared to IDA Pro, a state of the art commercial tool, our approach yields better accuracy, while maintaining great performance on our benchmarks. It also fares well against existing machine learning approaches such as DeepDi.
Keywords: disassembly, machine learning, graph neural network, x86
Received: 26.10.2024
Revised: 15.11.2024
Accepted: 25.11.2024
Document Type: Article
UDC: 004.93
Language: English
Citation: N. A. Strygin, N. D. Kudasov, “Fast and accurate x86 disassembly using a graph convolutional network model”, Computer Research and Modeling, 16:7 (2024), 1779–1792
Citation in format AMSBIB
\Bibitem{StrKud24}
\by N.~A.~Strygin, N.~D.~Kudasov
\paper Fast and accurate x86 disassembly using a graph convolutional network model
\jour Computer Research and Modeling
\yr 2024
\vol 16
\issue 7
\pages 1779--1792
\mathnet{http://mi.mathnet.ru/crm1248}
\crossref{https://doi.org/10.20537/2076-7633-2024-16-7-1779-1792}
Linking options:
  • https://www.mathnet.ru/eng/crm1248
  • https://www.mathnet.ru/eng/crm/v16/i7/p1779
  • Citing articles in Google Scholar: Russian citations, English citations
    Related articles in Google Scholar: Russian articles, English articles
    Computer Research and Modeling
    Statistics & downloads:
    Abstract page:84
    Full-text PDF :91
    References:21
     
      Contact us:
     Terms of Use  Registration to the website  Logotypes © Steklov Mathematical Institute RAS, 2025