I. A. Chistiakov, “On guaranteed estimate of deviations from the target set in a control problem under reinforcement learning”, Avtomat. i Telemekh., 2025, no. 1, 80–98; Autom. Remote Control, 86:1 (2025), 61

Avtomatika i Telemekhanika

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive
	Impact factor
	Guidelines for authors
	Submit a manuscript

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Avtomat. i Telemekh.:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Avtomatika i Telemekhanika, 2025, Issue 1, Pages 80–98
DOI: https://doi.org/10.31857/S0005231025010057 (Mi at16478)

Intellectual Control Systems, Data Analysis

On guaranteed estimate of deviations from the target set in a control problem under reinforcement learning

I. A. Chistiakov

Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics, Moscow, Russia

Full-text PDF (1617 kB) First page

References:

PDF

HTML

DOI: https://doi.org/10.31857/S0005231025010057

Abstract: We consider a target control problem of a special form, in which a system of differential equations includes nonlinear terms depending on state variables. We show that reinforcement learning algorithms such as Proximal Policy Optimization (PPO) can be used to find an inexact feedback solution. The chosen strategy is further approximated with a piecewise affine control. Based on the dynamic programming method, an inner estimate of the solvability set is calculated, as well as a corresponding a priori estimate of the distance between a final trajectory point and the target set. To do this, we examine an auxiliary problem for a piecewise linear system with noise and calculate a piecewise quadratic function as an approximate solution of the Hamilton–Jacobi–Bellman equation.

Keywords: nonlinear dynamics, dynamic programming, comparison principle, linearization, piecewise quadratic value function, reinforcement learning, PPO algorithm, solvability set.

Funding agency	Grant number
Ministry of Science and Higher Education of the Russian Federation	075-15-2022-284
This work was carried out with financial support from the Ministry of Science and Higher Education of the Russian Federation within the framework of the program of the Moscow Center for Fundamental and Applied Mathematics under the agreement no. 075-15-2022-284.

Presented by the member of Editorial Board: P. V. Pakshin

Received: 29.08.2023
Revised: 14.10.2024
Accepted: 29.10.2024

English version:
Automation and Remote Control, 2025, Volume 86, Issue 1, Pages 61–73
DOI: https://doi.org/10.31857/S0005117925010055

Bibliographic databases:

Document Type: Article

Language: Russian

Citation: I. A. Chistiakov, “On guaranteed estimate of deviations from the target set in a control problem under reinforcement learning”, Avtomat. i Telemekh., 2025, no. 1, 80–98; Autom. Remote Control, 86:1 (2025), 61–73

Citation in format AMSBIB

\Bibitem{Chi25}

\by I.~A.~Chistiakov

\paper On guaranteed estimate of deviations from the target set in a control problem under reinforcement learning

\jour Avtomat. i Telemekh.

\yr 2025

\issue 1

\pages 80--98

\mathnet{http://mi.mathnet.ru/at16478}

\edn{https://elibrary.ru/JQKKTQ}

\transl

\jour Autom. Remote Control

\yr 2025

\vol 86

\issue 1

\pages 61--73

Linking options:

https://www.mathnet.ru/eng/at16478

https://www.mathnet.ru/eng/at/y2025/i1/p80

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Registration to the website

Logotypes