|
Avtomatika i Telemekhanika, 2025, Issue 1, Pages 80–98 DOI: https://doi.org/10.31857/S0005231025010057
(Mi at16478)
|
|
|
|
Intellectual Control Systems, Data Analysis
On guaranteed estimate of deviations from the target set in a control problem under reinforcement learning
I. A. Chistiakov Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics, Moscow, Russia
DOI:
https://doi.org/10.31857/S0005231025010057
Abstract:
We consider a target control problem of a special form, in which a system of differential equations includes nonlinear terms depending on state variables. We show that reinforcement learning algorithms such as Proximal Policy Optimization (PPO) can be used to find an inexact feedback solution. The chosen strategy is further approximated with a piecewise affine
control. Based on the dynamic programming method, an inner estimate of the solvability set is
calculated, as well as a corresponding a priori estimate of the distance between a final trajectory
point and the target set. To do this, we examine an auxiliary problem for a piecewise linear
system with noise and calculate a piecewise quadratic function as an approximate solution of
the Hamilton–Jacobi–Bellman equation.
Keywords:
nonlinear dynamics, dynamic programming, comparison principle, linearization, piecewise quadratic value function, reinforcement learning, PPO algorithm, solvability set.
English version:
Automation and Remote Control, 2025, Volume 86, Issue 1, Pages 61–73 DOI: https://doi.org/10.31857/S0005117925010055
Citation:
I. A. Chistiakov, “On guaranteed estimate of deviations from the target set in a control problem under reinforcement learning”, Avtomat. i Telemekh., 2025, no. 1, 80–98; Autom. Remote Control, 86:1 (2025), 61–73
Linking options:
https://www.mathnet.ru/eng/at16478 https://www.mathnet.ru/eng/at/y2025/i1/p80
|
|