|
Averaging and mixing for stochastic perturbations of linear conservative systems
G. Huangab, S. B. Kuksincb a School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China
b Peoples' Friendship University of Russia (RUDN University), Moscow, Russia
c Université Paris-Diderot (Paris 7), UFR de Mathématiques, Paris, France
Abstract:
We study stochastic perturbations of linear systems of the form
\begin{equation*}
dv(t)+Av(t)\,dt=\varepsilon P(v(t))\,dt+\sqrt{\varepsilon}\,\mathcal{B}(v(t))\,dW (t), \qquad v\in\mathbb{R}^D,
\tag{*}
\end{equation*}
where $A$ is a linear operator with non-zero imaginary spectrum. It is assumed that the vector field $P(v)$ and the matrix function $\mathcal{B}(v)$ are locally Lipschitz with at most polynomial growth at infinity, that the equation is well posed and a few of first moments of the norms of solutions $v(t)$ are bounded uniformly in $\varepsilon$. We use Khasminski's approach to stochastic averaging to show that, as $\varepsilon\to0$, a solution $v(t)$, written in the interaction representation in terms of the operator $A$, for $0\leqslant t\leqslant\text{Const}\cdot\varepsilon^{-1}$ converges in distribution to a solution of an effective equation. The latter is obtained from $(*)$ by means of certain averaging. Assuming that equation $(*)$ and/or the effective equation are mixing, we examine this convergence further.
Bibliography: 27 titles.
Keywords:
averaging, mixing, stationary measures, effective equations, uniform in time convergence.
Received: 25.06.2022
Dedicated to the memory of M. I. Vishik on the occasion of his 100th birthday
1. Introduction1.1. The setting and problems The goal of this paper is to present an averaging theory for perturbations of conservative linear differential equations by locally Lipschitz nonlinearities and stochastic terms. Namely, we examine the stochastic equations
$$
\begin{equation}
dv(t)+Av(t)\,dt =\varepsilon P(v(t))\,dt +\sqrt{\varepsilon}\,\mathcal{B}(v(t))\,dW (t), \qquad v\in\mathbb{R}^D,
\end{equation}
\tag{1.1}
$$
where $0<\varepsilon\leqslant1$, $A$ is a linear operator with non-zero pure imaginary eigenvalues $\{i\lambda_j\}$ (so that the dimension $D$ is even), $P$ is a locally Lipschitz vector field on $\mathbb{R}^D$, $W(t)$ is the standard Wiener process in $\mathbb{R}^{N}$ and $\mathcal{B}(v)$ is a $D\times N$ matrix. We wish to study for small $\varepsilon$ the behaviour of solutions of equation (1.1) on intervals of time of order $\varepsilon^{-1}$, and under some additional restriction on the equation we examine the limiting behaviour of solutions as $\varepsilon\to0$, uniformly in time. 1.2. Our results and their deterministic analogues We have tried to make our work ‘reader-friendly’ and accessible to people with just a limited knowledge of stochastic calculus. To achieve this in the main part of the paper we restrict ourselves to the case of equations with additive noise $\sqrt\varepsilon\,\mathcal{B}\,dW(t)$ and exploit there a technical convenience: we introduce a complex structure in $\mathbb{R}^D$, by rewriting the phase space $\mathbb{R}^D$ as $\mathbb{C}^{D/2}$ (recall that $D$ is even), in such a way that the operator $A$ is diagonal in the corresponding complex coordinates: $A=\operatorname{diag}\{i\lambda_j\}$. General equations (1.1) are discussed in § 8, where they are treated in parallel with the equations with additive noise considered previously. As it is custom in the classical deterministic Krylov–Bogolyubov averaging (for example, see [4], [1], and [13]), to study solutions $v(t)\in \mathbb{C}^{D/2}$ we write them in the interaction representation, which preserves the norms of the complex components $v_j(\tau)$, but amends their angles. (See the substitution (2.8) below.) The first principal result of the work is given by Theorem 4.7, where we assume uniform, in $\varepsilon$ and in $t\leqslant C\varepsilon^{-1}$, bounds on a few of first moments of the norms of solutions. The theorem states that, as $\varepsilon\to0$, for $t\leqslant C\varepsilon^{-1}$ solutions $v(t)$, written in terms of the interaction representation, converge weakly in distribution to solutions of an additional effective equation. The latter is obtained from equation (1.1) by means of certain averaging of the vector field $P$ in terms of the spectrum $\{i\lambda_j\}$ and in many cases can be written down explicitly. The proof of Theorem 4.7, given in § 4, is obtained by mean of a synthesis of the Krylov–Bogolyubov method (as it is presented, for example, in [13]) and Khasminski’s approach to stochastic averaging [16]; it can serve as an introduction to the latter. The number of works on stochastic averaging is immense (see § 1.3 for some references). We were not able to find there the result of Theorem 4.7, but we do not insist on its novelty (and certainly related statements can be found in the literature). In § 5 we suppose that the bounds, mentioned above, on the moments of the norms of solutions are uniform in time, and that equation (1.1) is mixing. So, as time tends to infinity, its solutions converge in distribution to a unique stationary measure (which is a Borel measure in $\mathbb{R}^D=\mathbb{C}^{D/2}$). In Theorem 5.5, postulating that the effective equation is mixing too, we prove that, as $\varepsilon\to0$, the stationary measure for equation (1.1) converges to that for the effective equation. Note that this convergence holds without passing to the interaction representation. In a short section, § 6, we discuss non-resonant systems (1.1) (where the frequencies $\{\lambda_j\}$ are rationally independent). In particular, we show that then the actions $I_j(v(t))$ of solutions $v(t)$ (see (1.2) below) converge in distribution, as $\varepsilon\to0$, to solutions of a system of stochastic equations depending only on actions. The convergence holds on time intervals $0\leqslant t\leqslant C\varepsilon^{-1}$. In § 7 we keep the assumption on the norms of solutions from § 5. Assuming that the effective equation is mixing (but without assuming this for the original equation (1.1)) we prove there Theorem 7.4. It states that the convergence as in Theorem 4.7, our principal result, is uniform for $t\geqslant0$ (and not only for $t\leqslant C\varepsilon^{-1}$). In Proposition 9.4 we present a simple sufficient condition on equation (1.1), which is based on results in [17] and ensures that Theorems 4.7, 5.5, and 7.4 apply to it. In § 8 we go over to the general equations (1.1), where the dispersion matrix $\mathcal{B}$ depends on $v$. Assuming the same estimates on solutions as in § 4 we show that Theorem 4.7 remains valid if either the matrix $\mathcal{B}(v)$ is non-singular, or it is a $C^2$-smooth function of $v$. Theorems 5.5 and 7.4 also remain true for the general systems (1.1), but we do not discuss this, hoping that the corresponding modifications of the proofs should be clear after reading § 8. A deterministic analogue of our results, which deals with equation (1.1) for $W=0$ and describes the behaviour of its solutions on time intervals of order $\varepsilon^{-1}$ in the interaction representation in comparison with solutions of the corresponding effective equation, is given by Krylov–Bogolyubov averaging; see [4], [1], and [13] (Theorem 4.7 also applies to such equations, but then its assertion becomes unnatural). Theorem 5.5 has no analogues for deterministic systems, but Theorem 7.4 has. Namely, it is known for Krylov–Bogolyubov averaging that if the effective equation has a globally asymptotically stable equilibrium, then the convergence of solutions of equation (1.1)$_{W=0}$, written in the interaction representation, to solutions of the effective equation, is uniform in time. This result is known in folklore as the second Krylov–Bogolyubov theorem and can be found in [6]. The Krylov–Bogolyubov method and Khasminski’s approach to averaging which we exploit are flexible tools. They are applicable to various stochastic systems in finite and infinite dimension, including stochastic PDEs, and the particular realization of the two methods that we use now is inspired by our previous work on averaging for stochastic PDEs. See [12] and [19] for an analogue of Theorem 4.7 for stochastic PDEs, [12] for an analogue of Theorem 5.5, and [11] for an analogue of Theorem 7.4 (also see [7] for more results and references on averaging for stochastic PDEs). 1.3. Relation to classical stochastic averaging Averaging in stochastic systems is a well-developed topic, usually dealing with fast-slow stochastic systems (see, for example, [16], [10], § 7, [23], § II.3, [21], [25], [18], and the references therein). To explain the relation of that theory to our work let us write equation (1.1) in the complex form $v(t)\in\mathbb{C}^n$, $n=D/2$ (when the operator $A$ is diagonal) and then pass to the slow time $\tau=\varepsilon t$ and the action-angle coordinates $(I,\varphi)=(I_1,\dots,I_n; \varphi_1,\dots,\varphi_n) \in \mathbb{R}^n_+\times \mathbb{T}^n$, where $\mathbb{R}_+= \{x\in \mathbb{R}\colon x\geqslant0\}$, $\mathbb{T}^n=\mathbb{R}^n/ (2\pi \mathbb{Z}^n)$, and
$$
\begin{equation}
I_k(v) =\frac{1}{2}|v_k|^2 =\frac{1}{2}v_k\bar v_k, \quad \varphi_k(v) =\operatorname{Arg}v_k\in\mathbb{S}^1 =\mathbb{R}/(2\pi\mathbb{Z}), \qquad k=1,\dots,n
\end{equation}
\tag{1.2}
$$
(if $v_k=0$, then we set $\varphi_k(v)=0\in \mathbb{S}^1$). In these coordinates equation (1.1) takes the form
$$
\begin{equation}
\begin{cases} dI(\tau)=P^I(I,\varphi)\,d\tau+ \Psi^I(I,\varphi)\,d\beta(\tau), \\ d\varphi(\tau)+\varepsilon^{-1}\Lambda\,d\tau =P^\varphi(I,\varphi)\,d\tau+\Psi^\varphi(I,\varphi)\,d\beta (\tau). \end{cases}
\end{equation}
\tag{1.3}
$$
Here $\beta=(\beta_1,\dots,\beta_N)$, where $\{\beta_l\}$ are independent standard real Wiener processes, and the coefficients of the system are given by Itô’s formula. This is a fast-slow system with slow variable $I$ and fast variable $\varphi$. Stochastic averaging treats systems like (1.3), usually adding a non-degenerate stochastic term of order $\varepsilon^{-1/2}$ to the fast part of the $\varphi$-equation. The (first) goal of an analysis of a system is usually to prove that on time intervals $0\leqslant\tau\leqslant T$ the distributions of the $I$-components of solutions converge as $\varepsilon\to0$ to the distributions of solutions of a suitably averaged $I$-equation. After that other goals can be pursued.1[x]1For example, one can study the deviation of the $I$-components of solutions from the averaged dynamics (see [10] and [18]) or, under stronger restrictions on the system, examine the behaviour of solutions on longer intervals of time (see the paper [15] and works descending from it). Unfortunately, stochastic averaging does not apply directly to systems (1.3) coming from equations (1.1), since then the coefficients of the $\varphi$-equation have singularities when some $I_k$ vanish, and since the fast $\varphi$-equation is rather degenerate if the vector $\Lambda$ is resonant. Instead we borrow Khasminski’s method [16] for stochastic averaging from the theory and apply it to equation (1.1) written in the interaction representation, thus arriving at the assertion of Theorem 4.7. Averaging theorem for stationary solutions of equation (1.3) and for the corresponding stationary measures are known in stochastic averaging, but (of course) they control only the limiting behaviour of the $I$-components of the stationary solutions and measures, while our Theorem 5.5 describes the limit of the whole stationary measure. It seems that no analogue of Theorem 7.4 is known in stochastic averaging. At the origin of this paper are lecture notes for an online course that SK was teaching in the Shandong University (PRC) in the autumn term of the year 2020. Notation For a Banach space $E$ and $R>0$ we denote by $B_R(E)$ the open $R$-ball $\{e\in E\colon |e|_E < R\}$ and by $\overline{B}_R(E)$ its closure $\{| e|_E \leqslant R\}$; $C_b(E)$ denotes the space of bounded continuous functions on $E$, and $C([0,T];E)$ is the space of continuous curves $[0,T] \to E$ endowed with the sup-norm. For any $0<\alpha\leqslant1$ and $u\in C([0,T];E)$,
$$
\begin{equation}
\|u\|_\alpha =\sup_{0\leqslant\tau<\tau'\leqslant T} \frac{|u(\tau')-u(\tau)|_E}{|\tau'-\tau|^\alpha} +\sup_{\tau\in[0,T]}|u(\tau)|_E \leqslant \infty.
\end{equation}
\tag{1.4}
$$
This is a norm in the Hölder space $C^\alpha([0,T];E)$. The standard $C^m$-norm for $C^m$-smooth functions on $E$ is denoted by $|\cdot|_{C^m(E)}$. We use the notation $\mathcal{D}(\xi)$ for the law of the random variable $\xi$, the symbol $\rightharpoonup$ denotes weak convergence of measures, and $\mathcal{P}(M)$ is the space of Borel measures on the metric space $M$. For a measurable mapping $F\colon M_1\to M_2$ and $\mu\in \mathcal{P}(M_1)$ we denote by $F\circ\mu\in \mathcal{P}(M_2)$ the image of $\mu$ under $F$; that is, $F\circ\mu(Q)=\mu(F^{-1}(Q))$. If $m\geqslant0$ and $L$ is $\mathbb{R}^n$ or $\mathbb{C}^n$, then $\operatorname{Lip}_m(L, E)$ is the set of maps $F\colon L \to E$ such that for any $R\geqslant1$ we have
$$
\begin{equation}
(1+|R|)^{-m} \Bigl(\operatorname{Lip}\bigl(F|_{\overline{B}_R(L)}\bigr)+\sup_{v\in \overline{B}_R(L)}|F(v)|_E\Bigr) =:\mathcal{C}^m(F) <\infty,
\end{equation}
\tag{1.5}
$$
where $\operatorname{Lip}(f)$ is the Lipschitz constant of the map $f$ (note that, in particular, $|F(v)|_E \leqslant \mathcal{C}^m(F) (1+ |v|_L)^m$ for any $v\in L$). For a complex matrix $A=(A_{ij})$, $A^*= (A^*_{ji})$ denotes its Hermitian conjugate: $A^*_{ij}=\bar A_{ji}$ (so that for a real matrix $B$, $B^*$ is the transposed matrix). For a set $Q$ we denote by $\mathbf{1}_Q$ its indicator function, and by $Q^c$ its complement. Finally, $\mathbb{R}_+$ ($\mathbb{Z}_+$) is the set of non-negative real numbers (non-negative integers), and for real numbers $a$ and $b$, $a\vee b$ and $a\wedge b$ indicate their maximum and minimum.
2. Linear systems and their perturbations In this section we present the setting of the problem and specify our assumptions on the operator $A$, vector field $P$ and noise $\sqrt\varepsilon\,\mathcal{B}(v)\,dW$ in equation (1.1). To simplify the presentation and explain better the ideas, in the main part of the text we assume that the noise is additive, that is, $\mathcal{B}$ is a constant (possibly singular) matrix. We discuss the general equations (1.1) in § 8. 2.1. Assumptions on $A$ and ${W}(t)$ We assume that the unperturbed linear system
$$
\begin{equation}
\frac d{dt}v +Av=0, \qquad v\in\mathbb{R}^D,
\end{equation}
\tag{2.1}
$$
is such that all of its trajectories are bounded as $t\to\pm\infty$. Then the eigenvalues of $A$ are purely imaginary, go in pairs $\pm i\lambda_j$, and $A$ has no Jordan cells. We also assume that $A$ is invertible. So By these assumptions $D=2n$, and there exists a basis $\{\mathbf{e}_1^+,\mathbf{e}_1^-,\dots, \mathbf{e}_n^+, \mathbf{e}_n^-\}$ in $\mathbb{R}^{2n}$ in which the linear operator $A$ takes the block-diagonal form:
$$
\begin{equation*}
A= \begin{pmatrix} \begin{matrix}0&-\lambda_1\\ \lambda_1&0\end{matrix}&&0\\ &\ddots&\\ 0&&\begin{matrix}0&-\lambda_n\\\lambda_n&0\end{matrix} \end{pmatrix}.
\end{equation*}
\notag
$$
We denote by $(x_1,y_1,\dots, x_n,y_n)$ the coordinates corresponding to this basis, and for $j=1,\dots,n$ we set $z_j=x_j+iy_j$. Then $\mathbb{R}^{2n}$ becomes the space of complex vectors $(z_1,\dots,z_n)$, that is, $\mathbb{R}^{2n}\simeq\mathbb{C}^n$. In the complex coordinates the standard inner product in $\mathbb{R}^{2n}$ reads
$$
\begin{equation}
\langle z,z'\rangle =\operatorname{Re}\sum_{j=1}^nz_j\bar{z}_j',\qquad z,z'\in\mathbb{C}^n.
\end{equation}
\tag{2.2}
$$
Let us denote by
$$
\begin{equation*}
\Lambda=(\lambda_1,\dots,\lambda_n)\in(\mathbb{R}\setminus\{0\})^n
\end{equation*}
\notag
$$
the frequency vector of the linear system (2.1). Then in the complex coordinates $z$ the operator $A$ reads
$$
\begin{equation*}
Az=\operatorname{diag}\{i\Lambda\} z,
\end{equation*}
\notag
$$
where $\operatorname{diag}\{i\Lambda\}$ is the diagonal operator sending $(z_1,\dots,z_n)$ to $(i\lambda_1z_1,\dots,i\lambda_nz_n)$. Therefore, in $\mathbb{R}^{2n}$ written as the complex space $\mathbb{C}^n$ linear equation (2.1) takes the diagonal form
$$
\begin{equation*}
\frac d{dt}v_k+i\lambda_k v_k=0, \qquad 1\leqslant k\leqslant n.
\end{equation*}
\notag
$$
Below we examine the perturbed equation (1.1) using these complex coordinates. Next we discuss the random process $W(t)$ expressed in the complex coordinates. The standard complex Wiener process has the form
$$
\begin{equation}
\beta^c(t) =\beta^+(t)+i\beta^-(t)\in\mathbb{C},
\end{equation}
\tag{2.3}
$$
where $\beta^+(t)$ and $\beta^-(t)$ are independent standard (real) Wiener processes, defined on some probability space $(\Omega,\mathcal{F},\mathsf{P})$. Then $\bar\beta^c(t)=\beta^+(t)-i\beta^-(t)$, and any Wiener process $W(t)\in\mathbb{C}^n$ can conveniently be written in the complex form as
$$
\begin{equation}
W_k =\sum_{l=1}^{n_1}\Psi_{kl}^1\beta^c_l +\sum_{l=1}^{n_1}\Psi_{kl}^2 \bar\beta^c_l, \qquad k=1,\dots,n,
\end{equation}
\tag{2.4}
$$
where $\Psi^1=(\Psi_{kl}^1)$ and $\Psi^2=(\Psi_{kl}^2)$ are complex $n\times n_1$ matrices and $\{\beta^c_l\}$ are independent standard complex Wiener processes. Again, in order to simplify the presentation, we suppose below that the noise in (1.1) is of the form
$$
\begin{equation*}
W_k(t) =\sum_{l=1}^{n_1}\Psi_{kl}\beta^c_l(t), \qquad k=1,\dots,n.
\end{equation*}
\notag
$$
We do not assume that the matrix $\Psi$ is non-singular (in particular, it can be zero). Then the perturbed equation (1.1) in the complex coordinates reads as
$$
\begin{equation}
d v_k+i\lambda_kv_k\,dt =\varepsilon P_k(v)\,dt +\sqrt{\varepsilon}\,\sum_{l=1}^{n_1}\Psi_{kl}\,d\beta^c_l(t), \qquad k=1,\dots,n,
\end{equation}
\tag{2.5}
$$
where $ v=(v_1,\dots,v_n)\in\mathbb{C}^n$ and $0<\varepsilon\leqslant1$. The results obtained below for equation (2.5) remain true for the general equations (1.1) at the price of heavier calculation. The corresponding argument is sketched in § 8. 2.2. Assumptions on $P$ and on the perturbed equation Our first goal is to study equation (2.5) for $0<\varepsilon\ll 1$ on a time interval $0\leqslant t \leqslant \varepsilon^{-1}T$, where $T>0$ is a fixed constant. Introducing the slow time
$$
\begin{equation*}
\tau=\varepsilon t
\end{equation*}
\notag
$$
we write the equation as
$$
\begin{equation}
\begin{gathered} \, dv_k(\tau)+i\varepsilon^{-1}\lambda_kv_k\,d\tau =P_k(v)\,d\tau+\sum_{l=1}^{n_1}\Psi_{kl}\,d\tilde\beta^c_l(\tau), \\ \quad k=1,\dots,n, \quad 0\leqslant \tau\leqslant T. \notag \end{gathered}
\end{equation}
\tag{2.6}
$$
Here $\{\tilde\beta^c_l(\tau),\,l=1,\dots,n_1\}$ is another set of independent standard complex Wiener processes, which we now re-denote back to $\{\beta^c_l(\tau),\,l=1,\dots,n_1\}$. We stress that the equation above is nothing but the original equation (1.1), where its linear part (2.1) is conservative and non-degenerate in the sense of conditions 1) and 2), written in the complex coordinates and slow time. So all results below concerning equation (2.6) (and equation (2.11)) can be reformulated for equation (1.1) at the price of heavier notation. Let us formulate the assumptions concerning the well-posedness of equation (2.6) which hold throughout this paper. Assumption 2.1. (a) The drift $P(v)=(P_1(v),\dots,P_n(v))$ is a locally Lipschitz vector field, belonging to $\operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n)$ for some $m_0\geqslant0$ (see (1.5)). (b) For any $v_0\in\mathbb{C}^n$ equation (2.6) has a unique strong solution $v^\varepsilon(\tau;v_0)$, $\tau\in[0,T]$, which is equal to $v_0$ at $\tau=0$. Moreover, there exists $m_0'>(m_0\vee1)$ such that
$$
\begin{equation}
\mathsf{E}\sup_{0\leqslant\tau\leqslant T}|v^\varepsilon(\tau;v_0)|^{2 m'_0} \leqslant C_{m'_0}(|v_0|,T)<\infty \quad \forall\,0<\varepsilon\leqslant1,
\end{equation}
\tag{2.7}
$$
where $C_{m'_0}(\,{\cdot}\,)$ is a non-negative continuous function on $\mathbb{R}_+^2$, which is non-decreasing in both arguments. Our proofs generalize easily to the case when the vector field $P$ is locally Lipschitz and satisfies $|P(v)| \leqslant C (1+|v|)^{m_0}$ for all $v$ and some $C>0$ and $m_0\geqslant0$ (see [13] for averaging in deterministic perturbation of equation (2.1) by locally Lipschitz vector fields). In this case the argument remains essentially the same (but becomes a bit longer), and the constants in estimates depend not only on $m_0$, but also on the local Lipschitz constant of $P$, which is a function $R\mapsto\operatorname{Lip}(P|_{\overline{B}_R(\mathbb{C}^n)})$. Below $T>0$ is fixed and the dependence of constants on $T$ is usually not indicated. Solutions of (2.6) are assumed to be strong unless otherwise stated. As usual, strong solutions are understood in the sense of an integral equation. That is, $v^\varepsilon(\tau;v_0)= v(\tau)$, $0\leqslant\tau\leqslant T$, is a strong solution, equal to $v_0$ at $\tau=0$, if
$$
\begin{equation*}
v_k(\tau)+\int_0^\tau\bigl(i\varepsilon^{-1}\lambda_kv_k(s)-P_k(v(s))\bigr)\,ds =v_{0k}+\sum_{l=1}^{n_1}\Psi_{kl}\beta^c_l(\tau), \qquad k=1,\dots,n,
\end{equation*}
\notag
$$
almost surely for $0\leqslant\tau\leqslant T$. 2.3. The interaction representation Now in (2.6) we go over to the interaction representation, which means that we substitute in
$$
\begin{equation}
v_k(\tau) =e^{-i\tau\varepsilon^{-1}\lambda_k}a_k(\tau), \qquad k=1,\dots,n.
\end{equation}
\tag{2.8}
$$
Then $v_k(0)=a_k(0)$, and we obtain the following equations for variables $a_k(\tau)$:
$$
\begin{equation}
da_k(\tau) =e^{i\tau\varepsilon^{-1}\lambda_k}P_k(v) +e^{i\tau\varepsilon^{-1}\lambda_k}\sum_{l=1}^{n_1}\Psi_{kl}\,d\beta^c_l(\tau), \qquad k=1,\dots,n.
\end{equation}
\tag{2.9}
$$
The actions $I_k=|a_k|^2/2$ for solutions of (2.9) are the same as the actions for solutions of (2.6). It comparison to (2.6), in (2.9) we removed the large term $\varepsilon^{-1}\operatorname{diag}(i\Lambda)v$ from the drift at the price that now coefficients of the system are fast oscillating functions of $\tau$. To rewrite the above equations conveniently we introduce the rotation operators $\Phi_w$: for each real vector $w=(w_1,\dots,w_n)\in\mathbb{R}^n$ we set
$$
\begin{equation}
\Phi_w\colon \mathbb{C}^n\to\mathbb{C}^n, \quad \Phi_w=\operatorname{diag}\{e^{iw_1},\dots,e^{iw_n}\}.
\end{equation}
\tag{2.10}
$$
Then
$$
\begin{equation*}
(\Phi_w)^{-1}=\Phi_{-w},\quad \Phi_{w_1}\circ\Phi_{w_2}=\Phi_{w_1+w_2},\quad \Phi_0=\operatorname{id},
\end{equation*}
\notag
$$
where each $\Phi_w$ is a unitary transformation, so that $\Phi_w^*=\Phi_w^{-1}$. Moreover,
$$
\begin{equation*}
|(\Phi_wz)_j|=|z_j|\quad \forall\,z,w,j.
\end{equation*}
\notag
$$
In terms of the operators $\Phi$ we write $v(\tau)$ as $\Phi_{\tau\varepsilon^{-1}\Lambda} a(\tau)$, and we write system (2.9) as
$$
\begin{equation}
da(\tau) =\Phi_{\tau \varepsilon^{-1}\Lambda} P(\Phi_{-\tau \varepsilon^{-1}\Lambda}a(\tau))\,d\tau +\Phi_{\tau \varepsilon^{-1}\Lambda}\Psi\,d\beta^c(\tau), \qquad a(\tau)\in\mathbb{C}^n,
\end{equation}
\tag{2.11}
$$
where $\beta^c(\tau)=(\beta^c_1(\tau),\dots,\beta^c_{n_1}(\tau))$. This is the equation which we are going to study for small $\varepsilon$, for $0\leqslant\tau\leqslant T$, under the initial condition
$$
\begin{equation}
a(0)=v(0)=v_0.
\end{equation}
\tag{2.12}
$$
The solution $a^\varepsilon(\tau; v_0)=\Phi_{-\tau\varepsilon^{-1}\Lambda} v^\varepsilon(\tau; v_0)$ of (2.11), (2.12) also satisfies estimate (2.7) for each $\varepsilon\in(0,1]$. We recall that a $C^1$-diffeomorphism $G$ of $\mathbb{C}^{n}$ transforms a vector field $V$ into the field $G_*V$, where $(G_*V)(v)=dG(u)(V(u))$ for $ u=G^{-1}v$. In particular,
$$
\begin{equation*}
\bigl((\Phi_{\tau\varepsilon^{-1}\Lambda })_*P\bigr)(v) =\Phi_{\tau\varepsilon^{-1}\lambda_k}\circ P(\Phi_{-\varepsilon\tau\Lambda}v).
\end{equation*}
\notag
$$
So equation (2.11) can be written as
$$
\begin{equation*}
da(\tau) =\bigl((\Phi_{\tau\varepsilon^{-1}\Lambda})_*P\bigr)(a(\tau))\,d\tau +\Phi_{\tau\varepsilon^{-1}\Lambda}\Psi\,d\beta^c(\tau).
\end{equation*}
\notag
$$
2.4. Compactness For $0<\varepsilon\leqslant1$ we denote by $a^\varepsilon(\tau;v_0)$ the solution of equation (2.11) which is equal to $v_0$ at $\tau=0$. Then
$$
\begin{equation*}
a^\varepsilon(\tau;v_0) =\Phi_{\tau\varepsilon^{-1}\lambda_k}v^\varepsilon(\tau;v_0).
\end{equation*}
\notag
$$
A unique solution $v^\varepsilon(\tau;v_0)$ of (2.6) exists by Assumption 2.1, so the solution $a^\varepsilon(\tau;v_0)$ also exists and is unique. Our goal is to examine its law
$$
\begin{equation*}
Q_\varepsilon :=\mathcal{D}(a^\varepsilon(\,\cdot\,;v_0)) \in\mathcal{P}(C([0,T];\mathbb{C}^n))
\end{equation*}
\notag
$$
as $\varepsilon\to0$. When $v_0$ is fixed, we will usually write $a^\varepsilon(\tau;v_0)$ as $a^\varepsilon(\tau)$. Lemma 2.2. Under Assumption 2.1 the set of probability measures $\{Q_\varepsilon, \,\varepsilon\in(0,1]\}$ is pre-compact in the weak topology in $\mathcal{P}(C([0,T];\mathbb{C}^{n}))$. Proof. We denote the random force in (2.11) by $d\zeta^\varepsilon(\tau)$:
$$
\begin{equation*}
d\zeta^\varepsilon(\tau) :=\Phi_{\tau\varepsilon^{-1}\Lambda}\Psi\,d\beta^c(\tau),
\end{equation*}
\notag
$$
where $\zeta^\varepsilon(\tau)=(\zeta_l^\varepsilon(\tau),\,l=1,\dots,n_1)$. For any $k$ we have
$$
\begin{equation*}
\zeta^\varepsilon_k(\tau) =\int_0^\tau d\zeta_k^\varepsilon =\int_0^\tau e^{is\varepsilon^{-1}\lambda_k} \sum_{l=1}^{n_1}\Psi_{kl}\,d\beta^c_l(s).
\end{equation*}
\notag
$$
So $\zeta^\varepsilon(\tau)$ is a stochastic integral of a non-random vector function. Hence it is a Gaussian random process with zero mean value, and its increments over disjoint time intervals are independent. For each $k$
$$
\begin{equation*}
\mathsf{E}|\zeta_k^\varepsilon(\tau)|^2 =\int_0^\tau2 \sum_{l=1}^{n_1}|\Psi_{kl}|^2\,ds =:2C_k^\zeta\tau, \qquad C_k^\zeta=\sum_{l=1}^{n_1}|\Psi_{kl}|^2\geqslant0,
\end{equation*}
\notag
$$
and $\mathsf{E}\zeta_k^\varepsilon(\tau)\zeta_j^\varepsilon(\tau) =\mathsf{E}\bar\zeta_k^\varepsilon(\tau)\bar\zeta_j^\varepsilon(\tau)=0$. Therefore, $\zeta_k^\varepsilon(\tau)=C_k^\zeta \beta^c_k(\tau)$, where by Lévy’s theorem (see [14], p. 157) $\beta^c_k(\tau)$ is a standard complex Wiener process. However, the processes $\zeta^\varepsilon_j$ and $\zeta^\varepsilon_k$ with $j\neq k$ are not necessarily independent.
By the basic properties of a Wiener process the curve
$$
\begin{equation*}
[0,T]\ni\tau\mapsto\zeta^\varepsilon(\omega,\tau)\in\mathbb{C}^n
\end{equation*}
\notag
$$
is almost surely Hölder-continuous with exponent $1/3$, and since $C_k^\zeta$ does not depend on $\varepsilon$, we have (abbreviating $C^{1/3}([0,T];\mathbb{C}^n)$ to $C^{1/3}$)
$$
\begin{equation*}
\mathsf{P}\bigl(\zeta^\varepsilon(\,{\cdot}\,)\in \overline{B}_R(C^{1/3})\bigr) \to1 \quad \text{as $R\to\infty$},
\end{equation*}
\notag
$$
uniformly in $\varepsilon$. Let us write equation (2.11) as
$$
\begin{equation*}
da^\varepsilon(\tau) =V^\varepsilon(\tau)\,d\tau+d\zeta^\varepsilon(\tau).
\end{equation*}
\notag
$$
By Assumption 2.1 and since $|a^\varepsilon(\tau)|\equiv |v^\varepsilon(\tau)|$ we have
$$
\begin{equation*}
\mathsf{E}\sup_{\tau\in[0,T]}|V^\varepsilon(\tau)| \leqslant\mathcal{C}^{m_0}(P)\, \mathsf{E}\Bigl(1+\sup_{\tau\in[0,T]}|v^\varepsilon(\tau)|\Bigr)^{m_0} \leqslant C(|v_0|)<\infty.
\end{equation*}
\notag
$$
Therefore, by Chebyshev’s inequality,
$$
\begin{equation*}
\mathsf{P}\Bigl(\sup_{\tau\in[0,T]}|V^\varepsilon(\tau)|>R\Bigr) \leqslant C(|v_0|) R^{-1},
\end{equation*}
\notag
$$
uniformly in $\varepsilon\in(0,1]$. Since
$$
\begin{equation*}
a^\varepsilon(\tau) =v_0+\int_0^\tau V^\varepsilon(s)\,ds+\zeta^\varepsilon(\tau),
\end{equation*}
\notag
$$
from the above we get that
$$
\begin{equation}
\mathsf{P}\bigl(\|a^\varepsilon(\,{\cdot}\,)\|_{1/3}>R\bigr) \to0 \quad\text{as $R\to\infty$},
\end{equation}
\tag{2.13}
$$
uniformly in $\varepsilon\in(0,1]$. By the Ascoli–Arzelà theorem the sets $\overline{B}_R(C^{1/3})$ are compact in $C([0,T];\mathbb{C}^n)$, and in view of (2.13), for any $\delta>0$ there exists $R_\delta$ such that
$$
\begin{equation*}
Q_\varepsilon \bigl(\overline{B}_{R_{\delta}}(C^{1/3})\bigr)\geqslant1-\delta \quad \forall\,\varepsilon>0.
\end{equation*}
\notag
$$
So by Prohorov’s theorem the set of measures $\{Q_\varepsilon,0<\varepsilon\leqslant1\}$ is pre-compact in $\mathcal{P}(C([0,T];\mathbb{C}^n))$. $\Box$ By this lemma, for any sequence $\varepsilon_l\to0$ there exist a subsequence $\varepsilon_l'\to0$ and a measure $Q_0\in\mathcal{P}(C([0,T];\mathbb{C}^n))$ such that
$$
\begin{equation}
Q_{\varepsilon_l'} \rightharpoonup Q_0 \quad\text{as $\varepsilon_l'\to0$}.
\end{equation}
\tag{2.14}
$$
3. Averaging vector fields with respect to the frequency vector For a vector field $\widetilde{P}\in\operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n)$ we denote
$$
\begin{equation*}
Y_{\widetilde{P}}(a;t) =\bigl((\Phi_{t\Lambda})_*\widetilde{P}\bigr)(a) =\Phi_{t\Lambda}\circ\widetilde{P}(\Phi_{-t\Lambda }a), \qquad a\in\mathbb{C}^n, \quad t\in\mathbb{R},
\end{equation*}
\notag
$$
and for $T'>0$ we define the partial averaging $ \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}$ of the vector field $\widetilde{P}$ with respect to the frequency vector $\Lambda$ as follows:
$$
\begin{equation}
\langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a) =\frac{1}{T'} \int_0^{T'}Y_{\widetilde{P}}(a;t)\,dt =\frac{1}{T'} \int_0^{T'}\Phi_{t\Lambda}\circ\widetilde{P}(\Phi_{-t\Lambda}a)\,dt.
\end{equation}
\tag{3.1}
$$
Lemma 3.1. For any $T'>0$
$$
\begin{equation*}
\langle\!\langle {\widetilde{P}} \rangle\!\rangle ^{T'}(a) \in\operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n) \quad{and}\quad \mathcal{C}^{m_0} ( \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}) \leqslant\mathcal{C}^{m_0}(\widetilde{P})
\end{equation*}
\notag
$$
(see (1.5)). Proof. If $a\in \overline{B}_R(\mathbb{C}^n)$, then $\Phi_{-t\Lambda }a\in \overline{B}_R(\mathbb{C}^n)$ for each $t$. So
$$
\begin{equation*}
|Y_{\widetilde{P}}(a;t)| =|(\Phi_{t\Lambda})_* \widetilde{P}(a)| =|\widetilde{P}(\Phi_{-t\Lambda}a)|,
\end{equation*}
\notag
$$
and thus
$$
\begin{equation*}
| \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a)| \leqslant\sup_{0\leqslant t\leqslant {T'}}|Y_{\widetilde{P}}(a;t)| \leqslant\mathcal{C}^{m_0}(\widetilde{P})(1+ R)^{m_0}.
\end{equation*}
\notag
$$
Similarly, for any $a_1,a_2\in\overline{B}_R(\mathbb{C}^n)$,
$$
\begin{equation*}
\begin{aligned} \, |Y_{\widetilde{P}}(a_1;t)-Y_{\widetilde{P}}(a_2;t)| & =|\widetilde{P}(\Phi_{-t\Lambda }a_1)-\widetilde{P}(\Phi_{-t\Lambda }a_2)|\\ & \leqslant\mathcal{C}^{m_0}(\widetilde{P})(1+R)^{m_0}|a_2-a_1| \quad \forall\,t\geqslant0, \end{aligned}
\end{equation*}
\notag
$$
so that
$$
\begin{equation*}
\bigl| \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a_1)- \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a_1)\bigr| \leqslant\mathcal{C}^{m_0}(\widetilde{P})(1+R)^{m_0}|a_1-a_1|. \quad\square
\end{equation*}
\notag
$$
We define the averaging of the vector field $\widetilde{P}$ with respect to the frequency vector $\Lambda$ by
$$
\begin{equation}
\langle\!\langle \widetilde{P} \rangle\!\rangle (a) =\lim_{T'\to\infty} \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a) =\lim_{T'\to\infty}\frac{1}{T'}\int_0^{T'}(\Phi_{t\Lambda})_*\widetilde{P}(a)\,dt.
\end{equation}
\tag{3.2}
$$
Lemma 3.2. (1) The limit (3.2) exists for any $a$. Moreover, $ \langle\!\langle \widetilde{P} \rangle\!\rangle $ belongs to $\operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n)$ and $\mathcal{C}^{m_0}( \langle\!\langle \widetilde{P} \rangle\!\rangle ) \leqslant \mathcal{C}^{m_0}(\widetilde{P})$. (2) If $a\in \overline{B}_R(\mathbb{C}^n)$, then the rate of convergence in (3.2) does not depend on $a$, but only depends on $R$. This is the main lemma of deterministic averaging for vector fields. See [13], Lemma 3.1, for its proof.2[x]2In fact, if $\widetilde{P}$ is an arbitrary locally Lipschitz vector field, then $ \langle\!\langle \widetilde{P} \rangle\!\rangle $ is well defined and locally Lipschitz with the same local Lipschitz constant as $\widetilde{P}$ (see the discussion after Assumption 2.1): see [13]. The averaged vector field $ \langle\!\langle \widetilde{P} \rangle\!\rangle $ is invariant with respect to the transformations $\Phi_{\theta\Lambda}$. Lemma 3.3. For all $a\in \mathbb{C}^n$ and $\theta \in \mathbb{R}$,
$$
\begin{equation*}
\bigl(\Phi_{\theta\Lambda}\bigr)_* \langle\!\langle \widetilde{P} \rangle\!\rangle (a) \equiv \Phi_{\theta\Lambda}\circ \langle\!\langle \widetilde{P} \rangle\!\rangle \circ\Phi_{-\theta\Lambda}(a) = \langle\!\langle \widetilde{P} \rangle\!\rangle (a).
\end{equation*}
\notag
$$
Proof. For definiteness let $\theta>0$. For any ${T'}>0$ we have
$$
\begin{equation*}
\begin{aligned} \, \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(\Phi_{-\theta\Lambda}(a)) & =\frac{1}{T'} \int_0^{T'}\Phi_{t\Lambda }\circ \widetilde{P}(\Phi_{-t\Lambda} \circ\Phi_{-\theta\Lambda}(a))\,dt \\ & =\frac{1}{T'}\int_0^{T'}\Phi_{t\Lambda} \circ\widetilde{P}(\Phi_{-(t+\theta)\Lambda}a)\,dt. \end{aligned}
\end{equation*}
\notag
$$
Since $\Phi_{t\Lambda }=\Phi_{-\theta\Lambda}\circ \Phi_{(t+\theta)\Lambda}$, this equals
$$
\begin{equation*}
\frac{1}{T'}\, \Phi_{-\theta\Lambda}\biggl(\int_0^{T'}\Phi_{(t+\theta)\Lambda} \circ\widetilde{P}(\Phi_{-(t+\theta)\Lambda}a)\,dt\biggr) = \Phi_{-\theta\Lambda }\circ \langle\!\langle \widetilde{P} \rangle\!\rangle ^{T'}(a) +O\biggl(\frac{1}{T'}\biggr).
\end{equation*}
\notag
$$
Passing to the limit as $T'\to\infty$ we obtain the assertion. The statement below asserts that the averaged vector field $ \langle\!\langle {P} \rangle\!\rangle $ is at least as smooth as $P$. Proposition 3.4. If $P\in C^m(\mathbb{C}^n)$ for some $m\in\mathbb{N}$, then $ \langle\!\langle {P} \rangle\!\rangle \in C^m(\mathbb{C}^n)$ and $| \langle\!\langle {P} \rangle\!\rangle |_{C^m(\overline{B}_R)}\leqslant|P|_{C^m(\overline{B}_R)}$ for all $R>0$. Proof. First fix some $R>0$. Then there exists a sequence of polynomial vector fields $\{P_{R,j},\,j\in\mathbb{N}\}$ (cf. § 3.1.3) such that $|P_{R,j}-P|_{C^m(\overline{B}_R)}\to0$ as $j\to\infty$. An easy calculation shows that
$$
\begin{equation}
| \langle\!\langle P_{R,j} \rangle\!\rangle ^T- \langle\!\langle P_{R,j} \rangle\!\rangle |_{C^m(\overline{B}_R)} \to0 \quad \text{as $T\to\infty$},
\end{equation}
\tag{3.3}
$$
for each $j$. Since the transformations $\Phi_{t\Lambda}$ are unitary, differentiating the integral in (3.1) with respect to $a$ we get that
$$
\begin{equation}
| \langle\!\langle \widetilde{P} \rangle\!\rangle ^T|_{C^m(\overline{B}_R)} \leqslant|\widetilde{P}|_{C^m(\overline{B}_R)} \quad \forall\,T>0,
\end{equation}
\tag{3.4}
$$
for any $C^m$-smooth vector field $\widetilde{P}$. Therefore,
$$
\begin{equation}
\begin{aligned} \, | \langle\!\langle P_{R,j} \rangle\!\rangle ^T- \langle\!\langle P \rangle\!\rangle ^T|_{C^m(\overline{B}_R)} & \leqslant|P_{R,j}-P|_{C^m(\overline{B}_R)} \notag\\ & =:\kappa_j\to 0 \quad \text{as $j\to\infty$}, \quad \forall\,T>0. \end{aligned}
\end{equation}
\tag{3.5}
$$
So
$$
\begin{equation*}
| \langle\!\langle P_{R,j} \rangle\!\rangle ^T- \langle\!\langle P_{R,k} \rangle\!\rangle ^T|_{C^m(\overline{B}_R)} \leqslant 2\kappa_{j\wedge k} \quad \forall\,T>0.
\end{equation*}
\notag
$$
From this estimate and (3.3) we find that
$$
\begin{equation*}
| \langle\!\langle P_{R,j} \rangle\!\rangle - \langle\!\langle P_{R,k} \rangle\!\rangle |_{C^m(\overline{B}_R)} \leqslant 2\kappa_{j\wedge k}.
\end{equation*}
\notag
$$
Thus $\{ \langle\!\langle P_{R,j} \rangle\!\rangle \}$ is a Cauchy sequence in $C^m(\overline{B}_R)$. So it $C^m$-converges to a limiting field $ \langle\!\langle P_{R,\infty} \rangle\!\rangle $. As $P_{R,j}$ converges to $P$ in $C^m(\overline{B}_R)$, using (3.4) again, we find that $| \langle\!\langle P_{R,\infty} \rangle\!\rangle |_{C^m(\overline{B}_R)}\leqslant|P|_{C^m(\overline{B}_R)}$. But by Lemma 3.2 $ \langle\!\langle P_{R,\infty} \rangle\!\rangle $ must be equal to $ \langle\!\langle P \rangle\!\rangle $. Since $R>0$ is arbitrary, the assertion of the proposition follows. $\Box$ Finally, we note that if a vector field $P$ is Hamiltonian, then its averaging $ \langle\!\langle P \rangle\!\rangle $ also is. Looking ahead we state the corresponding result here, despite the fact that the averaging of functions $\langle\,\cdot\,\rangle$ is defined in § 3.2 below. Proposition 3.5. If a locally Lipschitz vector field $P$ is Hamiltonian, that is,
$$
\begin{equation*}
P(z) =i\frac{\partial}{\partial \bar{z}}H(z)
\end{equation*}
\notag
$$
for some $C^1$-smooth function $H$, then $ \langle\!\langle P \rangle\!\rangle $ is also Hamiltonian and
$$
\begin{equation*}
\langle\!\langle P \rangle\!\rangle =i \frac{\partial}{\partial \bar{z}}\langle H\rangle.
\end{equation*}
\notag
$$
For a proof see [13], Theorem 5.2. 3.1. Calculating averagings3.1.1. The frequency vector $\Lambda= (\lambda_1,\dots,\lambda_n)$ is called completely resonant if its components $\lambda_j$ are proportional to some $\lambda>0$, that is, if $\lambda_j/\lambda \in\mathbb{Z}$ for all $j$. In this case all trajectories of the original linear system (2.1) are periodic, the operator $\Phi_{t\Lambda}$ is $2\pi/\lambda$-periodic in $t$ and so
$$
\begin{equation}
\langle\!\langle \widetilde{P} \rangle\!\rangle (a) = \langle\!\langle \widetilde{P} \rangle\!\rangle ^{2\pi/\lambda}(a) =\frac{\lambda}{2\pi}\int_0^{2\pi/\lambda}(\Phi_{t\Lambda})_*\widetilde{P}(a)\,dt.
\end{equation}
\tag{3.6}
$$
Completely resonant linear systems (2.1) and their perturbations (1.1) often occur in applications. In particular, they occur in non-equilibrium statistical physics. There the dimension $D=2n$ is large, all the $\lambda_j$ are equal, and the Wiener process $W(t)$ in (1.1) can be very degenerate (it can have just two non-zero components). For example, see [9], where more references can be found. 3.1.2. Consider the case opposite to the above and assume that the frequency vector $\Lambda$ is non-resonant:
$$
\begin{equation}
\sum_{j=1}^nm_j\lambda_j\neq0 \quad \forall (m_1,\dots,m_n)\in\mathbb{Z}^n\setminus\{0\}
\end{equation}
\tag{3.7}
$$
(that is, the real numbers $\lambda_j$ are rationally independent). Then
$$
\begin{equation}
\langle\!\langle \widetilde{P} \rangle\!\rangle (a) =\frac{1}{(2\pi)^n}\int_{\mathbb{T}^n} (\Phi_w)_*\widetilde{P}(a)\,dw, \qquad \mathbb{T}^n=\mathbb{R}^n/(2\pi\mathbb{Z}^n).
\end{equation}
\tag{3.8}
$$
Indeed, if $\widetilde{P}$ is a polynomial vector field, then (3.8) follows easily from (3.2) by direct componentwise calculation. The general case is a consequence of this result since any vector field can be approximated by polynomial fields. Details are left to the reader (cf. Lemma 3.5 in [13], where $\widetilde{P}^{{\rm res}}$ equals the right-hand side of (3.8) if the vector $\Lambda$ is non-resonant). The right-hand side of (3.8) is obviously invariant with respect to all rotations $\Phi_{w'}$, so it does not depend on the vector $a$, but only depends on the corresponding torus
$$
\begin{equation}
\{z\in\mathbb{C}^n\colon I_j(z)=I_j(a)\ \forall j\}
\end{equation}
\tag{3.9}
$$
(see (1.2)) to which $a$ belongs, and
$$
\begin{equation}
(\Phi_{w})_* \langle\!\langle \widetilde{P} \rangle\!\rangle (a) \equiv \langle\!\langle \widetilde{P} \rangle\!\rangle (a) \quad \forall\,w\in\mathbb{C}^n \quad\text{if $\Lambda$ is non-resonant}.
\end{equation}
\tag{3.10}
$$
See § 6 below for a discussion of equations (1.1) with non-resonant vectors $\Lambda$. 3.1.3. If the field $\widetilde{P}$ in (3.2) is polynomial, that is,
$$
\begin{equation}
\widetilde{P}_j(a) =\sum_{|\alpha|, |\beta| \leqslant N} C_j^{\alpha, \beta} a^\alpha \bar a^\beta, \qquad j=1,\dots,n,
\end{equation}
\tag{3.11}
$$
for some $N\in\mathbb{N}$, where $\alpha, \beta \in \mathbb{Z}_+^n$, $a^\alpha=\prod a_j^{\alpha_j}$ and $|\alpha|=\sum |\alpha_j|$, then $ \langle\!\langle \widetilde{P} \rangle\!\rangle =\widetilde{P}^{{\rm res}}$. Here $\widetilde{P}^{{\rm res}}$ is a polynomial vector field such that for each $j$, $\widetilde{P}^{{\rm res}}_j(a)$ is given by the right-hand side of (3.11), where the sum is taken over all $|\alpha|,|\beta|\leqslant N$, satisfying $\Lambda \cdot (\alpha-\beta)=\lambda_j$. This easily follows from the explicit calculation of the integral in (3.1) (see [13], Lemma 3.). 3.2. Averaging functions Similarly to definition (3.2), for a locally Lipschitz function $f\in \operatorname{Lip}_m (\mathbb{C}^n, \mathbb{C})$, $m\geqslant0$, we define its averaging with respect to a frequency vector $\Lambda$ by
$$
\begin{equation}
\langle f\rangle(a) =\lim_{{T'}\to\infty}\frac{1}{T'}\int_0^{T'}f(\Phi_{-t\Lambda }a)\,dt, \qquad a\in\mathbb{C}^n.
\end{equation}
\tag{3.12}
$$
Then using the same argument as above we obtain the following lemma. Lemma 3.6. Let $f\in \operatorname{Lip}_m (\mathbb{C}^n, \mathbb{C})$. Then the following assertions are true. (1) The limit (3.12) exists for every $a$, and for $a\in \overline{B}_R(\mathbb{C}^n)$ the rate of convergence in (3.12) does not depend on $a$, but only depends on $R$. (2) $\langle f\rangle \in \operatorname{Lip}_{m}(\mathbb{C}^n,\mathbb{C})$ and $\mathcal{C}^m(\langle f\rangle ) \leqslant \mathcal{C}^m(f)$. (3) If $f$ is $C^m$-smooth for some $m\in \mathbb{N}$, then $\langle f\rangle$ also is, and the $C^m$-norm of the latter is bounded by the $C^m$-norm of the former. (4) The function $\langle f\rangle$ commutes with the operators $\Phi_{\theta\Lambda}$, $ \theta\in\mathbb{R}$, in the sense of the equalities $\langle f\circ\Phi_{\theta\Lambda}\rangle=\langle f\rangle\circ\Phi_{\theta\Lambda}=\langle f\rangle$. If the vector $\Lambda$ is non-resonant, then similarly to (3.8) we have
$$
\begin{equation}
\langle {f}\rangle(a) =\frac{1}{(2\pi)^n}\int_{\mathbb{T}^n} {f}(\Phi_{-w}a)\,dw.
\end{equation}
\tag{3.13}
$$
The right-hand side of (3.13) is the averaging of the function $f$ in angular brackets. It is constant on the tori (3.9).
4. Effective equation and the averaging theorem In this section we show that the limiting measure $Q_0$ in (2.14) is independent of the choice of the sequence $\varepsilon'_l\to0$, so that $\mathcal{D}(a^\varepsilon)\rightharpoonup Q_0$ as $\varepsilon\to0$, and we represent $Q_0$ as the law of a solution of an auxiliary effective equation. The drift in this equation is the averaged drift in (2.6). Now we construct its dispersion. The diffusion matrix for (2.11) is the complex $n\times n$ matrix
$$
\begin{equation*}
\mathcal{A}^\varepsilon(\tau) =(\Phi_{\tau\varepsilon^{-1}\Lambda}\Psi) \cdot(\Phi_{\tau\varepsilon^{-1}\Lambda}\Psi)^*.
\end{equation*}
\notag
$$
Setting
$$
\begin{equation}
\Phi_{\tau\varepsilon^{-1}\Lambda}\Psi = \bigl(e^{i\tau\varepsilon^{-1}\lambda_l}\Psi_{lj}\bigr) =:(\psi^\varepsilon_{lj}(\tau))=\psi^\varepsilon(\tau),
\end{equation}
\tag{4.1}
$$
we have
$$
\begin{equation*}
\mathcal{A}^\varepsilon_{kj}(\tau) =\sum_{l=1}^{n_1}\psi^\varepsilon_{kl}(\tau)\overline{\psi^\varepsilon_{jl}}(\tau) =e^{i \tau\varepsilon^{-1}(\lambda_k-\lambda_j)} \sum_{l=1}^{n_1}\Psi_{kl}\overline{\Psi}_{jl}.
\end{equation*}
\notag
$$
So for any $\tau>0$,
$$
\begin{equation*}
\frac{1}{\tau}\int_0^{\tau}\mathcal{A}^\varepsilon_{kj}(s)\,ds =\biggl(\sum_{l=1}^{n_1}\Psi_{kl}\overline{\Psi}_{jl}\biggr) \,\frac{1}{\tau} \int_0^\tau e^{is\varepsilon^{-1}(\lambda_k-\lambda_j)}\,ds,
\end{equation*}
\notag
$$
and we immediately see that
$$
\begin{equation}
\frac{1}{\tau}\int_0^{\tau}\mathcal{A}^\varepsilon_{kj}(\tau)\,d\tau \to A_{kj} \quad \text{as $\varepsilon\to0$},
\end{equation}
\tag{4.2}
$$
where
$$
\begin{equation}
A_{kj} =\begin{cases}\displaystyle \sum_{l=1}^{n_1}\Psi_{kl}\overline{\Psi}_{jl} & \text{if }\lambda_k=\lambda_j,\\ 0 & \text{otherwise}. \end{cases}
\end{equation}
\tag{4.3}
$$
Clearly, $A_{kj}=\bar A_{jk}$, so that $A$ is a Hermitian matrix. If $\lambda_k\neq\lambda_j$ for $k\neq j$, then
$$
\begin{equation}
A =\operatorname{diag}\{b_1,\dots,b_n\}, \qquad b_k=\sum_{l=1}^{n_1}|\Psi_{kl}|^2.
\end{equation}
\tag{4.4}
$$
For any vector $\xi\in\mathbb{C}^n$, from (4.2) we obtain $\langle A\xi,\xi\rangle\geqslant0$ since it is obvious that $\langle \mathcal{A}^\varepsilon(\tau)\xi,\xi\rangle=|\psi^\varepsilon(\tau)\xi|^2\geqslant0$ for each $\varepsilon$. Therefore, $A$ is a non-negative Hermitian matrix, and there exists another non-negative Hermitian matrix $B$ (called the principal square root of $A$) such that $BB^*=B^2=A$. The matrix $B$ is non-singular if $\Psi$ is. Example 4.1. If $\Psi$ is a diagonal matrix $\operatorname{diag}\{\psi_1,\dots,\psi_n\}$, $\psi_j\in\mathbb{R}$, then $\mathcal{A}^\varepsilon(\tau)= |\Psi|^2$. In this case $A=|\Psi|^2$ and $B=|\Psi|=\operatorname{diag}\{|\psi_1|,\dots,|\psi_n|\}$. In fact, it is not necessary that $B$ be a Hermitian square matrix, and the argument below remains true if as $B$ we take any complex $n\times N$ matrix (for any $N\in\mathbb{N}$) satisfying the equation
$$
\begin{equation*}
BB^*=A.
\end{equation*}
\notag
$$
Now we define the effective equation for (2.11) as follows:
$$
\begin{equation}
da_k- \langle\!\langle P \rangle\!\rangle _k(a)\,d\tau =\sum_{l=1}^nB_{kl}\,d\beta^c_l, \qquad k=1,\dots,n.
\end{equation}
\tag{4.5}
$$
Here the matrix $B$ is as above and $ \langle\!\langle P \rangle\!\rangle $ is the resonant averaging of the vector field $P$. We will usually consider this equation with the same initial condition as equations (2.6) and (2.11):
$$
\begin{equation}
a(0)=v_0.
\end{equation}
\tag{4.6}
$$
Since the vector field $ \langle\!\langle P \rangle\!\rangle $ is locally Lipschitz and the dispersion matrix $B$ is constant, it follows that a strong solution of (4.5), (4.6), if exists, is unique. Note that the effective dispersion $B$ in (4.5) is a square root of an explicit matrix, and by § 3.1.3, if the vector field $P(v)$ is polynomial, then the effective drift $ \langle\!\langle P \rangle\!\rangle (a)$ is also given by an explicit formula. Proposition 4.2. The limiting probability measure $Q_0$ in (2.14) is a weak solution of effective equation (4.5), (4.6). We recall that a measure $Q \in\mathcal{P}(C([0,T];\mathbb{C}^n))$ is a weak solution of equation (4.5), (4.6) if $Q=\mathcal{D}(\tilde{a})$, where the random process $\tilde{a}(\tau)$, $0\leqslant \tau\leqslant T$, is a weak solution of (4.5), (4.6). (Concerning weak solutions of stochastic differential equations see, for example, [14], § 5.3.) The proof of this result is prefaced by a number of lemmas. Till the end of this section we assume that Assumption 2.1 holds. As in § 3, we set
$$
\begin{equation}
Y(a;\tau\varepsilon^{-1}) := (\Phi_{\tau\varepsilon^{-1}\Lambda})_*P(a).
\end{equation}
\tag{4.7}
$$
Then equation (2.11) for $a^\varepsilon$ reads
$$
\begin{equation}
da^\varepsilon(\tau)-Y(a^\varepsilon,\tau\varepsilon^{-1})\,d\tau =\Phi_{\tau\varepsilon^{-1}\Lambda}\Psi\,d\beta^c(\tau).
\end{equation}
\tag{4.8}
$$
Set
$$
\begin{equation}
\tilde y(a,\tau\varepsilon^{-1})=Y(a,\tau\varepsilon^{-1})- \langle\!\langle P \rangle\!\rangle (a) =(\Phi_{\tau\varepsilon^{-1}\Lambda})_*P(a)- \langle\!\langle P \rangle\!\rangle (a).
\end{equation}
\tag{4.9}
$$
The following key lemma shows that integrals of $\tilde y(a^\varepsilon,\tau\varepsilon^{-1})$ with respect to $\tau$ decrease with $\varepsilon$, uniformly in the interval of integration. Lemma 4.3. For a solution $a^\varepsilon(\tau)$ of equation (2.11), (2.12) we have
$$
\begin{equation*}
\mathsf{E}\max_{0\leqslant\tau\leqslant T} \biggl| \int_0^{\tau}\tilde y(a^\varepsilon(s),s\varepsilon^{-1})\,ds \biggr| \to0 \quad\textit{as $\varepsilon\to0$}.
\end{equation*}
\notag
$$
This lemma is proved at the end of this section. Now let us introduce the natural filtered measurable space
$$
\begin{equation}
(\widetilde{\Omega},\mathcal{B},\{\mathcal{B}_\tau,0\leqslant\tau\leqslant T\})
\end{equation}
\tag{4.10}
$$
for the problem we consider, where $\widetilde{\Omega}$ is the Banach space $C([0,T];\mathbb{C}^n)=\{a:= a(\,{\cdot}\,)\}$, $\mathcal{B}$ is its Borel $\sigma$-algebra, and $\mathcal{B}_\tau$ is the $\sigma$-algebra generated by the random variables $\{a(s)\colon 0\leqslant s\leqslant\tau\}$. Consider the process on $\widetilde{\Omega}$ defined by the left- hand side of (4.5):
$$
\begin{equation}
N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a) =a(\tau)-\int_0^\tau { \langle\!\langle P \rangle\!\rangle }(a(s))\,ds, \qquad a\in\widetilde{\Omega}, \quad \tau\in[0,T].
\end{equation}
\tag{4.11}
$$
Note that for any $0\leqslant\tau\leqslant T$, $N^{ \langle\!\langle P \rangle\!\rangle }(\tau;\cdot)$ is a $\mathcal{B}_\tau$-measurable continuous functional on $C([0,T];\mathbb{C}^n)$. Lemma 4.4. The random process $N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)$ is a martingale on the space (4.10) with respect to the limiting measure $Q_0$ in (2.14). Proof. Fix some $\tau\in[0,T]$ and consider a $\mathcal{B}_\tau$-measurable function $f^\tau\in C_b(\widetilde{\Omega})$. We show that
$$
\begin{equation}
\mathsf{E}^{Q_0}\bigl(N^{ \langle\!\langle P \rangle\!\rangle }(t;a)f^\tau(a)\bigr) =\mathsf{E}^{Q_0}\bigl(N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)f^\tau(a)\bigr) \quad \text{for any $\tau\leqslant t\leqslant T$},
\end{equation}
\tag{4.12}
$$
which implies the assertion. To establish this, first consider the process
$$
\begin{equation*}
N^{Y,\varepsilon}(\tau; a^\varepsilon) :=a^\varepsilon(\tau) -\int_0^\tau Y(a^\varepsilon,s\varepsilon^{-1})\,ds,
\end{equation*}
\notag
$$
which is a martingale in view of (4.8). As
$$
\begin{equation*}
N^{Y,\varepsilon}(\tau;a^\varepsilon)-N^{{ \langle\!\langle P \rangle\!\rangle }}(\tau;a^\varepsilon) =\int_0^\tau \bigl[ { \langle\!\langle P \rangle\!\rangle }(a^\varepsilon(s))-Y(a^\varepsilon(s), s\varepsilon^{-1}) \bigr]\,ds,
\end{equation*}
\notag
$$
by Lemma 4.3 we have
$$
\begin{equation}
\max_{0\leqslant\tau\leqslant T} \mathsf{E} \bigl| N^{Y,\varepsilon}(\tau; a^\varepsilon) - N^{{ \langle\!\langle P \rangle\!\rangle }}(\tau; a^\varepsilon) \bigr| =o_\varepsilon(1).
\end{equation}
\tag{4.13}
$$
Here and throughout this proof $o_\varepsilon(1)$ is a quantity tending to zero with $\varepsilon$. Since $N^{Y,\varepsilon}$ is a martingale, relation (4.13) implies that
$$
\begin{equation*}
\begin{aligned} \, \mathsf{E}\bigl(N^{{ \langle\!\langle P \rangle\!\rangle }}(t;a^\varepsilon)f^\tau(a^\varepsilon)\bigr) +o_\varepsilon(1) & =\mathsf{E}\bigl(N^{Y,\varepsilon}(t;a^\varepsilon)f^\tau(a^\varepsilon)\bigr) \\ & =\mathsf{E}\bigl(N^{Y,\varepsilon}(\tau;a^\varepsilon)f^\tau(a^\varepsilon)\bigr) \\ & =\mathsf{E} \bigl( N^{{ \langle\!\langle P \rangle\!\rangle }}(\tau;a^\varepsilon)f^\tau(a^\varepsilon) \bigr) +o_\varepsilon(1). \end{aligned}
\end{equation*}
\notag
$$
So
$$
\begin{equation}
\mathsf{E}^{ Q_\varepsilon} \bigl[ N^{{ \langle\!\langle P \rangle\!\rangle }}(t;a)f^\tau(a)-N^{{ \langle\!\langle P \rangle\!\rangle }}(\tau;a)f^\tau(a) \bigr] =o_\varepsilon(1).
\end{equation}
\tag{4.14}
$$
To obtain (4.12), in this relation we take a limit as $\varepsilon\to0$. To do this, for $M>0$ consider the function
$$
\begin{equation*}
G_M(t) =\begin{cases} t & \text{if $|t|\leqslant M$},\\ M\operatorname{sgn}t & \text{otherwise}. \end{cases}
\end{equation*}
\notag
$$
Since by Assumption 2.1 and Lemma 3.2
$$
\begin{equation*}
\mathsf{E}^{Q_\varepsilon} \Bigl(\sup_{\tau\in[0,T]}|N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)|^2\Bigr) \leqslant \mathsf{E}^{Q_\varepsilon} \Bigl[C_P\Bigl(1+\sup_{\tau\in[0,T]} |a(\tau)|^{2(m_0\vee1)}\Bigr)\Bigr] \leqslant C_{P,m_0}(|v_0|),
\end{equation*}
\notag
$$
for any $\varepsilon$ we have
$$
\begin{equation}
\mathsf{E}^{Q_\varepsilon} \bigl|(1-G_M)\circ \bigl( N^{ \langle\!\langle P \rangle\!\rangle }(t;a)f^\tau(a)-N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)f^\tau(a) \bigr) \bigr| \leqslant C M^{-1}.
\end{equation}
\tag{4.15}
$$
As $Q_{\varepsilon'_l}\rightharpoonup Q_0$, by Fatou’s lemma this estimate stays true for $\varepsilon=0$.
Relations (4.14) and (4.15) show that
$$
\begin{equation*}
\mathsf{E}^{ Q_\varepsilon} \bigl[ G_M\circ \bigl( N^{ \langle\!\langle P \rangle\!\rangle }(t;a)f^\tau(a)-N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)f^\tau(a) \bigr) \bigr] =o_\varepsilon(1)+o_{M^{-1}}(1).
\end{equation*}
\notag
$$
From this and the convergence (2.14) we derive the relation
$$
\begin{equation*}
\mathsf{E}^{Q_0} \bigl[ G_M\circ \bigl( N^{ \langle\!\langle P \rangle\!\rangle }(t;a)f^\tau(a)-N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)f^\tau(a) \bigr) \bigr] =o_{M^{-1}}(1),
\end{equation*}
\notag
$$
which in combination with (4.15)$_{\varepsilon=0}$ implies (4.12) when we let $M$ tend to $\infty$. The lemma is proved. Definition 4.5. A measure $Q$ on the space (4.10) is called a solution of the martingale problem for effective equation (4.5) with initial condition (4.6) if $a(0)=v_0$ $Q$-a. s. and 1) the process $\{N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)\in\mathbb{C}^n,\,\tau\in[0,T]\}$ (see (4.11)) is a vector martingale on the filtered space (4.10) with respect to the measure $Q$; 2) for any $k,j=1,\dots,n$ the process
$$
\begin{equation}
N^{ \langle\!\langle P \rangle\!\rangle }_k(\tau;a)\, \overline{N^{ \langle\!\langle P \rangle\!\rangle }_j}(\tau;a) -2\int_0^\tau(B B^*)_{kj}\,ds, \qquad\tau\in[0,T]
\end{equation}
\tag{4.16}
$$
(here $BB^*=A$), is a martingale on the space (4.10) with respect to the measure $Q$, as also is the process $N^{ \langle\!\langle P \rangle\!\rangle }_k(\tau;a)\,N^{ \langle\!\langle P \rangle\!\rangle }_j(\tau;a)$. This is a classical definition expressed in complex coordinates. See [24] and [14], § 5.4, where we profited from [14], Remark 4.12, and the result of [14], Problem 4.13, since the vector field ${ \langle\!\langle P \rangle\!\rangle }$ in (4.5) is locally Lipschitz by Lemma 3.2. Note that condition 2) in Definition 4.5 implies that
$$
\begin{equation*}
\bigl\langle N^{ \langle\!\langle P \rangle\!\rangle }_k(\tau;a), \overline{N^{ \langle\!\langle P \rangle\!\rangle }_j}(\tau;a) \bigr\rangle (\tau) =2\int_0^\tau(BB^*)_{kj}\,ds
\end{equation*}
\notag
$$
and
$$
\begin{equation*}
\bigl\langle N^{ \langle\!\langle P \rangle\!\rangle }_k(\tau;a), {N^{ \langle\!\langle P \rangle\!\rangle }_j}(\tau;a) \bigr\rangle (\tau) =0
\end{equation*}
\notag
$$
(see Appendix 11). We have the following assertion. Lemma 4.6. The limiting measure $Q_0$ in (2.14) is a solution of the martingale problem for effective equation (4.5), (4.6). Proof. Since condition 1) in Definition 4.5 was verified in Lemma 4.4, it remains to check condition 2). For the second term in (4.16), as $\varepsilon\to0$, we have
$$
\begin{equation}
\int_0^{\tau}\bigl(\psi^\varepsilon(s)(\psi^\varepsilon(s))^*\bigr)_{kj}\,ds =\int_0^\tau e^{i\varepsilon^{-1}(\lambda_k-\lambda_j)s}(\psi\psi^*)_{kj}\,ds \to\tau A_{kj},
\end{equation}
\tag{4.17}
$$
where the matrix $(A_{kj})$ is given by (4.3). We turn to the first term. By (2.11) and (4.1) we have
$$
\begin{equation*}
N^{Y,\varepsilon}(\tau) =v_0+\int_0^\tau \psi^\varepsilon(s) \,d\beta^c(s), \qquad \psi^\varepsilon_{lj}(s) =e^{i s\varepsilon^{-1}\lambda_l}\Psi_{lj},
\end{equation*}
\notag
$$
and therefore, by the complex Itô formula (see Appendix 12) and Assumption 2.1, for any $k,j\in\{1,\dots,n\}$ the process
$$
\begin{equation}
N_k^{Y,\varepsilon}(\tau) \overline{ N_j^{Y,\varepsilon}}(\tau) -2\int_0^\tau\bigl(\psi^\varepsilon(s)(\psi^\varepsilon(s))^*\bigr)_{kj}\,ds,
\end{equation}
\tag{4.18}
$$
is a martingale. As in the verification of condition 1), we compare (4.16) with (4.18). To do this consider
$$
\begin{equation*}
\begin{aligned} \, & N_k^{ \langle\!\langle P \rangle\!\rangle }(\tau ;a^\varepsilon)\, \overline{N_j^{{ \langle\!\langle P \rangle\!\rangle }}}(\tau;a^\varepsilon) -N_k^{Y,\varepsilon}(\tau;a^\varepsilon)\, \overline{ N_j^{Y,\varepsilon}}(\tau;a^\varepsilon) \\ &\qquad =\biggl(a_k^\varepsilon(\tau)-\int_0^\tau{ \langle\!\langle P \rangle\!\rangle }_k(a^\varepsilon(s))\,ds\biggr) \biggl(\bar a_j^\varepsilon(\tau)-\int_0^\tau{ \langle\!\langle \overline{P} \rangle\!\rangle }_j(a^\varepsilon(s))\,ds\biggr) \\ &\qquad\qquad - \biggl(a_k^\varepsilon(\tau) -\int_0^\tau Y_k(a^\varepsilon(s),s\varepsilon^{-1})\,ds \biggr) \biggl(\bar a_j^\varepsilon(\tau) -\int_0^\tau\overline Y_j(a^\varepsilon(s),s\varepsilon^{-1})\,ds\biggr) \\ &\qquad =:M_{kj}(a^\varepsilon;\tau). \end{aligned}
\end{equation*}
\notag
$$
Repeating closely the proof of (4.13) we get that
$$
\begin{equation*}
\sup_{0\leqslant \tau\leqslant T} \mathsf{E}\big| M_{kj}(a^\varepsilon; \tau)\big| =o_\varepsilon(1) \quad \text{as $\varepsilon\to0$}.
\end{equation*}
\notag
$$
Since (4.18) is a martingale, this relation and (4.17) imply that (4.16) is a martingale by the same arguments by which relations (4.13) and the fact that $N^{Y,\varepsilon}(\tau;a^\varepsilon)$ is a martingale imply that $N^{ \langle\!\langle P \rangle\!\rangle }(\tau;a)$ is too. To pass to a limit as $\varepsilon\to0$ the proof uses that the random variables like $N_k^{Y,\varepsilon}(\tau;a^\varepsilon) \overline{N_j^{Y,\varepsilon}}(\tau;a^\varepsilon)$ are integrable uniformly in $\varepsilon>0$ by Assumption 2.1, where $m'_0>m_0$.
Similarly, for any $k$ and $j$ the process $N_k^{ \langle\!\langle P \rangle\!\rangle }(\tau) {N_j^{ \langle\!\langle P \rangle\!\rangle }}(\tau)$ is also a martingale. $\Box$ Now we can prove Proposition 4.2. Proof of Proposition 4.2. It is well known that a solution of the martingale problem for a stochastic differential equation is a weak solution of it. Instead of referring to a corresponding theorem (see [24] or [14], § 5.4, following [16] again, we give a short direct proof, based on another strong result from stochastic calculus. By Lemma 4.6 and the martingale representation theorem for complex processes (see Appendix 11) we know that there exists an extension $(\widehat{\Omega},\widehat{\mathcal{B}},\widehat{\mathsf{P}})$ of the probability space $(\widetilde{\Omega},\mathcal{B},Q_0)$, and on it there exist standard independent complex Wiener processes $\beta^c_1(\tau),\dots,\beta^c_n(\tau)$ such that
$$
\begin{equation*}
da_j(\tau)-{ \langle\!\langle P \rangle\!\rangle }_j(a)\,d\tau =\sum_{l=1}^nB_{jl}\,d\beta^c_l(\tau), \quad j=1,\dots,n,
\end{equation*}
\notag
$$
where the dispersion $B$ is a non-negative Hermitian matrix satisfying $BB^*=A$. Therefore, the measure $Q_0$ is a weak solution of effective equation (4.5). $\Box$ By Lemma 3.2, in effective equation (4.5) the drift term ${ \langle\!\langle P \rangle\!\rangle }$ is locally Lipschitz. So its strong solution (if exists) is unique. By Proposition 4.2 the measure $Q_0$ is a weak solution of (4.5). Hence, by the Yamada–Watanabe theorem (see [14], § 5.3.D, or [24], Chap. 8) a strong solution of the effective equation exists, and its weak solution is unique. Therefore, the limit $Q_0=\lim_{\varepsilon_l\to0}Q_{\varepsilon_l}$ does not depend on the sequence $\varepsilon_l\to0$. So convergence holds as $\varepsilon\to0$, and thus we have established the following theorem. Theorem 4.7. For any $v_0\in\mathbb{C}^n$ the solution $a^\varepsilon(\tau;v_0)$ of problem (2.11), (2.12) satisfies
$$
\begin{equation}
\mathcal{D}(a^\varepsilon(\,{\cdot}\,;v_0))\rightharpoonup Q_0 \quad \textit{in $\mathcal{P}(C([0,T];\mathbb{C}^n))$} \quad \textit{as $\varepsilon\to0$},
\end{equation}
\tag{4.19}
$$
where the measure $Q_0$ is the law of a unique weak solution $a^0(\tau;v_0)$ of effective equation (4.5), (4.6). Remark 4.8. (i) A straightforward analysis of the proof of the theorem shows that it goes without changes if $a^\varepsilon(\tau)$ solves (2.11) with initial data $v_{\varepsilon0}$ converging to $v_0$ as $\varepsilon\to0$. So
$$
\begin{equation}
\begin{gathered} \, \mathcal{D}(a^\varepsilon(\,{\cdot}\,; v_{\varepsilon 0})) \rightharpoonup Q_0 \quad\text{in $\mathcal{P}(C([0,T];\mathbb{C}^n))$} \quad\text{as $\varepsilon\to0$}, \\ \text{if $v_{\varepsilon 0} \to v_0$ when $\varepsilon\to0$}. \notag \end{gathered}
\end{equation}
\tag{4.20}
$$
(ii) Setting $\mathcal{D} (a^\varepsilon(\,{\cdot}\,; v_0))=Q_\varepsilon \in \mathcal{P}({C([0,T];\mathbb{C}^n)})$ as before, we use Skorokhod’s representation theorem (see [2], § 6) to find a sequence $\varepsilon_j\to0$ and processes $\xi_j(\tau)$, $0\leqslant\tau\leqslant T$, $j=0,1,2,\dots$, such that $\mathcal{D}(\xi_0)=Q_0$, $\mathcal{D}(\xi_j) =Q_{\varepsilon_j}$, and $\xi_j\to\xi_0$ in $C([0,T];\mathbb{C}^n)$ almost surely. Then (2.7) and Fatou’s lemma imply that
$$
\begin{equation}
\mathsf{E}\| a^0\|^{2m_0'}_{C([0,T]; \mathbb{C}^n)} =\mathsf{E}^{Q_0} \| a\|^{2m_0'}_{C([0,T]; \mathbb{C}^n)} =\mathsf{E}\| \xi_0\|^{2m_0'}_{C([0,T]; \mathbb{C}^n)} \leqslant C_{m'_0}(|v_0|,T).
\end{equation}
\tag{4.21}
$$
The result of Theorem 4.7 admits an immediate generalization to the case when the initial data $v_0$ in (2.12) are a random variable. Amplification 4.9. Let $v_0$ be a random variable independent of the Wiener process $\beta^c(\tau)$. Then the convergence (4.19) still holds. Proof. It suffices to establish (4.19) when $a^\varepsilon$ is a weak solution of the problem. Now let $(\Omega',\mathcal{F}',\mathsf{P}')$ be another probability space and $\xi_0^{\omega'}$ be a random variable on $\Omega'$ which is distributed as $v_0$. Then $a^{\varepsilon\omega}(\tau; \xi_0^{\omega'})$ is a weak solution of (2.11), (2.12) defined on the probability space $\Omega'\times\Omega$. Take $f$ to be a bounded continuous function on $C([0,T];\mathbb{C}^n)$. Then by the above theorem, for each $\omega'\in\Omega'$
$$
\begin{equation*}
\lim_{\varepsilon\to0} \mathsf{E}^\Omega f(a^{\varepsilon \omega} (\,{\cdot}\,;\xi_0^{\omega'})) =\mathsf{E}^\Omega f(a^{0 \omega}(\,{\cdot}\,; \xi_0^{\omega'})).
\end{equation*}
\notag
$$
Since $f$ is bounded, by Lebesgue’s dominated convergence theorem we have
$$
\begin{equation*}
\begin{aligned} \, \lim_{\varepsilon\to0} \mathsf{E} f(a^\varepsilon(\,{\cdot}\,; v_0)) & =\lim_{\varepsilon\to0}\mathsf{E}^{\Omega'}\mathsf{E}^\Omega f(a^{\varepsilon\omega}(\,{\cdot}\,;\xi_0^{\omega'})) \\ & =\mathsf{E}^{\Omega'}\mathsf{E}^\Omega f(a^{0\omega}(\,{\cdot}\,;\xi_0^{\omega'})) =\mathsf{E} f(a^{0}(\,{\cdot}\,; v_0)). \end{aligned}
\end{equation*}
\notag
$$
This implies the required convergence (4.19). $\Box$ The convergence stated in the last amplification holds uniformly in the class of random initial data $v_0$ bounded almost surely by a fixed constant. To state the result we have to introduce a distance in the space of measures. Definition 4.10. Let $M$ be a Polish (that is, complete and separable) metric space. For any two measures $ \mu_1,\mu_2\in\mathcal{P}(M)$ we define the dual-Lipschitz distance between them by
$$
\begin{equation*}
\|\mu_1-\mu_2\|_{L,M}^* :=\sup_{f\in C_b (M),\, |f|_L\leqslant1} |\langle f ,\mu_1\rangle -\langle f ,\mu_2\rangle| \leqslant2,
\end{equation*}
\notag
$$
where $|f|_L=|f|_{L,M}=\operatorname{Lip}(f)+\|f\|_{C(M)}$. In the definition and below we set
$$
\begin{equation}
\langle f ,\mu\rangle :=\int_M f(m)\,\mu(dm).
\end{equation}
\tag{4.22}
$$
Example 4.11. Consider the Polish spaces $C([0,T];\mathbb{C}^n)$ and $\mathbb{C}^n$ and the mappings
$$
\begin{equation*}
\Pi_t\colon C([0,T];\mathbb{C}^n) \to \mathbb{C}^n, \quad a(\,{\cdot}\,) \mapsto a(t), \qquad 0\leqslant t\leqslant T.
\end{equation*}
\notag
$$
Noting that $ | f\circ \Pi_t|_{L, \mathbb{C}^n} \leqslant |f|_{L, C([0,T];\mathbb{C}^n)}$ for each $t$ we get that
$$
\begin{equation}
\|\Pi_t\circ \mu_1- \Pi_t\circ \mu_2\|_{L, \mathbb{C}^n}^* \leqslant \| \mu_1- \mu_2\|_{L, C([0,T];\mathbb{C}^n)}^*
\end{equation}
\tag{4.23}
$$
for all $\mu_1, \mu_2 \in \mathcal{P}(C([0,T];\mathbb{C}^n))$ and all $0\leqslant t\leqslant T$ (where $\Pi_t\circ \mu_j \in\mathcal{P}(\mathbb{C}^n)$ denotes the image of $\mu_j$ under $\Pi_t$). The dual-Lipschitz distance converts $\mathcal{P}(M)$ into a complete metric space and induces on it a topology equivalent to the weak convergence of measures (see, for example, [8], § 11.3, and [5], § 1.7). Proposition 4.12. Under the assumptions of Amplification 4.9 let the random variable $v_0$ be such that $|v_0| \leqslant R$ almost surely for some $R>0$. Then the rate of convergence in (4.19) with respect to the dual-Lipschitz distance depends only on $R$. Proof. The proof of Amplification 4.9 shows that it suffices to verify that for non-random initial data $v_0\in\overline{B}_R(\mathbb{C}^n)$ the rate of convergence in (4.19) depends only on $R$. Assume the opposite. Then there exist $\delta>0$, a sequence $\varepsilon_j\to0$, and vectors $v_j\in\overline{B}_R(\mathbb{C}^n)$ such that
$$
\begin{equation}
\|\mathcal{D}(a^{\varepsilon_j}(\,{\cdot}\,; v_j)) -\mathcal{D}(a^0(\,{\cdot}\,; v_j))\|_{L,C([0,T];\mathbb{C}^n)}^* \geqslant \delta.
\end{equation}
\tag{4.24}
$$
By the same argument as in the proof of Lemma 2.2 we know that the two sets of probability measures $\{\mathcal{D}(a^{\varepsilon_j}(\,{\cdot}\,;v_j))\}$ and $\{\mathcal{D}(a^0(\,{\cdot}\,;v_j))\}$ are pre-compact in $C([0,T];\mathbb{C}^n)$. Therefore, there exists a sequence $k_j\to\infty$ such that $\varepsilon_{k_j}\to0$, $v_{k_j}\to v_0$, and
$$
\begin{equation*}
\mathcal{D}(a^{\varepsilon_{k_j}}(\,{\cdot}\,;v_{k_j}))\rightharpoonup\widetilde{Q}_0, \quad \mathcal{D}(a^{0}(\,{\cdot}\,; v_{k_j}))\rightharpoonup Q_0 \quad \text{in $\mathcal{P}(C([0,T];\mathbb{C}^n))$}.
\end{equation*}
\notag
$$
Then
$$
\begin{equation}
\|\widetilde{Q}_0-Q_0\|_{L,C([0,T];\mathbb{C}^n)}^* \geqslant\delta.
\end{equation}
\tag{4.25}
$$
Since in the well-posed equation (4.5) the drift and dispersion are locally Lipschitz, the law $\mathcal{D}(a^0(\,{\cdot}\,;v'))$ is continuous with respect to the initial condition $v'$ (this is well known and can easily be proved using the estimate in Remark 4.8, (ii)). Therefore, $Q_0$ is the unique weak solution of the effective equation (4.5) with initial condition $a^0(0)=v_0$. By (4.20) the measure $\widetilde{Q}_0$ is also a weak solution of problem (4.5), (4.6). Hence $Q_0=\widetilde{Q}_0$. This contradicts (4.25) and proves the assertion. $\Box$ We proceed to an obvious application of Theorem 4.7 to solutions $v^\varepsilon(\tau;v_0)$ of the original equation (2.6). Consider the action mapping
$$
\begin{equation*}
(z_1,\dots,z_n)\mapsto (I_1,\dots,I_n)=:I
\end{equation*}
\notag
$$
(see (1.2)). Since the interaction representation (2.8) does not change actions, from the theorem we obtain the following assertion. Corollary 4.13. For any $v_0$,
$$
\begin{equation}
\mathcal{D}(I(v^\varepsilon(\,\cdot\,;v_0))) \rightharpoonup I\circ\mathcal{D}(a(\,{\cdot}\,;v_0)) \quad \textit{in $\mathcal{P}(C([0,T];\mathbb{R}_+^n))$}
\end{equation}
\tag{4.26}
$$
as $\varepsilon\to0$, where $a(\,{\cdot}\,;v_0)$ is the unique weak solution of effective equation (4.5), (4.6). Example 4.14. If the drift $P$ in (2.6) is globally Lipschitz, that is, $\operatorname{Lip}(P)\leqslant M$ for some $M>0$, then it is not difficult to see that Assumption 2.1 holds, so Theorem 4.7 and Corollary 4.13 apply. A more interesting example is discussed in § 9 below. Proof of Lemma 4.3. In this proof we denote by $\mathcal{H}_k(r;c_1,\dots)$, $k=1,2,\dots$, non- negative functions of $r>0$ which tend to zero with $r$ and depend on parameters $c_1,\dots$ (the dependence of the $\mathcal{H}_k$ on $T$ and $P$ is not indicated). Also, for an event $Q$ we set $\mathsf{E}_{Q}f(\xi)=\mathsf{E}\mathbf{1}_{Q}f(\xi)$.
For any $M_1\geqslant1$ we set
$$
\begin{equation*}
\mathcal{E}^1 =\mathcal{E}^{1\varepsilon}_{M_1} =\Bigl\{\omega\in\Omega\colon \sup_{0\leqslant\tau\leqslant T} |a^\varepsilon(\tau)| \leqslant M_1 \Bigr\}.
\end{equation*}
\notag
$$
By Assumption 2.1 and Chebyshev’s inequality,
$$
\begin{equation*}
\mathsf{P}(\Omega\setminus \mathcal{E}^1 ) \leqslant\mathcal{H}_1(M_1^{-1}; |v_0|).
\end{equation*}
\notag
$$
Recalling that $\tilde y$ was defined in (4.9), by Lemma 3.2 we have
$$
\begin{equation*}
|\tilde y(a^\varepsilon(s),s\varepsilon^{-1})| \leqslant |Y(a^\varepsilon(s),s\varepsilon^{-1})|+|{ \langle\!\langle P \rangle\!\rangle }(a^\varepsilon(s))| \leqslant 2 \mathcal{C}^{m_0}(P) | a^\varepsilon(s)|^{m_0}.
\end{equation*}
\notag
$$
So, abbreviating $\tilde y(a^\varepsilon(s),s\varepsilon^{-1}) $ to $\tilde y(s)$, in view of (2.7) we have:
$$
\begin{equation*}
\begin{aligned} \, \mathsf{E}_{\Omega\setminus {\mathcal{E}^1}} \max_{\tau\in[0,T]} \biggl|\int_0^{ \tau}\tilde y(s)\,ds\biggr| & \leqslant\int_0^{T}\mathsf{E} \big(\mathbf{1}_{\Omega\setminus{\mathcal{E}^1}}|\tilde y(s)|\big)\,ds \\ & \leqslant 2\mathcal{C}^{m_0}(P) (\mathsf{P}(\Omega\setminus{\mathcal{E}^1}))^{1/2} \biggl(\int_0^{T}\mathsf{E}|a^\varepsilon(s)|^{2m_0}\,ds\biggr)^{1/2} \\ & \leqslant 2\mathcal{C}^{m_0}(P) (\mathcal{H}_1(M_1^{-1}; |v_0|))^{1/2} =:\mathcal{H}_2(M_1^{-1}; |v_0|). \end{aligned}
\end{equation*}
\notag
$$
Now we must estimate $\displaystyle\mathsf{E}_{\mathcal{E}^1}\max_{\tau\in[0,T]}\biggl|\int_0^\tau\tilde y(s)\,ds\biggr|$. For any $ M_2\geqslant1$ consider the event
$$
\begin{equation*}
\mathcal{E}^2 =\mathcal{E}^{2 \varepsilon}_{M_2} =\{\omega\in\Omega\colon\|a^\varepsilon\|_{1/3}\leqslant M_2\}
\end{equation*}
\notag
$$
(see (1.4)). Then by (2.13)
$$
\begin{equation*}
\mathsf{P}(\Omega\setminus {\mathcal{E}^2}) \leqslant\mathcal{H}_3(M_2^{-1};|v_0|).
\end{equation*}
\notag
$$
Therefore,
$$
\begin{equation*}
\begin{aligned} \, \mathsf{E}_{\Omega\setminus\mathcal{E}^2} \max_{\tau\in[0,T]} \biggl|\int_0^{\tau}\tilde y(s)\,ds\biggr| & \leqslant(\mathsf{P}(\Omega\setminus{\mathcal{E}^2}))^{1/2} \biggl(C_P\int_0^{T}\mathsf{E}|a^\varepsilon(s)|^{2m_0}\,ds\biggr)^{1/2} \\ & \leqslant\mathcal{H}_4(M_2^{-1};|v_0|)\,. \end{aligned}
\end{equation*}
\notag
$$
It remains to bound $\displaystyle\mathsf{E}_{\mathcal{E}^1\cap\mathcal{E}^2}\max_{\tau\in[0,T]}\biggl|\int_0^{\tau} \tilde y(s)\,ds\biggr|$.
We set
$$
\begin{equation*}
N=\biggl[\frac{T}{\sqrt{\varepsilon}}\biggr]+1,\qquad L=\frac TN.
\end{equation*}
\notag
$$
Then $C^{-1}\sqrt{\varepsilon}\leqslant L\leqslant C\sqrt{\varepsilon}$ and $c^{-1}/\sqrt{\varepsilon}\leqslant N\leqslant c/\sqrt{\varepsilon}$ for some constants $C$ and $c$. We consider a partition of the interval $[0,T]$ by the points $\tau_l=lL$, $l=0,\dots,N$, and set
$$
\begin{equation*}
\eta_l=\int_{\tau_l}^{\tau_{l+1}}\tilde y(s)\,ds, \qquad l=0,\dots,N-1.
\end{equation*}
\notag
$$
For any $\tau\in[0,T]$ we find $l=l(\tau) $ such that $\tau\in[\tau_{l},\tau_{l+1}]$. Then
$$
\begin{equation*}
\biggl|\int_0^{\tau}\tilde y(s)\,ds\biggr| \leqslant |\eta_1|+\dots+|\eta_l| +\biggl|\int_{\tau_l}^{\tau}\tilde y(s)\,ds\biggr|.
\end{equation*}
\notag
$$
If $\omega\in\mathcal{E}^1$, then $\displaystyle\biggl|\int_{\tau_l}^{\tau}\tilde y(s)\,ds\biggr|\leqslant 2\mathcal{C}^{m_0}(P) M_1^{m_0}L$. Therefore,
$$
\begin{equation}
\mathsf{E}_{\mathcal{E}^1\cap\mathcal{E}^2} \max_{0\leqslant\tau\leqslant T} \biggl|\int_0^{\tau}\tilde y(s)\,ds\biggr| \leqslant 2 \mathcal{C}^{m_0}(P)M_1^{m_0}L +\sum_{l=0}^{N-1}\mathsf{E}_{\mathcal{E}^1\cap\mathcal{E}^2}|\eta_l|,
\end{equation}
\tag{4.27}
$$
and it remains to estimate the integrals $\mathsf{E}_{\mathcal{E}^1\cap\mathcal{E}^2}|\eta_l|$ for $ l=0,\dots,N-1$. Observe that
$$
\begin{equation*}
\begin{aligned} \, |\eta_l| & \leqslant\biggl|\int_{\tau_l}^{\tau_{l+1}} \bigl[\tilde y(a^\varepsilon(s),s\varepsilon^{-1})-\tilde y(a^\varepsilon(\tau_l),s\varepsilon^{-1})\bigr]\,ds\biggr| \\ &\qquad +\biggl|\int_{\tau_l}^{\tau_{l+1}}\tilde y(a^\varepsilon(\tau_l),s\varepsilon^{-1})\,ds\biggr| =: |U_l^1|+|U_l^2|. \end{aligned}
\end{equation*}
\notag
$$
Since $\tilde y(a^\varepsilon,\tau\varepsilon^{-1})=(\Phi_{\tau\varepsilon^{-1}\Lambda})_* P(a^\varepsilon)-{ \langle\!\langle P \rangle\!\rangle }(a^\varepsilon)$ and $P,{ \langle\!\langle P \rangle\!\rangle }\in\operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n)$, for $\omega\in{\mathcal{E}^1}\cap\mathcal{E}^2$ the integrand in $U_l^1$ is bounded by
$$
\begin{equation*}
2\mathcal{C}^{m_0}(P) M_1^{m_0}\sup_{\tau_l\leqslant s\leqslant\tau_{l+1}}|a^\varepsilon(s)-a^\varepsilon(\tau_l)| \leqslant 2\mathcal{C}^{m_0}(P) M_1^{m_0}M_2L^{1/3}.
\end{equation*}
\notag
$$
So
$$
\begin{equation*}
|U_l^1| \leqslant 2\mathcal{C}^{m_0}(P) M_1^{m_0}M_2L^{4/3}.
\end{equation*}
\notag
$$
Now consider the integral $U_l^2$. By the definition of $\tilde y(a^\varepsilon,\tau\varepsilon^{-1})$ we have
$$
\begin{equation*}
U_l^2 =\int_{\tau_l}^{\tau_{l+1}}Y(a^\varepsilon(\tau_l),s\varepsilon^{-1})\,ds -L{ \langle\!\langle P \rangle\!\rangle }(a^\varepsilon(\tau_l)) =:Z^1+Z^2.
\end{equation*}
\notag
$$
For the integral $Z^1$, making the change of variable $s=\tau_l+\varepsilon x$ for $x\in[0,L/\varepsilon]$ we have
$$
\begin{equation*}
Z^1=\varepsilon\int_0^{L/\varepsilon}Y(a^\varepsilon(\tau_l),\tau_l\varepsilon^{-1}+x)\,dx.
\end{equation*}
\notag
$$
Since
$$
\begin{equation*}
Y(a^\varepsilon(\tau_l),\tau_l\varepsilon^{-1}+x) =\Phi_{\tau_l\varepsilon^{-1}\Lambda}\circ\Phi_{x\Lambda} P(\Phi_{-x\Lambda}(\Phi_{-\tau_l\varepsilon^{-1}\Lambda}a^\varepsilon(\tau_l))),
\end{equation*}
\notag
$$
we have
$$
\begin{equation*}
\begin{aligned} \, Z^1 & =L\Phi_{\tau_l\varepsilon^{-1}\Lambda} \biggl( \frac{\varepsilon}{L} \int_0^{L/\varepsilon}\Phi_{x\Lambda} P(\Phi_{-x\Lambda}(\Phi_{-\tau_l\varepsilon^{-1}\Lambda}a^\varepsilon(\tau_l)))\,dx \biggr) \\ & =L\Phi_{\tau_l\varepsilon^{-1}\Lambda} \langle\!\langle P \rangle\!\rangle ^{L/\varepsilon} (\Phi_{-\tau_l\varepsilon^{-1}\Lambda}a^\varepsilon(\tau_l)) \end{aligned}
\end{equation*}
\notag
$$
(see definition (3.1)). As $L/\varepsilon\sim\varepsilon^{-1/2}\gg1$ and $|\Phi_{-\tau_l\varepsilon^{-1}\Lambda}a^\varepsilon(\tau_l)|=|a^\varepsilon(\tau_l)|\leqslant M_1$ for $\omega\in{\mathcal{E}^1}\cap\mathcal{E}^2$, by Lemma 3.2 the partial averaging $ \langle\!\langle P \rangle\!\rangle ^{L/\varepsilon}$ is close to the complete averaging $ \langle\!\langle P \rangle\!\rangle $. Thus,
$$
\begin{equation*}
\begin{aligned} \, & \big| Z^1 -L\Phi_{\tau_l\varepsilon^{-1}\Lambda}{ \langle\!\langle P \rangle\!\rangle } \big(\Phi_{-\tau_l\varepsilon^{-1}\Lambda}(a^\varepsilon(\tau_l))\big) \big| =\big|Z^1 -L(\Phi_{\tau_l\varepsilon^{-1}\Lambda})_* { \langle\!\langle P \rangle\!\rangle } (a^\varepsilon(\tau_l)) \big| \\ &\qquad =\big|Z^1 -L { \langle\!\langle P \rangle\!\rangle } (a^\varepsilon(\tau_l)) \big| \leqslant L\mathcal{H}_5(\sqrt{\varepsilon}; M_1, |v_0|), \end{aligned}
\end{equation*}
\notag
$$
where we have used Lemma 3.3 to obtain the second equality. In view of equality $Z^2=-L{ \langle\!\langle P \rangle\!\rangle }(a^\varepsilon(\tau_l))$ we have
$$
\begin{equation*}
|U_l^2| =| Z^1+Z^2| \leqslant L\mathcal{H}_5(\sqrt{\varepsilon};M_1,|v_0|).
\end{equation*}
\notag
$$
Thus, we have obtained
$$
\begin{equation*}
\mathsf{E}_{\mathcal{E}^1\cap{\mathcal{E}^2}}|\eta_l| \leqslant 2\mathcal{C}^{m_0}(P) M_1^{m_0}M_2L^{4/3} +L\mathcal{H}_5(\sqrt{\varepsilon};M_1,|v_0|).
\end{equation*}
\notag
$$
In combination with (4.27), this gives us the inequality
$$
\begin{equation*}
\begin{aligned} \, \mathsf{E}_{\mathcal{E}^1\cap\mathcal{E}^2} \max_{0\leqslant\tau\leqslant T} \biggl|\int_0^{\tau}\tilde y(s)\,ds\biggr| & \leqslant 2\mathcal{C}^{m_0}(P) M_1^{m_0}L \\ &\qquad +2\mathcal{C}^{m_0}(P) M_1^{m_0}M_2L^{1/3} +\mathcal{H}_5(\sqrt{\varepsilon}; M_1, |v_0|). \end{aligned}
\end{equation*}
\notag
$$
Therefore,
$$
\begin{equation*}
\begin{aligned} \, \mathsf{E}\max_{0\leqslant\tau\leqslant T} \biggl|\int_0^{\tau}\tilde y(s)\,ds\biggl| & \leqslant \mathcal{H}_2(M_1^{-1}; |v_0|)+\mathcal{H}_4(M_2^{-1}; |v_0|) \\ &\qquad +2\mathcal{C}^{m_0}(P) M_1^{m_0}(M_2+1)\varepsilon^{1/6} +\mathcal{H}_5(\sqrt{\varepsilon}; M_1,|v_0|). \end{aligned}
\end{equation*}
\notag
$$
Now, for any $\delta>0$, we perform the following procedure:
1) choose $M_1$ sufficiently large so that $\mathcal{H}_2(M_1^{-1}; |v_0|)\leqslant\delta$;
2) choose $M_2$ sufficiently large so that $\mathcal{H}_4(M_2^{-1}; |v_0|)\leqslant \delta$;
3) finally, choose $\varepsilon_\delta>0$ sufficiently small so that
$$
\begin{equation*}
2\mathcal{C}^{m_0}(P) M_1^{m_0}(M_2+1)\varepsilon^{1/3} +\mathcal{H}_5(\sqrt{\varepsilon}; M_1, |v_0|) \leqslant\delta \quad \text{if $0<\varepsilon\leqslant\varepsilon_\delta$}.
\end{equation*}
\notag
$$
We have seen that for any $\delta>0$,
$$
\begin{equation*}
\mathsf{E}\max_{0\leqslant\tau\leqslant T} \biggl| \int_0^{\tau}\tilde y(a^\varepsilon(s),s\varepsilon^{-1})\,ds \biggr| \leqslant 3\delta \quad \text{if $0<\varepsilon\leqslant\varepsilon_\delta$}.
\end{equation*}
\notag
$$
So
$$
\begin{equation*}
\mathsf{E}\max_{0\leqslant\tau\leqslant T} \biggl|\int_0^{\tau}\bigl[Y(a^\varepsilon(s),s\varepsilon^{-1}) -{ \langle\!\langle P \rangle\!\rangle }(a^\varepsilon(s))\bigr]\,ds\biggr| \to0 \quad \text{as $\varepsilon\to0$}. \quad\square
\end{equation*}
\notag
$$
5. Stationary solutions and mixing In this section we study the relationship between stationary solutions of equation (2.6) and ones of the effective equation. We recall that a solution $a(\tau)$, $\tau\geqslant0$, of (2.6) (or of effective equation (4.5)) is stationary if $\mathcal{D}(a(\tau))\equiv\mu$ for all $\tau\geqslant0$ and some measure $\mu\in\mathcal{P}(\mathbb{C}^n)$, which is called a stationary measure for the corresponding equation. Throughout this section we assume that equation (2.6) satisfies the following stronger version of Assumption 2.1. Assumption 5.1. (a) The drift $P(v)$ is a locally Lipschitz vector filed, which belongs to $\operatorname{Lip}_{m_0}(\mathbb{C}^n,\mathbb{C}^n)$ for some $m_0\in\mathbb{N}$. (b) For any $v_0\in\mathbb{C}^n$ equation (2.6) has a unique strong solution $v^\varepsilon(\tau;v_0)$, $\tau\geqslant0$, which is equal to $v_0$ at $\tau=0$. There exists $m_0'>(m_0\vee1)$ such that
$$
\begin{equation}
\mathsf{E} \sup_{T'\leqslant\tau\leqslant {T'}+1} |v^\varepsilon(\tau;v_0)|^{2m'_0} \leqslant C_{m'_0}(|v_0|)
\end{equation}
\tag{5.1}
$$
for any $T'\geqslant0$ and $\varepsilon\in(0,1]$, where $C_{m_0'}(\,{\cdot}\,)$ is a continuous non-decreasing function. (c) Equation (2.6) is mixing, that is, it has a stationary solution $v^\varepsilon_{\mathrm{st}}(\tau)$, $\mathcal{D}(v^\varepsilon_{\mathrm{st}}(\tau)) \equiv \mu^\varepsilon\in \mathcal{P}(\mathbb{C}^n)$, and
$$
\begin{equation}
\mathcal{D}(v^\varepsilon(\tau;v_0))\rightharpoonup\mu^\varepsilon \quad \text{in $\mathcal{P}(\mathbb{C}^n)$} \quad \text{as $\tau\to+\infty$},
\end{equation}
\tag{5.2}
$$
for every $v_0$. Under Assumption 5.1 equation (2.6) defines a mixing Markov process in $\mathbb{C}^n$ with transition probability $\Sigma_\tau(v)\in\mathcal{P}(\mathbb{C}^n)$, $\tau \geqslant0$, $v\in \mathbb{C}^n$, where $\Sigma_\tau(v)=\mathcal{D} v^\varepsilon(\tau; v)$; for example, see [24], § 5.4.C. We denote by $X$ the complete separable metric space $X=C([0,\infty);\mathbb{C}^n)$ with the distance
$$
\begin{equation}
\operatorname{dist}(a_1, a_2) =\sum_{N=1}^\infty 2^{-N} \frac{\|a_1-a_2\|_{C([0,N];\mathbb{C}^n)}}{1+\|a_1-a_2\|_{C([0,N];\mathbb{C}^n)}}, \qquad a_1,a_2\in X,
\end{equation}
\tag{5.3}
$$
and consider the continuous function $g(a)=\sup_{0\leqslant t\leqslant 1} |a(t)|^{2m'_0}$ on $X$. Setting $\mu^\tau_\varepsilon=\mathcal{D}(v^\varepsilon(\tau; 0))$, by the Markov property we have
$$
\begin{equation}
\mathsf{E}\sup_{T'\leqslant\tau\leqslant T'+1}|v^\varepsilon(\tau;0)|^m =\int_{\mathbb{C}^n}\mathsf{E}g(v^\varepsilon(\,{\cdot}\,;v_0)) \,\mu^{T'}_\varepsilon(dv_0),
\end{equation}
\tag{5.4}
$$
and
$$
\begin{equation}
\mathsf{E} \sup_{0\leqslant\tau\leqslant 1} |v^\varepsilon_{\mathrm{st}}(\tau;0)|^{m} =\int_{\mathbb{C}^n}\mathsf{E}g(v^\varepsilon(\,{\cdot}\,; v_0))\,\mu^\varepsilon(dv_0).
\end{equation}
\tag{5.5}
$$
The left-hand side of (5.4) was estimated in (5.1). To estimate (5.5) we take the limit as $T'\to\infty$ on the right-hand side of (5.4) using (5.2). To do this we start with the following lemma. Lemma 5.2. Let $n_1,n_2\in \mathbb{N}$, let $\mathcal{B}\subset\mathbb{R}^{n_1}$ be a closed convex set which contains more than one point, and let $F\colon \mathcal{B}\to\mathbb{R}^{n_2}$ be a Lipschitz mapping. Then $F$ can be extended to a map $\widetilde{F}\colon \mathbb{R}^{n_1} \to \mathbb{R}^{n_2}$ in such a way that (a) $\operatorname{Lip}(\widetilde{F})=\operatorname{Lip}(F)$; (b) $\widetilde{F}(\mathbb{R}^{n_1})=F(\mathcal{B})$. Proof. Let $\Pi\colon \mathbb{R}^{n_1}\to\mathcal{B}$ be the projection taking each point in $\mathbb{R}^{n_1}$ to a nearest point in $\mathcal{B}$. Then $\operatorname{Lip}(\Pi)=1$ (see Appendix 13) and $\widetilde{F}=F \circ \Pi$ is obviously a required extension of $F$. $\Box$ Since $\mathcal{C}^{m_0}(P) =: C_*<\infty$, we see that for any $M\in\mathbb{N}$ both the norm of the restriction of $P$ to $\overline{B}_M(\mathbb{C}^n)$ and its Lipschitz constant are bounded by $(1+ M)^{m_0} C_*$. In view of the above lemma we can extend $P|_{\overline{B}_M(\mathbb{C}^n)}$ to a Lipschitz mapping $P^M\!\colon \mathbb{C}^n\to \mathbb{C}^n$ such that
$$
\begin{equation*}
\operatorname{Lip}(P^M) \leqslant (1+M)^{m_0} C_*\quad\text{and}\quad \sup| P^M(v)| \leqslant (1+M)^{m_0} C_*.
\end{equation*}
\notag
$$
Given a solution $v(\tau)$ of equation (2.6), consider the stopping time
$$
\begin{equation*}
\tau_M=\inf\{t\geqslant0\colon | v(t)| \geqslant M\}
\end{equation*}
\notag
$$
and denote by $v^{\varepsilon M} $ the stopped solution:
$$
\begin{equation*}
v^{\varepsilon M} (\tau; v_0)=v^\varepsilon(\tau\wedge \tau_M; v_0).
\end{equation*}
\notag
$$
Then we note that the process $v^{\varepsilon M}$ does not change if we replace $P(v)$ by $P^M(v)$ in (2.6). So $v^{\varepsilon M} (\tau;v_0)$ is a stopped solution of a stochastic equation with Lipschitz coefficients, and thus the curve $v^{\varepsilon M \omega}(\,{\cdot}\,; v_0)\in X$ depends continuously on $v_0$ for each $M\in \mathbb{N}$. Since
$$
\begin{equation*}
g(v^{\varepsilon M}) \leqslant g(v^\varepsilon) \quad \text{a. s.},
\end{equation*}
\notag
$$
in view of (5.2) and (5.1), for all $M$ and $N>0$ we have
$$
\begin{equation*}
\int \mathsf{E}(N\wedge g)(v^{\varepsilon M} (\,\cdot\,; v)) \, \mu^\varepsilon(dv) =\lim_{{T'}\to\infty} \int \mathsf{E}(N\wedge g)(v^{\varepsilon M} (\,{\cdot}\,; v)) \,\mu^{T'}_\varepsilon(dv) \leqslant C_m(0)
\end{equation*}
\notag
$$
(to obtain the last inequality from (5.1) we have used the Markov property). Passing to the limit as $N\to\infty$ on the left-hand side and using the monotone convergence theorem we see that
$$
\begin{equation}
\int\mathsf{E}g(v^{\varepsilon M} (\,{\cdot}\,; v)) \, \mu^\varepsilon(dv) \leqslant C_m(0).
\end{equation}
\tag{5.6}
$$
Since for every $v$, $g(v^{\varepsilon M} (\,{\cdot}\,; v)) \nearrow g(v^\varepsilon (\,{\cdot}\,; v)) \leqslant\infty$ almost surely as $M\to\infty$, we can use the last theorem again to derive from (5.6) that
$$
\begin{equation*}
\int\mathsf{E}g(v^\varepsilon(\,{\cdot}\,;v))\,\mu^\varepsilon(dv) \leqslant C_m(0).
\end{equation*}
\notag
$$
Recalling (5.5) we obtain the following assertion. Lemma 5.3. The stationary solution $v^\varepsilon_{\mathrm{st}}(\tau)$ satisfies (5.1) for $C_{m_0'}(|v_0|)$ replaced by $C_{m_0'}(0)$. Consider the interaction representation for $v^\varepsilon_{\mathrm{st}}$, $v^\varepsilon_{\mathrm{st}}(\tau)=\Phi_{-\tau\varepsilon^{-1}\Lambda}a^\varepsilon(\tau)$ (note that $a^\varepsilon$ is not a stationary process!). Then $a^\varepsilon(\tau)$ satisfies equation (2.11), so for any $N\in\mathbb{N}$ the system of measures $\{\mathcal{D}(a^\varepsilon|_{[0,N]}),\,0<\varepsilon\leqslant1\}$ is tight in view of (5.1) (for the same reason as in § 2.4). We choose a sequence $\varepsilon_l\to0$ (depending on $N$) such that
$$
\begin{equation*}
\mathcal{D}(a^{\varepsilon_l}|_{[0,N]})\rightharpoonup Q_0 \quad \text{in $\mathcal{P}(C([0,N];\mathbb{C}^n))$}.
\end{equation*}
\notag
$$
Applying the diagonal process and replacing $\{\varepsilon_l\}$ by a subsequence, which we still denote by $\{\varepsilon_l\}$, we achieve that $\mathcal{D} a^{\varepsilon_l} \rightharpoonup Q_0$ in $\mathcal{P}(X)$ (see (5.3)). Since $a^\varepsilon(0)=v_{\mathrm{st}}^\varepsilon(0)$, we have
$$
\begin{equation*}
\mu^{\varepsilon_l}\rightharpoonup \mu^0 :=Q_0\big|_{\tau=0}.
\end{equation*}
\notag
$$
Let $a^0(\tau)$ be a process in $\mathbb{C}^n$ such that $\mathcal{D}(a^0)=Q_0$. Then
$$
\begin{equation}
\mathcal{D} (a^{\varepsilon_l}(\tau))\rightharpoonup \mathcal{D}(a^{0}(\tau)) \quad \forall\,0\leqslant \tau <\infty.
\end{equation}
\tag{5.7}
$$
In particular, $\mathcal{D}(a^0(0) )=\mu^0$. Proposition 5.4. (1) The limiting process $a^0$ is a stationary weak solution of effective equation (4.5), and $\mathcal{D}(a^0(\tau))\equiv\mu^0$, $\tau\in[0,\infty)$. In particular, limit points as $\varepsilon\to0$ of the system of stationary measures $\{\mu^\varepsilon,\,\varepsilon\in(0,1]\}$ are stationary measures of the effective equation. (2) Any limiting measure $\mu^0$ is invariant under operators $\Phi_{\theta\Lambda}$, $\theta\in\mathbb{R}$. So
$$
\begin{equation*}
\mathcal{D}(\Phi_{\theta\Lambda}a^0(\tau))=\mu^0
\end{equation*}
\notag
$$
for all $\theta\in\mathbb{R}$ and $\tau\in[0,\infty)$. Proof. (1) Using Lemma 5.3 and repeating the argument in the proof of Proposition 4.2 we obtain that $a^0$ is a weak solution of the effective equation. It remains to prove that it is stationary.
Take any bounded Lipschitz function $f$ on $\mathbb{C}^n$ and consider
$$
\begin{equation*}
\mathsf{E}\int_0^{1}f(v_{\mathrm{st}}^{\varepsilon_l}(\tau))\,d\tau =\mathsf{E}\int_0^1 f(\Phi_{-\tau\varepsilon_l^{-1}\Lambda}a^{\varepsilon_l}(\tau)) \,d\tau.
\end{equation*}
\notag
$$
Using the same argument as in the proof of Lemma 4.3 (but applying it to the averaging of functions rather than vector fields) we obtain
$$
\begin{equation}
\begin{aligned} \, & \mathsf{E}\int_0^{1} f(v_{\mathrm{st}}^{\varepsilon_l}(\tau))\,d\tau -\mathsf{E}\int_0^{1}\langle f\rangle(a^{\varepsilon_l}(\tau))\,d\tau \notag \\ &\qquad =\mathsf{E}\int_0^{1} \bigl[f(\Phi_{-\tau\varepsilon_l^{-1}\Lambda}a^{\varepsilon_l}(\tau)) -\langle f\rangle(a^{\varepsilon_l}(\tau))\bigr]\,d\tau \to0 \quad \text{as $\varepsilon_l\to0$}. \end{aligned}
\end{equation}
\tag{5.8}
$$
By Lemma 3.6
$$
\begin{equation*}
\langle f\rangle(a^{\varepsilon_l}(\tau)) =\langle f\rangle (\Phi_{\tau\varepsilon_l^{-1}\Lambda}v_{\mathrm{st}}^{\varepsilon_l}(\tau)) =\langle f\rangle (v_{\mathrm{st}}^{\varepsilon_l}(\tau))
\end{equation*}
\notag
$$
for every $\tau$. Since the process $v_{\mathrm{st}}^{\varepsilon_l}(\tau)$ is stationary, it follows that
$$
\begin{equation*}
\mathsf{E}f (v_{\mathrm{st}}^{\varepsilon_l}(\tau)) =\text{Const} \quad\text{and}\quad \mathsf{E}\langle f\rangle (a^{\varepsilon_l}(\tau)) =\mathsf{E}\langle f\rangle (v_{\mathrm{st}}^{\varepsilon_l}(\tau)) =\text{Const}'.
\end{equation*}
\notag
$$
So from (5.8) we obtain
$$
\begin{equation}
\mathsf{E}f(v_{\mathrm{st}}^{\varepsilon_l}(\tau)) -\mathsf{E}\langle f\rangle (a^{\varepsilon_l}(\tau)) \to0 \quad \text{as $\varepsilon_l\to0$}
\end{equation}
\tag{5.9}
$$
for all $\tau$.
For any $\tau$ consider $\tilde f_\tau\! =f\mathrel{\circ}\Phi_{\tau \varepsilon_l^{-1}\Lambda}$. Then $f(a^{\varepsilon_l}(\tau))=\tilde f_\tau (v^{\varepsilon_l}_{\mathrm{st}}(\tau))$. Since $\langle f \rangle= \langle \tilde f_\tau \rangle$ by assertion (4) of Lemma 3.6, applying (5.9) to $\tilde f_\tau$ we obtain
$$
\begin{equation*}
\lim_{\varepsilon_l\to0} \mathsf{E}f(a^{\varepsilon_l}(\tau)) =\lim_{\varepsilon_l\to0} \mathsf{E}\tilde f_\tau(v_{\mathrm{st}}^{\varepsilon_l} (\tau)) =\lim_{\varepsilon_l\to0} \mathsf{E}\langle\tilde f_\tau \rangle(a^{\varepsilon_l} (\tau)) =\lim_{\varepsilon_l\to0} \mathsf{E}\langle f\rangle(a^{\varepsilon_l} (\tau)).
\end{equation*}
\notag
$$
From this relation, (5.9), and (5.7) we find that $\displaystyle\mathsf{E}f(a^0(\tau))=\int f(v) \,\mu^0(dv)$ for each $\tau$ and every $f$ as above. This implies the first assertion of the lemma.
(2) Passing to the limit in (5.9) with the use of (5.7) we have
$$
\begin{equation*}
\int f (v)\,\mu^0(dv) =\mathsf{E}\langle f\rangle(a^0(\tau)) \quad \forall\tau.
\end{equation*}
\notag
$$
Using this relation for $f:= f\circ \Phi_{\theta\Lambda}$ and then for $f:=f$ we get that
$$
\begin{equation*}
\int f\circ \Phi_{\theta\Lambda} (v) \mu^0(dv)=\mathsf{E}\langle f \circ\Phi_{\theta\Lambda} \rangle (a^0(\tau))=\mathsf{E}\langle f \rangle (a^0(\tau))= \int f (v)\,\mu^0(dv),
\end{equation*}
\notag
$$
for any $\theta\in \mathbb{R}$ and any $\tau$, for every bounded Lipschitz function $f$. This implies the second assertion. $\Box$ If effective equation (4.5) is mixing, then it has a unique stationary measure. In this case the measure $\mu^0$ in Proposition 5.4 does not depend on the choice of the sequence $\varepsilon_l\to0$, and so $\mu^\varepsilon \rightharpoonup\mu^0$ as $\varepsilon\to0$. Therefore, we have the following result. Theorem 5.5. If, in addition to Assumption 5.1, the effective equation is mixing and $\mu^0$ is its unique stationary measure, then
$$
\begin{equation*}
\mu^\varepsilon\rightharpoonup\mu^0 \quad\textit{in $\mathcal{P}(\mathbb{C}^n)$} \quad\textit{as $\varepsilon\to0$}.
\end{equation*}
\notag
$$
Moreover, the measure $\mu^0$ is invariant under all operators $\Phi_{\theta\Lambda}$, and the law of the stationary solution of equation (2.6), as expressed in the interaction presentation, converges to the law of the stationary solution of effective equation (4.5). We recall that Theorem 4.7 and Corollary 4.13 only ensure that on finite time intervals $\tau\in [0,T]$ the actions of solutions of (2.6) converge in law, as $\varepsilon\to0$, to the actions of solutions of the effective equation with the same initial data. By contrast, the entire stationary measure for equation (2.6) converges to the stationary measure for the effective equation as $\varepsilon\to0$. This important fact was originally observed in [9] for a special class of equations (2.6). Corollary 5.6. Under the assumption of Theorem 5.5, for any $v_0\in\mathbb{C}^n$ we have
$$
\begin{equation*}
\lim_{\varepsilon\to0}\lim_{\tau\to\infty}\mathcal{D}(v^\varepsilon(\tau;v_0))=\mu^0.
\end{equation*}
\notag
$$
Proof. Since $\lim_{\tau\to\infty}\mathcal{D}(v^\varepsilon(\tau))=\mu^\varepsilon$, the result follows from Theorem 5.5. $\Box$ Remark 5.7. We decomplexify $\mathbb{C}^n$ to obtain $\mathbb{R}^{2n}$ and write the effective equation in the real coordinates $\{x=(x_1,\dots,x_{2n})\}$:
$$
\begin{equation*}
d x_j(\tau)-{ \langle\!\langle P \rangle\!\rangle }_j(x)\,d\tau =\sum_{l=1}^{2n}\mathcal{B}_{jl}\,dW_l(\tau), \qquad j=1,\dots,2n,
\end{equation*}
\notag
$$
where the $W_l$ are independent standard real Wiener processes. Then the stationary measure $\mu^0\in\mathcal{P}(\mathbb{R}^{2n})$ satisfies the stationary Fokker–Planck equation
$$
\begin{equation*}
\frac{1}{2}\sum_{l=1}^{2n}\sum_{j=1}^{2n} \frac{\partial^2}{\partial x_l\,\partial x_j}(\mathcal{B}_{lj}\mu^0) =\sum_{l=1}^{2n}\frac{\partial}{\partial x_l}( \langle\!\langle P \rangle\!\rangle _l(x)\mu^0)
\end{equation*}
\notag
$$
in the sense of distributions. If the dispersion matrix $\Psi$ is non-singular, then so is the diffusion $\mathcal{B}$, and since the drift $ \langle\!\langle P \rangle\!\rangle (x)$ is locally Lipschitz, by the standard theory of the Fokker–Planck equation we have $\mu^0=\varphi(x)\,dx$, where $\varphi\in C^1(\mathbb{R}^{2n})$. (For example, first of all, Theorem 1.6.8 from [3] implies that $\mu^0=\varphi(x)\,dx$, where $\varphi$ is a Hölder function, and then, by the usual elliptic regularity, $\varphi\in C^1$.)
6. The non-resonant case Assume that the frequency vector $\Lambda=(\lambda_1,\dots,\lambda_n)$ is non-resonant (see (3.7)). In § 3.1.2 we saw that in this case the vector field $ \langle\!\langle P \rangle\!\rangle $ can be calculated via the averaging (3.8) and commutes with all rotations $\Phi_w$, $w\in \mathbb{R}^n$. For any $j\in\{1,\dots,n\}$ set $w^{j,t}:=(0,\dots0,t,0,\dots,0) $ (only the $j$th entry is non-zero). Consider $ \langle\!\langle P \rangle\!\rangle _1(z)$ and write it in the form $ \langle\!\langle P \rangle\!\rangle _1(z)=z_1R_1(z_1,\dots,z_n)$ for some complex function $R_1$. Since for $w=w^{1,t}$ we have $\Phi_w(z)=(e^{it}z_1,z_2,\dots,z_n)$, now the first component in relation (3.10) reads
$$
\begin{equation*}
e^{-it}z_1R_1(e^{-it}z,z_1,\dots,z_n) =e^{-it}z_1R_1(z_1,\dots,z_n),
\end{equation*}
\notag
$$
for every $t$. So
$$
\begin{equation*}
R_1(e^{-it}z_1,z_2,\dots,z_n) \equiv R_1(z_1,\dots,z_n)
\end{equation*}
\notag
$$
and $R_1(z_1,\dots,z_n)$ depends only on $|z_1|$, rather than on $z_1$. In a similar way we verify that $R_1(z_1,\dots,z_n)$ depends only on $|z_2|,\dots,|z_n|$. Therefore,
$$
\begin{equation*}
\langle\!\langle P \rangle\!\rangle _1(z)=z_1R_1(|z_1|,\dots,|z_n|).
\end{equation*}
\notag
$$
The same is true for any $ \langle\!\langle P \rangle\!\rangle _j(z)$. Then we obtain the following statement. Proposition 6.1. If Assumption 2.1 holds and the frequency vector $\Lambda$ is non- resonant, then $ \langle\!\langle P \rangle\!\rangle $ satisfies (3.8), and (1) ${ \langle\!\langle P \rangle\!\rangle }_j(a)=a_j R_j(|a_1|,\dots,|a_n|)$, $j=1,\dots,n$; (2) the effective equation reads
$$
\begin{equation}
da_j(\tau)-a_jR_j(|a_1|,\dots,|a_n|)\,d\tau =b_j\,d\beta^c_j(\tau), \qquad j=1,\dots, n,
\end{equation}
\tag{6.1}
$$
where $b_j=\bigl(\sum_{l=1}^n|\Psi_{jl}|^2\bigr)^{1/2}$ (and $a\mapsto(a_1R_1,\dots,a_nR_n)$ is a locally Lipschitz vector field); (3) if $a(\tau)$ is a solution of (6.1), then the vector of its actions
$$
\begin{equation*}
I(\tau)=(I_1,\dots,I_n)(\tau)\in \mathbb{R}_+^n
\end{equation*}
\notag
$$
is a weak solution of the equation
$$
\begin{equation}
\begin{gathered} \, dI_j(\tau) -2I_j (\operatorname{Re}R_j) \big(\sqrt{2 I_1},\dots,\sqrt{2 I_n}\big)\,d\tau-b_j^2 \,d\tau = b_j \sqrt{2 I_j} \,d W_j(\tau), \\ I_j(0)=\frac12 |v_{0j}|^2, \qquad j=1,\dots,n, \notag \end{gathered}
\end{equation}
\tag{6.2}
$$
where $\{W_j\}$ are independent standard real Wiener processes; (4) if, in addition, the assumptions of Theorem 5.5 are met and the matrix $\Psi$ is non-singular, then the stationary measure $\mu^0$ has the form $d\mu^0=p(I)\,dI\,d\varphi$, where $p$ is a continuous function on $\mathbb{R}^n_+$ which is $C^1$-smooth away from the boundary $\partial \mathbb{R}^n_+$. Proof. Assertion (1) was proved above, and (2) follows from it and (4.4).
(3) Writing the diffusion in effective equation (4.5) as in (6.1) and applying Itô’s formula (see Appendix 12) to $I_j=|a_j|^2/2$ we get that
$$
\begin{equation}
dI_j(\tau) -\frac12\,\big(\bar a_ j \langle\!\langle P \rangle\!\rangle _j+a_j \overline{ \langle\!\langle P \rangle\!\rangle }_j\bigr)\,d\tau - b_j^2 \,d\tau = b_j \langle a_j(\tau), d\beta_j(\tau)\rangle =: b_j |a_j|\,d\xi_j(\tau),
\end{equation}
\tag{6.3}
$$
where $d\xi_j(\tau)= \langle a_j/| a_j|, d\beta_j(\tau)\rangle$ (see (2.2)) and for $a_j=0$ we set $a_j/| a_j|$ to be $1$. Using (1) we see that the left-hand side of (6.3) is the same as in (6.2). Since $\bigl|a_j/| a_j| \bigr|(\tau)\equiv1$ for each $j$, by Lévy’s theorem (for example, see [ 14], p. 157) $\xi(\tau)=(\xi_1,\dots,\xi_n)(\tau)$ is a standard $n$-dimensional Wiener process and (3) follows.
(4) By Theorem 5.5 the stationary measure $\mu^0$ is invariant under the action of all operators $\Phi_{\theta\Lambda}$, $\theta\in\mathbb{R}$. Since the curve $\theta\mapsto\theta\Lambda\in\mathbb{T}^n$ is dense in $\mathbb{T}^n$, the measure $\mu^0$ is invariant under all operators $\Phi_w$, $w\in\mathbb{T}^n$. As the matrix $\Psi$ is non-singular, we have $d\mu^0=\widetilde{p}(z)\,dz$ by Remark 5.7, where $\widetilde{p}$ is a $C^1$-smooth function ($dz$ is the volume element in $\mathbb{C}^n\simeq \mathbb{R}^{2n}$). Let us write $z_j=\sqrt{ 2I_j}\,e^{i\varphi_j}$. Then $d\mu^0=p(I,\varphi)dz$. In the coordinates $(I,\varphi)$ the operators $\Phi_w$ has the form $(I,\varphi)\mapsto(I,\varphi+w)$. Since $\mu^0$ is invariant under all of them, $p$ does not depend on $\varphi$. So $d\mu^0=p(I)\,dz=p(I)\,dI\,d\varphi$ and (4) holds. $\Box$ By assertion (3) of this proposition, in the non-resonant case equation (6.2) describes the asymptotic behaviour as $\varepsilon\to0$ of the actions $I_j$ of solutions of (2.6). But how regular is this equation? Let
$$
\begin{equation*}
r_j=|a_j|=\sqrt{2I_j}, \qquad 1\leqslant j\leqslant n,
\end{equation*}
\notag
$$
denote the moduli of components of the vector $a\in\mathbb{C}^n$, consider the smooth polar coordinate mapping
$$
\begin{equation*}
\mathbb{R}_+^n\times \mathbb{T}^n \to \mathbb{C}^n, \qquad (r,\varphi)\mapsto (r_1e^{i\varphi_1},\dots,r_ne^{i\varphi_n}),
\end{equation*}
\notag
$$
and extend it to a mapping
$$
\begin{equation*}
\Phi\colon \mathbb{R}^n \times \mathbb{T}^n \to \mathbb{C}^n
\end{equation*}
\notag
$$
defined by the same formula. The $j$th component of the drift in equation (6.2), written in the form (6.3) without the Itô term $b_j^2\,d\tau$, is $\operatorname{Re}\bigl(a_j \overline{ \langle\!\langle P \rangle\!\rangle _j(a)}\bigr)$. By (3.8), in the polar coordinates we can express it as
$$
\begin{equation*}
\begin{aligned} \, & \frac1{(2\pi)^n} \int_{\mathbb{T}^n} \operatorname{Re}\bigl(r_j e^{i\varphi_j} \overline{ e^{iw_j} P_j(r,\varphi-w)}\bigr)\,dw \\ &\qquad =\frac{1}{(2\pi)^n} \int_{\mathbb{T}^n} \operatorname{Re}\bigl(r_j e^{i\theta_j} \overline P_j(r, \theta) \bigr)\,d\theta =: F_j(r), \qquad r\in\mathbb{R}^n, \end{aligned}
\end{equation*}
\notag
$$
where $F_j(r)$ is a continuous function, which vanishes with $r_j$. Since the integrand in the second integral does not change if for some $l=1,\dots,n$ we replace $r_l$ by $-r_l$ and $\theta_l$ by $\theta_l+\pi$, $F_j(r)$ is even in each variable $r_l$. So it can be expressed as follows:
$$
\begin{equation*}
F_j (r_1,\dots,r_n) =f_j (r_1^2,\dots,r_n^2), \qquad f_j \in C(\mathbb{R}^n_+),
\end{equation*}
\notag
$$
where $f_j(x_1,\dots,x_n)$ vanishes with $x_j$. Now assume that the vector field $P$ is $C^2$-smooth. In this case the integrand in the integral for $F_j$ is $C^2$-smooth with respect to $(r,\theta) \in \mathbb{R}^n\times \mathbb{T}^n$, and $F_j$ is a $C^2$-smooth function of $r$. Then by a result of Whitney (see Theorem 1 in [27] for $s=1$ and the remark concluding that paper) $f_j$ extends to $\mathbb{R}^n$ in such a way that $f_j(x)$ is $C^1$-smooth in each variable $x_l$ and $(\partial/\partial x_l) f_j(x)$ is continuous on $\mathbb{R}^n$. So $f_j$ is $C^1$-smooth. Since $r_j^2=2I_j$, we have established the following result. Proposition 6.2. If the frequency vector $\Lambda$ is non-resonant and $P$ is $C^2$-smooth, then equation (6.2) can be written as
$$
\begin{equation}
dI_j(\tau)- G_j(I_1,\dots,I_n) \,d\tau-b_j^2 \,d\tau = b_j \sqrt{2 I_j} \,d W_j(\tau), \quad 1\leqslant j\leqslant n,
\end{equation}
\tag{6.4}
$$
where $G$ is a $C^1$-smooth vector field such that $G_j(I)$ vanishes with $I_j$ for each $j$. We stress that, although, due to the square-root singularity in the dispersion, the averaged $I$-equation (6.4) is a stochastic equation possibly without uniqueness of a solution, the limiting law $\mathcal{D}(I(\,{\cdot}\,))$ for the actions of solutions of (2.6) is still uniquely defined by Corollary 4.13.
7. Convergence uniform in time In this section we investigate the convergence in distribution, uniformly in time, of solutions of (2.11) to ones of effective equation (4.5), with respect to the dual- Lipschitz metric (see Definition 4.10). These results are finite-dimensional versions of those in [11] for stochastic PDEs. Throughout this section the following assumption hold. Assumption 7.1. The first two parts of Assumption 5.1 hold, and the following condition is fulfilled instead of (c). - ${\rm (c')}$ Effective equation (4.5) is mixing with stationary measure $\mu^0$. For any solution $a(\tau)$, $\tau\geqslant0$, of it such that $\mathcal{D}(a(0))=:\mu$ and $\langle|z|^{2m_0'},\mu(dz)\rangle=\mathsf{E}|a(0)|^{2m_0'}\leqslant M'$ for some $M'>0$ (recall the notation (4.22)) we have
$$
\begin{equation}
\|\mathcal{D}(a(\tau))-\mu^0\|_{L,\mathbb{C}^n}^*\leqslant g_{M'}(\tau, d) \quad \forall \tau\geqslant0 \quad \text{if } \|\mu-\mu^0\|_{L,\mathbb{C}^n}^*\leqslant d\leqslant2.
\end{equation}
\tag{7.1}
$$
Here the function $g\colon \mathbb{R}_+^3\to\mathbb{R}_+$, $(\tau, d, M) \mapsto g_M(\tau,d)$, is continuous, vanishes with $d$, converges to zero as $\tau\to\infty$, and for each fixed $M\geqslant0$ the function $(\tau, d)\mapsto g_M(\tau, d)$ is uniformly continuous in $d$ for $(\tau,d)\in [0,\infty)\times[0,2]$ (so that $g_M$ extends to a continuous function on $[0,\infty]\times [0,2]$ that vanishes for $\tau=\infty$ and for $d=0$). We emphasize that now we assume mixing for the effective equation, but not for the original equation (2.6). Since Assumption 7.1 implies Assumption 2.1, the assertions in § 4 hold for solutions of equations (2.11), which we analyze in this section, for any $T>0$. Proposition 7.2. Assume that the first two parts of Assumption 5.1 hold, equation (4.5) is mixing with stationary measure $\mu^0$, and for each $M>0$ and any $v^1,v^2\in \overline{B}_M(\mathbb{C}^n)$
$$
\begin{equation}
\|\mathcal{D} a(\tau;v^1)-\mathcal{D} a(\tau;v^2) \|_{L,\mathbb{C}^n}^* \leqslant \mathfrak{g}_M(\tau),
\end{equation}
\tag{7.2}
$$
where $\mathfrak{g}$ is a continuous function of $(M,\tau)$ that tends to zero as $\tau \to\infty$ and is a non-decreasing function of $M$. Then condition ${\rm (c')}$ holds for some function $g$. The proposition is proved below, at the end of this section. Note that (7.2) holds (for $\mathfrak{g}$ replaced by $2\mathfrak{g}$) if
$$
\begin{equation}
\|\mathcal{D} a(\tau;v^1)-\mu^0 \|_{L,\mathbb{C}^n}^* \leqslant \mathfrak{g}_M(\tau) \quad \forall\,v^1\in\overline{B}_M(\mathbb{C}^n).
\end{equation}
\tag{7.3}
$$
Usually, a proof of mixing for (4.5) actually establishes (7.3). So condition ${\rm (c')}$ is a rather mild restriction. Example 7.3. If the assumptions of Proposition 9.4 below are fulfilled, then (7.2) is satisfied, since in this case (7.3) holds for $\mathfrak{g}_M(\tau)=\overline{V}(M)e^{-c\tau}$. Here $c>0$ is a constant and $\overline{V}(M)=\max\{V(x)\colon x\in \overline{B}_M(\mathbb{C}^n)\}$, where $V(x)$ is the Lyapunov function as in Proposition 9.3; see, for example, Theorem 2.5 in [22] and § 3.3 of [20]. Theorem 7.4. Under Assumption 7.1, for any $v_0\in\mathbb{C}^n$
$$
\begin{equation*}
\lim_{\varepsilon\to0}\, \sup_{\tau\geqslant0} \|\mathcal{D}(a^\varepsilon(\tau;v_0))-\mathcal{D}(a^{0}(\tau;v_0))\|_{L,\mathbb{C}^n}^* =0,
\end{equation*}
\notag
$$
where $a^\varepsilon(\tau;v_0)$ and $a^{0}(\tau;v_0)$ solve (2.11) and (4.5), respectively, for the same initial condition $a^\varepsilon(0;v_0)=a^{0}(0;v_0)=v_0$. Proof. Since $v_0$ is fixed, we abbreviate $a^\varepsilon(\tau; v_0)$ to $a^\varepsilon(\tau)$. By (5.1)
$$
\begin{equation}
\mathsf{E}| a^\varepsilon(\tau)|^{2m'_0}\leqslant C_{m'_0}(|v_0|) =:M^* \quad \forall\, \tau\geqslant0.
\end{equation}
\tag{7.4}
$$
By (7.4) and (4.19) we have3[x]3Indeed, for any $N>0$ the estimate with $|a|^{2m'_0}$ replaced by $|a|^{2m'_0}\wedge N$ follows from the convergence $\mathcal{D} a^\varepsilon(\tau;v_0) \rightharpoonup \mathcal{D} a^0(\tau;v_0)$. Then the required estimate follows from Fatou’s lemma as $N\to\infty$.
$$
\begin{equation}
\mathsf{E}| a^{0}(\tau;v_0)|^{2m'_0} =\langle |a|^{2m'_0}, \mathcal{D} a^0(\tau;v_0)\rangle \leqslant M^* \quad \forall\,\tau\geqslant0.
\end{equation}
\tag{7.5}
$$
Since $\mathcal{D} a^0(\tau;0) \rightharpoonup \mu^0$ as ${\tau}\to\infty$, from the above estimate for $v_0=0$ we get that
$$
\begin{equation}
\langle |a|^{2m'_0}, \mu^0 \rangle \leqslant C_{m'_0} (0) =: C_{m'_0}.
\end{equation}
\tag{7.6}
$$
For later use we note that, since we have only used parts (a) and (b) of Assumption 5.1 and the fact that equation (4.5) is mixing to derive estimates (7.5) and (7.6), these two estimates hold under the assumptions of Proposition 7.2.
The constants in the estimates below depend on $M^*$, but this dependence is usually not indicated. For any $T\geqslant0$ we denote by $a_T^0(\tau)$ a weak solution of effective equation (4.5) such that
$$
\begin{equation*}
\mathcal{D} a^0_T(0) =\mathcal{D} a^\varepsilon(T).
\end{equation*}
\notag
$$
Note that $a^0_T(\tau)$ depends on $\varepsilon$ and that $a^0_0(\tau)=a^0(\tau; v_0)$.
Lemma 7.5. Fix any $\delta>0$. Then the following assertions are true. (1) For any $T>0$ there exists $\varepsilon_1 =\varepsilon_1(\delta,T)>0$ such that if $\varepsilon\leqslant \varepsilon_1$, then
$$
\begin{equation}
\sup_{\tau\in[0,T]} \|\mathcal{D}(a^\varepsilon(T'+\tau)) -\mathcal{D}(a_{T'}^{0}(\tau))\|_{L,\mathbb{C}^n}^* \leqslant\frac{\delta}{2} \quad \forall\, T'\geqslant0.
\end{equation}
\tag{7.7}
$$
(2) Choose $T^*=T^*(\delta)>0$ such that $g_{M^*}(T,2)\leqslant\delta/4$ for each $T\geqslant T^*$. Then there exists $\varepsilon_2=\varepsilon_2(\delta)>0$ such that if $\varepsilon\leqslant\varepsilon_2$ and $\|\mathcal{D}(a^\varepsilon(T'))-\mu^0\|_{L,\mathbb{C}^n}^*\leqslant\delta$ for some $T'\geqslant0$, then also
$$
\begin{equation}
\|\mathcal{D}(a^\varepsilon(T'+T^*))-\mu^0\|_{L,\mathbb{C}^n}^* \leqslant\delta,
\end{equation}
\tag{7.8}
$$
and
$$
\begin{equation}
\sup_{\tau\in[T',T'+T^*]} \| \mathcal{D}(a^\varepsilon(\tau))-\mu^0\|_{L,\mathbb{C}^n}^* \leqslant\frac{\delta}{2} +\sup_{\tau\geqslant0}g_{M^*}(\tau,\delta).
\end{equation}
\tag{7.9}
$$
Below we abbreviate $\|\cdot\|^*_{L,\mathbb{C}^n}$ to $\|\cdot\|^*_L$. Given a measure $\nu\in\mathcal{P}(\mathbb{C}^n)$, denote by $a^\varepsilon(\tau;\nu)$ a weak solution of equation (2.11) such that $\mathcal{D} (a^\varepsilon(0)) =\nu$, and define $a^0(\tau;\nu)$ similarly. Since equation (2.11) defines a Markov process in $\mathbb{C}^n$ (for example, see [14], § 5.4.C, and [17], § 3.3), we have
$$
\begin{equation*}
\mathcal{D} a^\varepsilon(\tau;\nu) =\int_{\mathbb{C}^n}\mathcal{D}a^\varepsilon(\tau;v)\,\nu(dv),
\end{equation*}
\notag
$$
and a similar relation holds for $\mathcal{D}a^0(\tau;\nu)$. Proof of Lemma 7.5. Set $\nu^\varepsilon=\mathcal{D} (a^\varepsilon(T'))$. Then
$$
\begin{equation}
\mathcal{D} (a^\varepsilon(T'+\tau)) =\mathcal{D} (a^\varepsilon(\tau; \nu^\varepsilon))\quad\text{and} \quad \mathcal{D} (a^0_{T'}(\tau)) = \mathcal{D} (a^0(\tau; \nu^\varepsilon)).
\end{equation}
\tag{7.10}
$$
By (7.4), for any $\delta>0$ there exists $K_\delta>0$ such that for each $\varepsilon$ we have $\nu^\varepsilon(\mathbb{C}^n \setminus \overline{B}_{K_\delta} )\leqslant \delta/8$, where $\overline{B}_{K_\delta}:=\overline{B}_{K_\delta}(\mathbb{C}^n)$. So
$$
\begin{equation*}
\nu^\varepsilon =A^\varepsilon \nu^\varepsilon_\delta +\bar A^\varepsilon \bar\nu^\varepsilon_\delta, \quad\text{where } A^\varepsilon=\nu^\varepsilon(\overline{B}_{K_\delta}), \quad \bar A^\varepsilon =\nu^\varepsilon(\mathbb{C}^n\setminus\overline{B}_{K_\delta}),
\end{equation*}
\notag
$$
and $\nu^\varepsilon_\delta$ and $\bar\nu^\varepsilon_\delta$ are the conditional probabilities $\nu^\varepsilon(\,\cdot \mid \overline{B}_{K_\delta})$ and $\nu^\varepsilon(\,\cdot \mid\mathbb{C}^n\setminus\overline{B}_{K_\delta})$. Accordingly,
$$
\begin{equation}
\mathcal{D} (a^\kappa (\tau; \nu^\varepsilon) ) = A^\varepsilon \mathcal{D} (a^\kappa (\tau; \nu^\varepsilon_\delta) ) +\bar A^\varepsilon \mathcal{D} (a^\kappa (\tau; \bar\nu^\varepsilon_\delta)),
\end{equation}
\tag{7.11}
$$
where $\kappa=\varepsilon$ or $\kappa=0$. Therefore,
$$
\begin{equation*}
\begin{aligned} \, \|\mathcal{D}(a^\varepsilon (\tau; \nu^\varepsilon)) -\mathcal{D}(a^0 (\tau; \nu^\varepsilon))\|_L^* & \leqslant A^\varepsilon\|\mathcal{D}(a^\varepsilon(\tau;\nu^\varepsilon_\delta)) -\mathcal{D} (a^0 (\tau; \nu^\varepsilon_\delta))\|_L^* \\ &\qquad +\bar A^\varepsilon\|\mathcal{D}(a^\varepsilon(\tau;\bar\nu^\varepsilon_\delta)) -\mathcal{D}(a^0(\tau;\bar\nu^\varepsilon_\delta))\|_L^*. \end{aligned}
\end{equation*}
\notag
$$
The second term on the right-hand side is obviously bounded by $2\bar A^\varepsilon\leqslant\delta/4$. On the other hand, by Proposition 4.12 and (4.23) there exists $\varepsilon_1>0$, depending only on $K_\delta$ and $T$, such that for $0\leqslant \tau\leqslant T$ and $\varepsilon\in(0,\varepsilon_1]$ the first term on the right-hand side is $\leqslant\delta/4$. In view of (7.10) this proves the first assertion.
To prove the second assertion we choose $\varepsilon_2= \varepsilon_1(\delta/2, T^*(\delta))$. Then from (7.7), (7.4), (7.1), and the definition of $T^*$, for $\varepsilon\leqslant\varepsilon_2$ we obtain
$$
\begin{equation*}
\begin{aligned} \, \|\mathcal{D}(a^\varepsilon(T'+T^*))-\mu^0\|_L^* & \leqslant\|\mathcal{D}(a^\varepsilon(T'+T^*)) -\mathcal{D}(a^{0}_{T'}(T^*))\|_L^* \\ &\qquad +\|\mathcal{D}(a^{0}_{T'}(T^*))-\mu^0\|_L^*\leqslant \delta. \end{aligned}
\end{equation*}
\notag
$$
This proves (7.8). Next, in view of (7.7) and (7.1), (7.5),
$$
\begin{equation*}
\begin{aligned} \, \sup_{\theta\in[0,T^*]}\|\mathcal{D}(a^\varepsilon(T'+\theta))-\mu^0\|_L^* & \leqslant\sup_{\theta\in[0,T^*]} \|\mathcal{D}(a^\varepsilon(T'+\theta)) -\mathcal{D}(a_{T'}^{0}(\theta))\|_L^* \\ &\qquad +\sup_{\theta\in[0,T^*]}\| \mathcal{D}(a_{T'}^{0}(\theta))-\mu^0\|_L^* \\ & \leqslant\frac{\delta}{2}+\max_{\theta \in[0,T^*] }g_{M^*} (\theta, \delta). \end{aligned}
\end{equation*}
\notag
$$
This implies (7.9). $\Box$ Now we continue the proof of the theorem. Fix an arbitrary $\delta>0$ and take some $\delta_1$, $0<\delta_1\leqslant \delta/4$. In the proof below the functions $\varepsilon_1$, $\varepsilon_2$, and $T^*$ are as in Lemma 7.5. (i) By the definition of $T^*$, (7.1), and (7.4),
$$
\begin{equation}
\|\mathcal{D}(a^0_{T'}(\tau))-\mu^0\|_L^* \leqslant \delta_1 \quad \forall\, \tau\geqslant T^*(\delta_1),
\end{equation}
\tag{7.12}
$$
for any $T'\geqslant0$. We abbreviate $T^*(\delta_1)$ to $T^*$. (ii) By (7.7), if $\varepsilon\leqslant \varepsilon_1=\varepsilon_1(\delta_1, T^*)>0$, then
$$
\begin{equation}
\sup_{0 \leqslant \tau \leqslant T^*} \bigl\|\mathcal{D}(a^\varepsilon(\tau)) -\mathcal{D}(a^0(\tau;v_0))\bigr\|_L^* \leqslant \frac{\delta_1}2.
\end{equation}
\tag{7.13}
$$
In particular, in view of (7.12) for $T'=0$,
$$
\begin{equation}
\|\mathcal{D}(a^\varepsilon(T^*))-\mu^0 \|_L^* < 2\delta_1.
\end{equation}
\tag{7.14}
$$
(iii) By (7.14) and (7.8) for $\delta:=2\delta_1$ and $T'=nT^*$, $n=1,2,\dots$ , we obtain recursively the inequalities
$$
\begin{equation}
\|\mathcal{D}(a^\varepsilon(nT^*))-\mu^0\|_L^* \leqslant 2\delta_1 \quad \forall\, n\in\mathbb{N},
\end{equation}
\tag{7.15}
$$
for $\varepsilon\leqslant\varepsilon_2=\varepsilon_2(2\delta_1)$. (iv) Now by (7.15) and (7.9) for $\delta:=2\delta_1$, for any $n\in\mathbb{N}$ and $0\leqslant \theta\leqslant T^*$ we have
$$
\begin{equation}
\|\mathcal{D}(a^\varepsilon (nT^* +\theta))-\mu^0 \|_L^* \leqslant \delta_1+\sup_{\theta\geqslant0} g_{M^*} (\theta, 2\delta_1)
\end{equation}
\tag{7.16}
$$
if $\varepsilon\leqslant\varepsilon_2(2\delta_1)$. (v) Finally, let $\varepsilon\leqslant\varepsilon_\# (\delta_1) =\min\{\varepsilon_1(\delta_1, T^*),\varepsilon_2(2\delta_1)\}$; then by (7.13) (if $\tau\leqslant T^*$) and by (7.12)+(7.16) (if $\tau\geqslant T^*$) we have
$$
\begin{equation*}
\|\mathcal{D}(a^\varepsilon(\tau))-\mathcal{D}(a^0 (\tau;v_0))\|_L^* \leqslant 2\delta_1+\sup_{\theta\geqslant0} g_{M^*}(\theta, 2\delta_1) \quad \forall\, \tau\geqslant0.
\end{equation*}
\notag
$$
By the assumption imposed on the function $g_{M}$ in ${\rm (c')}$, $g_{M}(t, d)$ is uniformly continuous in $d$ and vanishes at $d=0$. So there exists $\delta^*=\delta^*(\delta)$, which we may assume to be $\leqslant \delta/4$, such that if $\delta_1 =\delta^*$, then $g_{M^*}(\theta, 2\delta_1) \leqslant \delta/2$ for every $\theta\geqslant0$. Then by the above estimate
$$
\begin{equation*}
\|\mathcal{D}(a^\varepsilon(\tau))-\mathcal{D} a^0 (\tau;v_0))\|_L^* \leqslant \delta \quad \text{if } \varepsilon \leqslant\varepsilon_*(\delta) :=\varepsilon_\# (\delta^*(\delta))>0,
\end{equation*}
\notag
$$
for every positive $\delta$. $\Box$ Since the interaction representation does not change actions, for the action variables of solutions of the original equations (2.6) we have the following assertion. Corollary 7.6. Under the assumptions of Theorem 7.4 the actions of a solution $v^\varepsilon(\tau; v_0)$ of (2.6) that equals $v_0$ at $\tau=0$ satisfy
$$
\begin{equation*}
\lim_{\varepsilon\to0} \sup_{\tau\geqslant0}\|\mathcal{D}(I(v^\varepsilon(\tau;v_0))) -\mathcal{D}(I(a^{0}(\tau;v_0)))\|_{L,\mathbb{C}^n}^* =0.
\end{equation*}
\notag
$$
In [9], Theorem 2.9, the assertion of this corollary was proved for a class of systems (2.6). The proof in [9] is based on the observation that the mixing rate in the corresponding equation (2.6) is uniform in $\varepsilon$ for $0<\varepsilon\leqslant1$. This is a delicate property, which is more difficult to establish than $(\mathrm{c}')$ in Assumption 7.1. We also note that Theorem 7.4 immediately implies that if equations (2.11) are mixing with stationary measures $\mu^\varepsilon$, then $\mu^\varepsilon \rightharpoonup \mu^0$; cf. Theorem 5.5. Proof of Proposition 7.2. In this subsection we write solutions $a^0(\tau; v)$ of effective equation (4.5) as $a(\tau;v)$. We prove the assertion of Proposition 7.2 in four steps.
(i) At this step, for any non-random $v^1, v^2 \in \overline{B}_M(\mathbb{C}^n)$ we use the notation $a_j(\tau) := a(\tau; v^j)$, $j=1,2$, and examine the distance $\|\mathcal{D}(a_1(\tau))-\mathcal{D}(a_2(\tau))\|_L^*$ as a function of $\tau$ and $|v^1-v^2|$. Set $w(\tau)=a_1(\tau) -a_2(\tau)$ and assume that $|v^1-v^2|\leqslant \bar d $ for some $\bar d\geqslant0$. Then
$$
\begin{equation*}
\dot w ={ \langle\!\langle P \rangle\!\rangle }(a_1)-{ \langle\!\langle P \rangle\!\rangle }(a_2).
\end{equation*}
\notag
$$
Since by Lemma 3.2 and Assumption 2.1, (a),
$$
\begin{equation*}
|{ \langle\!\langle P \rangle\!\rangle }(a_1(\tau))-{ \langle\!\langle P \rangle\!\rangle }(a_2(\tau))| \leqslant C |w(\tau)|\, X(\tau),
\end{equation*}
\notag
$$
where
$$
\begin{equation*}
X(\tau)=1+|a_1(\tau)|^{ m_0}\vee |a_2(\tau)|^{m_0},
\end{equation*}
\notag
$$
we have $(d/d\tau)|w|^2\leqslant 2CX(\tau)|w|^2$, where $|w(0)|\leqslant \bar d$. So
$$
\begin{equation}
|w(\tau)| \leqslant\bar d \exp\biggl(C\int_0^\tau X(l)\,dl\biggr).
\end{equation}
\tag{7.17}
$$
Set $ Y(T)=\sup_{0\leqslant \tau \leqslant T } |X(\tau)|$. By (5.1) estimate (2.7) holds for $C_{m_0'}(|v_0|, T)=C_{m_0'}(M) (T+1)$. Hence we have
$$
\begin{equation*}
\mathsf{E}Y(T) \leqslant (C_{m'_0}(M)+1) (T+1)
\end{equation*}
\notag
$$
by Remark 4.8, (ii) (since $m_0'>(m_0\vee1)$).
For $K>0$ denote the event $\{Y(T)\geqslant K\}$ by $\Omega_K(T)$. Then
$$
\begin{equation*}
\mathsf{P}(\Omega_K(T))\leqslant (C_{m'_0}(M)+1)(T+1)K^{-1},
\end{equation*}
\notag
$$
and
$$
\begin{equation*}
\int_0^\tau X(l)\,dl \leqslant \tau K
\end{equation*}
\notag
$$
for $\omega\notin \Omega_K(T)$. From this and (7.17) we see that if $f$ is such that $|f|\leqslant1$ and $\operatorname{Lip}(f)\leqslant1$, then
$$
\begin{equation}
\begin{aligned} \, & \mathsf{E}\bigl(f(a_1(\tau))-f(a_2(\tau))\bigr) \leqslant 2\mathsf{P}(\Omega_K(\tau))+\bar d e^{C\tau K} \notag \\ &\qquad = 2(C_{m'_0}(M)+1) (\tau +1) K^{-1}+\bar d e^{C\tau K} \quad \forall\,K>0. \end{aligned}
\end{equation}
\tag{7.18}
$$
Let us denote by $g^1_M(\tau,\bar d)$ the function on the right-hand side for $K=\ln\ln(\bar d^{-1}\vee 3)$. This is a continuous function of $(\tau,\bar d,M)\in\mathbb{R}_+^3$, which vanishes for $\bar d=0$. By (7.2) and (7.18),
$$
\begin{equation}
\begin{aligned} \, & \|\mathcal{D}(a(\tau;v^1))-\mathcal{D}(a(\tau;v^2)) \|_L^* = \| \mathcal{D}(a_1(\tau))-\mathcal{D}(a_2(\tau)) \|_L^* \notag \\ &\qquad\qquad \leqslant\mathfrak{g}_M(\tau) \wedge g_M^1(\tau,\bar d) \wedge 2 =: g_M^2(\tau,\bar d) \quad \text{if $|v^1-v^2|\leqslant \bar d$}. \end{aligned}
\end{equation}
\tag{7.19}
$$
The function $g_M^2$ is continuous in the variables $(\tau,\bar d, M)$, vanishes with $\bar d$, and tends to zero as $\tau \to\infty$ since $\mathfrak g_M(\tau)$ does.
(ii) At this step we consider a solution $a^0(\tau;\mu)=:a(\tau;\mu)$ of effective equation (4.5) for $\mathcal{D}(a(0))=\mu$ as in Assumption 7.1, $(\mathrm{c}')$ and examine the left-hand side of (7.1) as a function of $\tau$. For any $M>0$ consider the conditional probabilities
$$
\begin{equation*}
\mu_M=\mathsf{P}(\cdot\mid \overline{B}_M(\mathbb{C}^n)) \quad\text{and}\quad \overline{\mu}_M =\mathsf{P}(\cdot\mid \mathbb{C}^n\setminus \overline{B}_M(\mathbb{C}^n)).
\end{equation*}
\notag
$$
Then
$$
\begin{equation}
\mathcal{D}(a(\tau;\mu)) =A_M\mathcal{D}(a(\tau;\mu_M) ) +\bar A_M\mathcal{D}(a(\tau;\overline{\mu}_M)),
\end{equation}
\tag{7.20}
$$
where
$$
\begin{equation*}
A_M=\mu(\overline{B}_M(\mathbb{C}^n))\quad \text{and}\quad \bar A_M= \mu(\mathbb{C}^n\setminus \overline{B}_M(\mathbb{C}^n))
\end{equation*}
\notag
$$
(cf. (7.11)). As $\mathsf{E}|a(0)|^{2m_0'} \leqslant M'$, thus $\bar A_M=\mathsf{P}\{a(0)>M\}\leqslant M'/M^{2m_0'}$. Since equation (4.5) is assumed to be mixing, we have
$$
\begin{equation*}
\|\mathcal{D} (a(\tau; 0)) -\mu^0\|_L^* \leqslant \overline{g}(\tau),
\end{equation*}
\notag
$$
where $\overline{g}\geqslant0$ is a continuous function which tends to $0$ as $\tau\to\infty$. So in view of (7.2),
$$
\begin{equation*}
\|\mathcal{D} (a(\tau; v)) -\mu^0\|_L^* \leqslant \mathfrak{g}_M(\tau)+\overline{g}(\tau) =:\widetilde{g}_M(\tau) \quad \forall\,v\in \overline{B}_M(\mathbb{C}^n).
\end{equation*}
\notag
$$
Thus,
$$
\begin{equation*}
\begin{aligned} \, \|\mathcal{D} (a(\tau;\mu_M)) -\mu^0\|_L^* & =\biggl\| \int[\mathcal{D} (a(\tau; v))]\,\mu_M(dv) -\mu^0 \biggr\|_L^* \\ & \leqslant\int\|\mathcal{D} (a(\tau; v)) -\mu^0\|_L^*\,\mu_M(dv) \leqslant \widetilde{g}_M(\tau). \end{aligned}
\end{equation*}
\notag
$$
Therefore, by (7.20),
$$
\begin{equation*}
\begin{aligned} \, \|\mathcal{D}(a(\tau;\mu))-\mu^0\|_L^* & \leqslant A_M \|\mathcal{D}(a(\tau;\mu_M))-\mu^0\|_L^* +\bar A_M \|\mathcal{D}(a(\tau;\overline{\mu}_M))-\mu^0\|_L^* \\ & \leqslant \|\mathcal{D}(a(\tau;\mu_M))-\mu^0\|_L^*+2\bar A_M \leqslant \widetilde{g}_M(\tau) +2\,\frac{M'}{M^{2m'_0}} \end{aligned}
\end{equation*}
\notag
$$
for any $M>0$ and $\tau\geqslant0$. Let $M_1(\tau)>0$ be a continuous non-decreasing function, growing to infinity with $\tau$, and such that $\widetilde{g}_{M_1(\tau)}(\tau)\to0$ as $\tau\to\infty$ (it exists since $\widetilde{g}_M(\tau)$ is a continuous function of $(M,\tau)$, tending to zero as $\tau\to\infty$ for each fixed $M$). Then
$$
\begin{equation}
\|\mathcal{D}(a(\tau;\mu))-\mu^0\|_L^* \leqslant 2\,\frac{M'}{M_1(\tau)^{2m_0'}} +\widetilde{g}_{M_1(\tau)}(\tau) =:\widehat{g}_{M'}(\tau).
\end{equation}
\tag{7.21}
$$
Clearly, $\widehat{g}_{M'}(\tau)\geqslant0$ is a continuous function on $\mathbb{R}_+^2$, which converges to $0$ as $\tau\to\infty$.
(iii) Now we examine the left-hand side of (7.1) as a function of $\tau$ and $d$. Recall that the Kantorovich distance between two measures $\nu_1$ and $\nu_2$ on $\mathbb{C}^n$ is
$$
\begin{equation*}
\|\nu_1-\nu_2\|_{\mathrm{K}} =\sup_{\operatorname{Lip}(f)\leqslant1} (\langle f, \nu_1\rangle-\langle f,\nu_2\rangle) \leqslant\infty.
\end{equation*}
\notag
$$
Obviously $\|\nu_1-\nu_2\|_L^* \leqslant\|\nu_1 -\nu_2\|_{\mathrm{K}}$. By (7.6) and the assumption on $\mu$ the $2m_0'$- moments of $\mu$ and $\mu^0$ are bounded by $M'\vee C_{m_0'}$, so that
$$
\begin{equation}
\|\mu-\mu^0\|_{\mathrm{K}}\leqslant\widetilde{C} (M'\vee C_{m_0'}) ^{\gamma_1} d^{\gamma_2} :=D, \quad\text{where } \gamma_1=\frac{1}{2m'_0}, \quad \gamma_2=\frac{2m'_0-1}{2m_0'}
\end{equation}
\tag{7.22}
$$
(see [ 5], § 11.4, and [ 26], Chap. 7). Hence by the Kantorovich–Rubinstein theorem (see [ 26] and [ 5]) there exist random variables $\xi$ and $\xi_0$, defined on a new probability space $(\Omega',\mathcal{F}',\mathsf{P}')$, such that $\mathcal{D}(\xi)=\mu$, $\mathcal{D}(\xi_0)=\mu^0$, and
$$
\begin{equation}
\mathsf{E}\,|\xi_1 -\xi_0| =\|\mu- \mu^0\|_{\mathrm{K}}.
\end{equation}
\tag{7.23}
$$
Then using (7.19) and denoting by $a_{\mathrm{st}}(\tau)$ a stationary solution of equation (4.5) such that $\mathcal{D}a_{\mathrm{st}}(\tau))\equiv\mu^0$, we have
$$
\begin{equation*}
\begin{aligned} \, \|\mathcal{D} (a(\tau))-\mu^0\|_L^* & =\|\mathcal{D} (a(\tau; a(0)) )- \mathcal{D} (a_{\mathrm{st}}(\tau))\|_L^* \\ & \leqslant \mathsf{E}^{\omega'} \| \mathcal{D}(a(\tau; \xi^{\omega'}) ) - \mathcal{D} (a(\tau; \xi_0^{\omega'} )) \|_L^* \\ & \leqslant \mathsf{E}^{\omega'} g_{\overline{M}}^2(\tau, |\xi^{\omega'} -\xi_0^{\omega'}|), \qquad \overline{M} =\overline{M}^{\omega'} =|\xi^{\omega'}|\vee |\xi_0^{\omega'}|. \end{aligned}
\end{equation*}
\notag
$$
As $\mathsf{E}^{\omega'}\, {\overline{M}}^{2m'_0} \leqslant 2 (M'\vee C_{m_0'})$ by (7.5) and the assumption on $\mu$, setting $Q'_K=\{\overline{M}\geqslant K\}\subset\Omega'$, for any $K>0$ we have
$$
\begin{equation*}
\mathsf{P}^{\omega'} (Q'_K) \leqslant 2K^{-2m_0'} (M'\vee C_{m_0'}).
\end{equation*}
\notag
$$
Since $g_M^2\leqslant 2$ and for $\omega'\notin Q'_K$ we have $|\xi^{\omega'}|,|\xi_0^{\omega'}|\leqslant K$, it follows that
$$
\begin{equation*}
\| \mathcal{D} (a(\tau) )- \mu^0\|_L^* \leqslant 4K^{-2m_0'} (M'\vee C_{m_0'}) +\mathsf{E} ^{\omega'} g_K^2(\tau, |\xi^{\omega'} -\xi_0^{\omega'}|).
\end{equation*}
\notag
$$
Now let $\Omega'_r =\{ |\xi^{\omega'} -\xi_0^{\omega'}| \geqslant r\}$. Then $\mathsf{P}^{\omega'}\Omega'_r\leqslant D r^{-1}$ by (7.23) and (7.22). Hence
$$
\begin{equation}
\|\mathcal{D}(a(\tau))-\mu^0\|_L^* \leqslant 4K^{-2m'_0} (M'\vee C_{m'_0})+2D r^{-1}+g_K^2(\tau,r) \quad \forall\,\tau\geqslant0, \forall\, K,r>0.
\end{equation}
\tag{7.24}
$$
(iv) The end of the proof. Let $g_0(s)$ be a positive continuous function on $\mathbb{R}_+$ such that $g_0(s)\to\infty$ as $s\to+\infty$ and $|C_{m_0'}(g_0(s))(\ln\ln s)^{-1/2}|\leqslant 2C_{m_0'}(0)$ for $s\geqslant3$. Taking $r=D^{1/2}$ and choosing $K=g_0(r^{-1})$ on the right-hand side of (7.24) we denote this right-hand side by $g_{M'}^3(\tau,r)$ (so that we have substituted $D=r^2$ and $ K=g_0(r^{-1})$ into (7.24)). By (7.24) and the definition of $g_M^2$ (see (7.18) and (7.19)) we have
$$
\begin{equation*}
\begin{aligned} \, g_{M'}^3(\tau,r) & \leqslant 4(g_0(r^{-1}))^{-2m'_0}(M'\vee C_{m'_0})+2r \\ &\qquad +2\bigl(C_{m'_0}(g(r^{-1}))+1\bigr)(\ln\ln(r^{-1}\vee 3))^{-1} +r\exp(C\tau\ln\ln (r^{-1}\vee3)). \end{aligned}
\end{equation*}
\notag
$$
By the choice of $g_0$, as $r\to0$, the first, second, and fourth terms converge to zero. The third term is $\leqslant4(C_{m'_0}(0)+1)(\ln\ln(r^{-1}))^{-1/2}$ for $r\leqslant 1/3$, so it also tends to zero with $r$. Hence $g_{M'}^3(\tau,r)$ defines a continuous function on $\mathbb{R}_+^3$ which vanishes with $r$. Using the expression for $D$ in (7.22) we can write $r=D^{1/2}$ as $r=R_{M'}(d)$, where $R$ is a continuous function $\mathbb{R}_+^2 \to \mathbb{R}_+$ which is non-decreasing in $d$ and vanishes with $d$. Setting $g^4_{M'} (\tau, d)=g^3_{M'} (\tau, R_{M'}(d \wedge 2))$, from the above we obtain
$$
\begin{equation*}
\| \mathcal{D}(a(\tau))-\mu^0\|_L^* \leqslant g^4_{M'} (\tau, d ) \quad\text{if $\| \mu -\mu^0\|_L^* \leqslant d\leqslant2$}.
\end{equation*}
\notag
$$
Finally, recalling (7.21) we arrive at (7.1) for $g=g^5$, where
$$
\begin{equation*}
g^5_{M'} (\tau, d) =g^4_{M'} (\tau, d) \wedge \widehat{g}_{M'}(\tau) \wedge 2.
\end{equation*}
\notag
$$
The function $g^5$ is continuous, vanishes with $d$, and converges to zero as $\tau\to\infty$. For any fixed $M'>0$ this convergence is uniform in $d$ due to the term $\widehat{g}_{M'}(\tau)$. So for fixed $M'>0$ the function $(\tau,d)\mapsto g_{M'}^5(\tau,d)$ extends to a continuous function on the compact set $[0,\infty]\times[0,2]$, where it vanishes for $\tau=\infty$. Thus, $g_{M'}^5$ is uniformly continuous in $d$. $\Box$
8. Averaging for systems with general noises In this section we sketch a proof of Theorem 4.7 for equations (1.1) with the general stochastic term $\sqrt\varepsilon\,\mathcal{B}(v)\,dW$. The proof follows the argument in § 4, but an extra difficulty appears in the case of equations with non-additive degenerate noises. Consider the $v$-equation (2.6) with a general (possibly non-additive) noise and decomplexify it by writing the components $v_k(\tau)$ as $(\widetilde{v}_{2k-1}(\tau), \widetilde{v}_{2k}(\tau))\in\mathbb{R}^2$, $k=1,\dots,n$. Now a solution $v(\tau)$ is a vector in $\mathbb{R}^{2n}$, and the equation reads
$$
\begin{equation}
dv(\tau) +\varepsilon^{-1} Av(\tau) \,d\tau =P(v(\tau))\,d\tau+\mathcal{B}(v(\tau)) \,d\beta(\tau), \qquad v(0)=v_0\in\mathbb{R}^{2n}.
\end{equation}
\tag{8.1}
$$
Here $A$ is a block-diagonal matrix as in § 2, $\mathcal{B}(v)$ is a real $2n\times n_2$ matrix, and $\beta(\tau)=(\beta_1(\tau),\dots,\beta_{n_2}(\tau))$, where $\{\beta_j(\tau)\}$ are independent standard real Wiener processes. Note that in the real coordinates in $\mathbb{R}^{2n} \simeq\mathbb{C}^n$, for $w\in\mathbb{R}^n$ the operator $\Phi_w$ in (2.10) is given by the block-diagonal matrix such that its $j$th diagonal block, $j=1,\dots, n$, is the $2\times2$ matrix of rotation through an angle of $w_j$. In this section we make the following assumption. Assumption 8.1. The drift $P$ belongs to $\operatorname{Lip}_{m_0}(\mathbb{R}^{2n},\mathbb{R}^{2n})$, the matrix function $\mathcal{B}(v)$ belongs to $\operatorname{Lip}_{m_0} \bigl(\mathbb{R}^{2n},\operatorname{Mat}(2n\times n_2)\bigr)$, equation (8.1) is well posed, and its solutions satisfy (2.7). Going over to the interaction representation $v(\tau)=\Phi_{\tau \varepsilon^{-1} \Lambda} a(\tau)$ we rewrite the equation as
$$
\begin{equation}
da(\tau) =\Phi_{\tau \varepsilon^{-1} \Lambda} P(v(\tau))\,d\tau +\Phi_{\tau \varepsilon^{-1} \Lambda} \mathcal{B}(v(\tau))\,d\beta(\tau), \qquad a(0)=v_0.
\end{equation}
\tag{8.2}
$$
As in § 4, we will see that, as $\varepsilon\to0$, the asymptotic behaviour of the distributions of solutions of this equation is described by an effective equation. As before, the effective drift is $ \langle\!\langle P \rangle\!\rangle (a)$. To calculate the effective dispersion, as in the proof of Lemma 4.6, we consider the martingale
$$
\begin{equation*}
N^{Y,\varepsilon} :=a^\varepsilon(\tau)-\int_0^\tau Y(a^\varepsilon(s),s\varepsilon^{-1})\,ds =v_0+\int_0^\tau\mathcal{B}^{\Lambda}(a^\varepsilon(s);s\varepsilon^{-1})\,d\beta(s),
\end{equation*}
\notag
$$
where $Y$ is defined in (4.7) and $\mathcal{B}^{\Lambda}(a;t)=\Phi_{t\Lambda}\mathcal{B}(\Phi_{-t\Lambda}a)$. By Itô’s formula, for $i,j=1,\dots,n$ the process
$$
\begin{equation*}
N_i^{Y,\varepsilon}(\tau) N_j^{Y,\varepsilon}(\tau) -\int_0^\tau \mathcal{A}_{ij}^\Lambda(a^\varepsilon(s);s\varepsilon^{-1})\,ds, \qquad (\mathcal{A}_{ij}^\Lambda(a;t)) =\mathcal{B}^{\Lambda}(a;t)\mathcal{B}^{\Lambda*}(a;t),
\end{equation*}
\notag
$$
where $\mathcal{B}^{\Lambda*}$ is the transpose of $\mathcal{B}^\Lambda$, is also a martingale. By straightforward analogy with Lemma 3.2, the limit
$$
\begin{equation*}
\mathcal{A}^0(a) :=\lim_{T\to\infty}\frac{1}{T}\int_0^T\mathcal{A}^{\Lambda}(a;t)\,dt
\end{equation*}
\notag
$$
exists and belongs to $\operatorname{Lip}_{2m_0}(\mathbb{R}^{2n},\operatorname{Mat}(2n\times2n))$. Then, by analogy with Lemma 4.3,
$$
\begin{equation*}
\mathsf{E} \biggl| \int_0^{\tau}\mathcal{A}^{\Lambda}(a^\varepsilon(s);s\varepsilon^{-1})\,ds -\int_0^\tau\mathcal{A}^0(a^\varepsilon(s))\,ds \biggr| \to0 \qquad\text{as $\varepsilon\to0$},
\end{equation*}
\notag
$$
for any $\tau\geqslant0$. From this we conclude as in § 4 that now as the effective diffusion we should take $\mathcal{A}^0(a)$, which is a non-negative symmetric matrix. Denoting its principal square root by $\mathcal{B}^0(a)=\mathcal{A}^0(a)^{1/2}$, as in § 4 we verify that any limit measure $Q_0$ as in (2.14) is a solution of the martingale problem for the effective equation
$$
\begin{equation}
da(\tau)-{ \langle\!\langle P \rangle\!\rangle }(a(\tau))\,d\tau =\mathcal{B}^0(a(\tau))\,d\beta(\tau), \qquad a(0)=v_0,
\end{equation}
\tag{8.3}
$$
and so it is a weak solution of this equation. If the noise in (8.1) is additive, then $\mathcal{B}^0$ is a constant matrix, (8.3) has a unique solution, and considering (8.3) as the (modified) effective equation Theorem 4.7 remains true for solutions of equation (8.2). In particular, the theorem applies to equation (2.5) with general additive random forces (2.4) (but then the effective dispersion matrix is given by a more complicated formula than in § 4). Similarly, if the diffusion in (8.1) is non-degenerate, namely,
$$
\begin{equation}
| \mathcal{B}(v) \mathcal{B}^*(v) \xi|\geqslant \alpha|\xi| \quad \forall v, \ \ \forall\,\xi\in\mathbb{R}^{2n},
\end{equation}
\tag{8.4}
$$
for some $\alpha>0$, then the matrix $\mathcal{B}^{\Lambda}(a, \tau)$ also satisfies (8.4) for all $a$ and $\tau$, that is, $\langle \mathcal{A}^{\Lambda}(a;s\varepsilon^{-1})\xi,\xi\rangle\geqslant \alpha |\xi|^2$. Thus $\mathcal{A}^0(a)\geqslant\alpha\mathbb{I}$, and so $\mathcal{B}^0(a)=\mathcal{A}^0(a)^{1/2}$ is a locally Lipschitz matrix function of $a$ (for example, see [24], Theorem 5.2.2). So (8.3) has a unique solution again, and Theorem 4.7 remains true for equation (8.1) (and the effective equation of the form (8.3)). To treat equations (8.1) with degenerate non-additive noises we express the matrix $\mathcal{A}^0(a)$ in the form
$$
\begin{equation*}
\mathcal{A}^0(a) =\lim_{T\to\infty}\frac{1}{T} \int_0^T \bigl(\Phi_{t\Lambda}\mathcal{B}(\Phi_{-t\Lambda}a)\bigr) \cdot \bigl(\Phi_{t\Lambda}\mathcal{B}(\Phi_{-t\Lambda}a)\bigr)^* \,dt.
\end{equation*}
\notag
$$
For the same reason as in Proposition 3.4,
$$
\begin{equation*}
|\mathcal{A}^0|_{C^2(B_R)} \leqslant C|\mathcal{B}|_{C^2(B_R)}^2 \quad \forall R>0.
\end{equation*}
\notag
$$
Now using Theorem 5.2.3 from [24] we get that
$$
\begin{equation}
\operatorname{Lip}\bigl(\mathcal{B}^0(a)|_{\overline{B}_R}\bigr) \leqslant C|\mathcal{A}^0|_{C^2(B_{R+1})}^{1/2} \leqslant C_1|\mathcal{B}|_{C^2(B_{R+1})} \quad \forall R>0.
\end{equation}
\tag{8.5}
$$
So the matrix-function $\mathcal{B}^0(a)$ is locally Lipschitz continuous, (8.3) has a unique solution and the assertion of Theorem 4.7 remains true for equation (8.1). We have obtained the following result. Theorem 8.2. Suppose that Assumption 8.1 holds and one of the following three options is true for the matrix function $\mathcal{B}(v)$ in (8.1): (a) it is $v$-independent; (b) it satisfies the non-degeneracy condition (8.4); (c) it is a $C^2$-smooth matrix-function of $v$. Then for any $v_0\in \mathbb{R}^{2n}$ the solution $a^\varepsilon(\tau;v_0)$ of equation (8.2) satisfies
$$
\begin{equation*}
\mathcal{D}(a^\varepsilon(\,{\cdot}\,;v_0))\rightharpoonup Q_0 \quad\textit{in $\mathcal{P}(C([0,T];\mathbb{C}^n))$} \quad\textit{as $\varepsilon\to0$},
\end{equation*}
\notag
$$
where $Q_0$ is the law of the unique weak solution of effective equation (8.3). An obvious analogue of Corollary 4.13 holds for solutions of (8.1).
9. A sufficient condition for Assumptions 2.1, 5.1, and 7.1 In this section we derive a condition which implies Assumptions 2.1, 5.1, and 7.1. Thus, when it is met, all theorems in §§ 4, 5, and 7 apply to equation (2.6). Consider a stochastic differential equation on $\mathbb{R}^l$:
$$
\begin{equation}
dx =b(x)\,d\tau+\sigma(x)\,d\beta(\tau), \qquad x\in\mathbb{R}^l, \quad\tau\geqslant0,
\end{equation}
\tag{9.1}
$$
where $\sigma(x)$ is an $l\times k$ matrix and $\beta(\tau)$ is a standard Wiener processes in $\mathbb{R}^k$. We assume the following. Assumption 9.1. The drift $b(x)$ and dispersion $\sigma(x)$ are locally Lipschitz in $x$, and $\mathcal{C}^m(b),\mathcal{C}^m(\sigma)\leqslant C<\infty$ for some $m\geqslant0$. The diffusion $a(x)=\sigma(x)\sigma^\top(x)$ is a non-negative symmetric $l\times l$ matrix. Consider the differential operator
$$
\begin{equation*}
\mathscr{L}(v(x)) =\sum_{j=1}^lb_j(x)\frac{\partial v}{\partial x_j} +\frac{1}{2} \sum_{i=1}^l \sum_{j=1}^la_{ij}(x) \frac{\partial^2 v}{\partial x_i\,\partial x_j}.
\end{equation*}
\notag
$$
We have the following result from [17], Theorem 3.5, concerning the well-posedness of equation (9.1). Theorem 9.2. Let Assumption 9.1 hold, and suppose that there exists a non- negative function $V(x)\in C^2(\mathbb{R}^l)$ such that for some positive constant $c$
$$
\begin{equation*}
\mathscr{L}(V(x))\leqslant cV(x) \quad \forall\,\tau\geqslant0, \ \ \forall\,x\in\mathbb{R}^l,
\end{equation*}
\notag
$$
and
$$
\begin{equation*}
\inf_{|x|>R}V(x)\to\infty \quad\textit{as $R\to\infty$}.
\end{equation*}
\notag
$$
Then for any $x_0 \in\mathbb{R}^l$ equation (9.1) has a unique strong solution $X(\tau)$ with initial condition $X(0)=x_0$. Furthermore, the process $X(\tau)$ satisfies
$$
\begin{equation*}
\mathsf{E}V(X(\tau))\leqslant e^{c\tau} V(x_0) \quad \forall\,\tau\geqslant0.
\end{equation*}
\notag
$$
The function $V$ is called a Lyapunov function for equation (9.1). In terms of it a sufficient condition for mixing in (9.1) is given by the following statement. Proposition 9.3. Assume that, in addition to Assumption 9.1, (1) the drift $b$ satisfies
$$
\begin{equation}
\langle b(x), x\rangle\leqslant-{\alpha_1}|x|+{\alpha_2} \quad \forall\,x\in\mathbb{R}^l,
\end{equation}
\tag{9.2}
$$
for some constants ${\alpha_1}>0$ and ${\alpha_2}\geqslant0$, where $\langle\cdot,\cdot\rangle$ is the standard inner product in $\mathbb{R}^l$; (2) the diffusion matrix $a(x)=\sigma(x)\sigma^\top(x)$ is uniformly non-degenerate, that is,
$$
\begin{equation}
\gamma_2\mathbb{I} \leqslant a(x)\leqslant \gamma_1 \mathbb{I} \quad \forall\,x\in\mathbb{R}^l,
\end{equation}
\tag{9.3}
$$
for some $\gamma_1\geqslant\gamma_2>0$. Then for any $c'>0$ equation (9.1) has a smooth Lyapunov function $V(x)$ which is equal to $e^{c'|x|}$ for $|x|\geqslant1$, estimate (5.1) holds true for its solutions for every $m\in\mathbb{N}$, and the equation is mixing. In Appendix 10 we show how one can derive this proposition from the abstract results in [17]. Moreover, it can be proved that under the assumptions of the proposition the equation is exponentially mixing and (7.3) holds (see Example 7.3). Let us decomplexify $\mathbb{C}^n$ to obtain $\mathbb{R}^{2n}$ and identify equation (2.6) with a real equation (9.1), where $l=2n$ (and $x=v$). Then
$$
\begin{equation*}
b(v) \cong(b_j(v)=-i\varepsilon^{-1}\lambda_jv_j+P_j(v),\:j=1,\dots,n),
\end{equation*}
\notag
$$
where $b_j\in\mathbb{C}\cong \mathbb{R}^2 \subset \mathbb{R}^{2n}$. Since in complex terms the real inner product has the form $\langle v,w\rangle=\operatorname{Re}\sum v_j\bar w_j$, we have
$$
\begin{equation*}
\langle b(v),v\rangle =\langle P(v),v\rangle.
\end{equation*}
\notag
$$
So for equation (2.6) condition (9.2) is equivalent to
$$
\begin{equation}
\langle P(v),v\rangle \leqslant-\alpha_1|v|+\alpha_2 \quad \forall\,v\in\mathbb{C}^n
\end{equation}
\tag{9.4}
$$
for some positive constant $\alpha_1 $ and a non-negative constant $\alpha_2$. Now consider effective equation (4.5). Since in (2.11) the drift is
$$
\begin{equation*}
Y(a,\tau\varepsilon^{-1}) = (\Phi_{\tau\varepsilon^{-1}\Lambda})_*P(a),
\end{equation*}
\notag
$$
under the assumption (9.4) we have
$$
\begin{equation*}
\langle Y(a,\tau\varepsilon^{-1}),a\rangle = \langle P(\Phi_{\tau\varepsilon^{-1}\Lambda}a), \Phi_{\tau\varepsilon^{-1}\Lambda}a\rangle \leqslant -\alpha_1 |\Phi_{-\tau\varepsilon^{-1}\Lambda}a|+\alpha_2 =-\alpha_1|a|+\alpha_2
\end{equation*}
\notag
$$
for all $\varepsilon$. Therefore, $ \langle\!\langle P \rangle\!\rangle $ satisfies
$$
\begin{equation*}
\langle { \langle\!\langle P \rangle\!\rangle } (a),a\rangle =\lim_{T\to\infty}\frac{1}{T}\int_0^T \langle Y(a,\tau\varepsilon^{-1}),a\rangle\,d\tau \leqslant-\alpha_1|a|+\alpha_2.
\end{equation*}
\notag
$$
We see that assumption (9.4) implies the validity of condition (9.2) also for the effective equation. As we have pointed out, if the dispersion matrix $\Psi$ is non-singular, then the dispersion $B$ in the effective equation is also non-degenerate. The corresponding diffusion matrix is non-singular too, and condition (9.3) holds for it. Thus we have obtained the following statement. Proposition 9.4. If the dispersion matrix $\Psi$ in (2.6) is non-singular, the drift satisfies $P\in\operatorname{Lip}_{m_0}$ for some $m_0\in\mathbb{N}$, and (9.4) holds for some constants ${\alpha_1}>0$ and ${\alpha_2}\geqslant0$, then the assumptions of Theorem 5.5 hold, and so do also the assumptions of Theorems 4.7 and 7.4.
Appendix A. Proof of Proposition 9.3 By condition (9.3) the diffusion $a$ is uniformly bounded. So there exist positive constants $k_1$ and $k_2$ such that
$$
\begin{equation}
\operatorname{Tr}(a(x))\leqslant k_1\quad\text{and} \quad \|a(x)\|\leqslant k_2 \quad \forall\,x\in\mathbb{R}^l.
\end{equation}
\tag{A.1}
$$
Set $V(x)=e^{c'f(x)}$, where $c'$ is a positive constant and $f(x)$ is a non-negative smooth function which is equal to $|x|$ for $|x|\geqslant1$ and such that its first and second derivatives are bounded by $3$. Then
$$
\begin{equation*}
\frac{\partial V(x)}{\partial x_j}={c'} V(x)\partial_{x_j}f(x)
\end{equation*}
\notag
$$
and
$$
\begin{equation*}
\frac{\partial^2V(x)}{\partial x_j\,\partial x_j} =c'V(x)\,\partial_{x_ix_j}f(x)+{c'}^2V(x)\,\partial_{x_i}f(x)\,\partial_{x_j}f(x).
\end{equation*}
\notag
$$
Therefore, we have
$$
\begin{equation*}
\mathscr{L}(V(x)) =c'V(x) \mathcal{K}(c',x),
\end{equation*}
\notag
$$
where
$$
\begin{equation*}
\mathcal{K}(c',x) =\sum_{j=1}^lb_j(x)\,\partial_{x_j}f(x) +\frac{1}{2}\sum_{i,j}\!a_{ij}(x)\,\partial_{x_ix_j}f(x) +\frac{1}{2}c'\sum_{i,j}\!a_{ij}(x)\,\partial_{x_i}f(x)\,\partial_{x_j}f(x).
\end{equation*}
\notag
$$
From (9.2) and (A.1) it is obvious that
$$
\begin{equation}
\begin{cases} |\mathcal{K}(c',x)|\leqslant (c'+1)C & \text{if $|x|<1$}, \\\displaystyle \mathcal{K}(c',x) \leqslant-\alpha_1+\frac{\alpha_2}{|x|}+\frac{C}{|x|}+c'C & \text{if $|x|\geqslant1$}, \end{cases}
\end{equation}
\tag{A.2}
$$
where $C>0$ is a constant depending on $k_1$, $k_2$, and $\sup_{|x|\leqslant1}|b(x)|$. Then we obtain the inequality
$$
\begin{equation*}
\mathscr{L}(V(x)) \leqslant cV(x) \quad \forall\,x\in\mathbb{R}^l,
\end{equation*}
\notag
$$
where $c=c'(\alpha_2+(c'+1)C)$. Clearly, $\inf_{|x|>R}V(x)\to\infty$ as $R\to\infty$. So $V(x)$ is a Lyapunov function for equation (9.1). Then by Theorem 9.2, for any $x_0\in \mathbb{R}^l$ this equation has a unique solution $X(\tau)=X(\tau;x_0)$ equal to $x_0$ at $\tau=0$, which satisfies
$$
\begin{equation*}
\mathsf{E}e^{c'f(X(\tau))} \leqslant e^{c\tau} e^{c'f(x_0)} \quad\forall\,\tau\geqslant0.
\end{equation*}
\notag
$$
Let us apply Itô’s formula to the process $F(X(\tau))=e^{\eta'f(X(\tau))}$, where $0<\eta'\leqslant c'/2$ is a constant to be determined below. Then
$$
\begin{equation*}
\begin{aligned} \, dF(X) & =\mathscr{L}(F(X))\,d\tau +\eta' F(X)\langle\nabla f(x),\sigma^\top(X)\,dW\rangle\\ & =\eta'F(X)\mathcal{K}(\eta',X)\,d\tau +\eta'F(X)\langle\nabla f(x),\sigma^\top(X)\,dW\rangle. \end{aligned}
\end{equation*}
\notag
$$
By (A.2), choosing $\eta'=\min\{\alpha_1/(4C),c'/2\}$ we have
$$
\begin{equation*}
F(X)\mathcal{K}(\eta',X) \leqslant -\frac{\alpha_1}{2}F(X)+C_0(\alpha_1,\eta',k_1,k_2)
\end{equation*}
\notag
$$
uniformly in $X$. Then
$$
\begin{equation}
dF(X) \leqslant\biggl(-\frac{\alpha_1}{2}\eta'F(X)+C_0\biggr)\,d\tau +\eta'F(X)\langle\nabla f(x),\sigma^\top(X)\,dW\rangle,
\end{equation}
\tag{A.3}
$$
where the positive constant $C_0$ depends on $k_1$, $k_2$, $\alpha_1$, $\eta'$, and $\alpha_2$. Taking expectation and applying Gronwall’s lemma we obtain
$$
\begin{equation}
\mathsf{E}e^{\eta'f(X(\tau))} \leqslant e^{-\alpha_1\eta'\tau/2} e^{\eta'f(x_0)}+C_1, \qquad \tau\geqslant0,
\end{equation}
\tag{A.4}
$$
where $C_1>0$ depends on the same parameters as $C_0$. Now fix some $T\geqslant0$ and for $\tau\in[T,T+1]$ consider relation (A.3), where $F(X)$ is replaced by $\widetilde{F}(X)=e^{\widetilde{\eta} f(X)}$ for $0<\widetilde{\eta} \leqslant\eta'/2$, and integrate it from $T$ to $\tau$:
$$
\begin{equation}
\begin{aligned} \, \widetilde{F}(X(\tau)) & \leqslant \widetilde{F}(X(T))+C_0 +\widetilde{\eta} \int_T^\tau \widetilde{F}(X) \langle \nabla f(x),\sigma^\top(s, X)\,dW\rangle \notag \\ & =:\widetilde{F}(X(T))+C_0+\mathcal{M}(\tau). \end{aligned}
\end{equation}
\tag{A.5}
$$
In view of (A.4), $\mathcal{M}(\tau)$ is a continuous square-integrable martingale. Therefore, by Doob’s inequality
$$
\begin{equation*}
\begin{aligned} \, \mathsf{E}\sup_{T\leqslant \tau\leqslant T+1} |\mathcal{M}(\tau)|^2 & \leqslant 4 \mathsf{E} |\mathcal{M}(T+1)|^2 \leqslant C \int_T^{T+1} \mathsf{E}\widetilde{F}^2 (X(s))\,ds\\ & \leqslant C \int_T^{T+1} \mathsf{E}F(X(s))\,ds \leqslant C', \end{aligned}
\end{equation*}
\notag
$$
where $C'$ depends on $k_1$, $k_2$, $\alpha_1$, $\eta'$, $\alpha_2$, and $|x_0|$. From this and inequalities (A.5) and (A.4) it follows that
$$
\begin{equation*}
\mathsf{E} \sup_{T\leqslant \tau\leqslant T+1} e^{\widetilde{\eta} f(X(\tau))} \leqslant C'',
\end{equation*}
\notag
$$
where $C''$ depends on the same parameters as $C'$. This bound implies that the solutions $X(\tau)$ satisfy estimate (5.1) in Assumption 5.1 for every $m\geqslant0$. To prove the proposition it remains to show that, under the assumptions imposed, equation (9.1) is mixing. By [17], Theorem 4.3, we just need to verify that there exists an absorbing ball $B_R=\{|x|\leqslant R\}$ such that for any compact set $K\subset\mathbb{R}^l\setminus B_{R}$,
$$
\begin{equation}
\sup_{x_0\in K}\mathsf{E}\tau(x_0)<\infty,
\end{equation}
\tag{A.6}
$$
where $\tau(x_0)$ is the hitting time of $B_R$ by the trajectory $X(\tau;x_0)$. Indeed, let $x_0\in K\subset \mathbb{R}^l\setminus B_{R}$ for some $R>0$ to be determined later. We set $\tau_M:=\min\{\tau(x_0), M\}$, $M>0$. Applying Itô’s formula to the process $F(\tau,X(\tau))=e^{\eta'\alpha_1\tau/4}|X(\tau)|^2$ and using (A.4) we find that
$$
\begin{equation*}
dF(\tau, X(\tau)) =\biggl(\frac{\eta'\alpha_1}{4}F(\tau,X(\tau)) +\mathscr{L}(F(\tau,X(\tau)))\biggr)\,d\tau +d\mathcal{M}(\tau),
\end{equation*}
\notag
$$
where $\mathcal{M}(\tau)$ is the corresponding stochastic integral. By (A.1), (A.4), and (9.2) we have
$$
\begin{equation*}
\begin{aligned} \, \mathsf{E}e^{\eta'\alpha_1\tau_M/4}|X(\tau_M)|^2 +\mathsf{E}\int_0^{\tau_M}e^{\eta'\alpha_1 s/4}&(2\alpha_1|X(\tau)|-C_3)\,ds \\ &\leqslant |x_0|^2+2e^{\eta'f(x_0)} =:\gamma(x_0), \end{aligned}
\end{equation*}
\notag
$$
where $C_3>0$ depends on $\alpha_1$, $\alpha_2$, $k_1$, and $k_2$. Since $|X(s)|\geqslant R$ for $0\leqslant s\leqslant \tau_M$, we get that
$$
\begin{equation*}
\mathsf{E}\biggl(C_3 \int_0^{\tau_M}e^{\eta'\alpha_1 s/4}\,ds\biggr) \leqslant \gamma(x_0)
\end{equation*}
\notag
$$
for $R\geqslant {C_3/\alpha_1}$. Therefore, $\mathsf{E}\tau_M \leqslant\gamma(x_0)/C_3$. Letting $M\to\infty$ we verify (A.6) for $R\geqslant C_3/\alpha_1$. $\Box$
Appendix B. Representation of martingales Let $\{M_k(t),\,t\in[0,T]\}$, $k=1,\dots,d$, be continuous square-integrable martingales on a filtered probability space $(\Omega,\mathcal{F},\mathsf{P},\{\mathcal{F}_t\})$. We recall that their brackets (or their cross-variational process) is an $\{\mathcal{F}_t\}$-adapted continuous matrix-valued process of bounded variation $\langle M_k,M_j\rangle(t)$, $1\leqslant k,j\leqslant d$, vanishing at $t=0$ almost surely and such that for all $k$, $j$ the process $M_k(t) M_j(t)-\langle M_k,M_j\rangle(t)$ is an $\{\mathcal{F}_t\}$-martingale; see [14], Definition 1.5.5 and Theorem 1.5.13. Theorem B.1 ([14], Theorem 3.4.2). Let $(M_k(t), 1\leqslant k\leqslant d)$ be a vector of martingales as above. Then there exists an extension $(\widetilde{\Omega},\widetilde{\mathcal{F}},\widetilde{\mathsf{P}},\{\widetilde{\mathcal{F}}_t\})$ of the probability space on which independent standard Wiener processes $W_1(t),\dots,W_d(t)$ are defined, and there exists a measurable adapted matrix $X=(X_{k j}(t))_{k,j=1,\dots,d}$, $t\in[0,T]$, such that $\displaystyle\mathsf{E}\int_0^T\|X(s)\|^2\,ds<\infty$ and the following representations hold $\widetilde{\mathsf{P}}$-almost surely:
$$
\begin{equation*}
M_k(t)-M_k(0) =\sum_{j=1}^d\int_0^tX_{kj}(s)\,dW_j(s), \qquad 1\leqslant k\leqslant d, \quad t\in[0,T],
\end{equation*}
\notag
$$
and
$$
\begin{equation*}
\langle M_k,M_j\rangle(t) =\sum_{l=1}^d\int_0^tX_{kl}(s)X_{jl}(s)\,ds, \qquad 1\leqslant k,j\leqslant d, \quad t\in[0,T].
\end{equation*}
\notag
$$
Now let $(N_1(t),\dots, N_d(t))\in\mathbb{C}^d$ be a vector of complex continuous square- integrable martingales. Then
$$
\begin{equation*}
N_j(t)=N^+_j(t)+i N^-_j(t),
\end{equation*}
\notag
$$
where $\bigl(N^+_1(t),N^-_1(t),\dots, N^+_d(t),N^-_d(t) \bigr)\in\mathbb{R}^{2d}$ is a vector of real continuous martingales. The brackets $\langle N_i, N_j\rangle$ and $\langle N_i, \overline{N}_j\rangle$ are defined by linearity. For example,
$$
\begin{equation*}
\langle N_i, N_j\rangle = \langle N_i^+, N_j^+\rangle - \langle N_i^-, N_j^-\rangle +i \langle N_i^+, N_j^-\rangle +i \langle N_i^-, N_j^+\rangle.
\end{equation*}
\notag
$$
(There is no need to define the brackets $\langle \overline{N}_i, \overline{N}_j\rangle$ and $\langle \overline{N}_i, N_j\rangle$ since these are just the processes complex conjugate to $\langle N_i, N_j\rangle$ and $\langle N_i, \overline{N}_j\rangle$, respectively.) Equivalently, $\langle N_i, N_j\rangle$ can be defined as the unique adapted continuous complex process of bounded variation that vanishes at zero, such that $N_i N_j-\langle N_i, N_j\rangle$ is a martingale. The brackets $\langle N_i, \overline{N}_j\rangle$ can be defined similarly. The above result implies a representation theorem for complex continuous martingales. Below we present a special case of it which is relevant for our work. Corollary B.2. Suppose that all brackets $\langle N_i, N_j\rangle(t)$ and $\langle \overline{N}_i, \overline{N}_j\rangle(t)$ vanish, while the brackets $\langle N_i, \overline{N}_j\rangle(t)$, $1\leqslant i,j\leqslant d$, are almost surely absolutely continuous complex processes. Then there exist an adapted process $\Psi(t)$ taking values in complex $d\times d$ matrices and satisfying $\displaystyle\mathsf{E} \int_0^T\|\Psi(t)\|^2 \,dt<\infty$ and independent standard complex Wiener processes $\beta^c_1(t),\dots,\beta^c_d(t)$, all of which are defined on an extension of the original probability space, such that
$$
\begin{equation*}
N_j(t)-N_j(0) =\sum_{k=1}^d\int_0^t\Psi_{jk}(s)\,d\beta^c_k(s) \quad \forall\,0\leqslant t \leqslant T, \quad j=1,\dots,d,
\end{equation*}
\notag
$$
almost surely. Moreover, $\langle N_i, N_j\rangle(t) \equiv 0$ and
$$
\begin{equation*}
\langle N_i, \overline{N}_j\rangle(t)=2 \int_0^t (\Psi \Psi^*)_{ij} (s) \,ds, \quad 1\leqslant i,j\leqslant d.
\end{equation*}
\notag
$$
Appendix C. Itô’s formula for complex processes Consider a complex Itô process $v(t)\in\mathbb{C}^n$ defined on a filtered probability space:
$$
\begin{equation}
dv(t) =g(t)\,dt+M^1(t)\,dB(t) +M^2(t) \,d\overline{B}(t).
\end{equation}
\tag{C.1}
$$
Here $v(t)$ and $g(t)$ are adapted processes in $\mathbb{C}^n$, $M^1$ and $M^2$ are adapted processes in the space of complex $n\times N$ matrices, $B(t)=(\beta^c_1(t),\dots,\beta^c_N(t))$, and $\overline{B}(t)=(\bar\beta^c_1(t),\dots,\bar\beta^c_N(t))$, where $\{\beta^c_j\}$ are independent standard complex Wiener processes. We recall that, given a $C^1$-smooth function $f$ on $\mathbb{C}^n$, we have
$$
\begin{equation*}
\frac{\partial f}{\partial z_j} =\frac12 \biggl(\frac{\partial f}{\partial x_j}-i\,\frac{\partial f}{\partial y_j}\biggr) \quad\text{and}\quad \frac{\partial f}{\partial \bar{z}_j} =\frac12 \biggl(\frac{\partial f}{\partial x_j}+i\,\frac{\partial f}{\partial y_j}\biggr).
\end{equation*}
\notag
$$
If $f$ is a polynomial in $z_j$ and $\bar{z}_j$, then $\partial f/\partial z_j$ and $\partial f/\partial \bar{z}_j$ can be calculated as if $z_j$ and $\bar{z}_j$ were independent variables. The processes $g$, $M^1$, $M^2$ and the function $f(t,v)$ in the theorem below are assumed to satisfy the usual conditions for the applicability of Itô’s formula (for example, see [14]), which we do not repeat here. Theorem C.1. Let $f(t,v)$ be a $C^2$-smooth complex function. Then
$$
\begin{equation}
\begin{aligned} \, & df(t,v(t)) =\biggl\{\frac{\partial f}{\partial t} +d_vf(t,v) g+d_{\bar v} f(t,v) \overline{g} \notag\\ &\qquad +\operatorname{Tr}\biggl[ \bigl(M^1(M^2)^\top+M^2(M^1)^\top\bigr) \frac{\partial^2 f}{\partial v\,\partial v} +\bigl(\overline{M}^1(\overline{M}^2)^\top+\overline{M}^2(\overline{M}^1)^\top\bigr) \frac{\partial^2 f}{\partial \bar v\,\partial \bar v} \notag\\ &\qquad\qquad +2\bigl(M^1(\overline{M}^1)^\top+M^2(\overline{M}^2)^\top\bigr) \frac{\partial^2 f}{\partial v\,\partial \bar v} \biggr] \biggr\} \,dt \notag\\ &\qquad + d_vf(M^1\,dB+M^2\,d\overline{B})+ d_{\bar v}f(\overline{M}^1\,d\overline{B}+\overline{M}^2\,dB). \end{aligned}
\end{equation}
\tag{C.2}
$$
Here $\displaystyle d_vf(t,v) g=\sum\frac{\partial f}{\partial v_j}g_j$, $\displaystyle d_{\bar v}f(t,v) \overline{g} =\sum\frac{\partial f}{\partial \bar v_j}\overline{g}_j$, $\displaystyle \frac{\partial^2 f}{\partial v\,\partial v}$ is the matrix with entries $\displaystyle \frac{\partial^2 f}{\partial v_j\,\partial v_k}$, and so on. If the function $f$ is real valued, then $d_{\bar v}f(v)=\overline{d_vf(v)}$, and the Itô term, given by the second and third lines of (C.2), reeds
$$
\begin{equation*}
2\operatorname{Re} \operatorname{Tr} \biggl\{ \bigl(M^1(M^2)^\top+M^2(M^1)^\top\bigr) \frac{\partial^2 f}{\partial v\,\partial v} +\bigl(M^1(\overline{M}^1)^\top+M^2(\overline{M}^2)^\top\bigr) \frac{\partial^2 f}{\partial v\,\partial \bar v} \biggr\}.
\end{equation*}
\notag
$$
To prove this result we can express $v(t)$ as an Itô process in $\mathbb{R}^{2d}$ in terms of real Wiener processes $\operatorname{Re}W_j(t)$ and $\operatorname{Im}W_j(t)$, apply the usual Itô formula to $f(t,v(t))$, and then rewrite the result back in terms of complex Wiener processes. The corresponding straightforward calculation is rather heavy, and it is not easy to do it without mistake. Below we suggest a better way to derive the formula. Proof. The linear part of formula (C.2), given by its first and fourth lines, follows from the real case by linearity. It remains to prove that the Itô term has the form of the expression in the second and third lines. From the real formula we see that the Itô term is linear in $\partial^2 f/\partial v \partial v$, $\partial^2 f/\partial \bar v \partial \bar v$, and $\partial^2 f/\partial v \partial \bar v$, with coefficients quadratic in the matrices $M^1$ and $M^2$. So it can be written as
$$
\begin{equation}
\biggl\{ \operatorname{Tr}\biggl(Q^1 \frac{\partial^2 f}{\partial v\,\partial v}\biggr) +\operatorname{Tr}\biggl(Q^2 \frac{\partial^2 f}{\partial \bar v\,\partial \bar v}\biggr) +\operatorname{Tr}\biggl(Q^3 \frac{\partial^2 f}{\partial v\,\partial \bar v}\biggr) \biggr\} \,dt,
\end{equation}
\tag{C.3}
$$
where the $Q^j$ are complex $n\times n$ matrices quadratic in $M^1$ and $M^2$. We must show that they have the form specified in (C.2). To do this we note that, since the processes $\beta^c_j$ are independent and have the form (2.3), for all $j$ and $l$ the brackets of these processes have the following form:
$$
\begin{equation}
\langle \beta^c_j, \beta^c_l \rangle=\langle \bar\beta^c_j, \bar\beta^c_l \rangle=0, \qquad \langle \beta^c_j, \bar\beta^c_l \rangle=\langle \bar\beta^c_j, \beta^c_l \rangle=2 \delta_{j,l} t.
\end{equation}
\tag{C.4}
$$
Now let $g=0$ and $v(0)=0$ in (C.1) and let $M^1$ and $M^2$ be constant matrices. Then
$$
\begin{equation*}
v(t)=M^1 B(t)+M^2 \overline{B}(t).
\end{equation*}
\notag
$$
Taking $f(v)=v_{i_1} v_{i_2}$ and using (C.4) we see that
$$
\begin{equation*}
\begin{aligned} \, f(v(t)) & =\biggl(\sum_j M^1_{i_1 j} B_j(t)+\sum_j M^2_{i_1 j} \overline{B}_j(t)\biggr) \cdot \biggl(\sum_j M^1_{i_2 j} B_j(t)+\sum_j M^2_{i_2 j} \overline{B}_j(t)\biggr) \\ & =\bigl[ \big(M^1(M^2)^\top\big)_{i_1 i_2} +\big(M^2 (M^1)^\top\big)_{i_1 i_2} \bigr] \,2t+\text{(a martingale)}. \end{aligned}
\end{equation*}
\notag
$$
Since the linear component with respect to $t$ must be equal to $(Q_{i_1i_2}+Q_{i_2 i_1})t$ by (C.3), we have
$$
\begin{equation*}
Q^1 =M^1 (M^2)^\top+M^2 (M^1)^\top.
\end{equation*}
\notag
$$
In a similar way, considering $f(v)=\bar v_{i_1} \bar v_{i_2}$ we find that
$$
\begin{equation*}
Q^2 =\overline{M}^1 (\overline{M}^2)^\top+\overline{M}^2 (\overline{M}^1)^\top,
\end{equation*}
\notag
$$
while setting $f(v)=v_{i_1} \bar v_{i_2}$ leads to the equality
$$
\begin{equation*}
2\bigl[\bigl(M^1 (\overline{M}^1)^\top\bigr)_{i_1 i_2}+\bigl(M^2(\overline{M}^2)^\top\bigr)_{i_1 i_2}\bigr]=Q^3_{i_1 i_2},
\end{equation*}
\notag
$$
so that
$$
\begin{equation*}
Q^3 =2\bigl[M^1(\overline{M}^1)^\top+M^2(\overline{M}^2)^\top\bigr].
\end{equation*}
\notag
$$
This completes the proof of (C.2). The second assertion of the theorem follows by straightforward calculation. $\Box$
Appendix D. Projections onto convex sets Lemma D.1. Let $\mathcal{B}$ be a closed convex subset of a Hilbert space $X$ of finite or infinite dimension. Assume that $\mathcal{B}$ contains at least two points, and let $\Pi\colon X\to \mathcal{B}$ be a projection sending any point of $X$ to a nearest point in $\mathcal{B}$. Then $\operatorname{Lip}(\Pi)=1$. Proof. Let $A,B\in X$, and let $a=\Pi A$ and $b=\Pi B$ belong to $\mathcal{B}$. If $A,B\in\mathcal{B}$, then $a=A$ and $B=b$. So $\operatorname{Lip}(\Pi)\geqslant 1$, and it remains to show that
$$
\begin{equation*}
\|a-b\|\leqslant\|A-B\| \quad \forall\, A\ne B.
\end{equation*}
\notag
$$
If $a=b$, then the assertion is trivial. Otherwise consider the vectors $\xi=b-a$, $ l^a=A-a$, and $l^b= B-b$, and introduce an orthonormal basis $(e_1,e_2,\dots)$ in $X$ such that $e_1 =\xi/|\xi|$. Then $\xi=(\xi_1,\xi_2,\dots)$, where $\xi_1 =|\xi|$ and $\xi_j=0$ for $j\geqslant2$. Since $a$ is a point in $[a,b]\subset X$ closest to $A$, we have $l^a_1=l^a\cdot e_1 \leqslant0$. Similarly, $l^b_1 \geqslant0$. Thus,
$$
\begin{equation*}
\|B-A\| =\|\xi+l^b-l^a\| \geqslant|\xi_1+l_1^b-l_1^a| \geqslant \xi_1 =\| b-a\|,
\end{equation*}
\notag
$$
and the assertion is proved. Note that an analogue of the statement of this lemma for a Banach space $X$ fails in general.
|
|
|
Bibliography
|
|
|
1. |
V. I. Arnold, V. V. Kozlov, and A. I. Neishtadt, Mathematical aspects of classical and celestial mechanics, Encyclopaedia Math. Sci., 3, Dynamical systems. III, 3rd rev. ed., Springer-Verlag, Berlin, 2006, xiv+518 pp. |
2. |
P. Billingsley, Convergence of probability measures, Wiley Ser. Probab. Statist. Probab. Statist., 2nd ed., John Wiley & Sons, Inc., New York, 1999, x+277 pp. |
3. |
V. I. Bogachev, N. V. Krylov, M. Röckner, and S. V. Shaposhnikov, Fokker–Planck–Kolmogorov equations, Math. Surveys Monogr., 207, Amer. Math. Soc., Providence, RI, 2015, xii+479 pp. |
4. |
N. N. Bogoliubov and Yu. A. Mitropolsky, Asymptotic methods in the theory of non-linear oscillations, Int. Monogr. Adv. Math. Phys., Hindustan Publishing Corp., Delhi; Gordon and Breach Science Publishers, Inc., New York, 1961, v+537 pp. |
5. |
A. Boritchev and S. Kuksin, One-dimensional turbulence and the stochastic Burgers equation, Math. Surveys Monogr., 255, Amer. Math. Soc., Providence, RI, 2021, vii+192 pp. |
6. |
V. Sh. Burd, Method of averaging on an infinite intervals and some problems in oscillation theory, Yaroslavl' State University, Yaroslavl', 2013, 416 pp. (Russian) |
7. |
Jinqiao Duan and Wei Wang, Effective dynamics of stochastic partial differential equations, Elsevier, Amsterdam, 2014, xii+270 pp. |
8. |
R. M. Dudley, Real analysis and probability, Cambridge Stud. Adv. Math., 74, 2nd ed., Cambridge Univ. Press, Cambridge, 2002, x+555 pp. |
9. |
A. Dymov, “Nonequilibrium statistical mechanics of weakly stochastically perturbed system of oscillators”, Ann. Henri Poincaré, 17:7 (2016), 1825–1882 |
10. |
M. I. Freidlin and A. D. Wentzell, Random perturbations of dynamical systems, Grundlehren Math. Wiss., 260, 2nd ed., Springer-Verlag, New York, 1998, xii+430 pp. |
11. |
G. Huang and S. Kuksin, “On averaging and mixing for stochastic PDEs”, J. Dynam. Differential Equations, 2022, Publ. online |
12. |
Guan Huang, S. Kuksin, and A. Maiocchi, “Time-averaging for weakly nonlinear CGL equations with arbitrary potentials”, Hamiltonian partial differential equations and applications, Fields Inst. Commun., 75, Fields Inst. Res. Math. Sci., Toronto, ON, 2015, 323–349 |
13. |
Wenwen Jian, S. B. Kuksin, and Yuan Wu, “Krylov–Bogolyubov averaging”, Russian Math. Surveys, 75:3 (2020), 427–444 |
14. |
I. Karatzas and S. E. Shreve, Brownian motion and stochastic calculus, Grad. Texts in Math., 113, 2nd ed., Springer-Verlag, New York, 2005, xxiii+470 pp. |
15. |
R. Z. Has'minski (Khasminski), “On stochastic processes defined by differential equations with a small parameter”, Theory Probab. Appl., 11:2 (1966), 211–228 |
16. |
R. Z. Khasminski, “The averaging principle for stochastic Ito differential equations”, Kybernetika (Prague), 4:3 (1968), 260–279 (Russian) |
17. |
R. Khasminskii, Stochastic stability of differential equations, Stoch. Model. Appl. Probab., 66, 2nd ed., Springer, Heidelberg, 2012, xviii+339 pp. |
18. |
Yu. Kifer, Large deviations and adiabatic transitions for dynamical systems and Markov processes in fully coupled averaging, Mem. Amer. Math. Soc., 201, no. 944, Amer. Math. Soc., Providence, RI, 2009, viii+129 pp. |
19. |
S. Kuksin and A. Maiocchi, “Resonant averaging for small-amplitude solutions of stochastic nonlinear Schrödinger equations”, Proc. Roy. Soc. Edinburgh Sect. A, 148:2 (2018), 357–394 |
20. |
A. Kulik, Ergodic behavior of Markov processes. With applications to limit theorems, De Gruyter Stud. Math., 67, De Gruyter, Berlin, 2018, x+256 pp. |
21. |
Shu-Jun Liu and M. Krstic, Stochastic averaging and stochastic extremum seeking, Comm. Control Engrg. Ser., Springer, London, 2012, xii+224 pp. |
22. |
J. C. Mattingly, A. M. Stuart, and D. J. Higham, “Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise”, Stochastic Process. Appl., 101:2 (2002), 185–232 |
23. |
A. V. Skorokhod, Asymptotic methods in the theory of stochastic differential equations, Transl. Math. Monogr., 78, Amer. Math. Soc., Providence, RI, 1989, xvi+339 pp. |
24. |
D. W. Stroock and S. R. S. Varadhan, Multidimensional diffusion processes, Grundlehren Math. Wiss., 233, Springer-Verlag, Berlin–New York, 1979, xii+338 pp. |
25. |
A. Yu. Veretennikov, “On the averaging principle for systems of stochastic differential equations”, Math. USSR-Sb., 69:1 (1991), 271–284 |
26. |
C. Villani, Optimal transport. Old and new, Grundlehren Math. Wiss., 338, Springer-Verlag, Berlin, 2009, xxii+973 pp. |
27. |
H. Whitney, “Differentiable even functions”, Duke Math. J., 10 (1943), 159–160 ; Collected papers, v. 1, Contemp. Mathematicians, Birkhäuser Boston, Inc., Boston, MA, 1992, 309–310 |
Citation:
G. Huang, S. B. Kuksin, “Averaging and mixing for stochastic perturbations of linear conservative systems”, Uspekhi Mat. Nauk, 78:4(472) (2023), 3–52; Russian Math. Surveys, 78:4 (2023), 585–633
Linking options:
https://www.mathnet.ru/eng/rm10081https://doi.org/10.4213/rm10081e https://www.mathnet.ru/eng/rm/v78/i4/p3
|
Statistics & downloads: |
Abstract page: | 521 | Russian version PDF: | 16 | English version PDF: | 60 | Russian version HTML: | 147 | English version HTML: | 123 | References: | 52 | First page: | 24 |
|