Transfer function identification by minimizing the adaptive vs. optimal filter state estimates mismatch

Cover Page


Cite item

Full Text

Abstract

The article is concerned with a further development of the Active Principle of parametric system identification in the class of linear, time-invariant, completely observable models. As the identification target model, the optimal Kalman filter (OKF) is designated that is present, no more than conceptually, in the system’s discretely observed response to a training excitation of the white noise type. By modifying the physically given structure into the standard observable model in both the observed response and the Adaptive Kalman Filter (AKF), a so-called Generalized Residual (GR) is constructed equaling the mismatch between the adaptive and the optimal filter state estimates plus an AKF-independent noise component. By virtue of this modification, the GR mean square becomes a new model proximity criterion for these filters. Minimizing this criterion via conventional practical optimization methods produces exactly the same result (AKF = OKF) as would be obtained by minimizing the theoretical criterion being, unfortunately, inaccessible to any AKF numerical optimization methods. The article presents a detailed step-by-step procedure explaining the above solution in terms of a parameterized transfer function. For the sake of clarity and for stimulating real world applications of the approach, the article employs the transfer function model of a twisted-pair line in a typical xDSL system. The implementation challenges of theoretical provisions of the method are discussed. The issue of extending the proposed approach to the problems of identifying linear models for nonlinear systems is outlined in the directions for further research.

Full Text

1. Introduction

The theory and practice of system identification (SI) in their more than half a century of history have received a powerful development reflected in hundreds of thousands of scientific publications around the world. As Gianluigi Pillonetto and Lennart Ljung note in their recent paper [1], `Despite its long history, such research area is still extremely active.' Indeed, even in a nonlinear setting, research is being done on how to deal with the presence of nonlinear distortions in systems by using linear SI techniques [2].

The abundance of publications in this field signaled the need for some serious cleanup work in order to single out the truly independent concepts. According to Ljung, in SI there are two independent and universal key concepts: the choice of a Parametric Model Structure, PMS, and the choice of a Model Proximity Criterion, MPC, the latter is the criterion of fit indicating erroneousness of a model with respect to a target [3]. Looking generally at what takes us in the identification process from observed data to a validated model, there are four main components: `(1) The data itself, (2) The set of candidate models, (3) The suitability criterion, and (4) The validation procedure' [4].

Indeed, at the heart of SI—or, in the modern AI terminology, of system mathematical model machine learning—is the principle of fitting the response data of an adaptive predictive model to the data of the real system response, which actually exists as a `black box,' under conditions of the same excitatory (learning or training) input for them, by some predefined cost function. Nevertheless, the question of interest remains: Given the PMS, HOW to use the available data to predefine the MPC?

In the SI community, the impressive Prediction Error Framework, PEF, [5] reflects a generally accepted understanding of this issue. Such a view has been expressed [6] on more than one occasion: `All existing parameter identification methods can be seen as special cases of this prediction error framework.' At that, existing PEF methods fit the adaptive model in the system response space, not in the state space. This is due to the fact that the useful concept of `state space' is intended for purely theoretical work to formulate and minimize the system model optimality criterion, which we call direct performance index, DPI. This limiting feature is generated by the certainty that it is impossible to overcome the obvious barrier, namely, the inaccessibility of state space elements in explicit form and, hence, of DPI. In fact, DPI cannot be accepted as MPC for identification algorithms.

Putting this barrier overcoming on the agenda, this article proposes an alternative solution to the HOW question posed above. That is, in formulating the research question, the intention here is to form an Indirect Performance Index, IPI, and then organize the minimization of IPI so that it is equivalent to minimizing the discrepancy between the internal states of the adaptive model that is available, and the internal states of the optimal model, which is only theoretically known as Optimal Kalman Filter, OKF, but is not accessible because of parameter uncertainty and, moreover, is sought as the target result of the parametric optimization of the Adaptive Kalman Filter, AKF.

Thus, the alternative approach considered in this paper should, as is conceivable, minimize the discrepancy between the AKF and OKF state estimates. This would be reasonable, since the notion of `state' is intended to exhaustively characterize the behavior of an object. Moreover, such minimization, if implemented in practice, corresponds one-to-one to a theoretically optimal filter designing. This feature prevents the SI algorithm deviating from theoretical results of the OKF design and, therefore, from the bias errors inherent in some other SI methods.

To make such a solution feasible, the real system observed output is represented as if it were generated by the desired but hidden from us optimal filter rather than the given physically structured system. In the interest of realizing such a conceptual vision, the system is supposed to be completely observable to gain access to the ability to change the basis of the system's internal states formally without changing its input-output description, which is called the transfer function, TF, in the class of linear time-invariant, LTI, systems the article addresses to.

Building an LTI model for a dynamic system is usually being made either in the frequency domain (by a TF) or in time domain (by differential or difference equations) [7] to answer the challenge of reducing model uncertainty. The LTI SI theory and practice have reached a high degree of maturity and are frequently used in many disciplines where an object of interest exists, for example, in mechanical [8], electrical [9], electronic [10], chemical [11], civil [12], and even in biomedical [13, 14] applications. Besides, the point is that the object of interest for which it is necessary to parameterize the model in the form of TF can be not real, but fictitious. The most striking example of this is the construction of a dummy filter forming a model stationary random process from a white-noise process, which should approximate by its correlation function the experimental correlation function of a real process in a real system. An example may be identification of a parameterized instrumental errors model of a multi-component inertial navigation system [15].

The novelty of this article is that it encourages the application of the approach in the real world where it has not previously been considered. As an original example and for clarity, it uses the twisted-pair model in a typical Digital Subscriber Line, xDSL system [16]. As known, there exists a computational cost reduction challenge to solve the crosstalk precoding problem and this problem cannot be solved without knowing the direct and cross channel TFs, DCTFs and CCTFs [17]. There are several solutions to this engineering task in the literature of recent years, to exemplify [18–23]. Most of them are similar in that they propose to estimate channel TFs in the frequency domain, which is quite understandable since TF itself is a function of signal frequency and is to be known for each tone to eliminate the crosstalk phenomenon. Such methods of TF estimation narrow the field of their possible application, reducing it to the xDSL technology, where they are recognized to be effective. In contrast, this work proceeds from the fact that the problem of TF estimation can be solved in a more general formulation, considering it as a problem of parametric LTI model identification for a dynamic system in the state space time domain.

This article addresses the following research issues.

  1. First, we intend to overcome the obvious problem that the state vector of a dynamical system is unattainable explicitely, or, put this differently, is beyond of reach in a perfectly measured form, just as a signal disturbed by noise in filtering problems is, by definition, immeasurable in a pure form. The same applies to optimal state estimators, since the optimal filter remains machine-unrealizable or, put it tentatively, covert in the mesurement data until the necessary parameters are identified.
  2. The solution to overcome this unattainability barrier in [24] considered individual cases of a'priori uncertainty level constraints under which the solution works. We aim to show that this solution is in fact feasible with no limitations on the size of a'priori parametric uncertainty of the system model. We must make sure that this method of solution makes it universally applicable under the only unencumbered condition: the LTI model under study is completely observable and can be considered adequate to reality. We want to show that this quality of solution is achievable by converting the model into a standard observable form, SOF, to gain a solution in a computer-implementable tool.
  3. As for common xDSL applications, we have to check whether it is possible to use time-domain formulas instead of traditional frequency-domain formulations to estimate the DCTFs or CCTFs, and show how to do so for any frequency (or tone) of interest in the channel operating frequency range.
  4. Further, to organize the computational process with its numerical robustness and also to translate all decisions into a software design, reasonable suggestions are needed.
  5. Finally, a determination has to be made about the novelty of this work in terms of its results, advantages, and limitations, and concerning objectives of further research in the proposed direction.

Consideration of these issues constitutes the main content of this article. Section 2 is devoted to an illustrative example for which the IPI-based LTI system identification method may be of practical interest. Section 3 presents a formal statement of the problem with two generalizations. Section 4  explains a detailed procedure of how to identify a parametric OKF estimator in terms of a parametrized TF. Section 5 discusses three practical challenges associated with implementing the solution: (1) organizing the computation time; (2) ordering the computation in terms of its numerical robustness; and (3) scheduling the work for a software project. The final Section 6  summarizes the work, describes the limitations, and outlines possible research on the approach.

2. An illustrative example

Only within this example, symbol \(f\) is to designate the signal frequency in the electronic \(R_\mathrm{s}L_\mathrm{s}G_\mathrm{s}C_\mathrm{s}\)-circuit of Fig. 1 that mimics a very short—of length \(\Delta l\)—section of a twisted-pair line in the typical xDSL system [17, Chapter II]. The circuit can help the DCTF evaluation for transmission line of \(l\) full length. Primary transmission line parameters are \(R=R(f)\), \(L=L(f)\), \(G=G(f)\) and \(C=C(f)\) being functions of frequency \(f\) can be seen as expressed through the secondary cable parameters for standard twisted pairs depending on cable diameter, material and design. Taking these values from [17, p. 19] yields the parameters of a short section: the section resistance \(R_\mathrm{s}=R{\cdot}\Delta l\), the section inductance \(L_\mathrm{s}=L{\cdot}\Delta l\), the section conductivity \(G_\mathrm{s}=G{\cdot}\Delta l\), and the section capacitance \(C_\mathrm{s}=C{\cdot}\Delta l\).

Figure 1. Line section of length \(\Delta l\) for a twisted pair transmission line of full length \(l\)

Remark 1. The subscript \(_\mathrm{s}\) written in roman typestyle for lowercase index should not to be confused with the below Laplace variable \(s\). It serves to remind that the quantity refers to a twisted pair section as shown in Fig. 1. The square brackets below denote the dimensionality of physical quantities in units of SI.

The formula called Generalized Ohm's Law (GOL) in the complex domain defines the impedance of an electronic two-terminal element as across variable (voltage) divided by through variable (current), both in terms of the Laplace transform. When it is coupled with Kirchhoff's Current and Voltage Laws (commonly shortened to KCL and KVL), one has a sufficient set of tools for analyzing circuits. By writing KCL and KVL for the circuit in Fig. 1 (a), the equivalent linear Two-Port Network (TPN) shown in Fig. 1 (b) is obtained, yielding the following equations (1) in terms of the Laplace transform variables:
\[ \begin{equation} \left.\begin{aligned} \left[ \begin{array}{c} V_{1} \\ I_{1} \end{array} \right] & = \left[\begin{array}{c@{\quad}c} \boldsymbol{A}(s) & \boldsymbol{B}(s)\\ \boldsymbol{C}(s) & \boldsymbol{D}(s) \end{array} \right] \left[ \begin{array}{c} V_{2} \\ I_{2} \end{array} \right] ,\\ \boldsymbol{A}(s) &\triangleq 1 + F_\mathrm{s} (s); \,\, [\boldsymbol{A}(s)] = 1 ,\\ F_\mathrm{s} (s) &\triangleq (R_\mathrm{s}+sL_\mathrm{s})(G_\mathrm{s}+sC_\mathrm{s}); \,\, [F_\mathrm{s} (s)] = 1 ,\\ \boldsymbol{B}(s) &\triangleq (R_\mathrm{s}+sL_\mathrm{s}); \,\, [\boldsymbol{B}(s)] = \Omega ,\\ \boldsymbol{C}(s) &\triangleq (G_\mathrm{s}+sC_\mathrm{s}); \,\, [\boldsymbol{C}(s)] = {\mathrm S} \equiv {\Omega}^{-1} ,\\ \boldsymbol{D}(s) &\triangleq 1 .\end{aligned} \, \right\} \end{equation} \tag{1} \]

Remark 2. In this notation, referring to variables and their transforms interchangeably, Laplace transforms are distinguishable by the use of an uppercase letter or (in more detail) the complex-valued argument \((s\triangleq \sigma + \mathrm{j}\omega)\) with \(\omega\), \([\omega] =\)Rad/s, equaling angular velocity \(2\pi f\) and \(f\), \([f]=\)s\(^{-1}\), meaning the frequency variable, \(\mathrm{j}\triangleq \sqrt{-1} \). 

Define the dimensionless TF of a \(\Delta l\)-length line section as \(T_\mathrm{s}(s)\triangleq V_2/V_0\). Applying KVL yields \(V_0\) and KCL \(I_1\) in (2)
\[ \begin{equation}\begin{aligned} \left[ \begin{array}{c} V_{0} \\ I_{1} \end{array} \right] & = \left[ \begin{array}{c} V_{2} + [Z_0 + (R_\mathrm{s} + sL_\mathrm{s})] I_1\\ \underbrace{V_2/Z_2}_{I_2} + \underbrace{(G_\mathrm{s} + sC_\mathrm{s})V_2}_{I_3} \end{array} \right]. \end{aligned} \end{equation} \tag{2} \] 
It follows that \(T_\mathrm{s}(s)\) is the quantity inverse to
\[ \begin{equation*} 1 + \left[ Z_0 + (R_\mathrm{s} + sL_\mathrm{s}) \right] \left[ Z_2^{-1} + (G_\mathrm{s} + sC_\mathrm{s}) \right] . \end{equation*} \]
Together with (1), it leads to (3)
\[ \begin{equation}T_\mathrm{s}(s) = \left\{ \boldsymbol{A}(s) + Z_0 \left[ Z_2^{-1} + \boldsymbol{C}(s) \right] + Z_2^{-1} \boldsymbol{B}(s) \right\}^{-1}. \end{equation} \tag{3} \]

From now on, consider a mathematically idealized experiment with, condition (i), a voltage source \(V_0\) connected to the section input thus assuming \(Z_0 \to 0\); and with, condition (ii), a voltmeter having a very large inner impedance \(Z_2 \to \infty\) connected to the section output to measure \(V_2\). In this scenario, \(T_\mathrm{s}(s) \to \mathring{T}_\mathrm{s}(s) \triangleq \boldsymbol{A}^{-1}(s)\). \(\mathring{T}_\mathrm{s}(s)\) is the section intrinsic transfer function, SITF, we are interested in to move closer to the reality of xDSL multi-user transmission, xDSL–MUT.

As known from [17, Chapters II and III], the DCTF denoted by \(H(f, l)\) is frequency-dependent and changes with the cable length \(l\). When the transmission line is connected to a source \(V_S\) with source impedance \(Z_S\) and terminated with load impedance \(Z_L\), this \(H(f, l)\) is expressed by (4
\[ \begin{equation} \left.\begin{aligned} H(f, l) & = \frac{Z_{S}+Z_{L}}{(Z_{S}+Z_{L}){\cosh}(\gamma l) + Z_{\star}^{\star}{\sinh}(\gamma l)} \\ Z_{\star}^{\star} & \triangleq Z_{\star} + \frac{Z_{S}\cdot Z_{L}}{Z_{\star}} \end{aligned} \, \right\} \end{equation} \tag{4} \]
through the characteristic line impedance \(Z_{\star}\) defined as
\[ \begin{equation}Z_{\star} \triangleq \sqrt{\frac{R + \mathrm{j}2\pi f L}{G + \mathrm{j}2\pi f C}} \end{equation} \tag{5} \]
and the propagation constant \(\gamma\triangleq \gamma (f)\) calculated by
\[ \begin{equation}\gamma (f) = \sqrt{{(R + \mathrm{j}2\pi f L)}{(G + \mathrm{j}2\pi f C})}. \end{equation} \tag{6} \]
If the line terminates ideally at \(Z_{\star}\) (5), so that \(Z_L\) = \(Z_{\star}\) = \(Z_S\), the channel transfer function simplifies [17, Chapters II and III] to
\[ \begin{equation} H(f, l) = e^{-\gamma(f) \cdot l}. \end{equation} \tag{7} \]

Now, noticing a similarity between (6)  and \(F_\mathrm{s}(s)\) in (1), we obtain \(F_\mathrm{s}(s)=(\Delta l)^2 \gamma^2(s)\) after substitution \({s=\mathrm{j} 2\pi f}\). Hence
\[ \begin{equation} \gamma(f) = (\Delta l)^{-1} \sqrt{ [\mathring{T}_\mathrm{s}(s)]^{-1}\bigr|_{s=\mathrm{j} 2\pi f} - 1}. \end{equation} \tag{8} \]
If one manages to evaluate the expression under the square root sign in (8) as a complex-valued magnitude in dependence on frequency \(f\), then the problem is solved for any \(f\) value desired. Thus, the solution is to parametrically identify the SITF, that is, \(\mathring{T}_\mathrm{s}(s)\) to use it in (8) and then substitute in (7).

Remark 3. Everywhere, the fact that a value \({\{\cdot\}}\) is unknown and so is to be estimated is reminded by the notation \(\mathring{\{\cdot\}}\) with the overscript \(\mathring{\hphantom{T}}\). When moving later to the solution, we change marking the estimated parameters to the commonly used \(\hat{\{\cdot\}}\), instead of true \(\mathring{\{\cdot\}}\).

Directly from equations (1) and/or Fig. 1 (a), the following expression
\[ \begin{equation} \mathring{T}_\mathrm{s}(s) = \frac{\mathring{c}_0}{s^2 + \mathring{a}_1 s + \mathring{a}_0} \end{equation} \tag{9} \]
is obtained with the following three parameters
\[ \begin{equation} \left. \begin{aligned} \mathring{c}_0 &\triangleq 1/(L_\mathrm{s}C_\mathrm{s}), & [\mathring{c}_0]&={\mathrm{s^{-2}}},\\ \mathring{a}_0 &\triangleq (R_\mathrm{s}G_\mathrm{s} + 1)/(L_\mathrm{s}C_\mathrm{s}) = \omega^2_{\mathrm n}, & [\mathring{a}_0]&={\mathrm{s^{-2}}},\\ \mathring{a}_1 &\triangleq \frac{R_\mathrm{s}}{L_\mathrm{s}} + \frac{G_\mathrm{s}}{C_\mathrm{s}} = 2\zeta\omega_{\mathrm n}, & [\mathring{a}_1]&={\mathrm{s^{-1}}}. \end{aligned} \, \right\} \end{equation} \tag{10} \] 
Here are some intermediate values 
\[ \begin{equation} \left. \begin{aligned} \omega_{\mathrm n} &= \sqrt{\mathring{a}_0}, & [\omega_{\mathrm n}]&={\mathrm{s^{-1}}},\\ \zeta &= \frac{\mathring{a}_1} {2\sqrt{\rule{0ex}{1.7ex}\mathring{a}_0}}, & [\zeta]&=1,\\ D &= \mathring{a}_{0} - {\mathring{a}_1^2}/{4} =\omega_{\mathrm n}^2(1-\zeta^2), & [D]&={\mathrm{s^{-2}}},\\ \chi &= \frac{\mathring{a}_1}{2\sqrt{D}} = \frac{\zeta}{\sqrt{1- \zeta^2}}, & [\chi]&=1 \end{aligned} \, \right\} \end{equation} \tag{11} \]
introduced through the basic parameters (10) for further convenience, checking \(\zeta^2 < 1\) for the literature obtainable secondary cable parameters [17, Table 2.1].

3. Problem statements

Given the specific case A  with TF (9), good-quality estimates are required for parameters (10) of the numerator and denominator of this TF. In the most general case B, given an LTI `black-box' as an \(n\)th order ordinary differential equation (\(n\)th order ODE), good-quality estimates are required for the numerator and denominator parameters of the corresponding TF. The solution is sought for the below A and B cases.

A. Illustrative example (9).

The DSL environment is a multi-user transmission environment enabling a Central Office and the Customer Premises Equipment (CPE) to communicate in the downstream (from the CO to the different users) or in upstream (opposite) directions. The CO and the locally distributed CPEs are connected via twisted pair lines, each line belonging to one user. The twisted pairs are physically close to each other because they are bundled in a cable binder. Electromagnetic coupling between lines results in mutual interferences at all modems operating within the same cable [25]. These interferences known as crosstalk channels must be mitigated or, better, canceled.

Of two different kinds of crosstalk, namely near-end crosstalk (NEXT) and far-end crosstalk (FEXT), the latter represents the largest performance limiter in the xDSL system. A variety of suggestions have been made to reduce the impact of FEXT.

Most DSL and discrete multi-tone transmission (DMT) scenarios use the decomposition-based zero-forcing precoding (DBZF) to deal with FEXT. In DBZF, the transmit vector signal is pre-perturbed by the \([N \times N]\) precoder matrix \(\boldsymbol P\) defined for each tone, where \(N\) is the number of users. For each tone, this number may be in thousands, matrix \(\boldsymbol P\) is the inverse of the normalized (i.e. unit-diagonal) channel matrix \(\boldsymbol H^{-1}_{\mathrm{norm}}\). Formally, \(\boldsymbol H_{\mathrm{norm}}\) is the channel matrix \(\boldsymbol H\) pre-multiplied by matrix \(\boldsymbol H^{-1}_{\mathrm{diag}}\), the latter being a diagonal matrix composed of inverse transfer coefficients of direct channels [16; 17, pp. 34––35]. Hence, for downstream transmission with efficient precoding, i.e. full crosstalk cancellation, it is necessary to know the channel \([N \times N]\) matrix \(\boldsymbol H\), which consists of the DCTFs (on the diagonal) and crosstalk channel transfer functions, CCTFs (off the diagonal).

For a very short section (see Fig. 1), the generally accepted model DCTF is given by formula (9). A large number of such DCTFs cascade to form a DCTF of the entire line. It is shown in Fig. 2  with approximations \(\mathrm{d}{R}\triangleq {R}{\cdot}\mathrm{d}{l}\approx R_\mathrm{s}\), \(\mathrm{d}{L}\triangleq {L}{\cdot}\mathrm{d}{l}\approx L_\mathrm{s}\), \(\mathrm{d}{G}\triangleq {G}{\cdot}\mathrm{d}{l}\approx G_\mathrm{s}\), and \(\mathrm{d}{C}\triangleq {C}{\cdot}\mathrm{d}{l}\approx C_\mathrm{s}\), \(i_{out} = i_{in} + \mathrm{d}{i_{in}}\) and \(v_{out} = v_{in} + \mathrm{d}{v_{vn}}\), given \(\Delta l\approx \mathrm{d}{l}\). Therefore, the resulting DCTF will be of a higher order, while remaining a proper fractional-rational function of \(s\). As for the CCTF, solutions for its modeling include various approaches that have a solid physics basis but high computational complexity [17, p. 28]. Nevertheless, the adopted CCTF model does not escape the proper fraction form we move to now.

Figure 2. Equivalent lumped RLCG-circuit of a 2-wire transmission line

B. General case

In the most general form, the transfer function of a channel to the \(j\)th output from the \(i\)th input is defined as follows:
\[ \begin{equation} \mathring{T}_{ji}(s) =\displaystyle{\frac{\mathring{c}_m s^m+\mathring{c}_{m-1}s^{m-1}+\dots+\mathring{c}_1 s+\mathring{c}_0}{s^n+\mathring{a}_{n-1}s^{n-1}+\dots+\mathring{a}_1 s+\mathring{a}_0}} \end{equation} \tag{12} \]
where \(m+n+1 < 2n + 1\) parameters may be unknown. With (12), a DSL system is thought of as a MIMO—specifically, \([N\times N]\)—system, for which the crosstalk is modeled as an input rather than noise and the acronym MIMO stands for Multiple-Input Multiple-Output (Fig. 3).

Figure 3. The distributed MIMO channel estimation structure. Legend: CE – Channel Estimation; SCO – System Central Office; SI – System Information; CI – Channel Information; CPE – Customer Premises Equipment; \(N\) – the number of customers, \(j=1,2,\ldots,N\)

Thus, the \(i\)th input \(U_i (s)\) causes a direct response \(z_{i}\) on the \((j=i)\)th output and creates crosstalk contributions \(z_{j}\) on all other, \((j\ne i)\)th outputs, plus an external noise \(V_j(s)\) in every \(j\)th channel:
\[ \begin{equation} Y_j(s) = \sum_{i=1}^{N} \mathring{T}_{ji}(s) U_i (s) + V_j(s), \: j=1,2,\ldots, N. \end{equation} \tag{13} \]

The fact we are seeking to solve the inverse problem of recovering (12) from (13) dictates the only possible identification scenario (cf. Fig. 3): feed only one, namely \(i\)th input \(U_i (s)\) per single, namely \(i\)th identification session, into the MIMO system: 
\[ \begin{equation}Y_j(s) = \mathring{T}_{ji}(s) U_i (s) + V_j(s), \: j=1,2,\ldots, N. \end{equation} \tag{14} \]
(As for the uppercase variables notations in (13)  and (14), cf.  Remark 2.) The time of each \(i\)th session (14) needs to be spent to determine the \(i\)th column \({\mathring{T}}(s)_{(\cdot,i)}= [ \mathring{T}_{ji}(s) ] \), \(j=1,2,\ldots, N\) of matrix \({\mathring{T}}(s) \triangleq [ \mathring{T}_{ji}(s) ]\), and the whole scenario will require repeating \(N\) sessions: \(i=1,2, \ldots, N\) as in (14).

 4. Problem solution framework

Focusing the research on the class of linear constant-coefficient ODEs to describe the wide range of LTI dynamical systems, we first state that the choice of the excitation signal \(u_i(t)\) (cf. Fig. 3) is extremely important for parameter system identification. Gaussian white-noise random excitations \(w(t)\) are very popular among practitioners because they seem to be simple to design. We also stick to this choice, assuming \(u_i(t)\equiv w_i(t)\). However, using random-phase multisines for \(u_i(t)\) is also possible, given the design of the amplitude spectrum of the multisine is such that the equivalence between the random-phase multisine and the Gaussian random noise concerning the system behavior is guaranteed. Such signals are known as Riemann-Equivalent Excitation Signals, REESs [2, p. 44]. Using random input excitations makes the system output under study a stochastic process.

4.1. Cauchy form ODE system

Since there is no uniformity in the structure of matrices for the general Cauchy form and little else can be said about this form without additional knowledge of the particular dynamical system, we assume that the output, i.e. measurement data are generated by a completely observable physical system whose observability index is designated \(p\). Hence, we focus the attention on the SOF among the known three standard system forms [26, pp. 28––32]. The SOF provides a sort of unified approach to TFs of general form (12), not just (9). Besides, using SOF is beneficial to the below solution.

Given (9), using the notation mentioned in Remark 3 yields the following system of equations
\[ \begin{equation} \begin{aligned} \frac{d}{dt} \left[ \begin{array}{c} x_{1} \\ x_{2} \end{array} \right] & = \underbrace{\left[\begin{array}{c@{\quad}c} 0 & 1\\ -\mathring{a}_{0} & -\mathring{a}_{1} \end{array} \right]}_{\mathring{F}} \left[ \begin{array}{c} x_{1} \\ x_{2} \end{array} \right] + \underbrace{\left[ \begin{array}{c} 0 \\ \mathring{c}_{0} \end{array} \right]}_{\mathring{\Gamma}} w(t) \end{aligned} \end{equation} \tag{15} \] 
with \(w(t)\) as a stationary input voltage \(v_1(t)\) (cf. Fig. 1). Let \(w(t)\) be REES, that is Riemann-equivalent to the Gaussian white-noise excitation with the correlation function \(R_{ww}(\tau) = \mathring{Q}\delta(\tau)\) in terms of Dirac's delta function \(\delta(\tau)\) with some \(\mathring{Q} > 0\), \([\mathring{Q}] = \) V\(^2\cdot\)s where \(\mathring{Q}\) is possibly given. Next, assume that the output voltage \(v_2(t)\equiv x_1(t)\) is measured with random error \(v(t)\) of Gaussian type with correlation function \(R_{vv}(\tau) = \mathring{R}\delta(\tau)\), \(\mathring{R}>0\), \([\mathring{R}] = \) V\(^2\cdot\)s, to obtain the measurement data as
\[ \begin{equation} \begin{aligned} y(t) & = \underbrace{\left[\begin{array}{c@{\quad}c} 1 & 0 \end{array} \right]}_{\mathring{H}} \left[ \begin{array}{c} x_{1} \\ x_{2} \end{array} \right] + v(t). \end{aligned} \end{equation} \tag{16} \]
SOF model  (15)\(+\)(16) corresponds to conditions (i) and (ii) of the experiment above mentioned on page. Its characteristic polynomial \(\mathring{q}(s) \triangleq {s^2 + \mathring{a}_1 s + \mathring{a}_0} \) has the discriminant \(-D<0\). Besides, \(\mathring{T}_\mathrm{s}(s) = \mathring{H}\mathring{\Phi}_\mathrm{s}(s)\mathring{\Gamma}\) with \(\mathring{\Phi}_\mathrm{s}(s) = (Is - \mathring{F})^{-1}\), in matrix notation. The inverse Laplace transform of \(\mathring{\Phi}_\mathrm{s}(s)\) yields the continuous-time state transition matrix
\[ \begin{equation} \begin{aligned} \mathring{\phi}(t) & = {\left[\begin{array}{c@{\quad}c} \phi_{11}(t) & \phi_{12}(t)\\ \phi_{21}(t) & \phi_{22}(t) \end{array} \right]} \end{aligned} \end{equation} \tag{17} \]
with its entries
\[ \begin{equation} \left. \begin{aligned} \phi_{11}(t) &= e^{-\zeta\omega_{\mathrm n}t}\left[ {\cos}(t\sqrt{D}) + \chi{\sin}(t\sqrt{D}) \right], \\ \phi_{12}(t) &= e^{-\zeta\omega_{\mathrm n}t}\frac{1}{\sqrt{D}}{\sin}(t\sqrt{D}), \\ \phi_{21}(t) &= - \omega_{\mathrm n}^2\phi_{12}(t), \\ \phi_{22}(t) &= e^{-\zeta\omega_{\mathrm n}t}\left[ {\cos}(t\sqrt{D}) - \chi{\sin}(t\sqrt{D}) \, \right]. \end{aligned}\right\} \end{equation} \tag{18} \]

Given (12), it leads to the general SOF
 \[ \begin{equation} \left. \begin{aligned} \left[ \begin{array}{c} \dot x_1\\ \dot x_2\\ \vdots\\ \dot x_{n-1}\\ \dot x_n \end{array} \right] = & \underbrace{\left[ \begin{array}{cccc} 0 & 1 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \\ -\mathring{a}_0 & -\mathring{a}_1 & \cdots & -\mathring{a}_{n-1}\\ \end{array} \right]}_{\mathring{F}_{ji}} \left[ \begin{array}{c} x_1\\ x_2\\ \vdots\\ x_{n-1}\\ x_n \end{array} \right]+ \underbrace{\left[ \begin{array}{c} \mathring{b}_1\\ \mathring{b}_2\\ \vdots\\ \mathring{b}_{n-1}\\ \mathring{b}_n \end{array}\right]}_{\mathring{\Gamma}_{ji}} w(t),\\ y(t)= & \underbrace{\left[ \begin{array}{ccccc} 1 & 0 & \cdots & 0 & 0 \end{array} \right]}_{\mathring{H}}x(t) + v(t) \end{aligned}\right\} \end{equation} \]
instead of (15)\(+\)(16), where \(\mathring{b}_1,\mathring{b}_2, \ldots, \mathring{b}_n\) satisfy the following equation
\[ \begin{equation} \left[ \begin{array}{c} 0\\ \vdots\\ 0\\ \mathring{c}_m\\ \vdots\\ \mathring{c}_0 \end{array} \right]= \left[ \begin{array}{cccccc} 1 & 0 & 0 & \cdots & 0 & 0\\ \mathring{a}_{n-1} & 1 & 0 & \cdots & 0 & 0\\ \mathring{a}_{n-2} & \mathring{a}_{n-1} & 1 & \cdots & 0 & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots\\ \mathring{a}_2 & \mathring{a}_3 & \cdots & \mathring{a}_{n-1} & 1 & 0 \\ \mathring{a}_1 & \mathring{a}_2 & \cdots & \mathring{a}_{n-2} & \mathring{a}_{n-1} & 1\\ \end{array} \right] \left[ \begin{array}{c} \mathring{b}_1\\ \mathring{b}_2\\ \mathring{b}_3\\ \vdots\\ \mathring{b}_{n-1}\\ \mathring{b}_n \end{array} \right]\end{equation} \]
and \({\mathring{F}_{ji}}\) is the Frobenius companion matrix for the characteristic polynomial \(\mathring{q}(s) \triangleq {s^n+\mathring{a}_{n-1}s^{n-1}+\dots+\mathring{a}_1s+\mathring{a}_0}\) of (12). Acting as before (17) yields the check relation \(\mathring{T}_{ji}(s) = \mathring{H}\mathring{\Phi}_{ji}(s)\mathring{\Gamma}_{ji}\), in which \(\mathring{\Phi}_{ji}(s) = (Is - \mathring{F}_{ji} )^{-1}\) serves to find \(\mathring{\phi}_{ji}(t)\) as the inverse Laplace transform of \(\mathring{\Phi}_{ji}(s)\), quite similar to (17). Check: the \(\mathring{T}_{ji}(s)\) thus found must coincide with (12).

Remark 4. What will be done in the next Subsec. 4.2 and Subsec. 4.3 based on the preceeding Subsec. 4.1 for the illustrative example given in (9) can be repeated similarly for the general case given by (12), furnishing the results with the subscript \(_{ji}\). We omit these details and \(_{ji}\) subscripts due to the obviousness of the technique. We also omit subscript \(_{\mathrm{s}}\) as is done at the transition to (17).

4.2. Discrete-time model (DTM)

Belonging of \(\{\cdot\}\) to the discrete-time model is indicated below by the subscript \(_{\mathrm{d}}\) as in \(\{\cdot\}_{\mathrm{d}}\). Before making the change, it is necessary to reasonably choose the sampling interval \(T\). Obviously, the sampling rate \(1/T\) must be much higher than the natural frequency \(f_{\mathrm n} \triangleq 1/ T_{\mathrm n} = \omega_{\mathrm n}/(2\pi)\) of the system to be able to track the system behavior. This requirement means \(T f_{\mathrm n} \ll 1\). From the other side, \(\zeta^2\), cf. (11), must remain less than one for the consideration to stand. As a result, requirement \(T \ll T_{\mathrm n}\) in the transition to the discrete-time model means that parameter 
\[ \begin{equation} \fbox{\(d\triangleq e^{-\zeta\omega_{\mathrm n}T}\)} \end{equation} \tag{19} \]
appearing in (18) at \(t=T\) must lie within the sufficiently wide boundaries of inequality \(e^{-2\pi} \ll d < 1\), that is be less than one, but possible insignificantly less.

Given (9) and its continuous-time model (15)\(+\)(16), the DTM
\[ \begin{equation} \left.\begin{aligned} x(t_{i+1}) & = \mathring{\Phi}_{\mathrm{d}} x(t_{i}) + w_{\mathrm{d}}(t_{i}), \;\; \mathring{\Phi}_{\mathrm{d}} \triangleq \mathring{\phi} (T), \\ y(t_{i}) & = \underbrace{\left[\begin{array}{c@{\quad}c} 1 & 0 \end{array} \right]}_{\mathring{H}} x(t_{i}) + v_{\mathrm{d}}(t_{i}) \end{aligned} \, \right\} \end{equation} \tag{20} \]
yields by the standard method [26]. Here it is checked that (20) is observable with the observability index \(p=n=2\) and has physical dimentionalities \([x_1(t_{i})] =\) V, \([x_2(t_{i})] =\) V\(\cdot\)s\(^{-1}\). The discrete white-noise \(w_{\mathrm{d}}(t_{i})\) in (20) is a zero-mean process
\[ \begin{equation*} w_{\mathrm{d}}(t_{i}) \triangleq \int\nolimits_{t_i}^{t_{i+1}} \mathring{\phi}(t_{i+1} - \tau) \mathring{\Gamma} {\mathrm{d}}\beta(\tau) \end{equation*} \]
definded via the Brownian motion \(\beta(t)\) related formally with \(w(t)\) in its differential \(d\beta(\tau) \triangleq w(\tau)d\tau\). The covariance \([2\times 2]\)-matrix of \(w_{\mathrm{d}}(t_{i})\) is [26]
\[ \begin{equation} \mathring{Q}_{\mathrm{d}} \triangleq \int\nolimits_{t_i}^{t_{i+1}} \mathring{\phi}(t_{i+1} - \tau) \mathring{\Gamma} \mathring{Q} \mathring{\Gamma}^{ T} \mathring{\phi}^{ T}(t_{i+1} - \tau) {\mathrm{d}}\tau. \end{equation} \tag{21} \]
For the illustrative example (cf. page), four entries of (21) are calculated directly using (10)  and (11); the result is in
\[ \begin{equation} \left.\begin{aligned} q_{11} & = \frac{\mathring{c}_0^2\mathring{Q}}{4\zeta\omega_{\mathrm n}^3} \bigl[ 1 - \bigl( d^2 + 2\chi \sqrt{D} \phi_{11}(T) \phi_{12}(T) \bigr) \bigr] ,\\ q_{12} & = \frac{\mathring{c}_0^2\mathring{Q}}{2} \phi_{12}^2(T) = q_{21},\\ q_{22} & = \frac{\mathring{c}_0^2\mathring{Q}}{4\zeta\omega_{\mathrm n}} \bigl[ 1 - \bigl( d^2 - 2\chi \sqrt{D} \phi_{12}(T) \phi_{22}(T) \bigr) \bigr] . \end{aligned} \, \right\} \end{equation} \tag{22} \]

Remark 5. Calculations like (22) are technically trivial, so they are omitted here. They may seem complicated if done manually. Manual work can be avoided by using Maple to obtain the result quickly, easily and accurately. Additionally, although dimensionality analysis does not guarantee the correctness of result, it can be an auxiliary tool as in this case: \([q_{11}]=\)V\(^2\), \([q_{12}]=\)V\(^2\cdot\)s\(^{-1}\), \([q_{22}]=\)V\(^2\cdot\)s\(^{-2}\).

To finalize formulating DTM, it is worth going from \(w_{\mathrm{d}}(t_{i})\) to a dimensionless vector quantity \(\xi_{\mathrm{d}}(t_{i})\) for which \(w_{\mathrm{d}}(t_{i}) \triangleq \mathring{L}_{\mathrm{d}} \xi_{\mathrm{d}}(t_{i})\) with a matrix \(\mathring{L}_{\mathrm{d}}\) such that \(\mathring{Q}_{\mathrm{d}} = \mathring{L}_{\mathrm{d}} {{\mathring{L}}_{\mathrm{d}}}^{\rm T}\) by the lower triangular Cholesky decomposition [27, p. 40]. From  (22),
\[ \begin{equation} \begin{aligned} l_{11} & = \sqrt{q_{11}} ,\quad[l_{11}]= \mathrm{V}, \\ l_{21} & = q_{12} / l_{11},\quad[l_{21}]= \mathrm{V}\cdot\mathrm{s}^{-1}, \\ l_{22} & = \sqrt{q_{22} - {l_{21}}^2} ,\quad [l_{21}]= \mathrm{V}\cdot\mathrm{s}^{-1} \end{aligned} \end{equation} \tag{23} \]
being three non-zero real-valued entries of \([2\times 2]\)-matrix \(\mathring{L}_{\mathrm{d}}\). The model (20) now takes the final form
 \[ \begin{equation} \left.\begin{aligned} x(t_{i+1}) & = \mathring{\Phi}_{\mathrm{d}} x(t_{i}) + \mathring{L}_{\mathrm{d}}\xi_{\mathrm{d}}(t_{i}), \;\; \mathring{\Phi}_{\mathrm{d}} \triangleq \mathring{\phi} (T), \\ y(t_{i}) & = \underbrace{\left[\begin{array}{c@{\quad}c} 1 & 0 \end{array} \right]}_{\mathring{H}} x(t_{i}) + v_{\mathrm{d}}(t_{i}). \end{aligned} \, \right\} \end{equation} \tag{24} \]
As a result, the discrete white sequence \(\xi_{\mathrm{d}}(t_{i})\) in (24) has the unit covariance matrix, and the measurement discrete white sequence \(v_{\mathrm{d}}(t_{i})\) may have some unknown covariance \(\mathring{R}_{\mathrm{d}}>0\). Vector \(\mathring{\theta} \triangleq \bigl[ \mathring{\theta}_1 \triangleq\mathring{c}_0 \bigm| \mathring{\theta}_2 \triangleq\mathring{a}_0 \bigm| \mathring{\theta}_3 \triangleq\mathring{a}_1 \bigm| \mathring{\theta}_4 \triangleq\mathring{Q} \bigm| \mathring{\theta}_5 \triangleq\mathring{R}_{\mathrm{d}} \bigr]^{\mathrm T}\). Vector \(\hat{\theta} \triangleq \bigl[ \hat{\theta}_1 \triangleq\hat{c}_0 \bigm| \hat{\theta}_2 \triangleq\hat{a}_0 \bigm| \hat{\theta}_3 \triangleq\hat{a}_1 \bigm| \hat{\theta}_4 \triangleq\hat{Q} \bigm| \hat{\theta}_5 \triangleq \hat{R}_{\mathrm{d}} \bigr]^{\mathrm T}\) will be the estimator for \(\mathring{\theta}\). The explicit dependence of the in (24) matrices on \(\mathring{\theta}\) can be easily traced from the above formulas.

4.3. Standard Observable Discrete-time Model (SODM)

Turning back to the general solution of the problem, case B, let us introduce
\[ \mathring{M} \triangleq \bigl[ \mathring{H}^{\mathrm T} \bigm| (\mathring{H}\mathring{\Phi}_{\mathrm{d}})^{\mathrm T} \bigm| \cdots \bigm| (\mathring{H}\mathring{\Phi}_{\mathrm{d}}^{n-1})^{\mathrm T} \bigr]^{\mathrm T}, \]
the observability matrix for a linear \(n\)-dimensional one-way derivable DTM. It is invertible as the observability index \(p\) is supposed to equal \(n\). Performing a nonsingular basis transform in the state space by relation \(x^{\star} \triangleq \mathring{M}x\), we obtain the Standard Observable Discrete-time Model, SODM, with \(\mathring{\Phi}_{\star} \triangleq \mathring{M}\mathring{\Phi}_{\mathrm{d}}\mathring{M}^{-1}\), \(\mathring{H}_{\star} \triangleq \mathring{H}\mathring{M}^{-1}\), and \(\mathring{L}_{\star} \triangleq \mathring{M}\mathring{L}_{\mathrm{d}}\). Let us note that we use \(\{{\star}\}\) as a superscript or subscript for any magnitude \(\{\cdot\}\) belonging to the SODM, keeping in mind that the transfer function does not change when the basis changes nonsingularly. For this general case, notice Remark 4. For the specific case of (20), (24), we obtain the following SODM:
\[ \begin{equation} \left. \begin{aligned} x^{\star}(t_{i+1}) & = \mathring{\Phi}_{\star} x^{\star}(t_{i}) + \mathring{L}_{\star}\xi_{\mathrm{d}}(t_{i}),\\ y(t_{i}) & = H_{\star}x^{\star}(t_{i}) + v_{\mathrm{d}}(t_{i}),\\ \mathring{\Phi}_{\star} & \triangleq \mathring{M}\mathring{\Phi}_{\mathrm{d}}\mathring{M}^{-1} = {\left[\begin{array}{c@{\qquad}c} 0 & 1 \\ d^2 & 2d\cos(T\sqrt{D}) \end{array} \right]},\\ \mathring{H}_{\star} & \triangleq \mathring{H}\mathring{M}^{-1} = {\left[\begin{array}{c@{\qquad}c} 1 & 0 \end{array} \right]},\\ \mathring{L}_{\star} & \triangleq \mathring{M}\mathring{L}_{\mathrm{d}} = {\left[\begin{array}{c@{\qquad}c} l_{11} & 0 \\ l_{11}\phi_{11} + l_{21}\phi_{12} & l_{22}\phi_{12} \end{array} \right]},\\ \phi_{11} & \triangleq \phi_{11}(T),\; \phi_{12} \triangleq \phi_{12}(T),\; \phi_{22} \triangleq \phi_{22}(T). \end{aligned} \right\} \end{equation} \tag{25} \]
Every SODM thus obtained has matrix \(\mathring{\Phi}_{\star}\) in the form of the Frobenius companion matrix, and matrix \(\mathring{H}_{\star}\) with its first element equal to 1 and the rest to zeros.

4.4. OKF as the Target Model (OKF-TM)

Using the above technique culminated in (25) and taking into account  Remark 4, one obtains the following unique Optimal Kalman Filter--Target Model, OKF-TM, in the SODM basis for the general case arising from (12):
\[ \begin{equation} \left.\begin{aligned} \hat x^{\star}(t_{i+1}|t_i) &= \mathring\Phi_{\star} \hat x^{\star}(t_{i}|t_i)\,, \\ \hat x^{\star}(t_{i}|t_i) &= \hat x^{\star}(t_{i}|t_{i-1}) + \mathring K_{\star} \nu(t_{i}|t_{i-1})\,, \\ \end{aligned}\, \right\} \end{equation} \tag{26} \]
together with
 \[ \begin{equation} \left.\begin{aligned} y(t_{i}) &= \mathring{H}_{\star} \hat x^{\star}(t_{i}|t_{i-1}) + \nu(t_{i}|t_{i-1}), \\ \nu(t_{i}|t_{i-1}) &\triangleq y(t_{i}) - \mathring{H}_{\star} \hat x^{\star}(t_{i}|t_{i-1}) ~ \text{is defined as} \\ & \quad\,\, \textit{Innovation Sequence}, \text{IS}, \\ \mathring{K}_{\star} &= \mathring{P}_{\star}^{-} \mathring{H}_{\star}^{\rm T} \bigl( \mathring{H}_{\star} \mathring{P}_{\star}^{-} \mathring{H}_{\star}^{\rm T} + \mathring {R}_{\mathrm{d}} \bigr)^{-1},\\ \mathring{P}_{\star}^{-}&= \mathring{\Phi}_{\star} \bigl[ \mathring{P}_{\star}^{-} - \mathring{K}_{\star} \! \mathring{H}_{\star} \mathring{P}_{\star}^{-} \bigr] \mathring{\Phi}_{\star}^{\rm T} + \mathring{L}_{\star}\mathring{L}_{\star}^{\rm T}. \end{aligned}\, \right\} \end{equation} \tag{27} \]
We aim for a parametric identification of the steady-state OKF-TM (26)\(+\)(27). In this filter, \(\nu_{t|t-1}\) is a white-noise Gaussian sequence (WGS), and the last two equations in (27) form a  Discrete-time Algebraic Riccati Equation, DARE. Note the IS behaves like an WGS because \(\mathring{\theta}\) in (26)\(+\)(27) is assumed to be a true, albeit unknown, real parameter vector of some dimension \(q\): \({\mathring{\theta}}\in {\mathbb{R}}^q\).

Thus, algorithm (26)\(+\)(27) is a set of steady-state Kalman filter equations optimal for the true parameter \(\mathring{\theta}\). It is written under the unrealized assumption that \(\mathring{\theta}\) is known and that steady-state operation of this algorithm has been achieved by a theoretically assumed numerically stable DARE solution.

Remark 6. The preceding contains the correct characterization of \(\nu(t_{i}|t_{i-1})\) provided that the mathematical model on which the filter (26)\(+\)(27) is based accurately represents the real behavior of the system.

Thus, we can imagine—and consider this representation fair reasonable and therefore bearable—that the observed output \(y(t_{i})\) provided in fact by the real system is as if were generated, very conventionally, by the target model, that is, by the optimal Kalman filter, as presended in the first line of (27).

4.5. Concurrent Candidate Models (CCM)

Since \(\mathring{\theta}\) is unknown, one can only use its estimated value \(\hat{\theta}\), which is located in some \(\Theta\) space. Where the object of interest exists with no specific constraints, the real-valued space \(\Theta \equiv \mathbb{R}^q\) is formed by all possible values \(\hat\theta[j]\) of the estimator vector \(\hat \theta\). Here and below \(j\) denotes the order number of value \(\hat{\theta}\) in some scanning trajectory over the space \(\Theta\): \(\hat\theta[j] \in \Theta\), \(j=0,1,2,\dots, J_{\mathrm{total}}\), i.e. over an imaginary set of Concurrent Candidate Models, CCM. The CCM set plays the role of Machine Learning Models if one prefers to use Machine Learning terminology. When implementing a numerical iterative filter optimization method capable of sequentially converging to the OKF-TM (26)\(+\)(27), \(j\) has the meaning of the method step number, since it is common to test suboptimal models sequentially, their total number \(J_{\mathrm{total}}\), even if we admit, quite theoretically, the possibility of testing them in parallel (i.e. synchronously). It is important that in both variants, sequential or parallel, of the target model (26)\(+\)(27) identification, it is possible and even expedient to base the work on the same observational data \(y(t_{i})\) supplied (conditionally as said in  Remark 6) by the target model (26)\(+\)(27), using processing and analyzing the responses to these data of the suboptimal models under test as candidates for the role of the target, that is, optimal, model.

Assuming that \(\hat{\theta}\) has taken a particular \(\hat\theta[j]\) value in \(\Theta\), imagine that instead of optimal Kalman filter (26)\(+\)(27), we have managed to implement a suboptimal steady-state Kalman filter we refer to as \(j\)th  Standard Observable Kalman Filter, the \(j\)th SOKF, or \(\operatorname{SOKF}(\hat\theta[j])\), for short. The latter is the \(j\)th candidate model
\[ \begin{equation} \left.\begin{aligned} g_j^{\star}(t_{i+1}|t_i) &= \hat \Phi_{\star_j} g_j^{\star}(t_{i}|t_i)\,, \\ g_j^{\star}(t_{i}|t_i) &= g_j^{\star}(t_{i}|t_{i-1}) + \hat K_{\star_j} \eta_j(t_{i}|t_{i-1})\,, \\ y(t_{i}) &= \mathring{H}_{\star} g_j^{\star}(t_{i}|t_{i-1}) + \eta_j(t_{i}|t_{i-1}), \\ \eta_j(t_{i}|t_{i-1}) &\triangleq y(t_{i}) - \mathring{H}_{\star} g_j^{\star}(t_{i}|t_{i-1}) ~ \text{is defined as} \\ & \quad\,\, \text{the $j$th}~\textit{Residual Sequence},~\text{RS$_j$}, \\ \hat {K}_{\star_j} &= \hat {P}_{\star_j}^{-} \mathring{H}_{\star}^{\rm T} \bigl( \mathring{H}_{\star} \hat {P}_{\star_j}^{-} \mathring{H}_{\star}^{\rm T} + \hat{R}_{\mathrm{d}{j}} \bigr)^{-1}\!\!\!\!\!\!,\\ \hat {P}_{\star_j}^{-}&= \hat {\Phi}_{\star_j} \bigl[ \hat {P}_{\star_j}^{-} - \hat {K}_{\star_j} \! \mathring{H}_{\star} \hat {P}_{\star_j}^{-} \bigr] \hat {\Phi}_{\star_j}^{\rm T} + \hat {L}_{\star_j}\hat {L}_{\star_j}^{\rm T}\, \end{aligned}\, \right\} \end{equation} \tag{28} \]
with \(\hat \Phi_{\star_j}\triangleq \hat \Phi_{\star}(\hat\theta[j])\), \(\hat{R}_{\mathrm{d}{j}}\triangleq \hat {R}_{\mathrm{d}} (\hat\theta[j]) \), and \(\hat L_{\star_j}\triangleq \hat L_{\star}(\hat\theta[j])\). The model is intended to participate in testing to come as close as possible to the optimal filter (26)\(+\)(27), provided that the target model is also in the CCM set.

However, what does it mean: `we have managed to implement (28)?' In the real case scenario, this means that when trying to test candidate models sequentially, i.e. \(\operatorname{SOKF}(\hat\theta[j])\) after \(\operatorname{SOKF}(\hat\theta[j-1])\),  \(\operatorname{SOKF}(\hat\theta[j+1])\) after \(\operatorname{SOKF}(\hat\theta[j])\), and so on, we must solve \(\operatorname{DARE}\), i.e. the last two equations in (28) at each such step. Doing this job iteratively for each \(\operatorname{SOKF}(\hat\theta[j])\), we introduce the local notation \((i)\) for the iteration number, \((i) = (0),(1),\dots, (I_{\mathrm{DARE}})\), where \(I_{\mathrm{DARE}}\) denotes the final iteration number, and compute as follows, labeling the computed quantities with \(j\):
\[ \begin{equation} \left.\begin{aligned} \hat {P}_{\star_j {(0)}}^{-} = \hat {P}_{\star_{j-1} {(I_{\mathrm{DARE}}+1)}}^{-} \end{aligned}\, \right. \end{equation} \tag{29} \]
and then
\[ \begin{equation} \left.\begin{aligned} \hat {K}_{\star_j {(i)}} &= \hat {P}_{\star_j {(i)}}^{-} H_\star^{\rm T} \bigl( \mathring{H}_{\star} \hat {P}_{\star_j {(i)}}^{-} H_\star^{\rm T} + \hat{R}_{\mathrm{d}{j}} \bigr)^{-1}\!\!\!,\\ \hat {P}_{\star_j {(i+1)}}^{-} &= \hat \Phi_{\star_j} \bigl[ \hat {P}_{\star_j {(i)}}^{-} - \hat {K}_{\star_j {(i)}} \! \mathring{H}_{\star} \hat {P}_{\star_j {(i)}}^{-} \bigr] \hat \Phi_{\star_j}^{\rm T} +{}\\ &{}\quad\; \hat {L}_{\star_j}\hat {L}_{\star_j}^{\rm T}\,,\quad (i) = (0),(1),\dots, (I_{\mathrm{DARE}}). \end{aligned}\, \right. \end{equation} \tag{30} \]
The final value \(\hat {K}_{\star_j {(I_{\mathrm{DARE}})}}\) should be used as \(\hat {K}_{\star_j}\) in the second equation of (28), that is, \(\hat {K}_{\star_j}:= \hat {K}_{\star_j {(I_{\mathrm{DARE}})}}\), and the final value \(\hat {P}_{\star_j {(I_{\mathrm{DARE}}+1)}}^{-}\) as the starting point \(\hat {P}_{\star_{j+1} {(0)}}^{-} = \hat {P}_{\star_{j} {(I_{\mathrm{DARE}}+1)}}^{-}\) by the (29)  type but now for the \(\operatorname{SOKF}(\hat\theta[j+1])\) at the \((j+1)\)th optimization step if any, over the CCM set. `Real-case scenario' means that these Riccati iterations should be stopped at \(I_{\mathrm{DARE}}\) when a reasonable convergence criterion is satisfied. It also means that by the time of the final iteration \((i) = (I_{\mathrm{DARE}})\) we assume that the so iterated filter (28) has reached the desired steady-state operation defined by equations (28).

Iterations (30) may and should be performed on an accelerated time scale in the form of known numerically robust algorithms, e.g. [28], for each value \(j\), in other words, at each \(j\)th step of the numerical approximation to the optimum, that is to the target algorithm (26)\(+\)(27). This numerical optimization should be performed by a single AKF scanning sequentially the elements of the theoretically unbounded CCM set.

4.6. Predictors to form the AKF

We supplement the \(j\)th candidate model (28) with the predictors and make them operate as follows:
\[ \begin{equation} \left.\begin{aligned} g_j^{\star}(t_{i+h}|t_i) &\triangleq \hat \Phi_{\star_j} g_j^{\star}(t_{i+h-1}|t_i)\, \\ \hat{y}_{j}(t_{i+h}|t_i) &\triangleq \mathring{H}_{\star} g_j^{\star}(t_{i+h}|t_i)\, \end{aligned} \right\} \; h= {1,2,\dots,p} \end{equation} \tag{31} \]
where \(p\) is the total observability index of the system.

Remark 7. In an \(n\)-dimensional system with \(m\) outputs, any `\(i\)'th output of \(m\) outputs can be assigned a partial observability index \(p_i\). The sum of partial observability indices is always equal to the dimensionality \(n\) of the system if only the system has the property of complete observability. The total observability index \(p\) of the system is defined as the greatest of the partial indices. The case \(p < n\) is  possible  if  only  \(m > 1\).  In the problem under consideration, \(m=1\), therefore \(p=n\) elsewhere in what follows. Nevertheless, we distinguish between the notations \(p\) and \(n\), intending further work to extend the solution to the case where the number \(m\) of system outputs exceeds one. Only then \(p\) may occur less than \(n\). 

The fact that \(\hat{y}_{j}(t_{i+h}|t_i)\) in (31) and beyond depends on \({\hat\theta[j]}\) can also be denoted by the subscript \(_{\hat\theta[j]}\), bearing in mind the equivalence of the two possible notations: \(\hat{y}_{j}(t_{i+h}|t_i)\equiv \hat{y}_{\hat\theta[j]}(t_{i+h}|t_i), \; h= {1,2,\dots,p}\). In (32) that follows for the case of  (20)\(+\)(25) when \(p = 2\), the first line comes from (31) while the second equation in (32) comes from the first three lines of the target expressions (26)\(+\)(27):
\[ \begin{equation} \left.\begin{aligned} \begin{bmatrix} \hat y_{j}{(t_{i+1}|t_i)} \\ \hat y_{j}{(t_{i+2}|t_i)} \end{bmatrix} &= \begin{bmatrix} \mathring{H}_{\star} \\ \mathring{H}_{\star}\hat \Phi_{\star_j} \end{bmatrix} g_j^{\star}(t_{i+1}|t_i), \\ \begin{bmatrix} y{(t_{i+1})} \\ y{(t_{i+2})} \end{bmatrix} &= \begin{bmatrix} \mathring{H}_{\star} \\ \mathring{H}_{\star}\mathring\Phi_{\star} \end{bmatrix} \hat x^{\star}(t_{i+1}|t_i) {\,} + \begin{bmatrix} 1&&0\\ \mathring{H}_{\star}\mathring\Phi_{\star}\mathring K_{\star}&&1 \end{bmatrix} \begin{bmatrix} \nu{(t_{i+1}|t_{i})}\\ \nu{(t_{i+2}|t_{i+1})} \end{bmatrix}. \end{aligned}\!\! \right. \end{equation} \tag{32} \]
The composite (stackable) vectors opening expressions (32) in the specific case \(p=2\) are to be redefined when turning to the general case. Their definitions follow using notation \(p\) for the total observability index:
\[ \begin{equation} \left.\begin{aligned} \hat y_{\hat\theta[j]}{\left( t_{i+1}^{i+p}|t_i \right)} &\triangleq \begin{bmatrix} \hat y_{j}(t_{i+1}|t_i) \bigm| \cdots \bigm| \hat y_{j}(t_{i+p}|t_i) \end{bmatrix}^{\rm T}\! , \\ y{\left( t_{i+1}^{i+p} \right)} &\triangleq \begin{bmatrix} y(t_{i+1}) \bigm| \cdots \bigm| y(t_{i+p}) \end{bmatrix}^{\rm T}\! , \\ t_{i+1}^{i+p} &\triangleq \left( t_{i+1}, t_{i+2}, \ldots, t_{i+p} \right) . \end{aligned} \right. \end{equation} \tag{33} \]
For the case of a single-output completely observable linear \(n\)-dimensional DTM, we have \(p=n\). Thus, we obtain the advantages of changing to the SOF, viz.,
\[ \begin{equation} \left.\begin{aligned} \bigl[ \begin{matrix} \mathring{H}_{\star}&( \mathring{H}_{\star}\hat \Phi_{\star_j} )& \cdots &( \mathring{H}_{\star}\hat \Phi_{\star_j}^{n-1} ) \end{matrix}\bigr]^{\rm T} &= I , \\ \bigl[ \begin{matrix} \mathring{H}_{\star}&( \mathring{H}_{\star}\mathring\Phi_{\star} )& \cdots & ( \mathring{H}_{\star}\mathring\Phi_{\star}^{n-1} ) \end{matrix}\bigr]^{\rm T} &= I. \end{aligned} \right. \end{equation} \tag{34} \]
Equations (34) are true regardless of \(j\) and the non-trivial entries in the Frobenius matrices \(\mathring\Phi_{\star}\) as defined in (25)  for (26) and \(\hat \Phi_{\star_j}\) as commented for (28).

 4.7. The generalized residual (GR)

What follows is the general case of using \(p=n\) in the key relations (34) as a result of computing these composite (stackable) vectors: 
\[ \begin{equation} \left.\begin{aligned} & \hat y_{\hat\theta[j]}{\bigl( t_{i+1}^{i+p}|t_i \bigr)} = I \cdot g_j^{\star}(t_{i+1}|t_i),\\ & { \, \,\,} y{\bigl( t_{i+1}^{i+p} \bigr)} = I \cdot {\hat x}^{\star}{(t_{i+1}|t_i)} { } +\\ & {} \underbrace{\begin{bmatrix} 1&0&\cdots&0\\ \mathring{H}_{\star}\mathring\Phi_{\star}\mathring K_{\star}&1&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ \mathring{H}_{\star}\mathring\Phi^{p-1}_{\star}\mathring K_{\star} &\mathring{H}_{\star}\mathring\Phi^{p-2}_{\star}\mathring K_{\star} &\cdots &1 \end{bmatrix} \begin{bmatrix} \nu{(t_{i+1}|t_{i})}\\ \nu{(t_{i+2}|t_{i+1})}\\ \vdots\\ \nu{(t_{i+p}|t_{i+p-1})} \end{bmatrix}.}_{\displaystyle \triangleq \delta \left[ \nu_{(t_{i+1}|t_{i})}^{(t_{i+p}|t_{i+p-1})} \right] (\mathring{\theta}) } \end{aligned}\; \right\}\end{equation} \tag{35} \] 

Expressions (35), obtained at intervals equal to the system's observability index \(p\), show that the discrepancy between the system outputs \(y{\bigl( t_{i+1}^{i+p} \bigr)}\) and the corresponding predicted data \(\hat y_{\hat\theta[j]}{\bigl( t_{i+1}^{i+p}|t_i \bigr)}\) both expressed in SODM terms, contain a valuable but explicitly unavailable mismatch between the states \({\hat x}^{\star}{(t_{i+1}|t_i)}\) of the optimal filter (26)\(+\)(27), which is latently present in the discretely observed system output in response to the learning excitation \(w(t)\), and \(g_j^{\star}(t_{i+1}|t_i)\) being computed at the \(j\)th iteration of \(\operatorname{SOKF} (\hat\theta [j])\) (28)\(+\)(31). 

Remark 8. The mismatch of the optimal filter (26)\(+\)(27) compared to the suboptimal filter (28)\(+\)(31) is the difference between the object state estimates given by the optimal and suboptimal filters

Naming the difference between the second and first lines in (33), or equally in (35), the Generalized Residual, GR, calculated to be the \((n\times 1)\) vector process
\[ \begin{equation} \operatorname{ GR:} \quad \varepsilon_{\hat\theta[j]}^{\star}{\bigl( t_{i+1}^{i+p}|t_i \bigr)} \triangleq y{\bigl( t_{i+1}^{i+p} \bigr)} - \hat y_{\hat\theta[j]}{\bigl( t_{i+1}^{i+p}|t_i \bigr)} \in \mathbb{R}^n \end{equation} \tag{36} \]
and introducing the notion of adaptive filter state estimation error, or better to say, the concept of Adaptive vs. Optimal Filter State Estimation Mismatch,
\[ \begin{equation} \operatorname{ AOFSEM:} \quad e_{\hat\theta[j]}^{\star}{(t_{i+1}|t_i)} \triangleq \hat x^{\star}{(t_{i+1}|t_i)} - g_j^{\star}(t_{i+1}|t_i) \in \mathbb{R}^n \end{equation} \tag{37} \]
yields the key result:

Theorem 1. Let GR be calculated as (36and AOFSEM, which does not have a computer-manipulable representation, defined as (37). Based on the fact that the Direct Performance Index
\[ \begin{equation} DPI \triangleq {\cal J}^{DPI}_{t_{i+1}}(\hat\theta) \triangleq \mathop{{ \bf E}\left\{ { \bigl\| e_{\hat\theta[j]}^{\star}{(t_{i+1}|t_i)} \bigr\| }^{2} \right\}}\nolimits \in \mathbb{R}^1, \end{equation} \tag{38} \]
or in other words, Expected Direct Cost Function, EDCF, is not explicitly available to optimize the suboptimal filter (28)\(+\)(31), we introduce the Indirect Performance Index 
\[ \begin{equation} IPI \triangleq {\cal J}^{IPI}_{t_{i+p}}(\hat\theta) \triangleq \mathop{{ \bf E}\left\{ { \bigl\| \varepsilon_{\hat\theta[j]}^{\star}{\bigl( t_{i+1}^{i+p}|t_i \bigr)} \bigr\| }^{2} \right\}}\nolimits \in \mathbb{R}^1, \end{equation} \tag{39} \]
or in formal words, the Expected Indirect Cost Function, EICF. Then minimizing the IPI (39) by any numerical optimization method in \(\hat\theta \equiv{\hat\theta[j]} \in\Theta\) at each discrete time \({t_{i+p}}\) is equivalent to minimizing the DPI (38) in \(\hat\theta \equiv{\hat\theta[j]}\in\Theta\) at time \({t_{i+1}}\):
\[ \begin{equation} \fbox{\(\bigl\{ \min_{\hat{\theta}} {\cal J}^{IPI}_{t_{i+p}}(\hat{\theta}) \bigr\} \iff \bigl\{ \min_{\hat{\theta}} {\cal J}^{ DPI}_{t_{i+1}}(\hat{\theta}) = 0 \bigr\}. \)} \end{equation} \tag{40} \]

Proof. Given definitions (36) and (37), relations (35) show that
\[ \varepsilon_{\hat\theta[j]}^{\star}{\bigl( t_{i+1}^{i+p}|t_i \bigr)} = e_{\hat\theta[j]}^{\star}{(t_{i+1}|t_i)} + \delta \left[ \nu_{(t_{i+1}|t_{i})}^{(t_{i+p}|t_{i+p-1})} \right] (\mathring{\theta}) \]
with \(\delta \left[ \nu_{(t_{i+1}|t_{i})}^{(t_{i+p}|t_{i+p-1})} \right] (\mathring{\theta})\) defined in (35) being, first, independent of the estimated value \(\hat\theta \equiv{\hat\theta[j]} \in\Theta\) and, second, uncorrelated with error \(e_{\hat\theta[j]}^{\star}{(t_{i+1}|t_i)}\) (37) since the stackable vector \(\left[ \nu_{(t_{i+1}|t_{i})}^{(t_{i+p}|t_{i+p-1})} \right]\) formed by the white-noise IS in (26)\(+\)(27) is separated by one sample time interval \(T\) from all preceding IS values that determine error \(e_{\hat\theta[j]}^{\star}{(t_{i+1}|t_i)}\) (37). It is this circumstance, together with the theoretical fact that IS has the properties of a white-noise sequence, that entails statement \( \fbox{ \( IPI =DPI + Const_{\hat\theta}  \)  }  \), where \( Const_{\hat\theta} \) equals \( \mathop{{ \bf E}\left\{ \left\| \delta \left[ \nu_{(t_{i+1}|t_{i})}^{(t_{i+p}|t_{i+p-1})} \right] (\mathring{\theta}) \right\| ^2 \right\}}\nolimits \), a value independent of \(\hat\theta \equiv{\hat\theta[j]} \in\Theta\), which definitively proves statement (40). It is also easy to verify that
\[ \begin{equation} \hat y_{\hat\theta[j]}{\bigl( t_{i+1}^{i+p}|t_i \bigr)} = g_{j}^{\star}{\left( t_{i+1} | t_{i} \right)} \end{equation} \tag{41} \]
by virtue of the predictors (31) and first line in (34). \(\square\)

Remark 9. The data \(y{\bigl( t_{i+1}^{i+p} \bigr)}\) defined in (33) and used in (36) does not depend on \(\hat{\theta}[j]\), so only these measurement data can be collected and stored in computer memory before running the method's algorithm aimed at analyzing any \(\operatorname{SOKF} (\hat\theta[j])\) in terms of its state (behavior) closeness to that of the target optimal filter and ability of further diminishing the mean square discrepancy between these states.

5. Method implementation challenges

An attempt to implement the given solution with its advantages poses several challenges concerning the organization of computation time, calculation sequence, and numerical stability. Let us briefly discuss these challenges.

5.1. Computation time organization

As noted above, it is possible and even expedient to calculate parameter estimates in the accelerated off-line mode, that is, after the accumulation of measurement data in a database. We come to the contents of the database having defined the function to be minimized as follows.

By shifting IPI (39) back by \(p\) time points, we determine the Expected Indirect Objective Function, EIOF,
\[ \begin{equation} f(\hat\theta)\triangleq \mathop{{ \bf E}\left\{ { \bigl\| \varepsilon_{\hat\theta[j]}^{\star}{ ( t_{i-p+1}^{i}|t_{i-p} )} \bigr\| }^{2} \right\}}\nolimits \end{equation} \tag{42} \]
to be minimized in \(\hat\theta \in \mathbb{R}^q\). For practical work, we have to turn to the Averaged Indirect Objective Function, AIOF (43)
\[ \begin{equation} \overline{IPI} \triangleq {\cal J}^{\overline{IPI }}_{t_{i}\equiv t_{i}^{c}} (\hat{\theta}[j] ) \triangleq \frac{1}{M+1} \sum_{k=0}^{M} \bigl\| \varepsilon_{\hat\theta[j]}^{\star(k)}{ ( t_{i-p+1}^{i}|t_{i-p} )} \bigr\| ^{2} \triangleq f_{M}(\hat{\theta} = \hat{\theta}[j]) \end{equation} \tag{43} \]
considered as a real-valued function \(f_{M}(\hat{\theta} = \hat{\theta}[j])\) to be minimized in parameter \(\hat{\theta}\in \mathbb{R}^q\) instead of (42), the latter being the shifted back EICF (39).

Remark 10. The use of the upper index \(^{(k)}\), \(k=\overline{0,M}\triangleq 0,1,\ldots, M\), from (43) on as the sample number is especially justified when averaging \(M+1\) sample paths of the process, if necessary. Otherwise, it is sufficient to let \(M=0\) and so release of using \(^{(k)}\). \(M\) may be as large as desired positive integer number for better averaging. We relate \(t_{i}^{c}\), the time the computer starts up to process the data, to the real time \(t_{i}\) by which the data is ready

As seen from (43), each \((k)\)th \(\varepsilon_{\hat\theta[j]}^{\star(k)}{\bigl( t_{i-p+1}^{i}|t_{i-p} \bigr)}\)-path must result from a one-to-one time-mapping—no more than in the algorithmic computations computer-time formal representation—of interval \({ t_{i-p+1}^{i}|t_{i-p} }\) to the real-time \((k)\)th segment
\[ \begin{equation} \left.\begin{aligned} t_{i-p+1}^{i(k)} & \triangleq \{ t_{i-(k+1)p+1}, t_{i-(k+1)p+2}, \ldots, t_{i-(k+1)p+p} \} \triangleq \left. t^{i-kp}_{i-(k+1)p+1} \right.,\\ k & = 0,1,\ldots,M \end{aligned}\quad\right\} \end{equation} \tag{44} \]
of a set of points. We imagine an entire record \((M+1)p\)-length of all data 
\[ y ( t_{i-(M+1)p + 1}^{i} ) \triangleq \bigl\{ y(t_{i-(M+1)p+1}), y(t_{i-(M+1)p+2}), \cdots, y(t_{i-1}), y(t_{i}) \bigr\} \]
in the Measurement Data Base, MDB, as composed of \((M+1)\) portions (45)
\[ \begin{equation} y^{(k)}{\left( t_{i-p+1}^{i} \right)} \triangleq y{\bigl( t_{i-(k+1)p+1}^{i-kp} \bigr)}\,, \quad k = 0, 1, \ldots, M \end{equation} \tag{45} \]
obtained in real time but referenced to the computer-time stackable \(p\)-vectors
\[ \begin{equation} \left.\begin{aligned} y^{(k)}{\left( t_{i-p+1}^{i} \right)} & \triangleq \bigl[ y^{(k)}\left( t_{i-p+1} \right) \bigm| y^{(k)}\left( t_{i-p+2} \right) \bigm| \cdots \bigm| y^{(k)} ( t_{i} ) \bigr]^{\mathrm{T}} \,, \quad \\ k & = 0,1,\ldots,M\,. \end{aligned}\quad\right\}  \end{equation} \tag{46} \] 

Remark 11. The idea of notation (46) explained as regards the \((k)\)th sample path \(\varepsilon_{\hat\theta[j]}^{\star(k)}{ ( t_{i-p+1}^{i}|t_{i-p} )}\) notation by relating the \({ t_{i-p+1}^{i}|t_{i-p} }\) interval—for the algorithmic computations in computer-time—to the real-time segment (44) should be clear below when applied to other quantities as well. One can always see how real-time data, such as (45), relates to the same data, such as (46), when the latter is stored in the MDB for further computer processing. Additionally, the upper index \(^{(k)}\) indicates that the \({(k)}\)th sample data is considered as being in the MDB

Given (35), (36), and (37), let us write down all \((k)\)-sampled time-shifted values
\[ \begin{equation} \begin{aligned} \varepsilon_{\hat\theta[j]}^{\star(k)}{\left( t_{i-p+1}^{i}|t_{i-p} \right)} \!&\triangleq y^{(k)}{\left( t_{i-p+1}^{i} \right)} \! - \hat y_{\hat\theta[j]}^{(k)}{\left( t_{i-p+1}^{i}|t_{i-p} \right)} \! \\ k \!&= 0,1,\ldots,M \end{aligned} \end{equation} \tag{47} \]
of the GR, (36), stored in the MDB to compute \(f_{M}(\hat{\theta})\) and, respectively, all
\[ \begin{equation} \begin{aligned} e_{\hat\theta[j]}^{\star(k)}{\left( t_{i-p+1} | t_{i-p} \right)} & \triangleq \hat x^{\star (k)}{\left( t_{i-p+1} | t_{i-p} \right)} - \, g_{j}^{\star(k)}{\left( t_{i-p+1} | t_{i-p} \right)} \\ k &= 0,1,\ldots,M \end{aligned} \end{equation} \tag{48} \]
AOFSEM, (37), to go from (43) to optimization algorithms.

According to Theorem 1, the \((k)\)th sample value (47) of the random vector (36) varies, if we consider it in the Mean Square, MS, sense, by \( Const_{\hat\theta} \) remaining constant during the numerical scanning—does not matter sequentially or in parallel—of the parameter space \(\Theta\), from the \((k)\)th sample value (48) of the \(p\)-points delayed random discrepancy (37) between (A) state \(\hat x^{\star (k)}{\left( t_{i-p+1} | t_{i-p} \right)}\) of the target optimal filter (26)\(+\)(27)  and (B) state \(g_{j}^{\star(k)}{\left( t_{i-p+1} | t_{i-p} \right)}\) of the suboptimal filter (28), that is from the error committed by (B) if used as the \(j\)th estimator of (A) based on the \((k)\)th sample data (45)\(\equiv\)(46). The theorem also proves equation (41), which we use hereafter as the \(p\)-point delay for any \((k)\)th sampling path thus obtaining (49)
\[ \begin{equation} \hat y_{\hat\theta[j]}{\left( t_{i-p+1}^{i}|t_{i-p} \right)} = g_{j}^{\star}{\left( t_{i-p+1} | t_{i-p} \right)} . \end{equation} \tag{49} \]

Remark 12. The delay by \((k+1)p\) points of discrete time from the current moment \(t_{i}\equiv t^{c}_{i}\) in (45), and in (47),  (48as well, if we relate  (47),  (48) to the real time, is irrelevant because of the stationarity assumption of the system under study and the MS sense used.
Note in passing that, by virtue of Remark 7, vectors in the \((k)\)th sample defined by expressions (45), (47), and (48) have dimension \(n\), due to equality \(p=n\) in the situation of scalar system output, \(m=1\). 

Assuming that the \(\hat{\theta}\) parameter has no explicit constraints, the unconstrained optimization is applicable. The most straightforward way of doing this gives rise to what is known as Newton's method (50) [29, pp. 44––79]:
\[ \begin{equation} \left.\begin{aligned} &\text{(a) } \text{solve } G(\hat{\theta}[j])\,\Delta = - \nabla_{\hat\theta} f_{M} \bigl(\hat{\theta} = \hat{\theta}[j]\bigr) \text{ for vector } \Delta\,; \\ &\text{(b) } \text{set } \hat\theta[j+1] = \hat\theta[j] + \Delta\bigl(\hat{\theta} = \hat{\theta}[j]\bigr) \text{ with } \Delta\bigl(\hat{\theta} = \hat{\theta}[j]\bigr) \triangleq \Delta\,. \end{aligned}\quad\right\} \end{equation} \tag{50} \]
Hereafter, gradient operator is used as
\[ \begin{equation} \nabla_{\hat \theta}(\diamond) \triangleq \left[ {\frac{\partial}{\partial \hat{\theta}_1}}(\diamond) \Bigm| \cdots \Bigm| {\frac{\partial}{\partial \hat{\theta}_r}}(\diamond) \Bigm| \cdots \Bigm| {\frac{\partial}{\partial \hat{\theta}_q}}(\diamond) \right]^{ \rm T} \equiv \left[ \cdots {\frac{\partial}{\partial \hat{\theta}_r}}(\diamond) \cdots \right]^{\rm T}_{r=\overline{1,q}} \end{equation} \tag{51} \]
to be applied to each scalar element of vector or matrix \((\diamond)\); if \((\diamond) = f_{M}\bigl(\hat{\theta} = \hat{\theta}[j]\bigr)\), we have (50) with the matrix of second partial variables \(G(\hat{\theta}[j])\) known as Hessian matrix and defined by \(\nabla_{\hat \theta}^2 f_{M}^{\rm T} \bigl(\hat{\theta} = \hat{\theta}[j]\bigr) \triangleq \nabla_{\hat \theta}\bigl(\nabla_{\hat \theta} f_{M} \bigl(\hat{\theta} = \hat{\theta}[j]\bigr)^{ \rm T}\bigr)\). When \(M=0\), (50) is treated as a pure stochastic approximation of Newton's method.

The following gradient descent optimization
\[ \begin{equation} \hat\theta[j+1] = \hat\theta[j] - \gamma_{j} \nabla_{\hat\theta} f_{M}\bigl(\hat{\theta} = \hat{\theta}[j]\bigr) \end{equation} \tag{52} \]
can be a reasonable alternative to (50) with a lower computational burden. It uses a small enough step size \(\gamma_{j} \in \mathbb{R} _{+}\) to guarantee \(f_{M}\bigl(\hat{\theta} = \hat{\theta}[j+1]\bigr) < f_{M}\bigl(\hat{\theta} = \hat{\theta}[j]\bigr)\) and thereby performs the transition to the next \(\operatorname{SOKF} (\hat\theta[j+1])\), as shown in Fig 4and mentioned in Subsec. 4.5 to test candidate models sequentially.  In this figure, the arcs directed by arrows from points \(t_{i-p+1}\), \(t_{i-p+2}\), and \(t_{i}\) to \(t_{i-p}\) on the horizontal line tell us, cf. (31), that all predictors \(g_j^{\star(k)}(t_{i-p+h}|t_{i-p})\), their numbers \(h= {1,2,\dots,p}\), involved in obtaining the estimates \(\hat y_{\hat\theta[j]}^{(k)}{\left( t_{i-p+h}|t_{i-p} \right)} = \mathring{H}_{\star} g_j^{\star(k)}(t_{i-p+h}|t_{i-p})\) are conditioned, in the probabilistic sense, on the entire measurement history \( y \bigl( t_{i-p-l}^{i-p} \bigr) \triangleq \bigl\{ y(t_{i-p-l}), y(t_{i-p-(l-1)}), \dots, y(t_{i-p-1}), y(t_{i-p}) \bigr\}, \) (theoretically, \(l \to \infty\)) which is incorporated in the predictors and precedes their calculation immediately after time \(t_{i-p}\).

Figure 4. A timing diagram for minimizing the average (43) of the indirect performance index (39) by a gradient sequential method (52) using the AIOF as defined in (43)

Thus, upon a closer look at what the theory requires, we realize that it requires the cost of time intervals (44)  depicted in Fig. 4 by boxes as \(t_{i-p+1}^{i(k)}(\hat\theta[j])\), then \(t_{i-p+1}^{i(k)}(\hat\theta[j+1])\), and so on, as related to \(\operatorname{SOKF} (\hat\theta[j])\), then \(\operatorname{SOKF}(\hat\theta[j+1])\), and so on during the AKF step-by-step accelerated optimization.

5.2. Computation scheme and numerical robustness challenge

Let us perform the necessary calculations for the above algorithm (52).

Using (49) to obtain 
\[ \begin{equation} \left.\begin{aligned} \nabla_{\hat\theta} \left( \bigl[ \varepsilon_{\hat\theta[j]}^{\star(k)}{ ( t_{i-p+1}^{i}|t_{i-p} )} \bigr]^{\mathrm{T}} \right) & = - \nabla_{\hat\theta} \left( \bigl[ g_{j}^{\star(k)}{ ( t_{i-p+1} | t_{i-p} )} \bigr]^{\mathrm{T}} \right) \end{aligned}\quad\right. \end{equation} \tag{53} \]
and further (43) as \((\diamond)\) in (51) yields (54), the Objective Function Gradient, OFG:
\[ \begin{equation} \left.\begin{aligned} \nabla_{\hat\theta} f_{M} & \bigl( \hat{\theta}=\hat{\theta} [j] \bigr)\\ & = \frac{2}{M+1} \sum_{k=0}^{M} \nabla_{\hat\theta} \left( \bigl[ \varepsilon_{\hat\theta[j]}^{\star(k)}{ ( t_{i-p+1}^{i}|t_{i-p} )} \bigr]^{\mathrm{T}} \right) \times \left[ \varepsilon_{\hat\theta[j]}^{\star(k)}{ ( t_{i-p+1}^{i}|t_{i-p} )} \right]\\ & = \frac{-2}{M+1} \sum_{k=0}^{M} \nabla_{\hat\theta} \left( \bigl[ g_{j}^{\star(k)}{ ( t_{i-p+1} | t_{i-p} )} \bigr]^{\mathrm{T}} \right) \times \left[ \varepsilon_{\hat\theta[j]}^{\star(k)}{ ( t_{i-p+1}^{i}|t_{i-p} )} \right]\\ \end{aligned}\quad \right\} \end{equation} \tag{54} \]
Controlling whether the OFG norm reaches a neighborhood of zero, that is checking it for values `greater than/equal to' (\(\geqslant\)) or `less than' (\(<\)) a small threshold \(\delta\) is a convenient criterion for continuing or terminating the procedure:
\[ \begin{equation}  \left| \bigl\| \nabla_{\hat\theta} f_{M} \bigl( \hat{\theta}=\hat{\theta} [j] \bigr) \bigr\| \right| \left. { {\text{continue}} \atop {\geqslant \atop <} } \atop {\scriptsize\text{stop}} \right. \delta\,. \end{equation} \tag{55} \] 
The identification procedure will continue repeating operations numbered (Fig. 5) by blocks ②, ③, ④, ⑤, ⑥, and ⑦ with the same data stored in block ① after updating the parameter estimate in block ⑧:
 \[ \begin{equation}\hat\theta[j+1] := \hat\theta[j]\, \end{equation} \tag{56} \]
or will stop with capturing the result in block ⑨:
\[ \begin{equation}\hat\theta_{\mathrm{FINAL}} := \hat\theta[j]\,. \end{equation} \tag{57} \]

Figure 5. Generalized Parametric Identification Scheme by the Active Principle. Legend: ① = (45)≡(46); ② = (31)→(29)→(30)→(31)→(49); ③ = (47); ④ = (53); ⑤ = (54); ⑥ = (55); ⑦ = (52); ⑧ = (56); ⑨ = (57)

The most important thing about these repeatable operations (cf. Fig. 5) is that the calculations in block ② – the AKF, and in block ⑤ – the AKF Sensitivity Model being an algorithm to compute values of partial derivatives of state vector estimates and covariance matrix elements relative to the \(q\)-vector parameter estimates \(\hat\theta[j]\), must be numerically stable and robust with respect to ill-conditioned models. In this regard, it should be noted that practical projects using Kalman filtering, KF, since the very first one [30], have opened a wide field for research and development on giving KF algorithms, including Riccati equations, required numerical stability and robustness properties. The fundamental ideas of Bierman [27] about matrix factorization served as a powerful impetus.

For the method of Active System Identification developed in this article and earlier works by the author, the greatest contribution from Russian scientists was made by Julia Tsyganova and Maria Kulikova in their dissertations [31, 32] and numerous publications, partly co-authored [33, 34] with the author of this synthesis paper. More references on the pioneering titles can be found in a recent survey [35]. In there, one can find discussions and current developments for the efficient and robust computation of derivatives on the parameters of discrete filter equations, including a set of vector-valued filter sensitivity equations and a set of matrix-valued Riccati-type sensitivity equations arising in implementing the (steepest) gradient descent method (52); the necessary experimental proofs are there also available. In promising software projects, it is strongly recommended to create modern implementations of ② and ④ blocks (in Fig. 5) based on orthogonalized array methods for parametric identification of discrete linear stochastic systems [31].

5.3. Sequence of work for a software project

If there is an object of interest in the application domain, an appropriate mathematical model must first be written for parametric identification of the TF, considered to be an adequate description based on the laws of Physics. An example of such preprocessing operations is given in Sect. 2. It is highly expedient to perform further actions on the application of this method not manually, but in the automated mode on the  Maple software, according to the guidelines in Remark 5. The calculation procedure recommended for identifying the TF of an object according to the method outlined above is shown graphically in Fig. 6. This diagram can be useful when creating a specialized software tool to solve similar problems if such a project arises. Application-specific (AS) calculations for the problem taken in this article as an illustrative example are dictated by the following intermediate quantities:
① parameters \(\omega_{\mathrm n}\), \(\zeta\), \(D\), and \(\chi\) in (11);
② matrix \(\mathring{\phi}(t)\) in (17) with its entries in (18);
③ parameter \(d\) in (19);
④ matrix \(\mathring{Q}_{\mathrm{d}}\) in (21) with its entries in (22);
⑤ matrix \(\mathring{L}_{\mathrm{d}}\) with its three non-zero entries in (23);
⑥ matrix \(\mathring{\Phi}_{\mathrm{d}}\) in (24); and
⑦ matrices \(\mathring{\Phi}_{\star}\) and \(\mathring{L}_{\star}\) in (25).

Figure 6. Flow-diagram of works for a software project to implement the Active Principle of parametric system identification in the class of linear, time-invariant, completely observable models. AS = application-specific and SM = standard matrix calculations. The paper section numbers and equation numbers (within parentheses) are shown

These quantities are all continuous and differentiable functions of elements \(\mathring{\theta}_{i}\) of vector \(\mathring{\theta}\) introduced after  (24) for this application problem. Following the form of these functions, we apply not the true parameter value \(\mathring{\theta}\), but its estimated value \(\hat{\theta}\) to have continuous and differentiable, over \(\hat{\theta}\), functions. These properties make it possible to compute the gradient \(\nabla_{\hat\theta} {\cal J}^{\overline{IPI}}_{t_{i}\equiv t_{i}^{c}}(\hat{\theta} = \hat{\theta}[j])\) in algorithm (52) tuning the parameter \(\hat{\theta}\).  

6. Conclusions

Returning to the problem issues posed at the beginning, we believe the goals have been achieved.

The incompatibility of the theoretical (direct) criterion of optimality of the system model and practical methods of optimization has shown itself to be the most difficult or even impossible obstacle to overcome under conditions of uncertainty directly. In this article, incongruity between theoretic perception of the system optimality and practical on-computer optimization is overcome through the construction of an indirect proximity criterion of the adaptive and optimal system models to each other, which can become a practical tool for parametric system identification. This is done here for a class of linear time-invariant models characterized by a transfer function, relying on Kalman filter theory. As proven here, it is necessary and sufficient to modify a physically specified structure into the discrete-time standard observable model in both the observed data and the adaptive Kalman filter to implement this idea.

The advantage of this modification is twofold. First, the restrictions on the permissible size of a'priori parametric uncertainty are removed because the measurement channel parameters are transfered formally into the modified state equation. Second, and most importantly, we replace the direct, but unattainable for identification, objective function with the indirect objective function, which is equivalent to the original direct function, but—to our satisfaction—attainable and suitable for conventional optimization methods. The work of this preliminary modification is difficult to perform manually. Fortunately, it is greatly facilitated and accurately performed using well-known symbolic computation tools (Maple).

We prefer to denote the above-used concept of `equivalence' of the two objective functions by the new term `equimodality,' which simply means the coincidence of their minimizers in the argument space. This coincidence is important because ensures that minimization of the constructed practical indirect cost function does lead to the unbiased state estimates along with the unbiased parameter estimates, as it should be in the optimal filter.

The inclusion of a new illustrative example from modern digital communication technology increases the visibility of the approach and thereby encourages its extension to broad real-world applications.

Theoretical questions should not overshadow the practical side of the case. Therefore, this paper also includes practical critical issues: the organization of calculations in the computer time-scale, the structural algorithmic construction of the identification procedure, and the planning of the corresponding design work.

As a limitation of the work, it could be pointed out that underlying its results are the assumptions that the system to be modeled is linear and time-invariant. There is nothing to argue against this, except for the famous aphorism by George E.P. Box [36, p. 2]: "Models, of course, are never true, but fortunately it is only necessary that they be useful."1  Accordingly, if the system output measurement shows the presence of nonlinear distortions, it may mean that the suitability of the proposed method to identify a nonlinear model with a dominant linear kernel defined by the Best Linear Approximation concept, BLA, based on [2] propositions, should be considered and recommended for extended study. 

Competing interests. The author declares no competing interests.
Authorship contribution and responsibility. The author has approved the final version of the manuscript.
Funding. The research has not had any funding.
Acknowledgments. In the preparation of this paper, the author has benefited from the discussions with Dr. Alexandru Murgu (University of Cape Town) who provided extended comments and gave valuable suggestions on the manuscript.

×

About the authors

Innokentiy V. Semushin

Ulyanovsk State University

Author for correspondence.
Email: innovsem@gmail.com
ORCID iD: 0000-0002-3687-1110
https://www.ulsu.ru/ru/employees/2561/

Dr. Techn. Sci., Professor, IEEE Member, Professor, Dept. of Information Technology

Russian Federation, 432017, Ulyanovsk, L. Tolstoy st., 42

References

  1. Pillonetto G., Ljung L. Full Bayesian identification of linear dynamic systems using stable kernels, Proc. Natl. Acad. Sci. USA, 2023, vol. 120, no. 18, e2218197120. DOI: https://doi.org/10.1073/pnas.2218197120.
  2. Schoukens J., Vaes M., Pintelon R. Linear system identification in a nonlinear setting: Nonparametric analysis of the nonlinear distortions and their impact on the best linear approximation, IEEE Contr. Systems Magaz., 2016, vol. 36, no. 3, pp. 38–69. DOI: https://doi.org/10.1109/MCS.2016.2535918.
  3. Ljung L. Convergence analysis of parametric identification methods, IEEE Trans. Auto. Contr., 1978, vol. 23, no. 5, pp. 770–783. DOI: https://doi.org/10.1109/TAC.1978.1101840.
  4. Ljung L. System identification, In: Signal Analysis and Prediction, Applied and Numerical Harmonic Analysis; eds. A. Procházka, J. Uhlíř, P. W. J. Rayner, N. G. Kingsbury. Boston, MA, Birkhäuser, 1998, pp. 163–173. DOI: https://doi.org/10.1007/978-1-4612-1768-8_11.
  5. Ljung L. System Identification: Theory for the User. Upper Saddle River, N.J., Prentice Hall, 1999, xxii+609 pp.
  6. Gevers M. System identification without Lennart Ljung: what would have been different?, In: Forever Ljung in System Identification; eds. T. Glad, G. Hendeby. Lund, Sweden, Studentlitteratur AB, 2006, pp. 61–85.
  7. Schoukens J., Pintelon R., Rolain Y. Time domain identification, frequency domain identification. Equivalencies! Differences?, In: Proc. 2004 American Control Conf., vol. 1 (30 June 2004 – 02 July 2004). Boston, MA, USA, 2004, pp. 661–666. DOI: https://doi.org/10.23919/ACC.2004.1383679.
  8. Oomen T., van Herpen R., Quist S., et al. Connecting system identification and robust control for next-generation motion control and a wafer stage, IEEE Trans. Contr. Syst. Technol., 2014, vol. 22, no. 1, pp. 102–118. DOI: https://doi.org/10.1109/TCST.2013.2245668.
  9. Dedene N., Pintelon R., Lataire P. Estimation of a global synchronous machine model using a multiple-input multiple-ouput estimator, IEEE Trans. Energy Conver., 2003, vol. 18, no. 1, pp. 11–16. DOI: https://doi.org/10.1109/TEC.2002.805198.
  10. Wei Y., Mantooth A. LLC resonant converter—frequency domain analysis or time domain analysis, In: 2020 IEEE 9th Int. Power Electronics and Motion Control Conf. (IPEMC2020-ECCE Asia; 29 November–02 December, 2020). Nanjing, China, 2020, pp. 552–557. DOI: https://doi.org/10.1109/IPEMC-ECCEAsia48364.2020.9367734.
  11. Rivera D. E., Lee H., Mittelmann H. D., Braun M. W. Constrained mulisine output signals for plant-friendly identification of chemical process systems, J. Process Contr., 2009, vol. 19, no. 4, pp. 623–635. DOI: https://doi.org/10.1016/j.jprocont.2008.08.006.
  12. Peeters B., Ventura C. E. Comparative study of modal analysis techniques for bridge dynamic characteristics, Mech. Syst. Signal Process., 2003, vol. 17, no. 5, pp. 965–988. DOI: https://doi.org/10.1006/mssp.2002.1568.
  13. Westwick D. T., Kearney R. E. Identification of Nonlinear Physiological Systems. Piscataway, N.J., Wiley-IEEE Press, 2003, xii+261 pp. DOI: https://doi.org/10.1002/0471722960.
  14. Semushin I., Tsyganova J., Kulikova M., et al. Identification of human body daily temperature dynamics via minimum state prediction error method, In: Proc. 2016 Europ. Control Conf. (ECC2016; June 29–July 1, 2016). Aalborg, Denmark, 2016, pp. 2429–2434. EDN: YVHZMP. DOI: https://doi.org/10.1109/ECC.2016.7810654.
  15. Semoushin I. V. Identifying parameters of linear stochastic differential equations from incomplete noisy measurements, In: Recent Development in Theories and Numerics (International Conference on Inverse Problems; January 9–12, 2002); eds. H. Yiu-Chung, Y. Masahiro, C. Jin, L. June-Yub. River Edge, NJ,World Scientific032.62073, 2003, pp. 281–290. DOI: https://doi.org/10.1142/9789812704924_0026.
  16. Semushin I. V., Tsyganova J. V. Reducing computational complexity for DBZF precoding in xDSL downlinks, J. Phys.: Conf. Ser., 2018, vol. 1096, 012159. EDN: OHQTID. DOI: https://doi.org/10.1088/1742-6596/1096/1/012159.
  17. Düngen M. Crosstalk Mitigation Techniques for Digital Subscriber Line Systems, Dissertation (Dr.-Ing.). Hamburg, Technische Universität Hamburg, 2016, 160 pp. DOI: https://doi.org/10.15480/882.1293.
  18. Begović A., Škaljo N., Behlilović N. A simple model of copper local loop for use in current DSL application, In: Proc. 2015 23rd Telecommunications Forum Telfor (TELFOR; November 24–26, 2015). Belgrade, Serbia, 2015, pp. 114–117. DOI: https://doi.org/10.1109/TELFOR.2015.7377427.
  19. Begović A., Škaljo N., Behlilović N. An example of modeling local loop transfer function in DSL environment, In: Proc. ELMAR-2014 (September 10–12, 2014). Zadar, Croatia, 2014, pp. 1–6. DOI: https://doi.org/10.1109/ELMAR.2014.6923362.
  20. Rodrigues R. M., Sales C., Klautau A., et al. Transfer function estimation of telephone lines from input impedance measurements, IEEE Trans. Instrum. Meas., 2012, vol. 61, no. 1, pp. 43–54. DOI: https://doi.org/10.1109/TIM.2011.2157431.
  21. Foubert W., Neus C., Boets P., van Biesen L. Modeling the series impedance of a quad cable for common mode DSL applications, In: Proc. 2008 IEEE Instrumentation and Measurement Technology Conf. (IMTC 2008; May 12–15, 2008). Victoria, BC, Canada, 2008, pp. 250–253. DOI: https://doi.org/10.1109/IMTC.2008.4547040.
  22. Neus C., Boets P., van Biesen L. Transfer function estimation of digital subscriber lines with single ended line testing, In: Proc. 2007 IEEE Instrumentation and Measurement Technology Conf. (IMTC 2007; May 01–03, 2007). Warsaw, Poland, 2007, pp. 1–5. DOI: https://doi.org/10.1109/IMTC.2007.378980.
  23. Bostoen T., Boets P., Zekri M., et al. Estimation of the transfer function of a subscriber loop by means of a one-port scattering parameter measurement at the central office, IEEE J. Sel. Areas Commun., 2002, vol. 20, no. 5, pp. 936–948. DOI: https://doi.org/10.1109/JSAC.2002.1007376.
  24. Semushin I. V. Adaptation in stoshastic dynamic systems—Survey and new results II, Int. J. Commun. Netw. Syst. Sci., 2011, vol. 4, no. 4, pp. 266–285. DOI: https://doi.org/10.4236/ijcns.2011.44032.
  25. Stolle R. Electromagnetic coupling of twisted pair cables, IEEE J. Sel. Areas Commun., 2002, vol. 20, no. 5, pp. 883–892. DOI: https://doi.org/10.1109/JSAC.2002.1007371.
  26. Maybeck P. S. Stochastic Models, Estination, and Control. Vol. 1, Mathematics in Science and Engineering, vol. 141. New York, Academic Press, 1979, xix+423 pp. DOI: https://doi.org/10.1016/s0076-5392(08)62169-4.
  27. Bierman G. J. Factorization Methods for Discrete Sequential Estimation, Mathematics in Science and Engineering, vol. 128. New York, Academic Press, 1977, xvi+241 pp. DOI: https://doi.org/10.1016/s0076-5392(08)x6052-7.
  28. Sima V. Algorithms for Linear-Quadratic Optimization, Pure and Applied Mathematics, Marcel Dekker. New York, Marcel Dekker, 1996, vii+366 pp. DOI: https://doi.org/10.1201/9781003067450.
  29. Fletcher R. Practical Methods of Optimization. Chichester, John Wiley & Sons, 2000, xvii+436 pp. DOI: https://doi.org/10.1002/9781118723203.
  30. Potter J. E., Stern R. G. Statistical filtering of space navigaton measurements, In: Proc. 1963 AIAA Guigance and Control Conf. (AIAA 1963; August 12–14, 1963). Cambridge, MA, 1963, pp. 1–5. DOI: https://doi.org/10.2514/6.1963-333.
  31. Tsyganova J. V. Orthogonalized Array Methods for Parametric Identification of Discrete Linear Stochastic Systems, Diss. Dr. Sci. (Phys. & Math.). Ulyanovsk, Ulyanovsk State Univ., 2017, 400 pp. (In Russian). EDN: IMPPAX.
  32. Kulikova M. V. Methods of Calculation of the Logarithmic Likelihood Function and its Gradient in Kalman Filtering Algorithms, Diss. Cand. Sci. (Phys. & Math.). Ulyanovsk, Ulyanovsk State Univ., 2005, 131 pp. (In Russian). EDN: NNNPKP.
  33. Semushin I. V., Tsyganova J. V. Numerically implementing the API based solution to learn a plant dynamics for stochastic control concurrently, In: Proc. 2020 European Control Conference (ECC 2020; May 12–15, 2020). St. Petersburg, 2020, pp. 1105–1110. EDN: FWJUQS. DOI: https://doi.org/10.23919/ECC51009.2020.9143870.
  34. Kulikova M. V., Semoushin I. V. On the evaluation of log likelihood gradient for Gaussian signals, Int. J. Appl. Math. Stat., 2005, vol. 3, no. 5, pp. 1–14.
  35. Tsyganova J. V., Kulikova M. V. On modern array algorithms for optimal discrete filtering, Vestnik YuUrGU. Ser. Mat. Model. Progr., 2018, vol. 11, no. 4, pp. 5–30 (In Russian). EDN: YOTRJJ. DOI: https://doi.org/10.14529/mmp180401.
  36. Box G. E. P. Some problems of statistics and everyday life, J. Am. Stat. Assoc., 1979, vol. 74, no. 365, pp. 1–4. DOI: https://doi.org/10.1080/01621459.1979.10481600.

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Figure 1. Line section of length \(\Delta l\) for a twisted pair transmission line of full length \(l\)

Download (108KB)
3. Figure 2. Equivalent lumped \(RLCG\)-circuit of a 2-wire transmission line

Download (57KB)
4. Figure 3. The distributed MIMO channel estimation structure. Legend: CE – Channel Estimation; SCO – System Central Office; SI – System Information; CI – Channel Information; CPE – Customer Premises Equipment; \(N\) – the number of customers, \(j=1,2,\ldots,N\)

Download (143KB)
5. Figure 4. A timing diagram for minimizing the average (43) of the indirect performance index (39) by a gradient sequential method (52) using the AIOF as defined in (43)

Download (185KB)
6. Figure 5. Generalized Parametric Identification Scheme by the Active Principle. Legend: ① = (45)≡(46); ② = (31)→(29)→(30)→(31)→(49); ③ = (47); ④ = (53); ⑤ = (54); ⑥ = (55); ⑦ = (52); ⑧ = (56); ⑨ = (57)

Download (124KB)
7. Figure 6. Flow-diagram of works for a software project to implement the Active Principle of parametric system identification in the class of linear, time-invariant, completely observable models. \(AS\) = application-specific and \(SM\) = standard matrix calculations. The paper section numbers and equation numbers (within parentheses) are shown

Download (69KB)

Copyright (c) 2023 Authors; Samara State Technical University (Compilation, Design, and Layout)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies