Nonparametric method for testing the hypothesis of independence of random variables and its application in the analysis of remote sensing data

Anna V. Sharueva; Шаруева Анна Владлимировна

doi:10.31772/2712-8970-2025-26-1-48-59

Nonparametric method for testing the hypothesis of independence of random variables and its application in the analysis of remote sensing data

Authors: Sharueva A.V.¹
Affiliations:
1. Reshetnev Siberian State University of Science and Technology
Issue: Vol 26, No 1 (2025)
Pages: 48-59
Section: Section 1. Computer Science, Computer Engineering and Management
Published: 16.04.2025
URL: https://journals.eco-vector.com/2712-8970/article/view/678597
DOI: https://doi.org/10.31772/2712-8970-2025-26-1-48-59
ID: 678597

Cite item

Full Text

Abstract
Full Text
About the authors
References
Supplementary files
Statistics

Abstract

Testing the hypothesis of independence of random variables is one of the main stages of system analysis of statistical data. Based on its results, a synthesis of effective decision-making algorithms is carried out. The traditional method of testing the hypothesis of independence of random variables is based on the use of the Pearson criterion, which contains a difficult to formalize stage of dividing the range of values of random variables into multidimensional intervals. A method for testing the hypothesis of independence of random variables is proposed, which uses a nonparametric pattern recognition algorithm corresponding to the maximum likelihood criterion. Its application makes it possible to circumvent the problem of decomposing the range of values of random variables into intervals. The idea of the approach is to form a training sample based on the initial statistical data to solve a two-alternative pattern recognition problem. Each class is defined under the assumption of independence or dependence of random variables, which is manifested in the difference in their distribution laws in the classes. Under these conditions, it becomes possible to replace the initial hypothesis with the task of checking the reliability of the difference in the probabilities of pattern recognition errors in classes. Using the apparatus of graph theory, the proposed method is developed in the formation of sets of independent random variables. The obtained results are generalized when testing the hypothesis of independence of random variables for large volumes of statistical data based on compression of the original information. This allows to increase the computational efficiency of the problem being solved. The article substantiates a method for testing the hypothesis of independence of random variables, based on the use of a nonparametric pattern recognition algorithm in conditions of large volumes of statistical data. The results of comparing the technique with the generally recognized Pearson consensus criterion in the study of ambiguous dependencies between random variables of varying complexity are presented. The effectiveness of the proposed method is confirmed by the results of its application in processing remote sensing information from anthropogenic territories in the vicinity of the city of Krasnoyarsk.

Keywords

testing the hypothesis of independence of random variables, kernel probability density estimation, regression probability density estimation, pattern recognition, Pearson criterion, remote sensing

Full Text

Introduction

The universal and generally accepted criterion for testing hypotheses about the distributions of random variables, including their independence, is the Pearson criterion [1]. When using it, it is necessary to solve the problem of partitioning the area of values of random variables into multivariate intervals and to establish the law of distribution of the criterion that determines the dependences between the probabilistic characteristics of random variables. In [2 $-$ 4] a new approach is proposed that allows simplifying the test of the hypothesis of independence of random variables using a nonparametric algorithm of nuclear-type pattern recognition corresponding to the maximum likelihood criterion. The idea of the approach is to solve a two-alternative problem of pattern recognition. The classes being considered are defined by assumptions about dependence and independence of random variables. On this basis, a training sample is formed from the initial statistical data on observations of random variables and the problem of pattern recognition is solved. The ratio between the estimates of recognition error probabilities of the introduced classes confirms or refutes the hypothesis being considered.

The purpose of this paper is to generalise and develop a nonparametric method of testing the hypothesis of independence of random variables for conditions of the large volume of statistical data and its application in the analysis of information on remote sensing of anthropogenic territories.

Methodology for testing the hypothesis of independence of random variables

Let there be a sample $V = (x^{i}, i = \bar{1, n})$ of the $n$ volume, composed of independent observations of a two-dimensional random variable $x = (x_{1}, x_{2})$ . Let us suppose that the $V$ sample is drawn from the general population characterised by the densities of the probabilities $p (x_{1}) p (x_{2})$ or $p (x_{1}, x_{2})$ . On the basis of statistical data of $V$ it is necessary to test the hypothesis

$H_{0} : p (x_{1}, x_{2}) \equiv p (x_{1}) p (x_{2})$

of independence of random variables $x_{1}$ , $x_{2}$ .

To test the $H_{0}$ hypothesis let us solve the two-alternative problem of pattern recognition. By classes $Ω_{1}$ , $Ω_{2}$ areas of definition for probability densities $p (x_{1}) p (x_{2})$ , $p (x_{1}, x_{2})$ are meant. Under these conditions, the Bayesian decision rule corresponding to the maximum likelihood criterion has the following form

$m (x) : \{\begin{cases} x \in Ω_{1}, if p (x_{1}, x_{2}) < p (x_{1}) p (x_{2}), \\ x \in Ω_{2}, if p (x_{1}, x_{2}) > p (x_{1}) p (x_{2}) . \end{cases}$

In contrast to the traditional formulation of the pattern recognition problem, while synthesizing a decisive rule $m (x)$ there is no a priori training sample containing information about the belonging of the $V$ sample elements to one or another class. This information must be discovered in the process of implementation of the $H_{0}$ hypothesis testing methodology, which is based on the following actions.

From the $V$ sample recover probability densities $p (x_{1}, x_{2})$ , $p (x_{1}) p (x_{2})$ , using their non-parametric Rosenblatt $-$ Parzen type estimates [5; 6],

$\bar{p} (x_{1}, x_{2}) = \frac{1}{n c_{1} c_{2}} \sum_{i = 1}^{n} F (\frac{x_{1} - x_{1}^{i}}{c_{1}}) F (\frac{x_{2} - x_{2}^{i}}{c_{2}})$ ,

$\bar{p} (x_{1}) \bar{p} (x_{2}) = \frac{1}{n^{2} c_{1} c_{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} F (\frac{x_{1} - x_{1}^{i}}{c_{1}}) F (\frac{x_{2} - x_{2}^{j}}{c_{2}})$ .

In the statistics $\bar{p} (x_{1}, x_{2})$ , $\bar{p} (x_{1}) \bar{p} (x_{2})$ nuclear functions $F (u_{v})$ satisfy the conditions of positivity, symmetry and normalization.

The values of blurring coefficients $c_{v}$ , $v$ = 1, 2 of the nuclear functions decrease as the $n$ volume of the $V$ sample of statistical data increases. Then the nonparametric decision rule for classification of random variables $x = (x_{1}, x_{2})$ is written as follows

$\bar{m} (x) : \{\begin{cases} x \in Ω_{1}, if \bar{p} (x_{1}, x_{2}) < \bar{p} (x_{1}) \bar{p} (x_{2}), \\ x \in Ω_{2}, if \bar{p} (x_{1}, x_{2}) > \bar{p} (x_{1}) \bar{p} (x_{2}) . \end{cases}$

The optimal blurring coefficients of the nuclear functions of the $\bar{m} (x)$ decision rule are chosen on the basis of the analysis of approximation properties of nonparametric estimates of probability densities $\bar{p} (x_{1}, x_{2})$ , $\bar{p} (x_{1})$ , $\bar{p} (x_{2})$ from the minimum condition, their corresponding estimates of standard deviations from the $p (x_{1}, x_{2})$ , $p (x_{1})$ , $p (x_{2})$ . For example, for $\bar{p} (x_{1})$ such a criterion is [7 $-$ 11]

$\int_{- \infty}^{\infty} {\bar{p}}^{2} (x_{1}) d x_{1} - \frac{2}{n} \sum_{j ='}^{n} \bar{p} (x_{1}^{j})$ .

Let us define the estimates of probabilities of pattern recognition errors ${\bar{ρ}}_{1} (\bar{c} (1))$ , ${\bar{ρ}}_{2} (\bar{c} (2))$ using the $\bar{m} (x)$ dsision rule on the basis of raw statistical data of $V$ at optimal blurring coefficients $\bar{c} (1) = ({\bar{c}}_{1} (1), {\bar{c}}_{2} (1))$ , $\bar{c} (2) = ({\bar{c}}_{1} (2), {\bar{c}}_{2} (2))$ of the nuclear functions of statistics $\bar{p} (x_{1}) \bar{p} (x_{2})$ , $\bar{p} (x_{1}, x_{2})$ respectively.

The values ${\bar{ρ}}_{t} (\bar{c} (1), \bar{c} (2))$ are calculated in the ‘rolling examination’ mode on the $V$ sample assuming that its elements belong to the $Ω_{t}$ class.

${\bar{ρ}}_{t} (\bar{c} (1), \bar{c} (2)) = \frac{1}{n} \sum_{j = 1}^{n} 1 (δ (j), \bar{δ} (j)), t = 1, 2$ ,

where $δ (j) = t$ are designations of the type of $x^{t} = (x_{1}^{t}, x_{2}^{t}) \in Ω_{t}$ ;

$\bar{δ} (j) = \{\begin{cases} t, if x^{j} \in Ω_{t} \\ 0, if x^{j} \notin Ω_{t} \end{cases}$

$-$ «solving» the algorithm of $\bar{m} (x)$ about the belonging of the $x^{j}$ situation to on of the classes $Ω_{t}$ , $t = 1, 2$ .

While calculating ${\bar{ρ}}_{t} (\bar{c} (1), \bar{c} (2))$ in accordance with the ‘rolling examination’ methodology the situation $x^{j} = (x_{1}^{j}, x_{2}^{j})$ from the $V$ sample, which is fed into the algorithm of $\bar{m} (x)$ for control, is excluded from the process of producing statistics $\bar{p} (x_{1}, x_{2})$ , $\bar{p} (x_{1}) \bar{p} (x_{2})$ .

The indicator function is defined by the expression

$1 (δ (j), \bar{δ} (j)) = \{\begin{cases} 0, i f δ (j) = \bar{δ} (j), \\ 1, i f δ (j) \neq \bar{δ} (j) . \end{cases}$

Let us denote by ${\bar{\bar{ρ}}}_{t}$ the value of the estimation of the probability of pattern recognition error assuming that the $V$ sample elements belong to the class $Ω_{t}$ , $t = 1, 2$ . Let us compare the values ${\bar{\bar{ρ}}}_{1}$ , ${\bar{\bar{ρ}}}_{2}$ .

Then the $H_{0}$ hypothesis is valid if ${\bar{\bar{ρ}}}_{1}$ < ${\bar{\bar{ρ}}}_{2}$ . Otherwise, at ${\bar{\bar{ρ}}}_{2}$ < ${\bar{\bar{ρ}}}_{1}$ the random variables $x_{1}$ and $x_{2}$ are independent.

When the $n$ volume of the $V$ smple is limited, the problem of confidence estimation of probabilities of pattern recognition errors arises. For its solution, the traditional methodology of confidence estimation of probabilities or Kolmogorov $-$ Smirnov criterion is used.

For example, when using the Kolmogorov $-$ Smirnov criterion, the deviation ${\bar{D}}_{12} = |{\bar{\bar{ρ}}}_{1} - {\bar{\bar{ρ}}}_{2}|$ is compared to the threshold value [12]

$D_{β} = \sqrt{- \ln (\frac{β}{2}) / n}$ .

Here $β$ is a probability (risk) of rejecting the hypothesis ${\bar{H}}_{0}$ : $ρ_{1} = ρ_{2}$ . If the ratio ${\bar{D}}_{12}$ < $D_{β}$ is satisfied, then the ${\bar{H}}_{0}$ hypothesis is valid and the risk of rejecting it does not exceed the value ofβ. At ${\bar{D}}_{12}$ > $D_{β}$ the ${\bar{H}}_{0}$ hypothesis rejected.

Formation of sets of independent random variables

There is a sample of observations $V = (x_{v}^{i}, v = \bar{1, k}, i = \bar{1, n})$ of the $n$ volume composed of statistically independent observations of the components of the multivariate random variable $x = (x_{v}, v = \bar{1, k})$ . The type of the $p (x)$ probability density function is unknown a priori. It is necessary according to the statistics of $V$ , using the hypothesis testing criterion proposed above [13 $-$ 16]

$H_{v j} : p (x_{v}, x_{j}) \equiv p (x_{v}) p (x_{j})$

For the components $x_{v}$ , $v = \bar{1, k}$ , $x_{j}$ , $j = \bar{1, k}$ , $v > j$ , to form the sets of the independent random variables $x (t) = (x_{v}, v \in I_{t})$ , $t = \bar{1, m}$ . The $m$ number of sets of components of the random variable x is unknown, and $I_{t}$ is a set of component numbers that make up the set $x (t)$ .

The proposed methodology is based on performing the following steps:

In accordance with the above recommendations, to test the $H_{v j}$ hypotheses for each pair of the components $(x_{v}, x_{j})$ of the multivariate random variable $x = (x_{v}, v = \bar{1, k})$ . The number of such pairs corresponds to the value $k (k - 1) / 2$ .
Based on the results of step 1, construct an information graph $G (X, A)$ , where $X$ is a set of its vertices corresponding to the components of the random variable $x$ , and $A$ is a set of edges. Between the two vertices $x_{v}$ , $x_{j}$ there is an edge if the $H_{v j}$ hypothesis is satisfied, i.e. the components $x_{v}$ , $x_{j}$ are independent.
Analyse the information graph $G (X, A)$ and determine its complete subgraphs $G (X_{t}, A_{t})$ , $t = \bar{1, m}$ . Each vertice of the subgraph $G (X_{t}, A_{t})$ has an edge if the components of the random variable x are independent. Detect complete subgraphs using algorithms for cutting the original graph, which are based on analysing its adjacency matrix. The components $x_{v}$ , $v \in I_{t}$ correwsponding to the vertices of the complete subgraph $G (X_{t}, A_{t})$ form a set of independent random variables.

Modification of the method of testing the hypothesis of independence of random variables in conditions of large volumes of statistical data

With large $n$ volumes of the statistical data $V = (x_{1}^{i}, x_{2}^{i}, i = \bar{1, n})$ regression estimates of probability densities $\bar{p} (x_{1}, x_{2})$ , $\bar{p} (x_{1})$ , $\bar{p} (x_{2})$ are used in the proposed methodology. These estimates are based on the compression of the original information, e.g., $V_{1} = (x_{1}^{i}, i = \bar{1, n})$ into the data array ${\bar{V}}_{1} = ({\bar{p}}_{1}^{j}, z^{j}, j = \bar{1, N})$ by decomposing the area of values $x_{1}$ into $N$ intervals. Here $z^{j}$ are centres of sampling intervals of values $x_{1}$ , and ${\bar{p}}_{1}^{j} = {\bar{P}}_{1}^{j} / Δ$ is probability density estimation in the $j$ ^th interval; $Δ$ is a sampling interval length; ${\bar{P}}_{1}^{j}$ is frequency of occurrence of the $x_{1}^{i}$ values from the $V_{1}$ sample in the interval numbered $j$ . Then the regression estimate of the probability density function $p (x_{1})$ according to ${\bar{V}}_{1}$ has the form [17; 18]

$\bar{p} (x_{1}) = \frac{1}{с_{1}} \sum_{j = 1}^{N} {\bar{P}}_{1}^{j} Φ (\frac{x_{1} - z^{j}}{c_{1}})$ .

The proposed approach allows reducing by orders of magnitude the $n$ volume of initial statistical information when estimating probability densities. The peculiarity of the statistics of the $\bar{p} (x_{1})$ type allows simplifying considerably the choice of coefficients of $с$ blurring of nuclear functions in the $\bar{p} (x_{1})$ statistics from the condition of minimum criterion

$\frac{1}{N} \sum_{i = 1}^{N} ({\bar{p}}_{1}^{i} - \bar{p} (x_{1}^{i}))$ .

By analogy the estimation of the probability densities $p (x_{2})$ , $p (x_{1}, x_{2})$ is carried out. Regression estimates of probability densities are used in testing the hypothesis of independence of random variables according to the proposed methodology.

Analysing the results of the computational experiment

The effectiveness of the proposed method of testing the hypothesis of independence of two-dimensional random variables and Pearson's criterion in the conditions of ambiguous dependences at different volumes of statistical data has been compared [19 $-$ 21]. The sensors of random variables $x_{1}$ , $x_{2}$ were formed on the basis of the uniform distribution law $x_{1}$ , which was used in the calculation of the values of $x_{2}$ in the form of nonlinear transformations $x_{1}$ . At the same time the values of $x_{2}$ were superimposed with disturbances with the normal distribution law, which has zero mathematical expectation and standard deviation $σ$ . An example of the values of random variables $x_{1}$ and $x_{2}$ is shown in Fig. 1.

Рис. 1. Значения случайных величин x₁, x₂ из выборки исходных статистических данных V при n = 500 и σ = 0,5 (темные точки), а при σ = 2 (серые точки) при использовании зависимостей различной сложности

Fig. 1. Values x₁, x₂of random variables from a sample of initial statistical data V at n = 500 and σ = 0.5 (dark dots), and at σ = 2 (grey dots) when using dependencies of varying complexity

When testing the independence hypothesis of a two-dimensional random component based on the Pearson criterion, the results of the optimal selection of the number of sampling intervals are used [22 $-$ 24]

$N^{*} = {(\frac{3}{4} Δ_{1} Δ_{2} {‖p (x_{1}, x_{2})‖}^{2} n)}^{1 / 2}$ .

The value ${‖p (x_{1}, x_{2})‖}^{2} = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} p^{2} (x_{1}, x_{2}) d x_{1} d x_{2}$ , and $Δ_{v}$ is the length of the interval between values of the random value $x_{v}$ , $v$ = 1.2. The works [25 $-$ 27] are devoted to the traditional formulas of discretization of the range of values of random quantities.

By the results of computational experiment the offered methodology and Pearson's criterion at the analysis of ambiguous dependences between random variables in conditions of relatively small volumes of statistical data and mean square deviations σ of interferences are comparable and unmistakably determine dependence of random variables. This conclusion does not hold for the dependence between random variables (Fig. 1, a), when the Pearson criterion does not establish dependence under the conditions n = 100 and σ $\in$ [0.5; 2]. As σ increases, the efficiency of the criteria being compared decreases. This fact is explained by the peculiarities of ambiguous dependences and large values of σ, when the area of definition of random variables hides the desired dependence. With the increase in the n volume of initial data the efficiency of the compared criteria for testing the hypothesis of independence of random variables increases. This conclusion is expected, since asymptotic properties of nonparametric estimates of probability densities and frequencies of occurrence of random variables in their two-dimensional intervals rise as n increases. The advantage of the proposed methodology for testing the hypothesis of independence of random variables is observed at small values of σ, limited and large n. At large n and σ, the advantage of Pearson's criterion is often revealed if the procedure of optimal discretisation of the area of values of a two-dimensional random variable is used [22].

Application of the proposed methodology in analysing remote sensing data

The developed methodology was tested when analysing the remote sensing data [2; 28]. The object of the study is anthropogenic territories (quarry, suburban development) in the vicinity of the city of Krasnoyarsk. The initial information was formed on the fragments of Sentinel-2 satellite imagery on 26.08.2021 (Fig. 2). The spectral channels $x_{j}$ , $j = \bar{1, 9}$ were used. These channels are characterised by wavelengths (nanometres): $x_{1}$ $-$ (458 $-$ 523), $x_{2}$ $-$ (543 $-$ 578), $x_{3}$ $-$ (650 $-$ 680), $x_{4}$ $-$ (698 $-$ 713), $x_{5}$ $-$ (733 $-$ 748), $x_{6}$ $-$ (773 $-$ 793), $x_{7}$ $-$ (785 $-$ 899), $x_{8}$ $-$ (1565 $-$ 1655), $x_{9}$ $-$ (2100 $-$ 2280).

Рис. 2. Фрагменты спутниковой съемки Sentinel-2. Антропогенные территории:

a $-$ карьер; b $-$ пригородная застройка

Fig. 2. Fragments of Sentinel-2 satellite imagery. Anthropogenic territories:

a $-$ quarry; b $-$ suburban development

The proposed methodology allows forming pairs of independent and dependent random variables by changing the ratio between their parameters. The application of the methodology allowed us to detect 31 and 29 pairs of spectral features with strong linear dependence for the objects ‘quarry’ and ‘suburban development’, respectively. The obtained results are presented in Fig. 3.

Рис. 3. Иллюстрация сильной линейной зависимости между парами спектральных признаков (x_i, x_j), характеризующихся оценками коэффициентов корреляции больше 0,9:

a $-$ карьер; b $-$ пригородная застройка

Fig. 3. Illustration of a strong linear relationship between pairs of spectral features (x_i, x_j) characterized by correlation coefficient estimates greater than 0.9:

a $-$ quarry; b $-$ suburban development

Additionally, non-linear dependences between spectral features were found for the object ‘quarry’

$(x_{1}, x_{9})$ , $(x_{1}, x_{8})$ , $(x_{1}, x_{7})$ , $(x_{1}, x_{5})$ , $(x_{1}, x_{4})$

and the object ‘suburban development’

$(x_{7}, x_{9}), (x_{4}, x_{9}), (x_{3}, x_{9}), (x_{2}, x_{9}), (x_{1}, x_{9}), (x_{1}, x_{8}), (x_{1}, x_{7})$ .

The obtained results are reliable for all pairs of spectral features, since the condition $|{\bar{ρ}}_{1} - {\bar{ρ}}_{2}|$ > $D_{β}$ is met at $D_{β}$ = 0.029 and the risk $β$ = 0.025 reject the $H_{0}$ hypothesis of equality of values $ρ_{1}$ , $ρ_{2}$ .

The problem of detecting anthropogenic areas from spectral data is considered. The error of their recognition in the space of spectral features $x = (x_{j}, j = \bar{1, 9})$ based on the training sample $V = (x^{i}, σ (i), i = \bar{1, n})$ is equal to 0.012, where $n = n_{1} + n_{2}$ , $n_{1}$ = 3377 (‘quarry’, $σ (i)$ = 1), $n_{2}$ = 5049 (‘Suburban Development’ $σ (i)$ = 2). When excluding from the training sample, for example, the spectral features $(x_{4}, x_{5})$ , $(x_{5}, x_{6})$ , $(x_{4}, x_{5}, x_{6})$ the estimates of pattern recognition errors correspond to the values 0.011; 0.01; 0.008. The obtained reduction in pattern recognition errors is not reliable compared to the error estimate in feature space $x_{j}$ , $j = \bar{1, 9}$ . Nevertheless, the obtained result justifies the possibility of reducing spectral features in the synthesis of decision-making algorithms and simplifying their optimisation.

Conclusion

The methodology of testing the hypothesis of independence of pairs of random variables, based on the use of nonparametric algorithm of pattern recognition, allows bypassing the problem of discretisation of the area of the values of random variables into multidimensional intervals. This problem is inherent in the generally recognised Pearson criterion. The conditions of competence of the proposed method and Pearson's criterion in the analysis of unambiguous and ambiguous dependences between random variables are determined. Using the apparatus of graph theory, the proposed method is developed in the formation of sets of independent random variables. The obtained results are generalised in testing the hypothesis of independence of random variables for large volumes of statistical data on the basis of compression of initial information, which allows increasing by orders of magnitude the computational efficiency of the problems being solved. The effectiveness of the proposed methodology is confirmed when analysing remote sensing data of anthropogenic territories and assessing their states. In the presence of a set of spectral features characterised by a strong linear dependence between its pairs, it is possible to reduce the number of spectral features in the recognition of anthropogenic territories with a decrease in the estimate of the probability of error in their recognition.

About the authors

Anna V. Sharueva

Reshetnev Siberian State University of Science and Technology

Author for correspondence.
Email: anna-denisyuk@yandex.ru
ORCID iD: 0009-0003-4255-4554

head of the remote sensing laboratory, assistant

Russian Federation, 31, Krasnoyarskii rabochii prospekt, Krasnoyarsk, 660037

References

Sinitsyna I. N. Akademik Pugachev Vladimir Semenovich: k stoletiyu so dnya rozhdeniya [Academician Vladimir Semenovich Pugachev: on the centenary of his birth]. Moscow, Torus Press Publ., 2011, 376 p.
Sharueva A. V., Lapko A. V., Lapko V. A. Neparametricheskiye metody proverki gipotez o raspredeleniyakh sluchaynykh velichin pri analize dannykh distantsionnogo zondirovaniya [Nonparametric methods for testing hypotheses about distributions of random variables in the analysis of remote sensing data]. Novosibirsk, SO RAN Publ., 2024, 189 p.
Lapko A. V., Lapko V. A. Testing the Hypothesis of the Independence of Two-Dimensional Random Variables Using a Nonparametric Algorithm for Pattern Recognition. Optoelectronics, Instrumentation and Data Processing. 2021, Vol. 57, No. 2, P. 149–155.
Lapko A. V., Lapko V. A., Bakhtina A. V. Study of the Method for Verification of the Hypothesis on Independence of Two-Dimensional Random Quantities Using a Nonparametric Classifier. Optoelectronics, Instrumentation and Data Processing. 2022, Vol. 57, No. 6, P. 639–648.
Parzen E. On estimation of a probability density function and mode. Annals of Mathematical Statistics. 1962, Vol. 33, Nо. 3, P. 1065-1076.
Epanechnikov V. A. [Non-parametric estimation of a multivariate probability density]. Theory of Probability & Its Applications. 1969, Vol. 14, No. 1, P. 156–161 (In Russ.).
Lapko A. V., Lapko V. A. Analysis of optimization methods for nonparametric estimation of the probability density with respect to the blur factor of kernel functions. Measurement Techniques. 2017, Vol. 60, No. 6, P. 515–522.
Lapko A. V., Lapko V. A. Yadernye otsenki plotnosti veroyatnosti i ikh primenenie [Kernel probability density estimates and their applications]. Krasnoyarsk, SibGU im. M.F. Reshetnev Publ., 2021, 208 p.
Rudemo M. Empirical choice of histogram and kernel density estimators. Scandinavian Journal of Statistics. 1982, No. 9, P. 65–78.
Bowman A. W. A comparative study of some kernel-based non-parametric density estimators. Journal of Statistical Computation and Simulation. 1982, Vol. 21, P. 313–327.
Hall P. Large-sample optimality of least squares cross-validation in density estimation. Annals of Statistics. 1983, Vol. 11, P. 1156–1174.
Sharakshane, A. S., Zheleznov I. G., Ivnitskii V. A. Slozhnye sistemy [Complex systems]. Moscow, Vysshaya Shkola Publ., 1977, 248 p.
Lapko A. V., Lapko V. A., Bakhtina A. V. Formation of Sets of Independent Components of a Multidimensional Random Variable Based on a Nonparametric Pattern Recognition Algorithm. Measurement Techniques. 2021, Vol. 64, No. 9, P. 689–696.
Zenkov I. V., Lapko A. V., Lapko V. A., Kiryushina E. V., Vokin V. N. Nonparametric pattern recognition algorithm for testing a hypothesis of the independence of random variables. Computer Optics. 2021, Vol. 45, No 5, P. 767–772.
Zenkov I. V., Lapko A. V., Lapko V. A., Kiryushina E. V., Vokin V. N., Bakhtina A. V. A method of sequentially generating a set of components of a multidimensional random variable using a nonparametric pattern recognition algorithm. Computer Optics. 2021, Vol. 45, No. 6, P. 926–933.
Lapko A. V., Lapko V. A., Sharueva A. V. Neparametricheskii algoritm raspoznavaniya obrazov v zadache formirovaniya naborov nezavisimykh sluchainykh velichin [A nonparametric pattern recognition algorithm in the problem of forming sets of independent random variables]. Informatika i sistemy upravleniya. 2024, Vol. 79, No. 1, P. 81–90 (In Russ.).
Lapko A. V., Lapko V. A. Regressionnaya otsenka plotnosti veroyatnosti i ee svoistva [Regression estimation of probability density and its properties]. Sistemy upravleniya i informatsionnye tekhnologii. 2012, No. 3-1 (49), P. 152-156 (In Russ.).
Lapko A. V., Lapko V. A. Regression estimate of the multidimensional probability density and its properties. Optoelectronics, Instrumentation and Data Processing. 2014, Vol. 50, No. 2, P. 148-153.
Lapko A. V., Lapko V. A., Bakhtina A. V. Application of a nonparametric pattern recognition algorithm to the problem of testing the hypothesis of the independence of variables of multi-valued functions. Measurement Techniques. 2022, Vol. 65, No. 1, P. 17–23.
Lapko A. V., Lapko V. A., Bakhtina A. V. Comparison of the Methodology for Hypothesis Testing of the Independence of Two-Dimensional Random Variables Based on a Nonparametric Classifier. Scientific and Technical Information Processing. 2023, Vol. 50, No. 6, P. 572–581.
Lapko A. V., Lapko V. A., Bakhtina A. V. Comparison of Methods for Testing the Hypothesis of Independence of Random Variables Based on a Nonparametric Classiﬁer and Pearson's Chi-Squared Test. Optoelectronics, Instrumentation and Data Processing, 2023, Vol. 59, No. 5, P. 551–560.
Lapko A. V., Lapko V. A. Selection of the Optimal Number of Intervals Sampling the Region of Values of a Two-Dimensional Random Variable. Measurement Techniques. 2016, Vol. 59, No. 2, P. 122–126.
Lapko A. V., Lapko V. A. Discretization method for the range of values of a multi-dimensional random variable. Measurement Techniques. 2019, Vol. 62, No. 1, P. 16–22.
Lapko A. V., Lapko V. A. Estimation of parameters of the formula for optimal discretization of the range of values of a two-dimensional random variable. Measurement Techniques. 2018, Vol. 61, No. 5, P. 427–433.
Sturges H. A. The choice of a class interval. Journal of the American Statistical Association, 1926, Vol. 21, P. 65-66.
Heinhold I., Gaede K. W. Ingeniur statistic. München, Springler Verlag, 1964, 352 p.
Scott D. W. Multivariate Density Estimation: Theory, Practice, and Visualization. New Jersey, John Wiley & Sons, 2015, 384 p.
Lapko A. V., Lapko V. A., Bakhtina A. V. Application of a nonparametric procedure for testing the hypothesis about the independence of random variables given a large amount of statistical data. Measurement Techniques, 2024, Vol. 66, P. 744–754.

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

2. Fig. 1. Values x1, x2 of random variables from a sample of initial statistical data V at n = 500 and σ = 0.5 (dark dots), and at σ = 2 (grey dots) when using dependencies of varying complexity

Download (86KB)

Indexing metadata

3. Fig. 2. Fragments of Sentinel-2 satellite imagery. Anthropogenic territories: a quarry; b suburban development

Download (62KB)

Indexing metadata

4. Fig. 3. Illustration of a strong linear relationship between pairs of spectral features (xi, xj) characterized by correlation coefficient estimates greater than 0.9: a quarry; b suburban development

Download (45KB)

Indexing metadata

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register