Application of cluster analysis methods for dynamic search space adjustment in genetic algorithms

Ivan Pavlovich Malashin; Малашин Иван Павлович; Vadim Sergeevich Tynchenko; Тынченко Вадим Сергеевич

doi:10.31772/2712-8970-2025-26-3-318-333

Application of cluster analysis methods for dynamic search space adjustment in genetic algorithms

Autores: Malashin I.P.¹, Tynchenko V.S.²
Afiliações:
1. Bauman Moscow State Technical University (National Research University)
2. Reshetnev Siberian State University of Science and Technology
Edição: Volume 26, Nº 3 (2025)
Páginas: 318-333
Seção: Section 1. Computer Science, Computer Engineering and Management
##submission.datePublished##: 30.09.2025
URL: https://journals.eco-vector.com/2712-8970/article/view/692485
DOI: https://doi.org/10.31772/2712-8970-2025-26-3-318-333
ID: 692485

Citar

Texto integral

Resumo
Texto integral
Sobre autores
Bibliografia
Arquivos suplementares
Estatísticas

Resumo

This study investigates the application of cluster analysis techniques to improve the efficiency of genetic algorithms (GAs) in solving multidimensional global optimization problems, particularly those relevant to the aerospace industry. The research focuses on dynamic search space adjustment in GAs through statistical filtering of individual clusters. The proposed methodology involves: (1) developing a dynamic correction approach for variable domains by partitioning the population into clusters using both fixed-number clustering algorithms (k-means, k-medians, agglomerative, and spectral clustering) and density-based methods (DBSCAN); (2) evaluating cluster quality metrics including population size and average fitness; and (3) eliminating clusters that contribute insignificantly to the evolutionary process. The primary objective is to enhance algorithm convergence speed by 25–30 % (as demonstrated in benchmark testing) while maintaining solution quality in mixed optimization problems through effective search space adaptation at each iteration. The three-stage method comprises: (1) current population clustering, (2) elimination of clusters with below-average population size and fitness, and (3) dynamic boundary adjustment for remaining individuals' domains. Experimental results demonstrate the method's potential for integration into aerospace design systems, significantly reducing computation time while improving parameter optimization accuracy. Furthermore, the approach shows promise for hyperparameter optimization in various machine learning models, particularly in neural network architecture synthesis - including deep neural networks and specialized topologies.

Palavras-chave

evolutionary computation, global optimization, genetic algorithms, search space adaptation, cluster analysis

Texto integral

Introduction

Genetic algorithms (GAs) have proven to be a versatile tool for solving complex global optimisation problems in various fields of science and technology. In the rocket and space industry they are used for designing control systems [1], optimising the parameters of propulsion systems, aerodynamic shapes of wings and fairings, as well as flight trajectories and life support systems for spacecraft [2] and in many other tasks. The classical implementation of GAs supposes fixed search areas for each variable, which in the early stages can lead to excessive exploration of unpromising areas of n-dimensional space and slow the convergence process.

One of the priorities of the latest modifications has become reducing the amount of computation by allocating resources more economically and focusing on truly significant areas of the solution space. In order to achieve this, various approaches are used: the adaptation of parameters during evolution, training the selection of operator variants based on data from previous generations and dynamic search localisation that takes into account the current distribution of solutions. All of those things can allow accelerating convergence and increasing search efficiency without a significant increase in computational load.

This paper proposes a method for dynamic correction of the search area of a genetic algorithm, which differs from known methods in its implementation of an iterative population clustering procedure followed by the removal of unpromising clusters, which allows adapting effectively the search area and increasing convergence speed to the optimal solution. The essence of the method is that after each selection step, the population is divided into groups using one of the clustering algorithms. The article studies the application of five approaches: with a predetermined number of clusters – k-means, k-median, agglomerative and spectral clustering, as well as the DBScan approach with a dynamically determined number of clusters. For each cluster, the number of individuals and their average suitability are calculated. The clusters that simultaneously did not reach threshold values for the search space correction and stopping criteria are completely excluded from further search; and the variable ranges are adjusted taking into account the distribution of the remaining individuals.

The main aim of the study is increaing the convergence rate without the decline in the quality of the found solution, which is important for complex engineering calculations in the field of rocket and space system design. The article presents a mathematical formulation of the problem, description of the clustering and filtering procedure, as well as the results of computational experiments on standard test functions that have 25–30 % saving in computational resources compared to a classical GA.

Review of existing methods

In recent years, there have been many modifications for GAs aimed at increasing their flexibility and eliminating premature convergence to local extremes. One of the key trends is implementing an automatic parameter adaptation. Instead of hard-coded mutation and crossing probabilities, modern algorithms evaluate population “diversity” (e.g., through genetic entropy or dispersion) and dynamically change these probabilities. In a number of implementations, the mutation step size itself evolves along with the solution (self-adaptive approach), which eliminates the need for lengthy setup during the preparation phase and helps to avoid premature compression of the population.

Therewith, multi-population or ‘island’ GA models have become increasingly widespread. In a traditional GA, there is a single population, but in modern implementations researchers have begun to create several small subpopulations (‘islands’) at once, each of which evolves relatively autonomously [4]. Individuals periodically migrate between islands: the best individuals flow into other populations. This allows combining the advantages of the parallel exploration of different regions of the solution space and the exchange of high-quality genes [5].

The integration with local optimisation methods has a significant role in the development of GAs, having generated memetic algorithms. The idea is the following: a genetic operator of global search is supplemented by powerful procedural local ‘further research’, for example, gradient descents (where possible) or direct local search algorithms in the absence of information about derivatives. As a result, a GA performs a broad ‘scanning’ of the space; and local algorithms improve the suitability value of prospective individuals [6].

Aside from that, multi-criteria optimisation has become a significant trend for the development of GAs, since in real-world engineering and economic problems it is necessary to take into consideration several competing aims simultaneously. Although the classical methods NSGA-II [7], MOEA/D [8] and SPEA2 [9] have become well established, in recent years specialists have proposed new strategies for better distribution of Pareto front solutions. For example, the adaptive distribution of reference points makes it possible to represent different front areas more evenly; and a dynamic crowding disnance takes into account the local density of solutions: if the solutions are clustered too closely together, the algorithm automatically increases the separation distance, thereby encouraging exploration the areas that are less studied [10].

Surrogate modelling is actively developing, where objective function evaluations are replaced by predictive models (Gaussian Process, random forest, neural networks). In hybrid Surrogate-Assisted GA (SA-GA) schemes, data on the objective function is collected gradually: first, a simple surrogate is constructed, and then, as data accumulates, the algorithm retrains the model [11].

Finally, it is necessary to mention the integration of GAs with neuroevolutionary methods, where an algorithm not only optimises the weights of the neural network, but also its topology. The originally proposed NEAT (NeuroEvolution of Augmenting Topologies [12]) has acqired a new lease of life in combination with modern architectures (convolutional networks, transformers) [13]. In Deep Neuroevolution GA, a mutation operator can add or remove layers of the neural network, and crossover ‘crosses’ architectures, seeking to achieve a compromise between accuracy and compactness. Such approaches are being actively used in AutoML pipelines, where automatic neural network design reduces the time and effort required by engineers for manual setting.

Nevertheless, sufficient attention is not paid to the issues of search area correction [14], which confirms the relevance of the development of modern schemes for increasing the convergence speed of GAs through concentrating computational resources in promising regions.

Mathematical statement of the problem

The problem of the global optimisation of the multidimensional function of real variables on a limited domain is being considered. Let $\vec{X}$ be a vector of variables characterising the search space (1):

$\vec{X} = {x_{i}, i = 1, …, n},$ (1)

where n is the dimensionality of the problem (the number of variables in the objective function), x_i is the ^th variable. For each variable an initial domain is specified (2):

$x_{i} = D_{x_{i}} = [x_{i}^{l}, x_{i}^{h}],$ (2)

where $x_{i}^{l}$ and $x_{i}^{h}$ are lower and upper bounds of the range of the acceptable values for the x_i variable. These bounds define the initial hyperparallelepiped search area in the n-dimensional space.

In the classical versions of genetic algorithms [15; 16], it is assumed that the bounds of remain constant throughout the entire evolution process (3):

$D_{x_{i}} = c o n s t, \forall i = 1, \dots, n .$ (3)

Nevertheless, this assumption may be ineffective, since it does not allow taking into account the changing structure of decision distribution in the search process. This paper proposes and studies the method for dynamically correcting the search area of a genetic algorithm at each generation.

To formalise the proposed approach, let us introduce the following designation:

G is the total number of generations (iterations) of the genetic algorithm.

g is a current generation number, $g = {1,2, \dots, G}$ .

M is the number of individuals (decisions) in the population in each generation.

m is the index of a specific individual in the population, $m = {1,2, \dots, M}$ .

K is the number of clusters obtained after clustering the population.

k is a cluster index, $k = {1,2, \dots, K}$ .

$I n d_{m}^{g}$ is the vector of parameters for the m ^th individual in the g generation.

$F_{I n d_{m}^{g}}$ is the value of the fitness function for the $I n d_{m}^{g}$ individual..

$I n d S e t_{k}^{g}$ is the set of all individuals belonging to the k cluster in the g generation.

$M_{k}^{g} = |I n d S e t_{k}^{g}|$ is the number of individuals in the k cluster in the g generation.

Each k cluster in the g generation is matched its own domain for each x_i variable (Fig. 1), which will be denoted as (4):

$D_{x_{i}}^{g, k} = [x_{i}^{g, k; l}, x_{i}^{g, k; h}],$ (4)

Рис. 1. Схема кластеризации

Fig. 1. Clustering diagram

where $x_{i}^{g, k; l}$ and $x_{i}^{g, k; h}$ are current lower and upper bounds of the x_i variable in the k cluster in the g generation.

The complete search area for the $x_{i}$ variable on thegeneration is defined as the union of all clusters (5):

$D_{x_{i}}^{g} = \cup_{k = 1}^{K} D_{x_{i}}^{g, k} .$ (5)

If a cluster is considered to be ineffective (according to the criteria described below), the corresponding $D_{x_{i}}^{g, k}$ area is excluded from consideration, i.e. it can be assumed that (6):

$D_{x_{i}}^{g, k} = \emptyset .$ (6)

Clustering solutions

In order to icrease the efficiency of searching in the solution space, cluster analysis is used. It allows identifying the promising areas of high-quality solutions for concentrating computational resources. This work uses both methods with a predefined number of clusters (K-Means, K-Medoids, agglomerative and spectral clustering) and a density algorithm that automatically determines the number of groups (DBScan).

Methods with a fixed number of clusters are based on minimising intra-cluster dispersion
(K-Means), selecting medoids for robustness to outliers (K-Medoids), constructing a hierarchical dendrogram (agglomerative clustering), or using the spectral properties of a similarity matrix (spectral clustering). Their main limitation is the necessity to specify the number of groups in advance, which in practical tasks is selected using the ‘elbow method’ [17] or cross-validation based on the quality of solutions.

Density algorithms do not require the number of clusters to be specified in advance. DBScan identifies clusters as areas with a minimum number of neighbours within a distance that is not higher than ε (a parameter specified by a user), automatically discarding sparse points as noise.

Before clustering, it is necessary to normalise the data (bring each feature to a single scale, for example [0.1], and select a metric (Euclidean or Manhattan), since the difference in variable ranges and the method of calculating distances significantly influences the structure of clusters.

In the optimisation algorithm (Fig. 2), the results of cluster partitioning are used for local correction of search area boundaries. In each cluster, corrective offsets are calculated based on the distribution statistics of individuals, which allows narrowing or expanding the search area around promising solutions. Furthermore, the probability of the recombination of the nearest representatives increases within clusters, which helps preserve the local features of the found optima; and computational resources are dynamically redistributed in favour of the clusters with higher solution quality or convergence speed. This approach provides a balanced combination of exploring and using solution space and contributes to faster and more reliable convergence of the algorithm.

Рис. 2. Блок-схема: а – ГА с динамической коррекцией области поиска; b – процедуры динамической коррекции

Fig. 2. Block diagram: а – GA with dynamic correction of search area; b – block diagram of the dynamic correction procedure

Correcting cluster boundaries

After the clusters are formed, the boundaries of the search area in each cluster are corrected based on the analysis of the distribution of individuals within a cluster. Formally, the boundaries of the $D_{x_{i}}^{g, k}$ area are updated as follows (7):

$D_{x_{i}}^{g, k} = [x_{i}^{g, k; l} + Δ_{x_{i}^{g, k; l}}, x_{i}^{g, k; h} + Δ_{x_{i}^{g, k; h}}],$ (7)

where $Δ_{x_{i}^{g, k; l}} \in R$ and $Δ_{x_{i}^{g, k; h}} \in R$ are corrective values, based on which the initial boundaries can be either expanded or narrowed. They can be obtained based on the standard deviation of the values of the x_i variable within the cluster.

Thus, at each generation, the local adaptation of the search area boundaries is performed in order to concentrate computational resources on promising regions of the solution space.

Implementation and discussion

The method was tested on a set of four standard test functions of various dimensions [18]. The effectiveness of the approach is further illustrated using the example of four test functions: Ackley, Beale, Booth and Bukin № 6, as they form a representative sample for the comprehensive evaluation of the effectiveness of optimisation algorithms. The key advantage of this set of functions is that they have the most representative set of properties that pose the greatest complexity for optimisation algorithms.

The Ackley function represents a complex multimodal surface with a large number of local minima, which allows testing the algorithm's ability to avoid premature convergence to suboptimal solutions (8):

$f (x) = - 20 \exp (- 0.2 \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}) - \exp (\frac{1}{n} \sum_{i = 1}^{n} \cos (2 π x_{i})) + 20 + e .$ (8)

The Beale function is highly sensitive to the choice of a starting point (9):

$F (x) = \sum_{i = 1}^{d} [{(1,5 - x_{2 i - 1} + x_{2 i - 1} x_{2 i})}^{2} + {(2,25 - x_{2 i - 1} + x_{2 i - 1} x_{2 i}^{2})}^{2} + {(2,625 - x_{2 i - 1} + x_{2 i - 1} x_{2 i}^{3})}^{2}],$

$x = (x_{1}, \dots, x_{2 d}) \in ℝ^{2 d} .$ (9)

The Booth function, having a relatively simple structure with a single clear-cut global minimum, serves as a test for the basic performance of optimisation methods (10):

$f (x, y) = {(x + 2 y - 7)}^{2} + {(2 x + y - 5)}^{2} .$ (10)

Thanks to the presence of gaps and characteristic ‘ravines,’ the Bukin № 6 function allows evaluating the stability of algorithms to extreme changes in gradient (11):

$f (x, y) = 100 \sqrt{|y - 0,01 x^{2}|} + 0,01 |x + 10| .$ (11)

Fig. 3 shows the contour plots of the above test functions for the case of two variables.

Рис. 3. Изолинии тестовых функций

Fig. 3. Contour plots of test functions

Fig. 4 shows the visualisation of the effectiveness evaluation of the proposed approach using the examples of the above functions for the Ackley, Beale, Booth and Bukin № 6 functions, respectively. The choice is determined by the different nature of these test problems: the Ackley function is traditionally used as a benchmark for testing the scalability of optimisation methods in high-dimensional space; the multidimensional modification of the Beale function allows evaluating the behaviour of the algorithm at medium dimensions with the clear-cut dependence on initial conditions; The Booth and Bukin No. 6 functions are two-dimensional by definition and are used for basic validation of the correctness and stability of the method under conditions of simple (for Booth) and rapidly changing (for Bukin No. 6) destination surface relief. The figure shows the results of applying five clustering methods: k-means, k-medians, DBSCAN, agglomerative clustering and moving average from top to bottom, respectively.

Рис. 4. Диаграммы межквартильного размаха результатов экспериментов. Представлены результаты применения пяти методов кластеризации: k-средних (k-means), k-медианных (k-medians), DBSCAN, агломеративной кластеризации и скользящего среднего (от верхнего к нижнему ряду, соответственно)

Fig. 4. Interquartile (IQR) range diagrams of the experimental results. The figure shows the outcomes of five clustering methods: k-means, k-medians, DBSCAN, agglomerative clustering, and moving average (from top to bottom row, respectively)

It can be seen from Fig. 4 that for the complex multidimensional functions (Ackley, Beale), the best results are achieved when using 3–6 clusters, where the median values of the number of iterations are lower and the spread is smaller. For the two-dimensional functions (Booth, Bukin №. 6), the influence of the parameter is minimal, thus 1–2 clusters are sufficient without any noticeable loss of quality.

Based on the algorithm runs on the two-dimensional test functions described above, it was discovered that the optimal range of clusters for most tasks was between 3 and 6. The k-means method was used to analyse the multidimensional test functions, ensuring stable point separation and accelerating algorithm convergence.

Table 1 shows the average number of evluations of objective functions with standard deviation for different dimensions and numbers of clusters. The functions are selected from a standard set of test functions that can be generalised to higher-dimensional spaces.

Table 1. Average number of evaluations of an objective function (± standard deviation) for the multidimensional test functions with different dimensions and numbers of clusters

Function	Dimension	3 clusters	4 clusters	5 clusters	6 clusters
Sphere	10	1187 ± 132	2295 ± 214	2478 ± 289	2221 ± 367
	20	2644 ± 305	5548 ± 412	4822 ± 497	4133 ± 721
	50	4758 ± 623	10187 ± 807	11643 ± 1002	11387 ± 1154
	100	9442 ± 1189	8921 ± 1030	8737 ± 1655	9845 ± 2033
Rosenbrock	10	3368 ± 247	3124 ± 301	4192 ± 382	4087 ± 472
	20	5278 ± 493	4842 ± 615	5217 ± 786	6462 ± 912
	50	8956 ± 1044	8682 ± 1218	8447 ± 1659	8973 ± 1187
	100	10671 ± 2047	11348 ± 1965	10582 ± 2954	12791 ± 3092
Rastrigin	10	4142 ± 198	4084 ± 153	4226 ± 433	4211 ± 498
	20	6637 ± 527	6438 ± 642	6582 ± 804	6276 ± 986
	50	9672 ± 1810	9452 ± 1551	9911 ± 2792	9374 ± 2025
	100	16941 ± 2647	15482 ± 3082	12653 ± 3467	13278 ± 3065
Ackley	10	3885 ± 107	3851 ± 286	3682 ± 154	3718 ± 449
	20	9154 ± 554	9247 ± 718	10368 ± 601	9012 ± 1802
	50	15934 ± 1087	17614 ± 2762	15761 ± 2634	16286 ± 2089
	100	22184 ± 2794	20378 ± 3358	19756 ± 3841	20187 ± 2712
Stybinsi-Tang	10	4082 ± 103	3738 ± 214	4187 ± 478	3724 ± 251
	20	6326 ± 389	6843 ± 518	7624 ± 967	6557 ± 842
	50	10762 ± 895	10574 ± 1128	11198 ± 1376	10867 ± 1582
	100	20468 ± 1836	21328 ± 1215	21642 ± 2048	18124 ± 3096

It can be seen from Table 1 that in most cases, the smallest number of evaluations of an objective function is achieved with 3–4 clusters, especially for low dimensions. For high dimensions, the effect of the number of clusters is less clear-cut, but the range of 3–6 clusters remains effective.

The material from the used test functions shows that the correct selection of the k number of clusters can dramatically change the number of generations required to converge to the global optimal solution with the proposed modification. In all the cases, the key factors are:

complexity of the objective function landscape (presence of local minima and their density, narrow or wide attraction zones);
adequacy of the choice of k (small k does not allow identifting all ‘hot’ areas, while too large k leads to the overuse of resources);
The efficiency of the procedure for updating search area boundaries in each cluster ensures that the algorithm quickly shifts its focus to promising regions.

This balance between exploring the solution space and using the results of the study makes the proposed modification converge faster than the standard approach.

In addition, for the comparison of the effectiveness of the GA modification proposed in this article, a series of computational experiments were conducted on a set of CEC 2017 test functions [19]. In the experiments, four combinations of algorithms were considered:

standard GA without modifications;
standard GA with the proposed modification;
algorithm SelfCSHAGA (Self-Configuring Success History-based Adaptation Genetic Algorithm) [20];
algorithm SelfCSHAGA with the proposed modification.

In the experimental protocol, test functions from the CEC'2017 set were used. For each function, the parameters were selected in order to ensure the maximum possible dimension of the task while maintaining adequate computational complexity and stability of the algorithm. In particular, for most functions, a dimension of 50 variables and a population size of 200 individuals were used. For more computationally complex functions (for example, f 4, f 5, f 7, f 8), the dimension was reduced to 20 or 10. It should be noted that the f 9 function proved to be sensitive to dimension: when the dimension was increased to 10, the SelfCSHAGA algorithm worked similarly to the standard GA. Nevertheless, with a dimension of 2, the algorithm found the optimum rather quickly.

The number of iterations was also adjusted depending on the complexity of the function and the dimension of the search area: from 100 iterations (for the f 9) to 2000 iterations (for the f 2). Table 2 presents the parameters of the experiments of the standard GA and SelfCSHAGA algorithm on the CEC'2017 test functions. For each function, the task dimension, population size, number of iterations, search boundaries, and discretisation step are indicated. It should be noted that on the functions f 1, f 2 and f 10, the standard GA failed to find the optimum, while the SelfCSHAGA demonstrated successful performance.

Table 2. The parameters of the experiments for the GA and SelfCSHAGA (sSHAGA) on the CEC’2017 functions

Func.	Dimension D (GA)	Dimension D (sSHAGA)	Generations (GA)	Generations (sSHAGA)	Population (GA)	Population (sSHAGA)	Search boundaries	Optimal value
f₁	–	50	–	1100	–	200	[–100, 100]	100
f₂	–	50	–	2000	–	200	[–100, 100]	200
f₃	20	50	4500	1000	100	200	[–100, 100]	300
f₄	10	20	3500	1000	100	200	[–100, 100]	400
f₅	10	20	3500	1300	200	200	[–100, 100]	500
f₆	20	50	2500	300	100	200	[–100, 100]	600
f₇	10	20	2500	500	100	400	[–50, 50]	700
f₈	10	10	3500	1100	300	100	[–50, 50]	800
f₉	2	2	100	100	100	50	[–10, 10]	900
f₁₀	–	10	–	2500	–	200	[–100, 100]	1000

Using the same settings, the algorithms with the modifications proposed in this article were tested. A K-means clusterizer with 3 to 6 clusters was used. The analysis of the distribution of the number of objective function evaluations was performed based on the interquartile range. For the standard GA, there is variability in computational efforts between functions (Figs. 5 and 6).

Рис. 5. Диаграммы межквартильного размаха для функций f 3–f 9. По оси ординат показано количество вычислений целевой функции. Результаты получены с использованием стандартного генетического алгоритма

Fig. 5. Boxplot diagrams for functions f 3–f 9. The y-axis shows the number of objective function evaluations. The results were obtained using the standard genetic algorithm

Рис. 6. Диаграммы межквартильного размаха для функций f 3–f 9. Каждый столбик соответствует числу кластеров (3, 4, 5, 6), сгруппированных по функциям. По оси ординат показано количество вычислений целевой функции. Результаты получены с использованием стандартного генетического алгоритма с модификацией, предложенной в настоящей диссертации

Fig. 6. Boxplot diagrams for functions f 3–f 9. Each bar corresponds to the number of clusters (3, 4, 5, 6), grouped by functions. The y-axis shows the number of objective function evaluations. The results were obtained using the standard genetic algorithm with the modification proposed in this dissertation

For example, the functions f 3–f 8 demonstrate high median values and interquartile range widths, which indicates scattered consumption of computing resources. The algorithm variant with the modification proposed in the dissertation, including K-means clustering with 3 to 6 clusters, made it possible to reduce the dispersion of the number of evaluations and shift the median towards smaller values. Thus, the generation of individuals within clusters ensured a directed distribution of the search across the solution space, which reduced the number of evaluations required to achieve an approximate optimum.

The analysis of distributing the number of the obective function evaluations for the SelfCSHAGA algorithm (Figs. 7 and 8) shows high stability and low variability compared to the standard GA.

Рис. 7. Диаграммы межквартильного размаха для функций f 1–f 10. По оси ординат показано количество вычислений целевой функции. Результаты получены с использованием алгоритма SelfCSHAGA

Fig. 7. Boxplot diagrams for functions f 1–f 10. The y-axis shows the number of objective function evaluations. The results were obtained using the SelfCSHAGA algorithm

Рис. 8. Диаграммы межквартильного размаха для функций f 1–f 10. Каждый столбик соответствует числу кластеров (3, 4, 5, 6), сгруппированных по функциям. По оси ординат показано количество вычислений целевой функции. Результаты получены с использованием алгоритма SelfCSHAGA с модификацией, предложенной в настоящей статье

Fig. 8. Boxplot diagrams for functions f 1–f 10. Each bar corresponds to the number of clusters (3, 4, 5, 6), grouped by functions. The y-axis shows the number of objective function evaluations. The results were obtained using the SelfCSHAGA algorithm with the modification proposed in this paper

The use of K-means clustering (number of clusters from 3 to 6) further reduces the spread of calculations. By distributing the population across clusters, the algorithm explores solution space more effectively, especially for the functions with high computational complexity (f 3–f 8), reducing the number of iterations required to reach an approximate optimum and narrowing the interquartile range. For example, for the f 5 and f 7, the modified algorithm consumes significantly fewer computations than the standard SHAGA, while the IQR is reduced reflecting the more predictable and economical operation of the algorithm.

Conclusion

The study of the proposed approach showed that integrating the method of dynamic correction of the search area based on clustering into GA allows reducing the number of the objective function evaluations significantly due to more accurate localisation of areas with promising solutions. The correction of search area boundaries based on statistics of the distribution of individuals in clusters demonstrates high efficiency in combination with the dynamic redistribution of computational resources.

The numerical experiments on a representative set of test functions confirmed that the effective number of clusters k is determined by the topology of the objective function and can vary over a wide range. Small values of k simplify landscape modelling but increase the risk of missing individual extrema, while an excessively large k leads to an increase in the number of calculations. Choosing an effective value of k (usually 3–6) provides the best balance between exploring the solution space and using the results of the exploration, reducing the number of generations to convergence by sometimes tens of times compared to the standard approach.

These results confirm the relevance and utility of the proposed approach and justify its further development. In particular, it would be advisable to study the possibility of automatic adaptation of k during the optimisation process, as well as to expand the list of clustering methods used by adding the HDBSCAN algorithms [21] and Kohonen self-organising maps [22]. Furhtermore, a promising direction for development is the combination of density and graph approaches for the more flexible control of search boundaries in complex multidimensional problems.

Sobre autores

Ivan Malashin

Bauman Moscow State Technical University (National Research University)

Autor responsável pela correspondência
Email: ivan.p.malashin@gmail.ru
ORCID ID: 0009-0008-8986-402X

Software engineer

Rússia, 5, building 1, 2-ya Baumanskaya St., Moscow, 105005

Vadim Tynchenko

Reshetnev Siberian State University of Science and Technology

Email: vadimond@mail.ru
ORCID ID: 0000-0002-3959-2969

Dr. Sc., Associate Professor

Rússia, 31, Krasnoyarskii rabochii prospekt, Krasnoyarsk, 66003

Bibliografia

Semenkin E., Semenkina M. Spacecrafts' control systems effective variants choice with self-configuring genetic algorithm // ICINCO 2012 – Proceedings of the 9th International Conference on Informatics in Control, Automation and Robotics. Rome, 28–31 July 2012. 2012, Vol. 1, P. 84–93.
Gamot J. et al. Hidden-variables genetic algorithm for variable-size design space optimal layout problems with application to aerospace vehicles. Engineering Applications of Artificial Intelligence. 2023, Vol. 121, P. 105941.
Srinivas M., Patnaik L. Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics. 1994, Vol. 24, No. 4, P. 656–667.
Liu C., Liu K. An asynchronous island-model genetic algorithm for large-scale global optimization. Applied Soft Computing, 2019, Vol. 76, P. 637–649.
Izzo D., Rucinski M., Ampatzis C. Parallel global optimisation meta-heuristics using an asynchronous island-model. Proc. 2009 IEEE Congress on Evolutionary Computation, IEEE, 2009, P. 2301–2308.
Gendreau M., D’Amours G., Rousseau S. Memetic algorithms: Hybrid genetic algorithms and local search methods for combinatorial optimization. Computers & Operations Research, 2020, Vol. 112, P. 104778.
Verma S., Pant M., Snasel V. A comprehensive review on NSGA-II for multi-objective combinatorial optimization problems. IEEE access. 2021, Vol. 9, P. 57757–57791.
Wang Z. et al. Adaptive replacement strategies for MOEA/D. IEEE transactions on cybernetics. 2015, Vol. 46, No. 2, P. 474–486.
Bleuler S. et al. Multiobjective genetic programming: Reducing bloat using SPEA2. Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No. 01TH8546). IEEE, 2001, Vol. 1, P. 536–543.
Jain L., Sinha A. Adaptive reference point selection in NSGA-III for many-objective optimization. Applied Soft Computing. 2021, Vol. 105, P. 107237.
Shi Y., Song Y., Sun W. Surrogate-assisted multi-objective genetic algorithm for computationally expensive problems. Applied Soft Computing. 2021, Vol. 111, P. 107637.
Galván E., Mooney P. Neuroevolution in deep neural networks: Current trends and future challenges. IEEE Transactions on Artificial Intelligence. 2021, Vol. 2, No. 6, P. 476–493.
Papavasileiou E., Cornelis J., Jansen B. A systematic literature review of the successors of “neuroevolution of augmenting topologies”. Evolutionary Computation. 2021, Vol. 29, No. 1, P. 1–73.
Katoch S., Chauhan S. S., Kumar V. A review on genetic algorithm: past, present, and future. Multimedia Tools and Applications. 2021, Vol. 80, P. 8091–8126.
Alhijawi B., Awajan A. Genetic algorithms: Theory, genetic operators, solutions, and applications. Evolutionary Intelligence. 2024, Vol. 17, No. 3, P. 1245–1256.
Sohail A. Genetic algorithms in the fields of artificial intelligence and data sciences. Annals of Data Science. 2023, Vol. 10, No. 4, P. 1007–1018.
Schubert E. Stop using the elbow criterion for k-means and how to choose the number of clusters instead. ACM SIGKDD Explorations Newsletter. 2023, Vol. 25, No. 1, P. 36–42.
Jamil M., Yang X. S., Zepernick H. J. Test functions for global optimization: a comprehensive survey. Swarm intelligence and Bio-inspired Computation. 2013, P. 193–222.
Wu G., Mallipeddi R., Suganthan P. N. Problem definitions and evaluation criteria for the CEC 2017 competition on constrained real-parameter optimization. National University of Defense Technology, Changsha, Hunan, PR China and Kyungpook National University, Daegu, South Korea and Nanyang Technological University, Singapore, Technical Report. 2017, Vol. 9, P. 2017.
Sherstnev P. A., Semenkin E. S. [SelfCSHAGA: Self-configuring genetic optimization algorithm with adaptation based on success history]. Vestnik MGTU im. N. E. Baumana. Ser. Priborostroenie. 2025, No. 2 (151), P. 122–139 (In Russ.).
Stewart G., Al-Khassaweneh M. An implementation of the HDBSCAN* clustering algorithm. Applied Sciences. 2022, Vol. 12, No. 5, P. 2405.
Gorokhovatskyi V. et al. Application a committee of Kohonen neural networks to training of image classifier based on description of descriptors set. IEEE Access, 2024.

Arquivos suplementares

Ação

1. JATS XML

Baixar

2. Fig. 1. Clustering diagram

Baixar (896KB)

Metadados

3. Fig. 2. Block diagram: а – GA with dynamic correction of search area; б – block diagram of the dynamic correction procedure

Baixar (99KB)

Metadados

4. Fig. 3. Contour plots of test functions

Baixar (564KB)

Metadados

5. Fig. 4. Interquartile (IQR) range diagrams of the experimental results. The figure shows the outcomes of five clustering methods: k-means, k-medians, DBSCAN, agglomerative clustering, and moving average (from top to bottom row, respectively)

Baixar (316KB)

Metadados

6. Fig. 5. Boxplot diagrams for functions f 3–f 9. The y-axis shows the number of objective function evaluations. The results were obtained using the standard genetic algorithm

Baixar (113KB)

Metadados

7. Fig. 6. Boxplot diagrams for functions f 3–f 9. Each bar corresponds to the number of clusters (3, 4, 5, 6), grouped by functions. The y-axis shows the number of objective function evaluations. The results were obtained using the standard genetic algorithm with the modification proposed in this paper

Baixar (121KB)

Metadados

8. Fig. 7. Boxplot diagrams for functions f 1–f 10. The y-axis shows the number of objective function evaluations. The results were obtained using the SelfCSHAGA algorithm

Baixar (96KB)

Metadados

9. Fig. 8. Boxplot diagrams for functions f 1–f 10. Each bar corresponds to the number of clusters (3, 4, 5, 6), grouped by functions. The y-axis shows the number of objective function evaluations. The results were obtained using the SelfCSHAGA algorithm with the modification proposed in this paper

Baixar (159KB)

Metadados

10. Fig. 1. Clustering diagram

Baixar (125KB)

Metadados

11. Fig. 2. Block diagram: а – GA with dynamic correction of search area; b – block diagram of the dynamic correction procedure

Baixar (84KB)

Metadados

12. Fig. 3. Contour plots of test functions

Baixar (206KB)

Metadados

13. Fig. 4. Interquartile (IQR) range diagrams of the experimental results. The figure shows the outcomes of five clustering methods: k-means, k-medians, DBSCAN, agglomerative clustering, and moving average (from top to bottom row, respectively)

Baixar (489KB)

Metadados

14. Fig. 5. Boxplot diagrams for functions f 3–f 9. The y-axis shows the number of objective function evaluations. The results were obtained using the standard genetic algorithm

Baixar (99KB)

Metadados

15. Fig. 6. Boxplot diagrams for functions f 3–f 9. Each bar corresponds to the number of clusters (3, 4, 5, 6), grouped by functions. The y-axis shows the number of objective function evaluations. The results were obtained using the standard genetic algorithm with the modification proposed in this dissertation

Baixar (125KB)

Metadados

16. Fig. 7. Boxplot diagrams for functions f 1–f 10. The y-axis shows the number of objective function evaluations. The results were obtained using the SelfCSHAGA algorithm

Baixar (91KB)

Metadados

17. Fig. 8. Boxplot diagrams for functions f 1–f 10. Each bar corresponds to the number of clusters (3, 4, 5, 6), grouped by functions. The y-axis shows the number of objective function evaluations. The results were obtained using the SelfCSHAGA algorithm with the modification proposed in this paper

Baixar (157KB)

Metadados

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro