GREEDY HEURISTIC METHOD FOR LOCATION PROBLEMS


Cite item

Full Text

Abstract

Authors consider multi-source location problems, k-means, k-median and k-medoid. Such problems are popular models in logistics (optimal location of factories, warehouses, transportation hubs etc.) and cluster analysis, approximation theory, estimation theory, image recognition. Various distance metrics and gauges allow using these models for clustering various kinds of data: continuous and discrete numeric data, Boolean vectors, documents. Wide area of application of such problems leads to growing interest of researchers in Russia and worldwide. In this paper, the authors propose a new heuristic method for solving such problems which can be used as a standalone local search method (local search multi-start) or as the main part of a new algorithm based on ideas of the probability changing method. For the parameters self-tuning of such algorithm, the authors propose new meta-heuristic which allows using new algorithm without learning specific features of each solved problem. Algorithms were tested on various data sets of size up to 160000 data vectors from the UCI repository and real data of semiconductor devices examination. For testing purposes, various distance metrics were used. Computational experiments showed the high efficiency of new algorithms in comparison with local search methods used traditionally for the considered problems. In addition, results were compared with the evolutionary methods and a deterministic algorithm based on the Information Bottleneck Clustering method. Such comparison illustrated the ability of new algorithms to reach higher preciseness of the results in reasonable time.

Full Text

Introduction. The k-means problem is one of the classical problems of the continuous location theory [1-3]. The aim of this problem is searching for k points (cluster centers, centroids, medoids) in a d-dimensional space such that the sum of distances from each of given points called demand points, data vectors etc. reaches its minimum: (1) Squared Euclidean distances and distance metrics based on the Euclidean norm is most commonly used as the distance gauge [1]. However, usage of the rectilinear (rectangle, Manhattan) metric [4] allows to obtain results (coordinates of each center) of the same accuracy (quantity of decimal digits) as given data vectors. In this case, value of each coordinate coincides with one of corresponding coordinates of data vectors [3; 5]. Moreover, using rectilinear metric reduces the influence of the outliers represent some non-standard data vectors or improperly measured values. Another approach leading to the results of the same accuracy as the initial data is solving k-medoids problem [6; 7]. In this case, searching for cluster centers (points minimizing the total distance) is performed on the set of data vectors only. In (1), we have a problem with “classical” squared Euclidean distance. However, other metrics or gauges can be used. Let us denote the distance between X and Y as L(X, Y). The k-means problem and the k-medoids problem are special cases of the p-median (k-median) problem: (2) Here, wi > 0 are weight coefficients. In the case of k-means problem, all wi = 1. The k-medoids problem can be stated as follows: Known methods. The most commonly used algorithms for solving k-means problem include the ALA procedure (Alternating Location-Allocation) which repeats two simple steps: Algorithm 1. ALA procedure. Required: data vectors A1…AN, k initial cluster centers X1…Xk. 1. For each center Xi, define its cluster Ci as the subset of data vectors having closest center Xi. 2. For each cluster Ci, recalculate its center Xi: 3. Repeat from Step 1 is steps 1, 2 made any changes. This procedure can be used for solving continuous k-median, k-means and k-medoids problems. In the general case of k-medoids problem, for calculating new cluster center, enumeration of all cluster members is needed. Faster similar local search procedures exist [8-10], however, they do not guarantee an exact solution. The “classical” k-means method with squared Euclidean distances (l22) has an important advantage. In this case, new cluster center recalculation is a simplest task solved in a single step. An average (weighted average) value of each coordinate in the cluster is the corresponding coordinate value of the center [1]. If ith cluster center Xi = (xi,1, …, xi,d) is a d-dimensional vector and data vectors Aj = (ai,1, …, aj,d), are also d-dimensional vectors then new center of ith cluster for the k-median problem is [1]: . If the ALA procedure runs with the rectilinear metric (l1) then the value of each cluster center coordinate is calculated separately as median (weighted median) value of the corresponding coordinate of data vectors which are cluster members. Such procedure can be described as follows. Algorithm 2. Calculating ith cluster center (median) with the l1 metric. 1. For each : 1.1. Arrange data vectors in ascending order. Denote such sequence . Here is the power of a set. 1.2. Calculate Store . Here, square brackets mean integer part. 1.3. Repeat 1. 2. Return Except several special cases, k-means, k-medoids and k-median problems are NP hard problems requiring global search [11]. The result of the ALA procedure depends on selection of the initial centers. Usually, initial centers are selected among data vectors. The k-means++ procedure [12] is preferred in comparison with the chaotic selection of the initial centers. The k-means++ procedure guarantees accuracy O(log(p)). However, such accuracy cannot be acceptable for most practically important problems. In this case, various initial centers recombination techniques are used. Authors propose many techniques optimizing the ALA procedure: sampling [13] (solving a simplified problem on a randomly selected subset of data vectors and using the result as the initial solution for solving the initial problem), various streaming algorithms for running on big data [14] etc. Dependence of the ALA and other local search procedures on the given initial data embarrasses the reproducibility of the results: the same data vectors can belong to different clusters or the same cluster depending on chosen initial centers. Thus, problem of developing an algorithm resulting in precise and stable result is needed. Perfect results of clustering and classification problems can be obtained by using the Information Bottleneck Clustering method (IBC) [15]. Running this algorithm starts with considering each data vector as a separate cluster. Then superfluous clusters are eliminated one by one unless we have k clusters. Each time, this algorithm eliminates the cluster center which gives minimal total distance increase after its elimination. Such algorithms are extremely slow [15]. Genetic algorithms with greedy heuristic [16] use the same principles. However, such algorithms developed initially for solving the k-median problems on a network are compromise methods which allow solving bigger problems. Versions of such algorithms described in article [17] can be used for solving continuous problems. The approach can be described as follows [2]. Algorithm 3. Genetic algorithm with greedy heuristic for k-median problems. Required: Population size Np. 1. Form (randomly with uniform distribution or using the k-means++ procedure) Np different initial solutions . Each of such solutions is a subset of power k used as an initial solution of the ALA procedure. Here and after, for each of the initial solutions, the fitness function is evaluated by Algorithm 4. Resulting values are stored in variables 2. If the stop condition is reached the STOP. The result is the initial solution having the minimal corresponding value fi. For finding the final solution, Algorithm 1 runs again. 3. Select randomly two indexes . 4. Form an interim solution . 5. If then go to step 7. 6. Calculate . Eliminate j* from : Go to step 5. 7. If then go to step 2. 8. Select index as follows. Select randomly two indexes . If then else . 9. Replace and corresponding fitness function value: , . Go to step 2. Steps 5, 6 of this algorithm realize so called greedy heuristic eliminating sequentially the centers from the interim solution. Analogous greedy heuristic was proposed in 1963 by Kuehn and Hamburger [18]. The Information Bottleneck Clustering method is based on the same principle of sequential cluster elimination [15]. Both method of Kuehn and Hamburger and the IBC method form the initial unfeasible solution as the set of all data vectors. The fitness function can be evaluated as follows: Algorithm 4. Calculating fitness function . Required: initial solution . 1. Run Algorithm 1 from initial centers , form set of centers. 2. Return Using this algorithm requires much computational resources. An alternative approach [17] is using the total distance (2) immediately as the fitness function in Algorithm 3. In this case, step 1 of Algorithm 4 is omitted. Actually, this approach solves the k-medoid problem and uses its solution as the initial centers set of the ALA procedure at the final iteration of the greedy heuristic. Such approach reduces the computational complexity, however, it reduces the accuracy. An indisputable advantage of the Information Bottleneck Clustering method is its determinancy: this method does not use any random values, thus, each run results in the same result. In general case, if an algorithm uses random values, the precise reproduction of the results is impossible. Using the IBC method slows down the computation utterly. Thus, development of an algorithm giving sufficiently precise results is important for solving many practical problems. In [3], authors propose a modification of the greedy heuristic used at step 6 of Algorithm 3. In this case, sets of points representing cluster centers in d-dimensional space are used as an alphabet of the genetic algorithm instead of the sets of data vector indexes. Authors call such modification Genetic Algorithm with Floating Point Alphabet. Results of the genetic algorithm, with this new heuristic are significantly more precise than the results of the modification proposed in [17] (k-problem solution, in fact). At the same time, iterations of Algorithm 3 with this heuristic can be performed faster in comparison with using Algorithm 4. Moreover, abrupt decrease of the computational complexity of step 6 of Algorithm 3 allows using this algorithm for large-scale problems: in [3], authors present results for problems up to 560000 data vectors. Such heuristic can be described as follows [3]. Algorithm 5. Greedy heuristic for the genetic algorithm with floating point alphabet used instead of steps 4-7 of Algorithm 3). Required: set of data vectors , quantity k of clusters, two “parent” sets of centers and , parameter α. 1. Set . Run ALA procedure from initial set of centers . Store the result to . 2. If then start the ALA procesure from the initial solution then STOP and return the result . 3. Calculate distance from each data vector to the nearest center in : Form clusters around centers : Calculate distances from each data vector to the second closest center in set : 4. For ezch center , calculate 5. Calculate . Sort values in ascending order and select subset of data vectors with minimal values . 6. For each : if : then remove Xj from set . Here, . 7. Store . 8. Rearrange the data vectors to the closest centers: 9. For each : if : Ci = X and then recalculate center X* of cluster . Store . 10. Go to step 2. Here, an importan parameter α determines the part of superfluous centers eliminated in a single iteration. Autrors [3] propose value 0.2. Higher values accelerate the algorithm but reduce the preciseness. Small values lead to eliminating the centers one-by-one in accordance with Algorithm 4. In this paper, we used value 0.25. Iterations of this heuristic combine the fast greedy heuristic [16; 17; 19] eliminating up to 20-25 % superfluous centers with an iteration of the modified ALA procedure. Algorithm 4 requires p(k0-k) runs of the ALA procedure (here, k0 is quantity of initial centers), Algorithm 5 reduces number of iterations down to O(log(k0-k)). Moreover, each iteration does not require running the whole ALA procedure. Instead, only its separate optimized steps (location - allocation) are performed. Obviously, if k0 = N then Algorithm 5 is similar with the Information Bottleneck Clustering: number of centers (and clusters) at the first iteration coincides with the number of data vectors. Moreoved, if k0 = N is not random. Thus, a simple deterministic algorithm can be constructed [20]. Algorithm 6. Deterministic algorithm with greedy heuristic. Required: set of data vectors , number of clusters k, parameter α. 1. Set . 2. Start Algorithm 5 from step 2. Table shows somparison of results of various algorithms. This algorithm is deterministic but not very precise. New algorithms. Deterministic Algorithm 6 is a special case of Algorithm 6 with initial value . The crossover procedure of the genetic algorithm with floating point (Algorithm 5) starts from a joint interim solution . If k << N then in most cases (at least at the initial iterations of the GA), elements of the “parent” sets and do not coincide and Algorithm 5 has at its initial iteration. In [21; 22], authors propose various efficient modifications of the greedy heuristic for genetic algorithm with partial joining of the “parent” sets. In this case, . The GA with recombination of fixed length subsets [10] operates sets of power k only. All listed algorithms are most efficient for different problem classes. Thus, depending on problem class, various power of the initial solution can be optimal. Let us denote quantity of centers in the initial solution as kinit = and a ratio of the superfluous centers to kinit as . Greedy heuristic in its original form (steps 4-7 of Algorithm 3) and modified form with floating point alphabet (Algorithm 5) can be used as a standalone method for solving the problem. An algorithm can be described as follows. Algorithm 7. Greedy heuristic method with multistart. Required: sed of data vectors , number k of clusters, parameters α and β. 1. Calculate kinit . Select randomly subset of power kinit. 2. Start Algorithm 5 from step 2. 3. Check the stop conditions and repeat from step 1. Such approach requires indicating the value of parameter β. Results given in table (see algorithm “GH”) show the importance of this parameter. Let us consider a known method of solving pseudo-Boolean optimization problems, Probability Changing Method [23-25]. The main idea of this method is selecting elements from some given set randomly in accordance with the probability vector depending on previously aclieved results. An algorithm based on this method was proposed for solving the p-median problem on a network [26; 27]. However, such algorithm has very slow convergence and can be used as an aiding method for the genetic algorithm [26]. Similar algorithm can be used for approximate solution of the k-median problem with an arbitrary distance function [28]. Inclusion of the greedy heuristic into the Probability Changing Method leads to new efficient algorithm: Algorithm 8. Adaptive greedy heuristic method for continuous location problems. 1. Set equal values of probabilities pi = 1 /N for all . Set β = 0.5. 2. For each do: 2.1. Copy the probabilities qi = pi for all . Set . Generate randomly with uniform distribution. Calculate . 2.2. While do: 2.2.1. Generate random with uniform distribution. Set S = 0; l = 1. 2.2.2. While S + qj < Q do: S = S + qj; l = l + 1; repeat 2.2.2. 2.2.3. Set set ql = 0. 2.2.4. Repeat 2.2. 2.3. Copy the initial set . 2.4. Start Algorithm 5 from step 2. Store the result ; set; . 2.5. Next iteration 2. 3. Calculate here, nj is the number of value fj in an ascending sequence os these values. If then set . 4. Chose index b having minimal corresponding value fb and index w having maximal fw. 5. For each do: 5.1. If and then set . Else if and then . 5.2. Next iteration 5. 6. Check the stop conditions and repeat step 2. This algorithm requires indicating values of parameters Npop (population size) and γ (probability change coefficient). Small populations are enough for this algorithm. We used Npop = 9 and γ = 1,1. Parameter β is adjusted with special meta-heuristic (step 3). New algorithms and the deterministic algorithm with greedy heuristic [20] have an important feature. Many clustering problems (see for example [19]) are decomposed into series of k-means problems with where kmin = 1 (the whole data set is supposed to be a single cluster) and kmax is equal to some reasonable value. Algorithm 6 can be used for a problem with k = kmax and then Algorithm 5 from step 2 can be started again for obtaining other results down to k = kmin. Thus, all results for can be obtained in a single loop. Computational results. For testing new methods, we used data sets from the UCI repository [29], real data of examination of the EEE components [19] and randomly generated points on a plane with uniform distribution. Some results are given in table. For comparison purposes, we solved the problems with Information Bottleneck Clustering method [15; 20], deterministic algorithm with greedy heuristic [20], the genetic algorithm for p-median and p-medoids problems based on recombination of fixed length subsets [10], the genetic algorithm with classical crossover procedure and the genetic algorithm with greedy heuristic and floating point alphabet for continuous p-median problem [3]. Computational tests show high efficiency of new algorithms. In many cases, new adaptive algorithm shows the best results. In many cases, a single start of the greedy heuristic cannot be completed within the given time limit. Running time of the algorithm depends on parameter β. In such cases, table contains dashes. Problems with data sets are not solved with deterministic algorithms either due to huge amount of time needed. The optimal value of parameter α remains an open question. Note that if α = 0.001 then centers are eliminated from the interim solution one-by-one in accordance with the original greedy heuristic. For the deterministic algorithm with greedy heuristic [20], value α = 0.25 guarantees faster result (not the most precise) in accordance with results obtained with α = 0.001. In the case of stochastic algorithm, it is not easy to foresee which value leads to better result in reasonable time. As we can see in table, data vectors quantity (compare data sets MissAmerica1, BIRCH3 and examination of diodes) and metric do not determine the optimal value of this parameter. Developing a meta-heuristic algorithms adjusting this parameter seems to be promising. Adjusting parameter α with fixed value of β can be easily performed. Simultaneous adjustment of both parameters requires factor analysis for evaluating the influence of each parameter on the results. Thus, developing a co-evolutionary algorithm with two concurrent populations having different fixed values of parameter α seems to be simpler. An example of analogous algorithm with concurrent populations evolving with various bionic algorithms was proposed in [30]. Nevertheless, it is obvious that new adaptive algorithm allows to obtain better results than multiple start of the ALA procedure and multiple start of the greedy heuristic with fixed β. Used algorithms: “ALA” is ALA procedure multistart; “GA-MIX” is genetic algorithm with fixed length subsets recombination [10]; “GA-GHFP” is genetic algorithm with greedy heuristic and floating point alphabet [3]; “GA-classic” is GA with classical recombination, “IBC” is Information Bottleneck Clustering [15; 20]; “Determ. GH” is deterministic algorithm with greedy heuristic [20]; “GH” is new Algorithm 7; “GL” is multistart of the original greedy heuristic with local search (steps 4-7 of Algorithm 3 at α = 0.001 or steps 4-7 of Algorithm 3 with fast elimination of centers from the interim solution similar with steps 5-7 of algorithm 5 at α = 0.25); “GH” adapt is adaptive greedy heuristic (Algorithm 8). If no result was achieved within specified time limit, the table contains a dash. Results of various methods Data set, number of data vectors, dimension, data type Number of clusters k, metric, problem type Algorithm Time, sec. Average result of 30 runs: total distance (2) F(X1, …, Xk) Std. deviation of the result (30 runs) Parameter α = 0.25 Parameter α = 0.001 Parameter α = 0.25 Parameter α = 0.001 Examination tests of diode, N = 701, d = 18, real 30, l22 nor-med. ALA GA-MIX GA-GHFP GA-classic IBC 15 15 15 15 989.3 4896.913 4867.103 4810.655 4882.609 4887.930 9.41069 2.13825 1.37712 8.51498 Determ. k-means Determ. GH GH, β = 0.5 GH, β = 1 GH, β = 3 GL β = 0.5 GL, β = 1 GL, β = 3 GH adapt. 0.02/0.9 15 15 15 15 15 15 15 4890.675 4855.235 4835.241 4834.817 4854.183 4839.104 4836.167 4827.753 4888.21 4855.534 4825.410 4821.546 4853.432 4829.781 4820.190 4832.583 Determ. 12.9294 4.0288 7.6421 10.848 2.8491 4.1217 1.3607 Determ. 7.6843 6.4184 5.2087 8.8929 3.1062 3.1191 1.6819 BreastCancer (UCI), N = 699, d = 10, cathegories 20, Jaccard, ALA GA-MIX GA-GHFP GA-classic IBC 5 5 5 5 370.16 184.1 172.81 172.62 182.98 Determ. 0.8725 0.3650 0.0787 1.0415 Determ. k-medoids Determ. GH GH, β = 0.5 GH, β = 1 GH, β = 3 GL β = 0.5 GL, β = 1 GL, β = 3 GH adapt. 0.4/0.7 5 5 5 5 5 5 5 175.2 181.95 177.10 175.00 183.93 181.43 - 175.39 175.7 181.62 177.77 175.63 184.43 181.60 - 175.67 Determ. 0.816 1.108 0.155 0.859 3.163 - 1.741 Determ. 0.453 0.513 0.755 2.312 1.012 - 0.468 Ionosphere (UCI), N = 351, d = 10, real 20, l1, ALA GA-MIX GA-GHFP GA-classic IBC 4 4 4 4 370.16 2530.40 2526.77 2527.00 2526.93 Determ. 2.117 0.0257 0.0082 0.1302 Determ. k-means Determ. GH GH, β = 0.5 GH, β = 1 GH, β = 3 GL β = 0.5 GL, β = 1 GL, β = 3 GH adapt. 0.03/0.2 4 4 4 4 4 4 4 2531.03 2526.81 2526.85 2526.86 2528.02 2528.33 2528.99 2526.79 2533.31 2526.79 2526.81 2526.92 2527.83 2527.92 2529.99 2526.79 Determ. 0.0286 0.0816 0.0821 0.4523 0.9843 1.0165 0.0231 Determ. 0.0222 0.0222 0.1100 0.5249 0.7072 0.8595 0.0205 BIRCH3 (UCI), N = 100000, d = 2, real 100, l22, ALA GA-MIX GA-GHFP GA-classic IBC 60 60 60 60 86400 3.825·1013 3.886·1013 3.751·1013 - - 3.060·1011 5.155·1011 0.878·1011 - - k-means Determ. GH GH, β = 0.5 GH, β = 1 GH, β = 3 GL β = 0.5 GL, β = 1 GL, β = 3 GH adapt. 86400 60 60 60 60 60 60 60 - 3.832·1013 3.827·1013 - - - - 3.772·1013 - 3.735·1013 - - - - - 3.722·1013 - 5.359·1011 4.017·1011 - - - - 3.802·1011 - 0.759·1011 - - - - - 0.216·1011 MissAmerica1 (UCI), N = 6400, d = 16, real 100, l22, ALA GA-MIX GA-GHFP GA-classic IBC 60 60 60 60 86400 717488.7 698055.6 698054.5 714946.3 - 1107.24 460.62 406.42 685.23 - k-means Determ. GH GH, β = 0.5 GH, β = 1 GH, β = 3 GL β = 0.5 GL, β = 1 GL, β = 3 GH adapt. 93/9977 60 60 60 60 60 60 60 703786.4 714287.9 705838.9 709468.4 - - - 702860.8 707529.5 713244.8 - - - - - 708239.0 Determ. 499.037 334.656 195.671 - - - 576.207 Determ. 2263.9 - - - - - 781.084 Completion of table Data set, number of data vectors, dimension, data type Number of clusters k, metric, problem type Algorithm Time, sec. Average result of 30 runs: total distance (2) F(X1, …, Xk) Std. deviation of the result (30 runs) Parameter α = 0.25 Parameter α = 0.001 Parameter α = 0.25 Parameter α = 0.001 Generated problem, N = 500, d = 2, real 20, metric based on angular, distances, k-median ALA GA-MIX GA-GHFP GA-classic IBC 4 4 4 4 3141 4461.24 4509.49 4326.72 4392.95 4381.34 25.9199 14.4368 4.7587 79.6913 Determ. Determ. GH GH, β = 0.5 GH, β = 1 GH, β = 3 GL β = 0.5 GL, β = 1 GL, β = 3 GH adapt. 4.9/23.5 4 4 4 4 4 4 4 4377.12 4350.36 4333.02 4389.32 4428.97 4379.42 4429.37 4313.67 4376.43 4378.30 4371.50 - 4453.65 4359.41 - 4309.43 Determ. 15.1717 5.8614 22.5053 59.4167 46.1732 78.2398 3.7077 Determ. 15.4234 38.5620 - 34.3130 22.5052 - 1.4427 Conclusion. The greedy heuristic method can be used as a standalone random stochastic or deterministic [20] method for solving continuous k-median, k-mean and k-medoids problems and as a recombination method in genetic algorithms [3; 19; 21]. More precise results can be achieved in appropriate time by using the modified greedy heuristic in a self-adjusting algorithm based on ideas of the probability changing method.
×

About the authors

L. A. Kazakovtsev

Reshetnev Siberian State Aerospace University

Email: levk@bk.ru
31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037, Russian Federation

A. N. Antamoshkin

Reshetnev Siberian State Aerospace University

31, Krasnoyarsky Rabochy Av., Krasnoyarsk, 660037, Russian Federation

References

  1. Facility Location Concepts, Models, Algorithms and Case Studies / R. Z. Farahani, М. Hekmatfar (editors). Berlin Heidelberg : Springer-Verlag, 2009. 550 p.
  2. Kazakovtsev L. A., Stupina A. A. Fast Genetic Algorithm with Greedy Heuristic for p-Median and k-Means Problems // IEEE 2014 6th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT) (October 6-8, 2014. St.-Petersburg). 2014. Р. 702-706.
  3. Kazakovtsev L. A., Antamoshkin A. N. Genetic Algorithm wish Fast Greedy Heuristic for Clustering and Location Problems // Informatica. 2014. Vol. 38, iss. 3. Рp. 229-240.
  4. Deza M. M. and Deza E. Encyclopedya of Distances. Berlin-Heilderberg : Springer Verlag, 2009. 590 p. doi: 10.1007/978-3-642-00234-2_1.
  5. Трубин В. А. Эффективный алгоритм для задачи Вебера с прямоугольной метрикой // Кибернетика. 1978. № 6. С. 67-70. doi: 10.1007/BF01070282/.
  6. Kaufman L. and Rousseeuw P. J. Finding Groups in Data: an Introduction to Cluster Analysis. New York : John Wiley & Sons, 1990. 342 p. doi: 10.1002/9780470316801.ch1.
  7. Park H. S., Jun C.-H. A simple and fast algorithm for K-medoids clustering // Expert Systems with Applications. 2009. Vol. 36. Р. 3336-3341.
  8. Lucasius C. B., Dane A. D., Kateman G. On K-Medoid Clustering of Large Data Sets with the Aid of a Genetic Algorithm: Background, Feasibility and Comparison // Analytical Chimica Acta. 1993. Vol. 282. Р. 647-669. doi: 10.1007/s10732-006-7284-z.
  9. Zhang Q., Couloigner I. A New Efficient K-Medoid Algorithm for Spatial Clustering // ICCSA 2005, LNCS. 2005. Vol. 3482. Р. 181-189. DOI: 10.1007/ 11424857_20.
  10. Sheng W., Liu X. A Genetic K-Medoids Clustering Algorithm // Journal of Heuristics. 2004. Vol. 12, iss. 6. Р. 447-466. doi: 10.1007/s10732-006-7284-z.
  11. Cooper L. Location-allocation problem // Oper. Res. 1963. Vol. 11. Р. 331-343, doi: 10.1287/opre.11.3.331.
  12. Arthur D., Vassilvitskii S. k-Means++: the Advantages of Careful Seeding // Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 2007. Р. 1027-1035. DOI: 10.1.1.360.7427.
  13. Mishra N., Oblinger D., Pitt L. Sublinear time approximate clustering // 12th SODA. 2001. Р. 439-447.
  14. StreamKM: A Clustering Algorithm for Data Streams / M. R. Ackermann [et al.] // J. Exp. Algorithmics. 2012. Vol. 17. Article 2.4. doi: 10.1145/2133803.2184450/.
  15. A parallel clustering method combined information bottleneck theory and centroid-based clustering / Zh. Sun [et al.] // The Journal of Supercomputing. 2014. Vol. 69, iss. 1. Р. 452-467. doi: 10.1007/s11227-014-1174-1.
  16. Alp O., Erkut E., Drezner Z. An Efficient Genetic Algorithm for the p-Median Problem // Annals of Operations Research. 2003. Vol. 122, iss. 1-4. Р. 21-42. doi: 10.1023/A:1026130003508.
  17. Neema M. N., Maniruzzaman K. M., Ohgai A. New Genetic Algorithms Based Approaches to Continuous p-Median Problem // Netw. Spat. Econ. 2011. Vol. 11. Р. 83-99. doi: 10.1007/s11067-008-9084-5.
  18. Kuehn A. A., Hamburger M. J. A heuristic program for locating warehouses // Management Science. 1963. Vol. 9, iss. 4. Р. 643-666. DOI:10.1287/ mnsc.9.4.643.
  19. Задача классификации электронной компонентной базы / Л. А. Казаковцев [и др.]. // Вестник СибГАУ. 2014. № 4(56). С. 55-61.
  20. Казаковцев Л. А. Детерминированный алгоритм для задачи k-средних и k-медоид // Системы управления и информационные технологии. 2015. № 59. С. 95-99.
  21. Казаковцев Л. А., Ступина А. А., Орлов В. И. Модификация генетического алгоритма с жадной эвристикой для непрерывных задач размещения и классификации // Системы управления и информационные технологии. 2014. № 2(56). С. 35-39.
  22. Modified Genetic Algorithm with Greedy Heuristic for Continuous and Discrete p-Median Problems / L. A. Kazakovtsev [et al.] // Facta Universitatis Series Mathematics and Informatics. 2015. Vol. 30, iss. 1. Р. 89-106.
  23. Antamoshkin A. N. Brainware for Searchal Pseudoboolean Optimization // Transactions of the Tenth Prague Conference Czechoslovak Academy of Sciences Volume. 1988. Р. 203-206. doi: 10.1007/978-94-009-3859-5_16.
  24. Казаковцев Л. А. Параллельный алгоритм случайного поиска с адаптацией для систем с распределенной памятью // Системы управления и информационные технологии. 2012. № 3(49). С. 11-15.
  25. Kazakovtsev L. A. Random Constrained Pseudo-Boolean Optimization Algorithm for Multiprocessor Systems and Clusters // ICUMT 2012, International Congress on Ultra-Modern Telecommunications. S-Petersburg : IEEE Press, 2012. Р. 650-656. doi: 10.1109/ICUMT.2012.6459711.
  26. Antamoshkin A. N., Kazakovtsev L. A. Random Search Algorithm for the p-Median Problem // Informatica, 2013. Vol. 37, iss. 3. Р. 267-278.
  27. Антамошкин А. Н., Казаковцев Л. А. Применение метода изменяющихся вероятностей для задач размещения на сети // Вестник СибГАУ. 2014. № 5(57). С. 10-19.
  28. Антамошкин А. Н., Казаковцев Л. А. Алгоритм случайного поиска для обобщенной задачи Вебера в дискретных координатах // Информатика и системы управления. 2013. № 1. С. 87-98.
  29. Patrick M. Murphy P. M., Aha D. W. UCI Repository of Machine Learning Databases [Электронный ресурс]. 1994. URL: http://www.cs.uci.edu/mlearn/ mlrepository.html (дата обращения: 02.01.2015).
  30. Akhmedova Sh. A., Semenkin E. S. New optimization metaheuristic based on co-operation of biology related algorithms // Вестник СибГАУ. 2013. № 4(50). С. 92-99.
  31. Farahani R. Z., Hekmatfar M. (editors). Facility Location Concepts, Models, Algorithms and Case Studies. Springer-Verlag Berlin Heidelberg, 2009, 550 p.
  32. Kazakovtsev L. A., Stupina A. A. Fast Genetic Algorithm with Greedy Heuristic for p-Median and k-Means Problems. IEEE 2014 6th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), St.-Petersburg, October 6-8, 2014. 2014, P. 702-706.
  33. Kazakovtsev L. A., Antamoshkin A. N. Genetic Algorithm wish Fast Greedy Heuristic for Clustering and Location Problems. Informatica. 2014, Vol. 38, Iss. 3, P. 229-240.
  34. Deza M. M., Deza E. Encyclopedya of Distances. Springer Verlag. Berlin, Heilderberg, 2009, 590 p., doi: 10.1007/978-3-642-00234-2_1.
  35. Trubin V. A. [An efficient algorithm for the Weber problem with rectilinear metric]. Kibernetika. 1978, Iss. 6, P. 67-70, doi: 10.1007/BF01070282/ (In Russ.).
  36. Kaufman L., Rousseeuw P. J. Finding Groups in Data: an Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990, 342 p., DOI: 10.1002/ 9780470316801.ch1.
  37. Park H. S., Jun C.-H. A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications. 2009, Vol. 36, P. 3336-3341.
  38. Lucasius C. B., Dane A. D., Kateman G. On K-Medoid Clustering of Large Data Sets with the Aid of a Genetic Algorithm: Background, Feasibility and Comparison. Analytical Chimica Acta. 1993, Vol. 282, P. 647-669, doi: 10.1007/s10732-006-7284-z.
  39. Zhang Q., Couloigner I. A New Efficient K-Medoid Algorithm for Spatial Clustering. ICCSA 2005, LNCS. 2005, Vol. 3482, P. 181-189, doi: 10.1007/11424857_20.
  40. Sheng W., Liu X. A Genetic K-Medoids Clustering Algorithm. Journal of Heuristics. 2004, Vol. 12, Iss. 6, P. 447-466, doi: 10.1007/s10732-006-7284-z.
  41. Cooper L. Location-allocation problem. Oper. Res. 1963, Vol. 11, P. 331-343, DOI: 10.1287/ opre.11.3.331.
  42. Arthur D., Vassilvitskii S. k-Means++: the Advantages of Careful Seeding. Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 2007, P. 1027-1035, DOI: 10.1.1.360.7427.
  43. Mishra N., Oblinger D., Pitt L. Sublinear time approximate clustering. 12th SODA. 2001, P. 439-447.
  44. Ackermann M. R. et al. StreamKM: A Clustering Algorithm for Data Streams. J. Exp. Algorithmics. 2012, Vol. 17, Article 2.4, doi: 10.1145/2133803.2184450/.
  45. Sun Zh., Fox G., Gu W., Li Zh. A parallel clustering method combined information bottleneck theory and centroid-based clustering. The Journal of Supercomputing. 2014, Vol. 69, Iss. 1, P. 452-467. doi: 10.1007/s11227-014-1174-1.
  46. Alp O., Erkut E., Drezner Z. An Efficient Genetic Algorithm for the p-Median Problem. Annals of Operations Research. 2003, Vol. 122, Iss. 1-4, Р. 21-42. doi: 10.1023/A:1026130003508.
  47. Neema M. N., Maniruzzaman K. M., Ohgai A. New Genetic Algorithms Based Approaches to Continuous p-Median Problem. Netw. Spat. Econ. 2011, Vol. 11, P. 83-99. doi: 10.1007/s11067-008-9084-5.
  48. Kuehn A. A., Hamburger M. J. A heuristic program for locating warehouses. Management Science. 1963, Vol. 9, Iss. 4, P. 643-666, doi: 10.1287/mnsc.9.4.643.
  49. Kazakovtsev L. A., Orlov V. I., Stupina A. A., Masich I. S. [Problem of electronic components classifying]. Vestnnik SibGAU. 2014, No. 4(56), P. 55-61 (In Russ.).
  50. Kazakovtsev L. A. [Deterministic algorithm for the k-means and k-medoids problems]. Sistemy upravleniya I informatsionnye tekhnokogii. 2015, No. 1(59), P. 95-99 (In Russ.).
  51. Kazakovtsev L. A., Stupina A. A., Orlov V. I. [Modification of the genetic algorithm with greedy heuristic ffor continuous location and classification problems]. Sistemy upravleniya i informatsionnye tekhnokogii. 2014, No. 2(56), P. 35-39 (In Russ.).
  52. Kazakovtsev L. A., Orlov V. I., Stupina A. A., Kazakovtsev V. L. Modified Genetic Algorithm with Greedy Heuristic for Continuous and Discrete p-Median Problems. Facta Universitatis Series Mathematics and Informatics. 2015, Vol. 30, Iss. 1, P. 89-106.
  53. Antamoshkin A. N. Brainware for Searchal Pseudoboolean Optimization. Transactions of the Tenth Prague Conference Czechoslovak Academy of Sciences Volume. 1988, P. 203-206, doi: 10.1007/978-94-009-3859-5_16.
  54. Kazakovtsev L. A. [Parallel random search algorithm with adaptation for shared memory systems]. Sistemy upravleniya I informatsionnye tekhnokogii. 2012, No. 3 (49), P. 11-15 (In Russ.).
  55. Kazakovtsev L. A. Random Constrained Pseudo-Boolean Optimization Algorithm for Multiprocessor Systems and Clusters. ICUMT 2012, International Congress on Ultra-Modern Telecommunications. IEEE Press. S-Petersburg. 2012, P. 650-656, DOI: 10.1109/ ICUMT.2012.6459711.
  56. Antamoshkin A. N., Kazakovtsev L. A. Random Search Algorithm for the p-Median Problem. Informatica, 2013, Vol. 37, Iss. 3, P. 267-278.
  57. Antamoshkin A. N., Kazakovtsev L. A. [Application of the probability changing method for optimal location problems on a network]. Vestnik SibGAU. 2014, No. 5(57), P. 10-19 (In Russ.).
  58. Antamoshkin A. N., Kazakovtsev L. A. [Random search algorithm for the generalized Weeber problem in a discrete coordinate system]. Informatika i systemy upravleniya. 2013, No. 1, P. 87-98 (In Russ.).
  59. Patrick M. Murphy P. M., Aha D. W. UCI Repository of Machine Learning Databases. 1994. Available at: http://www.cs.uci.edu/mlearn/mlrepository. html (accessed 02.01.2015).
  60. Akhmedova Sh. A., Semenkin E. S. New optimization metaheuristic based on cooperation of biology related algorithms. Vestnik SibGAU. 2013, No. 4(50), P. 92-99 (In Russ.).

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2015 Kazakovtsev L.A., Antamoshkin A.N.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies