Performance Analysis of Subtractive Clustering Algorithm in Determining the Number and Position of Cluster Centers

INTRODUCTION Subtractive Clustering Algorithm (Chiu, 1994) is a clustering method modified from Mountain Clustering (Yager & Filev, 1992). In principle, the subtractive clustering algorithm is based on the size of the density of data points (potential) in a space (variable). The data point with the highest potential value will be selected as the center of the cluster. Potential data points within the specified radius around the cluster center will be deducted from their potential value. Then the algorithm will choose another point that has the potential value of the next highest data point to serve as the center of another cluster. This process is repeated until the predetermined criteria are met (Sarin et al. 2019). The subtractive ABSTRACT The basic concept of the subtractive clustering algorithm is to choose a data point that has the highest density (potential) in a space (variable) as the center of the cluster. The number and position of the cluster centers formed are influenced by the given radius (r) parameter value. If the radius value is very small, it will result in the neglect of potential data points around the center of the cluster. If the value of the radius parameter is too large, it increases the contribution of all potential data points, thereby canceling the effect of cluster density. The number of cluster centers in the subtractive clustering algorithm is determined based on the iteration process in finding data points with the highest number of neighbors. This study uses the clustering partition as a parameter value to determine a data point (candidate cluster center) will be selected to determine the effect of the radius (r) parameter value on the subtractive clustering algorithm in generating clustering. From the experiments that have been carried out on 4 datasets, the results have been obtained, for dataset 1 the highest average value of fuzzy silhouette with a parameter value of radius (r) 0.35 is 0.9088 and the number of clusters 2. While in dataset 2, the average value The highest fuzzy silhouette with a parameter value of radius (r) 0.40 is 0.6742 and the number of clusters 3. While in dataset 3, the average value of the highest fuzzy silhouette with a parameter value of radius (r) 0.50 is 0.7434 and the number of clusters 3. While in the dataset the last is the fourth dataset, the highest fuzzy silhouette average value with a radius (r) parameter value of 0.50 is 0.6630 and the number of clusters 2. This subscractive clustering algorithm is widely applied in the fields of transportation, GIS, big data, control of electric voltages, electrical energy needs, knowing the area of population density to health such as breast cancer diagnosis, which is related to the needs of human life.

clustering algorithm method is a simple and fast clustering method and automatically forms the number of clusters. This method is widely implemented in various fields, Liang et al. (2017) breast cancer diagnosis, Wu & Luo (2017) transport, Radionov et al. (2015) controlling electric current voltage, Laksono, H & Hafis, M. (2013). The need for electrical energy, Azizah N. et al. (2019) to find out the area of population density, Polat & Durduran (2011) Geographical Information System (GIS), Pereira et al. (2016) industrial power grid Smart Grid technology, Mubeen et al. (2017) Bigdata.
The subtractive clustering algorithm method is usually used as a preprocessing step in other algorithms to find the number and position of the cluster center (Rezaeian et al. 2017), including: Kokkinos & Margaritis (2018) automatic selection of exemplars points in the affinity propagation algorithm. Yang et al. (2010) automatic selection of the number and position of cluster centers on fuzzy c-means. Rezaeian et al. (2017) using the subtractive clustering algorithm method on the K-means algorithm and the fuzzy c-means algorithm.
However, in its implementation the subtractive clustering algorithm method requires 4 (four) parameters, namely: radius (r), squash factor (q), accept ratio ( ̅ ) and reject ratio ( ) (Chiu, 1994). The four parameters are default values (Liang et al. 2017). According to (Sarin et al. 2019), the radius (r) parameter has an important role in optimizing the subtractive clustering algorithm method. So far, the value of the radius parameter has been determined based on "trial and error". To produce good clustering, the grouping process must be carried out several times with different radius parameter values.
Several studies have been conducted to estimate the value of the radius parameter and the validity of the clustering results in the subtractive clustering algorithm method, including: Shieh et al. (2013) using a genetic algorithm. Sarin et al. (2019) using linear regression. Shieh & Kuo (2011) proposed a new validity index from the combination of compactness and separation to measure the clustering results of the subtractive clustering algorithm method. Shieh (2014) combines compactness, separation and partition index to measure the clustering results of the subtractive clustering algorithm method.
Silhouette index (Rousseeuw, 1987) is a technique to measure clustering quality in crisp clustering which combines compactness and separation results. Campello & Hruschka (2006) proposed the fuzzy silhouette index method to analyze fuzzy clustering. Subbalakshmi et al. (2015) used a fuzzy silhouette index to determine the optimal number of clusters in the fuzzy c-means algorithm using dynamic data. Chiu (1994) proposed a subtractive clustering algorithm which is a modification of the mountain clustering algorithm (Yager & Filev, 1992). Subtractive clustering algorithm, determines the data point that has the highest density to the points (surrounding data) as a candidate for the center of the cluster. The data point with the most neighbors will be selected as the center of the cluster. The data point that is the center of the cluster will be reduced in density. Then the algorithm looks for another data point that has the most neighbors to be the center of the next cluster. This process is repeated until all data points are tested.

LITERATURE REVIEW Subtractive Clustering Algorithm
In practice, the subtractive clustering algorithm requires 4 (four) parameters (Chiu, 1994), namely: radius (r), squash factor (q), accept ratio ( ̅ ) and reject ratio ( ). The radius parameter (r) is a vector that will determine how much influence the cluster center has on each data point that is a candidate for the cluster center. The squash factor (q) parameter is used to avoid cluster centers having close densities. The accept ratio ( ̅ ) and reject ratio ( ) parameters are comparison parameters that determine whether or not a data point (candidate cluster center) will be selected as the cluster center.

Randwick International of Social Science Journal
According to Wu & Luo (2017), the subtractive clustering algorithm includes three main steps: Suppose there are n data points { 1 , 2 , … , } in an M-dimensional space. Assuming the data is normal.
Step 1: Calculate the density (potential) of the data points.
Step 2: Revise the potential of each data point Step 3: In this step, after the density of each data point is revised. Then, look for the data point that has the highest potential to be selected as the center of the second cluster 2 . This process is repeated until a predetermined potential threshold is obtained, namely: The results of this subtractive clustering algorithm are cluster center matrices ( ) and sigma ( ) which will be used to determine the fuzzy membership function parameter values. In this study, the Gauss membership function was used (Shieh, 2014).

Fuzzy Silhouette Index
The Silhouette index method introduced by Rousseeuw (1987) is used to measure the quality of the crisp cluster which combines the values of compactness and separation.
The value range of the silhouette index is -1 to +1. If the silhouette index value is close to 1, it indicates that the data is right in the cluster, if the silhouette index value is 0 or close to 0 then the data position is on the border of the two clusters. Silhouette index value is calculated by (Rousseeuw, 1986). Campello et al. (2006) proposed a silhouette index for fuzzy partitioning by including fuzzy membership values in evaluating clusters. The fuzzy partition is validated using a silhouette index by including the fuzzification process. In the fuzzification process, the fuzzy membership matrix is converted into a crisp matrix. In the fuzzy silhouette index, the average value of the silhouette cluster is calculated using a weighted average. Each data point value is assigned a weighted value based on the reduction in the value of the largest cluster membership in one cluster. Suppose is a data point that has the first and second highest membership values, denoted and , then the weight is calculated using the equation:

Randwick International of Social Science Journal
While the fuzzy silhouette index is calculated using the equation:

RESEARCH METHODS
In this study, modifications were made to the parameter values of accept ratio ( ̅ ) and reject ratio ( ) in the subtractive clustering algorithm. In the standard subtractive clustering algorithm, the accept ratio ( ̅ ) and reject ratio ( ) parameter values are used as comparison parameters which determine whether a data point (candidate cluster center) will be selected or not as the cluster center. Meanwhile, this study uses the clustering partition method as a parameter value to determine whether a data point (candidate cluster center) will be selected or not as the cluster center, so that the influence of the radius ( ) parameter value on the subtractive clustering algorithm in generating clustering can be determined.

Research data
This study uses datasets obtained from the UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets.php, including the Iris dataset, Wholesale customers dataset, Abalone dataset and Banknote dataset.

Discussion
From the experiments that have been carried out on 4 datasets, the results have been obtained. The highest fuzzy silhouette average value for dataset 1 in the standard subtractive clustering algorithm with a radius parameter value ( ) of 0.50 is 0.7297 and the number of clusters 4 while with the proposed method, the highest fuzzy silhouette average value with a radius parameter value ( ) 0.35 is 0.9088 and the number of clusters 2. While in dataset 2, the highest fuzzy silhouette average value for dataset 2 in the standard subtractive clustering algorithm with a radius ( ) 0.40 parameter value of 0.6196 and the number of clusters 2, while with the proposed method, the highest fuzzy silhouette average value with the parameter value radius ( ) 0.40 is 0.6742 and the number of clusters 3. -183-While in dataset 3, the highest average fuzzy silhouette value for dataset 3 in the standard subtractive clustering algorithm with a parameter value of radius ( ) 0.50 is 0.7434 and the number of clusters 3, while with the proposed method, the average value of fuzzy silhouette is the highest. with the parameter value radius ( ) 0.50 is 0.7434 and the number of clusters 3. While in the last dataset, namely the fourth dataset, the highest fuzzy silhouette average value for dataset 4 in the standard subtractive clustering algorithm with a radius ( ) parameter value of 0.30 is 0.5989 and the number of clusters 14 while with the proposed method, the average value The highest fuzzy silhouette with a parameter value of radius ( ) 0.50 is 0.6630 and the number of clusters 2.

CONCLUSION
From all the experiments carried out, the value of the radius ( ) parameter has not fully guaranteed to increase the fuzzy silhouette value, this is because the subtractive clustering algorithm determining the cluster center point is influenced by four parameter values, namely the radius parameter value ( ), the squash factor parameter value ( ), accept ratio ( ̅ ) and reject ratio ( ). The test affects four parameter values, namely the radius parameter value ( ), the squash factor parameter value ( ), accept ratio ( ̅ ) and the reject ratio ( ) pembentukan in the formation of clustering in the subtractive clustering algorithm. Comparison or application of other clustering evaluation methods on datasets that have a larger amount of data for better clustering results against clustering results in the subtractive clustering algorithm.