An Automatic Merge Technique to Improve the Clustering Quality Performed by LAMDA
Por:
Morales, Luis, Aguilar, Jose
Publicada:
1 ene 2020
Resumen:
Clustering is a research challenge focused on discovering knowledge from
data samples whose goal is to build good quality partitions. In this
paper is proposed an approach based on LAMDA (Learning Algorithm for
Multivariable Data Analysis), whose most important features are: a) it
is a non-iterative fuzzy algorithm that can work with online data
streams, b) it does not require the number of clusters, c) it can
generate new partitions with objects that do not have enough similarity
with the preexisting clusters (incremental-learning). However, in some
applications, the number of created partitions does not correspond with
the number of desired clusters, which can be excessive or impractical
for the expert. Therefore, our contribution is the formalization of an
automatic merge technique to update the cluster partition performed by
LAMDA to improve the quality of the clusters, and a new methodology to
compute the Marginal Adequacy Degree that enhances the
individual-cluster assignment. The proposal, called LAMDA-RD, is applied
to several benchmarks, comparing the results against the original LAMDA
and other clustering algorithms, to evaluate the performance based on
different metrics. Finally, LAMDA-RD is validated in a real case study
related to the identification of production states in a gas-lift well,
with data stream. The results have shown that LAMDA-RD achieves a
competitive performance with respect to the other well-known algorithms,
especially in unbalanced benchmarks and benchmarks with an overlapping
of around 9%. In these cases, our algorithm is the best, reaching a
Rand Index (RI) >98%. Besides, it is consistently among the best for
all metrics considered (Silhouette coefficient, modification of the
Silhouette coefficient, WB-index, Performance Coefficient, among others)
in all case studies analyzed in this paper. Finally, in the real case
study, it is better in all the metrics.
Filiaciones:
Morales, Luis:
Escuela Politec Nacl, Dept Automatizac & Control Ind, Quito 170525, Ecuador
Aguilar, Jose:
Univ Los Andes, CEMISID, Escuela Ingn Sistemas, Merida 5101, Venezuela
Univ EAFIT, GIDITIC, Medellin 050021, Colombia
gold, Gold
|