Date of Award
12-24-2025
Date Published
January 2026
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Mathematics
Advisor(s)
Pinyuen Chen
Second Advisor
Jack Graver
Keywords
Automatic clustering;Cluster analysis;Clustering;Density-based clustering;Graph-based clustering;Outlier detection
Abstract
While numerous clustering algorithms are available, a substantial portion has limitations in effectively handling clusters with diverse shapes and densities or requires hyperparameter tuning prior to algorithm utilization. A main limitation of many existing algorithms is their inability to automatically determine the number of clusters since, unlike classification, clustering does not naturally provide this information. Viewing points as vertices and distances between points as edges, it is possible to construct the Euclidean graph and to build multiple Euclidean minimum spanning trees (MST), where each tree spans data points from a particular cluster. Algorithm starts with building MST for the Euclidean graph on the entire set of points using Prim’s algorithm and stops when no more appropriate edges can be added to the current tree. A stopping rule for the appropriate edges is based on a sample version of Cantelli’s inequality. The described algorithm executes clustering in a single stage and does not require to predefine the number of clusters or to tune any hyperparameters. As a result, no prior knowledge of the data or additional steps are necessary to perform clustering. Moreover, using a graph-based approach for clustering produces additional insights for cluster interpretation, which is another common challenge in cluster analysis.
Access
Open Access
Recommended Citation
Shevchenko, Elena, "Automatic Interpretable Prim’s Density-based Clustering (AIPDC)" (2025). Dissertations - ALL. 2247.
https://surface.syr.edu/etd/2247
