Date of Award

June 2020

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical Engineering and Computer Science

Advisor(s)

Biao Chen

Subject Categories

Engineering

Abstract

This dissertation studies two machine learning problems: 1) clustering of independent and identically generated random sequences, and 2) dimensionality reduction for classification problems.

For sequence clustering, the focus is on large sample performance of classical clustering algorithms, including the k-medoids algorithm and hierarchical agglomerative clustering (HAC) algorithms. Data sequences are generated from unknown continuous distributions that are assumed to form clusters according to some well-defined distance metrics. The goal is to group data sequences according to their underlying distributions with little or no prior knowledge of both the underlying distributions as well as the number of clusters. Upper bounds on the clustering error probability are derived for the k-medoids algorithm and a class of HAC algorithms under mild assumptions on the distribution clusters and distance metrics. For both cases, the error probabilities are shown to decay exponentially fast as the number of samples in each data sequence goes to infinity. The obtained error exponent bound has a simple form when either the Kolmogrov-Smirnov distance or the maximum mean discrepancy is used as the distance metric. Tighter upper bound on the error probability of the single-linkage HAC algorithm is derived by taking advantage of the simplified metric updating scheme. Numerical results are provided to validate the analysis.

For dimensionality reduction, the focus is on classification problem where label information in the training data can be leveraged for improved learning performance. A supervised dimensionality reduction method maximizing the difference of average projection energy of samples with different labels is proposed. Both synthetic data and WiFi sensing data are used to validate the effectiveness of the proposed method. The numerical results show that the proposed method outperforms existing supervised dimensionality reduction approaches based on Fisher discriminant analysis (FDA) and Hilbert-Schmidt independent criterion (HSIC). When kernel trick is applied to all three approaches, the performance of the proposed dimensionality reduction method is comparable to FDA and HSIC and is superior over unsupervised principal component analysis.

Access

Open Access

Recommended Citation

Wang, Tiexing, "On Sequence Clustering and Supervised Dimensionality Reduction" (2020). Dissertations - ALL. 1285.
https://surface.syr.edu/etd/1285

Download

Included in

Engineering Commons

COinS

Dissertations - ALL

On Sequence Clustering and Supervised Dimensionality Reduction

Date of Award

Degree Type

Degree Name

Department

Advisor(s)

Subject Categories

Abstract

Access

Recommended Citation

Included in

Browse

Search

Author Resources

Dissertations - ALL

On Sequence Clustering and Supervised Dimensionality Reduction

Author

Date of Award

Degree Type

Degree Name

Department

Advisor(s)

Subject Categories

Abstract

Access

Recommended Citation

Included in

Share

Browse

Search

Author Resources