Date of Award

5-2013

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical Engineering and Computer Science

Advisor(s)

Kishan G. Mehrotra

Second Advisor

Raja P. Velu

Keywords

abnormal subsequence detection, anomalous time series detection, anomaly detection, data mining, outlier detection, time series detection

Subject Categories

Computer Engineering

Abstract

Anomaly or outlier detection problems are of considerable importance, arising frequently in diverse real-world applications such as finance and cyber-security. Several algorithms have been formulated for such problems, usually based on formulating a problem-dependent heuristic or distance metric. This dissertation proposes anomaly detection algorithms that exploit the notion of ``rank," expressing relative outlierness of different points in the relevant space, and exploiting asymmetry in nearest neighbor relations between points: a data point is ``more anomalous" if it is not the nearest neighbor of its nearest neighbors. Although rank is computed using distance, it is a more robust and higher level abstraction that is particularly helpful in problems characterized by significant variations of data point density, when distance alone is inadequate.

We begin by proposing a rank-based outlier detection algorithm, and then discuss how this may be extended by also considering clustering-based approaches. We show that the use of rank significantly improves anomaly detection performance in a broad range of problems.

We then consider the problem of identifying the most anomalous among a set of time series, e.g., the stock price of a company that exhibits significantly different behavior than its peer group of other companies. In such problems, different characteristics of time series are captured by different metrics, and we show that the best performance is obtained by combining several such metrics, along with the use of rank-based algorithms for anomaly detection.

In practical scenarios, it is of interest to identify when a time series begins to diverge from the behavior of its peer group. We address this problem as well, using an online version of the anomaly detection algorithm developed earlier.

Finally, we address the task of detecting the occurrence of anomalous sub-sequences within a single time series. This is accomplished by refining the multiple-distance combination approach, which succeeds when other algorithms (based on a single distance measure) fail.

The algorithms developed in this dissertation can be applied in a large variety of application areas, and can assist in solving many practical problems.

Access

Open Access

Share

COinS