Date of Award

Spring 5-22-2021

Degree Type


Degree Name

Doctor of Philosophy (PhD)


Electrical Engineering and Computer Science


Senem Velipasalar

Second Advisor

Leanne Hirshfield


Deep Learning, fNIRS, Multi-Class, Multi_Labeling, Spatiotemporal

Subject Categories

Electrical and Computer Engineering | Engineering


Human senses allow for the detection of simultaneous changes in our environments. An unobstructed field of view allows us to notice concurrent variations in different parts of what we are looking at. For example, when playing a video game, a player, oftentimes, needs to be aware of what is happening in the entire scene. Likewise, our hearing makes us aware of various simultaneous sounds occurring around us. Human perception can be affected by the cognitive ability of the brain and acuity of the senses. This is not a factor with machines. As long as a system is given a signal and instructed how to analyze this signal and extract useful information, it will be able to complete this task repeatedly with enough processing power.

Automated and simultaneous detection of activity in machine learning requires the use of multi-labels. In order to detect concurrent occurrences spatially, the labels should represent the regions of interest for a particular application. For example, in this thesis, the regions of interest will be either different quadrants of a parking lot as captured on surveillance videos, four auscultation sites on patients' lungs, or the two sides of the brain's motor cortex (left and right). Since the labels, within the multi-labels, will be used to represent not only certain spatial locations but also different levels or types of occurrences, a multi-class/multi-level schema is necessary. In the first study, each label is appointed one of three levels of activity within the specific quadrant. In the second study, each label is assigned one of four different types of respiratory sounds. In the third study, each label is designated one of three different finger tapping frequencies.

This novel multi-labeling/multi-class schema is one part of being able to detect useful information in the data. The other part of the process lies in the machine learning algorithm, the network model. In order to be able to capture the spatiotemporal characteristics of the data, selecting Convolutional Neural Network and Long Short Term Memory Network-based algorithms as the basis of the network is fitting.

The following classifications are described in this thesis:

1. In the first study, one of three different motion densities are identified simultaneously in four quadrants of two sets of surveillance videos. Publicly available video recordings are the spatiotemporal data.

2. In the second study, one of four types of breathing sounds are classified simultaneously in four auscultation sites. The spatiotemporal data are publicly available respiratory sound recordings.

3. In the third study, one of three finger tapping rates are detected simultaneously in two regions of interest, the right and left sides of the brain's motor cortex. The spatiotemporal data are fNIRS channel readings gathered during an index finger tapping experiment.

Classification results are based on testing data which is not part of model training and validation. The success of the results is based on measures of Hamming Loss and Subset Accuracy as well Accuracy, F-Score, Sensitivity, and Specificity metrics. In the last study, model explanation is performed using Shapley Additive Explanation (SHAP) values and plotting them on an image-like background, a representation of the fNIRS channel layout used as data input. Overall, promising findings support the use of this approach in classifying spatiotemporal data with the interest of detecting different levels or types of occurrences simultaneously in several regions of interest.


Open Access

Available for download on Tuesday, June 21, 2022