Date of Award

December 2020

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical Engineering and Computer Science

Advisor(s)

Qinru Qiu

Subject Categories

Engineering

Abstract

During the last decades, most collective information has been digitized to form an immense database distributed across the Internet. This can also be referred to as Big data, a collection of data that is vast in volume and still growing with time. Nowadays, we can say that Big data is everywhere. We might not even realize how much it affects our daily life as it is applied in many ways, ranging from online shopping, music streaming, TV streaming, travel and transportation, energy, fighting crime, to health care. Many organizations and companies have been collecting and analyzing large volumes of data to solve domain-specific problems or making business decisions. One of the powerful tools that can be used to extract value from Big data is Deep learning, a type of machine learning algorithm inspired by the structure and function of the human brain called artificial neural networks that learn from large amounts of data. Deep learning has been widely used and applied in many research fields such as natural language processing, IoT applications, and computer vision. In this thesis, we introduce three Deep Neural Networks that used to learn semantic information from different types of data and a design guideline to accelerate Neural Network Layer on a general propose computing platform.

First, we focus on the text type data. We proposed a new feature extraction technique to preprocess the dataset and optimize the original Restricted Boltzmann Machine (RBM) model to generate the more meaningful topic that better represents the given document. Our proposed method can improve the generated topic accuracy by up to 12.99% on Open Movie, Reuters, and 20NewsGroup datasets.

Moving from text to image type data and with additional click locations, we proposed a human in a loop automatic image labeling framework focusing on aerial images with fewer features for detection. The proposed model consists of two main parts, a prediction model and an adjustment model. The user first provides click locations to the prediction model to generate a bounding box of a specific object. The bounding box is then fine-tuned by the adjustment model for more accurate size and location. A feedback and retrain mechanism is implemented that allows the users to manually adjust the generated bounding box and provide feedback to incrementally train the adjustment network during runtime. This unique online learning feature enables the user to generalize the existing model to target classes not initially presented in the training set, and gradually improves the specificity of the model to those new targets during online learning.

Combining text and image type data, we proposed a Multi-region Attention-assisted Grounding network (MAGNet) framework that utilizes spatial attention networks for image-level visual-textual fusion preserving local (word) and global (phrase) information to refine region proposals with an in-network Region Proposal Network (RPN) and detect single or multiple regions for a phrase query. Our framework is independent of external proposal generation systems and without additional information, it can develop an understanding of the query phrase in relation to the image to achieve respectable results in Flickr30k entities and 12% improvement over the state-of-the-art in ReferIt game. Additionally, our model is capable of grounding multiple regions for a query phrase, which is more suitable for real-life applications.

Although Deep neural networks (DNNs) have become a powerful tool, it is highly expensive in both computational time and storage cost. To optimize and improve the performance of the network while maintaining the accuracy, the block-circulant matrix-based (BCM) algorithm has been introduced. It has been proven to be highly effective when implemented using customized hardware, such as FPGAs. However, its performance suffers on general purpose computing platforms. In certain cases, using the BCM does not improve the total computation time of the networks at all. With this problem, we proposed a parallel implementation of the BCM layer, and guidelines that generally lead to better implementation practice is provided. The guidelines run across popular implementation language and packages including Python, numpy, intel-numpy, tensorflow, and nGraph.

Access

Open Access

Recommended Citation

Pugdeethosapol, Krittaphat, "Learning Semantic Information from Multimodal Data using Deep Neural Networks" (2020). Dissertations - ALL. 1259.
https://surface.syr.edu/etd/1259

Download

Included in

Engineering Commons

COinS

Dissertations - ALL

Learning Semantic Information from Multimodal Data using Deep Neural Networks

Date of Award

Degree Type

Degree Name

Department

Advisor(s)

Subject Categories

Abstract

Access

Recommended Citation

Included in

Browse

Search

Author Resources

Dissertations - ALL

Learning Semantic Information from Multimodal Data using Deep Neural Networks

Author

Date of Award

Degree Type

Degree Name

Department

Advisor(s)

Subject Categories

Abstract

Access

Recommended Citation

Included in

Share

Browse

Search

Author Resources