Date of Award

12-24-2025

Date Published

January 2026

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical Engineering and Computer Science

Advisor(s)

Senem Velipasalar

Subject Categories

Computer Engineering | Engineering

Abstract

Deep learning for 3D point cloud analysis has made significant progress, yet several critical challenges, including the following, remain underexplored: (1) traditional max-pooling operations discard a substantial portion of learned features, resulting in information loss and inefficient use of computational resources; (2) existing few-shot point cloud classification models lack robustness when faced with occlusion, missing points, and limited training data; (3) semantic segmentation methods often fail to fully exploit background–foreground interactions, leading to reduced accuracy. Moreover, in domains such as gait recognition and visual program synthesis, research has been largely dominated by 2D-based approaches, leaving the potential of point cloud analysis insufficiently leveraged. To address these issues, this thesis introduces a series of novel methods. First, a Recycling Max-Pooling (RMP) module is proposed to recycle features discarded during the max-pooling operation, thereby improving classification and segmentation accuracy across different benchmarks. Second, a projection-based backbone, referred to as the ViewNet, is presented. With a novel view pooling mechanism, ViewNet enhances few-shot point cloud classification by aggregating multi-view features to combat occlusion. Third, a Dynamic Point Feature Aggregation Network (DPFA-Net) is proposed. With background–foreground exploitation strategies, DPFA-Net significantly improves semantic segmentation efficiency and accuracy. Fourth, a new gait recognition approach, coined as GaitPoint is presented. In the domain of gait recognition, most methods rely on silhouettes or skeletons in 2D. Different from these traditional approaches, GaitPoint treats the set of skeleton keypoints as 3D point clouds and integrates them with silhouettes. This approach demonstrates that point cloud analysis can robustly improve gait recognition under appearance and viewpoint variations. Finally, in the area of visual question answering (VQA), we present the CobraVPS (Code Template Optimization for Better Question Reasoning Accuracy with Visual Program Synthesis). While most visual program synthesis approaches focus solely on logical correctness in 2D vision tasks, our work demonstrates that optimizing pseudo-code templates for each query type improves accuracy. Furthermore, the flexibility of visual program synthesis enables its application to 3D point clouds. Collectively, these contributions advance the state of the art in point cloud learning, cross-modality recognition, and visual program synthesis. They offer both theoretical insights and practical frameworks that address long-standing limitations in efficiency, robustness, and generalization—while also opening new directions for applying point cloud analysis to domains historically dominated by 2D-focused methods.

Access

Open Access

Share

COinS