Date of Award
6-27-2025
Date Published
August 2025
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Electrical Engineering and Computer Science
Advisor(s)
Senem Velipasalar
Subject Categories
Computer Sciences | Physical Sciences and Mathematics
Abstract
3D data, whether represented as point clouds, volumetric data or meshes, plays a critical role in domains such as autonomous driving, robotics, and medical imaging. However, the complexity of 3D data acquisition and the high cost of annotation often make it impractical to curate large, fully labeled 3D datasets. Furthermore, the unstructured nature of point clouds and the high dimensionality of volumetric data pose additional challenges for designing effective deep learning models. While existing architectures, like PointNet and DGCNN, have made significant progress in learning directly from raw 3D inputs, they typically rely on data-rich scenarios. This dissertation focuses on these challenges, and presents data-efficient 3D deep learning frameworks that reduce reliance on extensive supervision. Specifically, it explores methods for learning from limited labeled data in both point cloud and volumetric settings through few-shot learning, zero-shot learning, and sparse-view reconstruction. We begin with the task of few-shot point cloud classification, where we propose a Cross-Modality Feature Fusion Network that combines 3D point features with depth image features. A support-query mutual attention module is introduced to enhance feature alignment, yielding robust performance under occlusion and missing data. We further introduce SimpliMix, a lightweight manifold mixup strategy that improves generalization by regularizing intermediate features, particularly in cross-domain few-shot settings. Next, we focus on zero-shot point cloud semantic segmentation, which aims to assign labels to points belonging to unseen object categories without requiring annotated samples. We present 3D-PointZshotS, a framework that learns latent geometric prototypes and aligns them with semantic knowledge through a similarity-based re-representation mechanism. This allows the model to generalize effectively to previously unseen classes. To extend data-efficient learning to the volumetric domain, we address the challenge of sparse-view cone-beam computed tomography (CBCT) reconstruction, which aims to recover a high-quality 3D volume from a limited number of 2D X-ray projections. We introduce Trans-CBCT, a model that adopts TransUNet as image encoder, aggregates its multi-scale feature maps to produce a rich, multi-level features, queries view‑specific features for each 3D point, and incorporates a lightweight attenuation‑prediction head. Building on our Trans-CBCT, we propose Trans2-CBCT, a dual-transformer architecture that further integrates a point transformer module to enhance local smoothness and spatial consistency. Experimental results demonstrate that both Trans-CBCT and Trans2-CBCT achieve superior reconstruction quality under extreme data sparsity (as low as six X-ray images), with Trans2-CBCT delivering sharper structural details and achieving even higher PSNR and SSIM scores. In summary, this thesis presents a unified investigation into learning robust and generalizable 3D representations from limited data across both point cloud and volumetric modalities. These contributions pave the way for deploying 3D deep learning in real-world applications where labeled data is scarce or costly to obtain.
Access
Open Access
Recommended Citation
Yang, Minmin, "Data-Efficient 3D Deep Learning" (2025). Dissertations - ALL. 2167.
https://surface.syr.edu/etd/2167
