Keywords

Millimeter-Wave (mmWave) Beam Prediction; Multimodal Learning; Contrastive Learning; CLIP; Multimodal dataset; Channel Estimation; Sensor Fusion; Beamforming; mmWave Communication; DeepSense dataset

Abstract

This thesis presents a novel approach to beam prediction in wireless communication systems using a multimodal masked CLIP (Contrastive Language-Image Pre-training). We introduce a two-phase training methodology that first aligns representations across multiple sensor modalities—GPS, Radar, LiDAR, and RGB images—through masked contrastive learning, followed by task-specific fine-tuning for channel power reconstruction. Our approach adapts CLIP’s pre-training strategy to the domain of wireless signal modeling, enabling the model to learn rich, transferable features that capture the spatial and contextual dependencies of the beam distribution. Notably, the pre-training stage provides a substantial boost to overall performance, significantly improving the model's ability to infer full beam pattern characteristics. Experimental results demonstrate that our method outperforms traditional approaches, particularly in Non-Line-of-Sight (NLoS) conditions, where the learned multimodal embeddings enhance the model’s ability to reason about occluded or indirect signal paths. These findings suggest that multimodal masked CLIP not only strengthens beam prediction accuracy but also provides a robust foundation for real-world mmWave communication scenarios where environmental variability are prevalent.

Completion Date

2025

Semester

Summer

Committee Chair

Rahnavard, Nazanin

Degree

Master of Science in Electrical Engineering (M.S.E.E.)

College

College of Engineering and Computer Science

Department

Electrical Engineering

Format

PDF

Identifier

DP0029533

Language

English

Document Type

Thesis

Campus Location

Orlando (Main) Campus

Share

COinS