ORCID

0000-0002-3357-9720

Keywords

Diffusion Modes, Video Editing, 3D Scene Editing

Abstract

This doctoral dissertation explores efficient applications of diffusion models in video and 3D scene editing, addressing the high computational demands traditionally associated with these tasks. The research introduces several innovative methods to make diffusion models more practical in real-world video and 3D scenarios while preserving high output quality.

The dissertation first presents parametric efficiency techniques for video understanding, followed by SAVE, a novel approach that leverages Text-to-Image (T2I) diffusion models for video editing, minimizing resource requirements. Building on this, the study introduces LatentEditor, a method that performs fine-grained, localized 3D scene editing in a latent space, thus enhancing Neural Radiance Fields (NeRF) adaptability for 3D editing.

Further, 3DEgo is introduced for photorealistic 3D scene synthesis from monocular videos. This approach simplifies the 3D editing pipeline, merging pre-editing with 3D Gaussian Splatting in a single-stage workflow, significantly improving multi-view consistency without extra training.

Lastly, we introduce EVLM, a Vision-Language Model (VLM) developed to address current limitations in visual editing tasks. EVLM incorporates Chain-of-Thought (CoT) reasoning and alignment techniques to interpret ambiguous text instructions, utilize reference visual cues, and produce optimized, context-aware editing instructions.

Through these developments, this dissertation contributes novel, computationally efficient techniques that advance the practical usability of diffusion models in video and 3D/4D scene editing, making a substantial impact on their potential for real-world applications.

Completion Date

2024

Semester

Fall

Committee Chair

Chen, Chen

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Department of Electrical and Computer Engineering

Degree Program

Computer Engineering

Format

PDF

Identifier

DP0029000

Language

English

Release Date

12-15-2024

Access Status

Dissertation

Campus Location

Orlando (Main) Campus

Accessibility Status

PDF accessibility verified using Adobe Acrobat Pro Accessibility Checker

Share

COinS