ORCID
0000-0002-3357-9720
Keywords
Diffusion Modes, Video Editing, 3D Scene Editing
Abstract
This doctoral dissertation explores efficient applications of diffusion models in video and 3D scene editing, addressing the high computational demands traditionally associated with these tasks. The research introduces several innovative methods to make diffusion models more practical in real-world video and 3D scenarios while preserving high output quality.
The dissertation first presents parametric efficiency techniques for video understanding, followed by SAVE, a novel approach that leverages Text-to-Image (T2I) diffusion models for video editing, minimizing resource requirements. Building on this, the study introduces LatentEditor, a method that performs fine-grained, localized 3D scene editing in a latent space, thus enhancing Neural Radiance Fields (NeRF) adaptability for 3D editing.
Further, 3DEgo is introduced for photorealistic 3D scene synthesis from monocular videos. This approach simplifies the 3D editing pipeline, merging pre-editing with 3D Gaussian Splatting in a single-stage workflow, significantly improving multi-view consistency without extra training.
Lastly, we introduce EVLM, a Vision-Language Model (VLM) developed to address current limitations in visual editing tasks. EVLM incorporates Chain-of-Thought (CoT) reasoning and alignment techniques to interpret ambiguous text instructions, utilize reference visual cues, and produce optimized, context-aware editing instructions.
Through these developments, this dissertation contributes novel, computationally efficient techniques that advance the practical usability of diffusion models in video and 3D/4D scene editing, making a substantial impact on their potential for real-world applications.
Completion Date
2024
Semester
Fall
Committee Chair
Chen, Chen
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Department of Electrical and Computer Engineering
Degree Program
Computer Engineering
Format
Identifier
DP0029000
Language
English
Release Date
12-15-2024
Access Status
Dissertation
Campus Location
Orlando (Main) Campus
STARS Citation
Khalid, Umar, "Effective and Efficient Use of Diffusion Models for Editing in Computer Vision" (2024). Graduate Thesis and Dissertation post-2024. 37.
https://stars.library.ucf.edu/etd2024/37
Accessibility Status
PDF accessibility verified using Adobe Acrobat Pro Accessibility Checker