Keywords
Object counting, class-agnostic, vision transformer, vision-language, infrared target detection
Abstract
Object counting is a process of determining the quantity of specific objects in images. Accurate object counting is key for various applications in image understanding. The common applications are traffic monitoring, crowd management, wildlife migration monitoring, cell counting in medical images, plant and insect counting in agriculture, etc. Occlusions, complex backgrounds, changes in scale, and variations in object appearance in real-world settings make object counting challenging. This dissertation explores a progression of techniques to achieve robust localization and counting under diverse image modalities.
The exploration initiates with addressing the challenges of vehicular target localization in cluttered environments using infrared (IR) imagery. We propose a network, called TCRNet-2, that processes target and clutter information in two parallel channels and then combines them to optimize the target-to-clutter ratio (TCR) metric. Next, we explore class-agnostic object counting in RGB images using vision transformers. The primary motivation for this work is that most current methods excel at counting known object types but struggle with unseen categories. To solve these drawbacks, we propose a class-agnostic object counting method. We introduce a dual-branch architecture with interconnected cross-attention that generates feature pyramids for robust object representations, and a dedicated feature aggregator module that further improves performance. Finally, we propose a novel framework that leverages vision-language models (VLM) for zero-shot object counting. While our earlier class-agnostic counting method demonstrates high efficacy in generalized counting tasks, it relies on user-defined exemplars of target objects, presenting a limitation. Additionally, the previous zero-shot counting method was a reference-less approach, which limits the ability to control the selection of the target object of interest in multi-class scenarios. To address these shortcomings, we propose to utilize vision-language models for zero-shot counting where object categories of interest can be specified by text prompts.
Completion Date
2024
Semester
Summer
Committee Chair
Da Vitoria Lobo, Niels
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Department of Computer Science
Degree Program
Computer Science
Format
application/pdf
Release Date
8-15-2025
Length of Campus-only Access
1 year
Access Status
Doctoral Dissertation (Campus-only Access)
Campus Location
Orlando (Main) Campus
STARS Citation
Jiban, Md Jibanul Haque, "Toward Robust Class-Agnostic Object Counting" (2024). Graduate Thesis and Dissertation 2023-2024. 286.
https://stars.library.ucf.edu/etd2023/286
Accessibility Status
Meets minimum standards for ETDs/HUTs
Restricted to the UCF community until 8-15-2025; it will then be open access.