In this Dissertation, we have investigated the underlying theories of probabilistic models for application in large scale machine learning tasks. First, we introduce the maximum probability theorem and its consequences. We present a theoretical framework of probabilistic learning derived from the Maximum Probability (MP) Theorem. In this probabilistic framework, a model is defined as an event in the probability space, and a model or the associated event - either the true underlying model or the parameterized model - have a quantified probability measure. This quantification of a model's probability measure is derived from the MP Theorem, where we have shown that an event's probability measure has an upper-bound given its conditional distribution on an arbitrary random variable. Through this alternative framework, the notion of model parameters is encompassed in the definition of the model or the associated event. Therefore, this framework deviates from the conventional approach of assuming a prior on the model parameters. Instead, the regularizing effects of assuming prior over parameters are imposed through maximizing probabilities of models or according to information theory, minimizing the information content of a model. The probability of a model in MP framework is invariant to reparameterization and is solely dependent on the model's likelihood function. Also, rather than maximizing the posterior in a conventional Bayesian setting, the objective function in our alternative framework is defined as the probability of set operations (e.g. intersection) on the event of the true underlying model and the event of the model at hand. The MP framework adds clarity to probabilistic learning through solidifying the definition of probabilistic models, quantifying their probabilities, and providing a visual understanding of objective functions. Furthermore, we discuss Finite "K"onvolutional Neural Networks (FKNN) as a step towards constructing a discrete counterpart to Convolutional Neural Networks (CNN). In FKNNs, the linear and non-linear components of the network are naturally derived and justified in terms of Bayes' Theorem. The building blocks of our network are classifiers operating on the domain of categorical distributions. This property enables the composition of Bayesian classifiers to construct more expressive models. The resulting composite model consists of linear and non-linear components, which are remarkably similar to modern CNNs and their variations, yet the roles of parameters, variables, and layers are less ambiguous from a statistical perspective. Parameters and variables represent categorical distributions in FKNNs, providing the potential for usage of statistical and information-theoretical methods. We further introduce two methods of parameter initialization, inspired by the natural parameterization of categorical distribution and the Jeffreys priors. Finally, we transform some well-known CNN architectures for image classification task into their FKNN counterparts and compare their performance. Experimental results show that the FKNNs and their corresponding CNN architecture exhibit comparable performances. The functional similarity of CNNs and FKNNs, the empirical results, and the explicit connection of FKNNs and Bayes' rule encourage the investigation of finite-state probabilistic models.


If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date





Foroosh, Hassan


Doctor of Philosophy (Ph.D.)


College of Engineering and Computer Science


Computer Science

Degree Program

Computer Science




CFE0009105; DP0026438





Release Date

February 2025

Length of Campus-only Access

3 years

Access Status

Doctoral Dissertation (Campus-only Access)

Restricted to the UCF community until February 2025; it will then be open access.