Networks with community structure arise in many fields such as social science, biological science, and computer science. Stochastic block models are popular tools to describe such networks. For this reason, in this dissertation which is composed of two parts we explore some stochastic block models and the relationship between them. In the first part of the dissertation, we study the Popularity Adjusted Block Model (PABM) and introduce its sparse case, the Sparse Popularity Adjusted Block Model (SPABM). The SPABM is the only existing block model which allows to set some probabilities of connections to zero. For both the PABM and the SPABM, we produce the estimators of the probability matrix in the case of an arbitrary number of communities which possibly grows with a number of nodes in the network and is not assumed to be known. One of our main contributions is application of the Sparse Subspace Clustering (SSC) to partitioning the network into communities, the approach that is well known in Computer Vision but, to the best of our knowledge, has not been used for clustering network data. There is a variety of block models such as the Stochastic Block Model (SBM) and the Degree Corrected Block Model (DCBM) and the PABM. However, while this variety leads to a range of choices, the block models do not have a nested structure, in addition the DCBM requires identifiability assumptions for its fitting. There is also a substantial jump in the number of parameters from the DCBM to the PABM. Therefore, in the second part of the dissertation, we explore the relationship between the existing block models. We suggest a set of conditions on the DCBM that leads to a nested structure in block models, with the Erdos- Renyi model being the simplest and the PABM the most complex. Moreover, we introduce the Heterogeneous Block Model (HBM) that is more complicated than DCBM but has fewer unknown parameters than the PABM, thus bridging the gap between the DCBM and the PABM. The HBM is based on partitioning the network into the mega-communities that, in turn, are subdivided into communities, where the communities are distinguished by the average connection probabilities between them while the mega-communities are determined by the heterogeneity of the probabilities of connections. This results in formulation of a hierarchy of block model which does not rely on arbitrary identifiability conditions, treats the SBM, the DCBM and the PABM as its particular cases with specific parameter values, and also allows a multitude of versions that are more complicated than DCBM but have fewer unknown parameters than the PABM. The latter enables one to carry out clustering and estimation without preliminary testing which of the block models is really true. The theories in this dissertation are supplemented by simulation studies and real data examples.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu.
Doctor of Philosophy (Ph.D.)
College of Sciences
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Noroozi, Majid, "Estimation and Clustering in Block Models" (2020). Electronic Theses and Dissertations, 2020-. 261.