The first part of the dissertation studies a density deconvolution problem with small Berkson errors. In this setting, the data is not available directly but rather in the form of convolution and one needs to estimate the convolution of the unknown density with Berkson errors. While it is known that the Berkson errors improve the precision of the reconstruction, it does not necessarily happen when Berkson errors are small. Furthermore, the choice of bandwidth in density estimation has been an open problem so far. In this dissertation, we provide an in-depth study of the choice of the bandwidth which leads to the optimal error rates. The second part of the dissertation studies a generative network model, the so-called Popularity Adjusted Block Model (PABM) introduced by Sengupta and Chen (2018). The PABM generalizes popular graph generative models such as the Stochastic Block Model (SBM) and the Degree Corrected Block Model (DCBM). The advantages of the PABM is that, unlike mixed membership models or the DCBM, it does not rely on any identifiability conditions, and leads to more flexible spectral properties. We expand the theory of PABM to the case of an arbitrary number of communities which possibly grows with a number of nodes in the network and is not assumed to be known. We produce the estimators of the probability matrix and the community structure and provide non-asymptotic upper bounds for the estimation and the clustering errors. Majority of real-life networks are sparse, in the sense that they have few high degree nodes while the rest of the nodes have low degrees. Since the SBM and DCBM do not allow to set any probabilities of connections to zero, they model sparsity by enforcing the maximum connection probability to be bounded above by a small quantity which precludes existence of high degree nodes. On the contrary, the PABM allows modeling some of the probabilities of connections between the nodes as identical zeros while maintaining the rest of the probabilities non-negligible. This leads to the Sparse Popularity Adjusted Block Model (SPABM). The SPABM reduces the size of parameter set and leads to improved precision of estimation and clustering. We produce the estimators of the probability matrix and the community structure in SPABM. Finally, we provide non-asymptotic upper bounds for the estimation and the clustering errors.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Sciences
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Rimal, Ramchandra, "Estimation and Clustering in Network and Indirect Data" (2020). Electronic Theses and Dissertations, 2020-. 276.