Keywords
Large Language Models (LLMs), Edge Proposal, Node Generation, Prompt Mutation, Synthetic Corpora, Graph-Structured Text
Abstract
Natural Language Processing (NLP) and Artificial Intelligence (AI) have evolved from brittle rule-based engines through data-driven statistical models to today’s transformer architectures whose multi-head attention enables rich contextual reasoning. Against this backdrop, this dissertation unifies three complementary investigations that deepen our understanding of how large language models (LLMs) can reason over, generate, and strategically adapt language within networked settings. First, instruction-tuned LLMs are recast as latent-relationship detectors: by prompting models to hypothesize links between text-described entities, we recover edge sets that reconstruct social, thematic, and citation graphs with high precision, revealing how attention distributions encode topological cues. Recognizing the difficulty of benchmarking such capabilities on noisy real-world corpora, the work introduces a suite of synthetic grammar generators—spanning Markov, tree, and graph formalisms—that yield corpora with analytically tractable entropy and cross-entropy complexity. These controllable datasets expose failure modes and scaling laws that remain hidden when evaluation relies solely on organic text while also providing a reproducible laboratory for probing biases introduced by the transformer mechanism itself. Finally, the dissertation closes the loop with an adaptive node-generation framework in which a candidate bio is iteratively rewritten through prompt-space mutations, each assessed by an independent LLM evaluator, until its predicted connectivity with a target community is maximized. This self-play paradigm shows that textual attributes can be optimized in situ, blurring the boundary between content creation and network embedding. Together, this research offers three complementary advances: (i) it reframes instruction-tuned LLMs as latent-relationship detectors that recover graph structure from text; (ii) it supplies a controlled suite of synthetic corpora whose tractability exposes scaling laws and failure modes unseen in organic data; and (iii) it unveils an adaptive node-generation protocol that uses prompt-space search to craft text surgically integrated into target communities.
Completion Date
2025
Semester
Summer
Committee Chair
Mantzaris, Alexander
Degree
Doctor of Philosophy (Ph.D.)
College
College of Sciences
Department
Department of Statistics & Data Science
Format
Identifier
DP0029542
Language
English
Document Type
Thesis
Campus Location
Orlando (Main) Campus
STARS Citation
Gonzalez, Nathan, "Large Language Models and Networks: Edge Proposal, Synthetic Corpora, and Adaptive Node Generation" (2025). Graduate Thesis and Dissertation post-2024. 300.
https://stars.library.ucf.edu/etd2024/300