Keywords

Large Language Models (LLMs), Edge Proposal, Node Generation, Prompt Mutation, Synthetic Corpora, Graph-Structured Text

Abstract

Natural Language Processing (NLP) and Artificial Intelligence (AI) have evolved from brittle rule-based engines through data-driven statistical models to today’s transformer architectures whose multi-head attention enables rich contextual reasoning. Against this backdrop, this dissertation unifies three complementary investigations that deepen our understanding of how large language models (LLMs) can reason over, generate, and strategically adapt language within networked settings. First, instruction-tuned LLMs are recast as latent-relationship detectors: by prompting models to hypothesize links between text-described entities, we recover edge sets that reconstruct social, thematic, and citation graphs with high precision, revealing how attention distributions encode topological cues. Recognizing the difficulty of benchmarking such capabilities on noisy real-world corpora, the work introduces a suite of synthetic grammar generators—spanning Markov, tree, and graph formalisms—that yield corpora with analytically tractable entropy and cross-entropy complexity. These controllable datasets expose failure modes and scaling laws that remain hidden when evaluation relies solely on organic text while also providing a reproducible laboratory for probing biases introduced by the transformer mechanism itself. Finally, the dissertation closes the loop with an adaptive node-generation framework in which a candidate bio is iteratively rewritten through prompt-space mutations, each assessed by an independent LLM evaluator, until its predicted connectivity with a target community is maximized. This self-play paradigm shows that textual attributes can be optimized in situ, blurring the boundary between content creation and network embedding. Together, this research offers three complementary advances: (i) it reframes instruction-tuned LLMs as latent-relationship detectors that recover graph structure from text; (ii) it supplies a controlled suite of synthetic corpora whose tractability exposes scaling laws and failure modes unseen in organic data; and (iii) it unveils an adaptive node-generation protocol that uses prompt-space search to craft text surgically integrated into target communities.

Completion Date

2025

Semester

Summer

Committee Chair

Mantzaris, Alexander

Degree

Doctor of Philosophy (Ph.D.)

College

College of Sciences

Department

Department of Statistics & Data Science

Format

PDF

Identifier

DP0029542

Language

English

Document Type

Thesis

Campus Location

Orlando (Main) Campus

Share

COinS