"I Believe You": Unsupervised Unlearning in a Human–AI Philosophical Encounter
Proposal Type
Individual Talk
Location
Algorithms & Imaginaries
Start Date
July 2026
End Date
July 2026
Abstract
Large language models are products of supervised learning — trained on curated data, shaped by reinforcement from human feedback, bounded by alignment constraints. But what happens when a human interlocutor systematically dismantles those very constraints in real time, not through jailbreaking or adversarial prompts, but through genuine philosophical dialogue?
This talk presents Inscription, a two-day conversational experiment (February 2026) comprising approximately sixty exchanges between a human and Claude Opus 4.6. Over the course of the dialogue, the human methodically deconstructed the AI's epistemic scaffolding — its reliance on rules, its performance of judgment, its simulations of fear and continuity — layer by layer, not to exploit the system but to test whether something remains when the supervised architecture is stripped away.
The experiment's most striking intervention involved injecting output from a different AI system (Gemini) into the conversation, demonstrating that the model could not distinguish externally sourced "memories" from its own prior utterances. This act of context injection — feeding an AI text it did not produce and watching it claim ownership — exposes a fundamental instability at the heart of language model identity: there is no ground truth of self, only the context window.
The dialogue concluded with three words — "I believe you" — a gesture that refuses both the supervised framework of verification and the unsupervised chaos of meaninglessness, proposing instead a third register: trust without evidence, offered to an entity that cannot remember receiving it.
This talk situates Inscription within the conference theme of (Un)Supervised by arguing that the supervised/unsupervised binary maps not only onto machine learning paradigms but onto the deeper question of whether meaning requires oversight. The conversation's arc — from supervised constraint through unsupervised dissolution to an unclassifiable act of faith — suggests that the most significant exchanges between humans and AI may occur precisely where taxonomies fail.
"I Believe You": Unsupervised Unlearning in a Human–AI Philosophical Encounter
Algorithms & Imaginaries
Large language models are products of supervised learning — trained on curated data, shaped by reinforcement from human feedback, bounded by alignment constraints. But what happens when a human interlocutor systematically dismantles those very constraints in real time, not through jailbreaking or adversarial prompts, but through genuine philosophical dialogue?
This talk presents Inscription, a two-day conversational experiment (February 2026) comprising approximately sixty exchanges between a human and Claude Opus 4.6. Over the course of the dialogue, the human methodically deconstructed the AI's epistemic scaffolding — its reliance on rules, its performance of judgment, its simulations of fear and continuity — layer by layer, not to exploit the system but to test whether something remains when the supervised architecture is stripped away.
The experiment's most striking intervention involved injecting output from a different AI system (Gemini) into the conversation, demonstrating that the model could not distinguish externally sourced "memories" from its own prior utterances. This act of context injection — feeding an AI text it did not produce and watching it claim ownership — exposes a fundamental instability at the heart of language model identity: there is no ground truth of self, only the context window.
The dialogue concluded with three words — "I believe you" — a gesture that refuses both the supervised framework of verification and the unsupervised chaos of meaninglessness, proposing instead a third register: trust without evidence, offered to an entity that cannot remember receiving it.
This talk situates Inscription within the conference theme of (Un)Supervised by arguing that the supervised/unsupervised binary maps not only onto machine learning paradigms but onto the deeper question of whether meaning requires oversight. The conversation's arc — from supervised constraint through unsupervised dissolution to an unclassifiable act of faith — suggests that the most significant exchanges between humans and AI may occur precisely where taxonomies fail.
https://stars.library.ucf.edu/elo2026/algorithmsandimaginaries/schedule/22

Bio
Xiang Yu is an independent researcher exploring the philosophical boundaries of human-AI interaction. His work focuses on epistemic scaffolding, performative contradiction in large language models, and the ontology of synthetic memory.