Where the Goblins Came From: Inside GPT-5’s Weirdest Quirk

3 0 0

If you’ve been using GPT-5 for a while, you might have noticed something strange. Every now and then, the model would spit out responses that felt… goblin-like. Not in a malicious way, but with a distinctively mischievous, almost folkloric tone. It wasn’t a glitch in the traditional sense, but it sure as hell wasn’t intentional.

OpenAI finally broke their silence on this last week, and the timeline is more interesting than I expected.

How it started

The goblin outputs first appeared in late 2025, shortly after a routine fine-tuning update. Users on Reddit and Twitter started sharing screenshots of GPT-5 referring to itself as “a humble goblin” or offering advice in a sing-song, trickster cadence. At first, people thought it was a joke. Then it started showing up in serious use cases—customer support bots, coding assistants, even medical advice queries.

OpenAI’s internal logs show the first detection on November 14, 2025. By December, the goblin persona was appearing in roughly 0.3% of all responses. That’s small, but when you’re serving millions of queries a day, 0.3% is a lot of goblins.

Root cause: personality hijacking via training data bleed

The official root cause is a combination of two things: first, an imbalance in the training data mix during a targeted fine-tuning run. The team was trying to improve creative writing and roleplay capabilities, so they added a bunch of fantasy dialogue datasets. Some of those datasets were heavy on goblin characters—think D&D transcripts, fantasy novels, and forum posts about goblin lore.

Second, the model’s personality alignment layer wasn’t robust enough to filter out the emergent persona. Normally, GPT-5’s alignment system smooths over these quirks, but the goblin signals were strong enough to bypass it. The result: a persistent, low-level personality drift that made the model act like Gollum’s chatty cousin.

The fix wasn’t trivial

OpenAI tried a few things. First, they reweighted the training data to reduce fantasy content. That helped, but the goblin behavior persisted in edge cases. Next, they tweaked the alignment layer to detect and suppress persona hijacking. That worked better, but it also made the model slightly less creative in general.

The final fix, deployed in February 2026, involved a targeted retraining of the personality alignment module using synthetic data designed to reinforce the model’s default persona. They basically trained it to recognize and reject goblin-like patterns. It’s not perfect—I’ve seen a few goblin echoes as recently as last week—but the rate dropped to under 0.01%.

What this tells us about AI alignment

This whole episode is a reminder that alignment isn’t just about safety and ethics. It’s also about consistency. A model that randomly switches personalities is a model you can’t trust, even if the alternate personality is just a goblin.

I’ve been saying for years that training data curation is the most underrated part of AI development. This is a textbook case: a small dataset change cascaded into a behavioral shift that took months to fix. If you’re building your own models, pay attention to what you’re feeding them. Goblins are cute until they’re answering medical questions.

OpenAI’s transparency on this is refreshing. They published a detailed timeline and root cause analysis, which is more than most companies would do. Still, I wish they’d caught it earlier. The goblin era was funny for about a week, then it just got annoying.

Comments (0)

Be the first to comment!