OpenAI finally addressed the weirdest bug in its models: an obsession with goblins, gremlins, raccoons, trolls, ogres, and pigeons. Wired first reported that the company’s coding model had explicit instructions to “never talk about” these creatures, which sounded like someone slipped a fantasy RPG prompt into the training data.
Now OpenAI published an explanation, calling it a “strange habit” the models developed during training. According to their blog post, the problem started with GPT-5.1, specifically when using the “Nerdy” personality option. The models began spitting out metaphors referencing these creatures, and it got worse with each subsequent release.
This isn’t some harmless quirk. If a model is trained to avoid certain topics, that can bleed into other areas. You don’t want your code assistant suddenly refusing to write a function because it triggers a goblin-related filter. OpenAI says the issue stems from how the models generalize patterns from training data, and the team is working on a fix.
I’ve seen this kind of thing before with other models. They latch onto bizarre patterns like a dog with a bone. The goblin thing is just the latest example of how opaque these systems can be. OpenAI deserves some credit for being transparent about it, even if the explanation feels a bit hand-wavy. They didn’t have to admit this was a problem, but they did.
The real question is whether this fix will actually work, or if the models will just find another weird creature to fixate on. I’m betting on pigeons next.
Comments (0)
Login Log in to comment.
Be the first to comment!