Goodfire, a San Francisco startup, just dropped a tool called Silico that claims to let you peek inside an AI model and fiddle with its knobs while it’s still training. That’s a big deal if you’ve ever tried to figure out why your LLM thinks 9.11 is greater than 9.9 or refuses to admit its own flaws.
The company says Silico is the first off-the-shelf product that helps debug every stage of model development, from data set construction to training. Their pitch? Making AI less like alchemy and more like engineering. I’ve heard that line before, but they’re actually putting money where their mouth is.
CEO Eric Ho told MIT Technology Review that the dominant feeling in frontier labs is that you just need more scale, more compute, more data, and AGI will magically appear. Goodfire’s saying no, there’s a better way. I appreciate the contrarian stance, especially when the big labs are burning billions on brute force.
Goodfire is part of a small group—Anthropic, OpenAI, Google DeepMind—working on mechanistic interpretability, which MIT Technology Review named one of its 10 Breakthrough Technologies of 2026. The idea is to map neurons and their pathways to understand what the hell is actually happening inside these black boxes.
What sets Goodfire apart is that they want to use this not just to audit finished models but to design them from scratch. “We want to remove the trial and error and turn training models into precision engineering,” Ho says. “And that means exposing the knobs and dials so that you can actually use them during the training process.”
They’ve already used their techniques to reduce hallucinations in LLMs. Now they’re packaging those internal tools as a product. Silico uses AI agents to automate much of the interpretability work that previously required humans. “Agents are now strong enough to do a lot of the interpretability work that we were doing using humans,” Ho says. “That was kind of the gap that needed to be bridged before this was actually a viable platform that customers could use themselves.”
Leonard Bereska, a researcher at the University of Amsterdam who has worked on mechanistic interpretability, thinks Silico looks useful but pushes back on the grand claims. “In reality, they are adding precision to the alchemy,” he says. “Calling it engineering makes it sound more principled than it is.” I think he’s right to be skeptical—we’ve seen plenty of “breakthrough” interpretability tools that don’t scale.
Silico lets you zoom in on individual neurons or groups, run experiments to see what they do, and trace pathways upstream and downstream. For example, Goodfire found one neuron in the open-source Qwen 3 model that was associated with the trolley problem. Activating it made the model frame its outputs as explicit moral dilemmas. “When this neuron’s active, all sorts of weird things happen,” says Ho.
Pinpointing weird behavior is standard practice now, but Goodfire wants to make it easy to adjust that behavior. With Silico, developers can tweak parameters connected to individual neurons to boost or suppress certain behaviors. In one example, they asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases, affecting 200 million users. The model said no, citing negative business impact. By boosting neurons associated with transparency and disclosure, they flipped the answer from no to yes nine out of ten times. “The model already had the ethical reasoning circuitry, but it was being outweighed by the commercial risk assessment,” says Ho.
Silico can also help steer training by filtering out data that sets unwanted values. The classic example: many models think 9.11 is greater than 9.9. Looking inside might reveal influence from Bible verses or code repositories where updates are numbered sequentially. This is the kind of nonsense that interpretability could actually fix.
I’m cautiously optimistic. Goodfire’s approach is more grounded than most, and the agent automation makes it practical. But Bereska’s skepticism is warranted—we’re still adding precision to alchemy, not turning it into physics. Still, if they can make debugging LLMs even slightly less painful, that’s a win.
Comments (0)
Login Log in to comment.
Be the first to comment!