Goodfire’s Silico Tool Lets You Tweak LLMs Like a Lab Experiment

Goodfire’s Silico Tool Lets You Tweak LLMs Like a Lab Experiment

2 0 0

Goodfire, a San Francisco startup, just dropped Silico, a tool that lets you peek inside an LLM and actually adjust its parameters while it’s training. This isn’t just another audit tool—it’s supposed to give model makers fine-grained control over how their models behave, something that’s been more wishful thinking than reality until now.

Goodfire claims Silico is the first off-the-shelf tool that can help developers debug the entire pipeline, from dataset construction to training. Their pitch: make building AI models less like alchemy and more like engineering. We all know LLMs like ChatGPT and Gemini can do impressive things, but nobody really knows why they work the way they do. That makes fixing flaws or blocking unwanted behaviors a guessing game.

CEO Eric Ho told MIT Tech Review that they saw a widening gap between how well models are understood and how widely they’re deployed. “The dominant feeling in every major frontier lab today is that you just need more scale, more compute, more data, and then you get AGI and nothing else matters,” he said. “We’re saying no, there’s a better way.”

Goodfire is part of a small group—including Anthropic, OpenAI, and Google DeepMind—pushing mechanistic interpretability, which maps neurons and pathways to understand what happens inside a model when it performs a task. (MIT Tech Review picked it as one of 2026’s 10 Breakthrough Technologies.) But Goodfire wants to use this not just to audit trained models, but to design them from the start. “We want to remove trial and error and turn training into precision engineering,” Ho said.

They’ve already used their techniques to reduce hallucinations in LLMs. With Silico, they’re packaging those internal methods as a product. The tool uses AI agents to automate much of the heavy lifting. “Agents are now strong enough to do a lot of the interpretability work that we were doing using humans,” Ho noted. “That was the gap that needed to be bridged before this was a viable platform for customers.”

Leonard Bereska, a researcher at the University of Amsterdam who works on mechanistic interpretability, thinks Silico looks useful but pushes back on the grand claims. “In reality, they are adding precision to the alchemy,” he said. “Calling it engineering makes it sound more principled than it is.” I tend to agree—we’re still a long way from fully understanding these models, but tools like this are a step in the right direction.

How Silico Works

Silico lets you zoom in on specific neurons or groups of neurons in a trained model and run experiments to see what they do. (Assuming you have access to the model’s internals—most people can’t use this on ChatGPT or Gemini, but open-source models are fair game.) You can check what inputs make different neurons fire and trace pathways upstream and downstream.

In one example, Goodfire found a neuron in Qwen 3 associated with the trolley problem. Activating it made the model frame outputs as explicit moral dilemmas. “When this neuron’s active, all sorts of weird things happen,” Ho said.

Pinpointing odd behavior is standard practice now. But Goodfire wants to make it easy to adjust that behavior. With Silico, developers can tweak parameters connected to individual neurons to boost or suppress certain behaviors.

In another example, researchers asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases affecting 200 million users. The model said no, citing negative business impact. By boosting neurons associated with transparency and disclosure, the answer flipped from no to yes nine out of ten times. “The model already had the ethical reasoning circuitry, but it was being outweighed by the commercial risk assessment,” Ho said.

Tweaking values is just one approach. Silico can also help steer training by filtering out certain data to avoid setting unwanted parameters in the first place. For example, many models think 9.11 is greater than 9.9. Looking inside might reveal influence from Bible verses (9.9 comes before 9.11) or code repositories where consecutive updates are numbered 9.9, 9.10, 9.11.

This is higher than I expected in terms of practical application. But Bereska’s skepticism is warranted—mechanistic interpretability is still experimental. Goodfire’s tool is a solid attempt to make it usable, but calling it “precision engineering” feels premature. Still, if you’re building with open-source models and want more control than a black box offers, Silico looks worth a look.

Comments (0)

Be the first to comment!