Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification
📅 May 23, 2026 | 🌐 Source: https://www.marktechpost.com/2026/05/23/nous-research-releases-contrastive-neuron-attribution-cna-sparse-mlp-circuit-steering-without-sae-training-or-weight-modification/ | 🏷️ Technology

Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification — via https://www.marktechpost.com/2026/05/23/nous-research-releases-contrastive-neuron-attribution-cna-sparse-mlp-circuit-steering-without-sae-training-or-weight-modification/
Instruction-tuned language models refuse harmful requests. But which part of the model is actually responsible — and how does that mechanism get installed during training?
What You Need To Know
A new research from Nous Research team takes a neuron-level look at this question. The Nous research team developed contrastive neuron attribution (CNA), a method that identifies the specific MLP neurons whose activations most distinguish harmful from benign prompts. By ablating just 0.
🔑 Key Highlights
- But which part of the model is actually responsible — and how does that mechanism get installed during training?
- A new research from Nous Research team takes a neuron-level look at this question.
- The Nous research team developed contrastive neuron attribution (CNA), a method that identifies the specific MLP neurons whose activations most distinguish harmful from benign prompts.
- By ablating just 0.
More Details
1% of MLP activations, they reduced refusal rates by more than 50% in most instruct models tested — across Llama and Qwen architectures from 1B to 72B parameters — while keeping output quality above 0. 97 at all steering strengths. What’s interesting is a key finding: the late-layer structure that discriminates harmful from benign prompts exists in base models before any fine-tuning. Alignment fine-tuning does not create new structure. It transforms the function of neurons within that existing structure into a sparse, targetable refusal gate.
This is a developing story. Follow the source for live updates and more in-depth coverage as details continue to emerge.
⚡ This article was auto-curated from https://www.marktechpost.com/2026/05/23/nous-research-releases-contrastive-neuron-attribution-cna-sparse-mlp-circuit-steering-without-sae-training-or-weight-modification/. All rights and credits belong to the original publisher. This blog aggregates tech news for informational purposes only.
Comments
Post a Comment