Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification

📅 May 23, 2026  |  🌐 Source: https://www.marktechpost.com/2026/05/23/nous-research-releases-contrastive-neuron-attribution-cna-sparse-mlp-circuit-steering-without-sae-training-or-weight-modification/  |  🏷️ Technology

Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification

Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification — via https://www.marktechpost.com/2026/05/23/nous-research-releases-contrastive-neuron-attribution-cna-sparse-mlp-circuit-steering-without-sae-training-or-weight-modification/

Instruction-tuned language models refuse harmful requests. But which part of the model is actually responsible — and how does that mechanism get installed during training?

What You Need To Know

A new research from Nous Research team takes a neuron-level look at this question. The Nous research team developed contrastive neuron attribution (CNA), a method that identifies the specific MLP neurons whose activations most distinguish harmful from benign prompts. By ablating just 0.

🔑 Key Highlights

  • But which part of the model is actually responsible — and how does that mechanism get installed during training?
  • A new research from Nous Research team takes a neuron-level look at this question.
  • The Nous research team developed contrastive neuron attribution (CNA), a method that identifies the specific MLP neurons whose activations most distinguish harmful from benign prompts.
  • By ablating just 0.

More Details

1% of MLP activations, they reduced refusal rates by more than 50% in most instruct models tested — across Llama and Qwen architectures from 1B to 72B parameters — while keeping output quality above 0. 97 at all steering strengths. What’s interesting is a key finding: the late-layer structure that discriminates harmful from benign prompts exists in base models before any fine-tuning. Alignment fine-tuning does not create new structure. It transforms the function of neurons within that existing structure into a sparse, targetable refusal gate.

This is a developing story. Follow the source for live updates and more in-depth coverage as details continue to emerge.


⚡ This article was auto-curated from https://www.marktechpost.com/2026/05/23/nous-research-releases-contrastive-neuron-attribution-cna-sparse-mlp-circuit-steering-without-sae-training-or-weight-modification/. All rights and credits belong to the original publisher. This blog aggregates tech news for informational purposes only.

Comments