Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification

📅 May 23, 2026 | 🌐 Source: https://www.marktechpost.com/2026/05/23/nous-research-releases-contrastive-neuron-attribution-cna-sparse-mlp-circuit-steering-without-sae-training-or-weight-modification/ | 🏷️ Technology

Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification — via https://www.marktechpost.com/2026/05/23/nous-research-releases-contrastive-neuron-attribution-cna-sparse-mlp-circuit-steering-without-sae-training-or-weight-modification/

Instruction-tuned language models refuse harmful requests. But which part of the model is actually responsible — and how does that mechanism get installed during training?

What You Need To Know

A new research from Nous Research team takes a neuron-level look at this question. The Nous research team developed contrastive neuron attribution (CNA), a method that identifies the specific MLP neurons whose activations most distinguish harmful from benign prompts. By ablating just 0.

🔑 Key Highlights

But which part of the model is actually responsible — and how does that mechanism get installed during training?
A new research from Nous Research team takes a neuron-level look at this question.
The Nous research team developed contrastive neuron attribution (CNA), a method that identifies the specific MLP neurons whose activations most distinguish harmful from benign prompts.
By ablating just 0.

More Details

1% of MLP activations, they reduced refusal rates by more than 50% in most instruct models tested — across Llama and Qwen architectures from 1B to 72B parameters — while keeping output quality above 0. 97 at all steering strengths. What’s interesting is a key finding: the late-layer structure that discriminates harmful from benign prompts exists in base models before any fine-tuning. Alignment fine-tuning does not create new structure. It transforms the function of neurons within that existing structure into a sparse, targetable refusal gate.

This is a developing story. Follow the source for live updates and more in-depth coverage as details continue to emerge.

📰 Read Full Article on https://www.marktechpost.com/2026/05/23/nous-research-releases-contrastive-neuron-attribution-cna-sparse-mlp-circuit-steering-without-sae-training-or-weight-modification/ →

⚡ This article was auto-curated from https://www.marktechpost.com/2026/05/23/nous-research-releases-contrastive-neuron-attribution-cna-sparse-mlp-circuit-steering-without-sae-training-or-weight-modification/. All rights and credits belong to the original publisher. This blog aggregates tech news for informational purposes only.

AV - Technology News, Unboxing, Reviews, Event Updates

Search This Blog