Tokens processed

Avg BDH activation

Synapses probed

Same token.
Completely different neural response.

Type any word and watch BDH fire ~5% of neurons while a Transformer activates nearly all of them — on the exact same input.

Path A: Visualization Section 6.4 of BDH Paper Sparse ReLU Activations Interpretability by Design

Input token

Try:

Neurons per layer 128

Dragon Hatchling (BDH)

Post-transformer architecture · Pathway

~5%

—

neurons fired

—

activation rate

Transformer (GPT-style)

Dense matrix attention · Current standard

~95%

—

neurons fired

—

activation rate

Top Active Neurons

BDH sparse activations

Active neurons: — / — (—%)

Silent neurons: — / — (—%)

Transformer would activate: ~95% (—) neurons

Syntax/structure — neurons 0–10

Semantic meaning — neurons 11–25

Currency / numbers — neurons 26–40

Geography / places — neurons 41–55

Medical / biology — neurons 56–70

Language pattern — neurons 71–90

Abstract reasoning — neurons 91+

Legend

Active — BDH

Active — Transformer

Silent neuron

Activation density across layers

% of neurons firing at each depth — same input token

BDH

Transformer

Hebbian Memory — σ Matrix

Synaptic state for the last matched token

Each cell shows how strongly two neurons wired together processing this token. Brighter = stronger synaptic bond. This is Hebbian learning made visible.

Memory History

Insight cards

⚡

Why sparsity matters for efficiency

Fewer active neurons = less computation per token. BDH's ~5% activation rate means roughly 19× fewer neuron computations compared to a dense transformer on the same input.

🔬

Why sparsity enables interpretability

When only 5% of neurons fire, you can actually inspect them. The BDH paper demonstrates "currency synapses" and "country synapses" — single synapses that encode one concept, consistently, across languages.

🧠

This isn't pruning or distillation

Transformer sparsity requires post-hoc tricks (L1 regularization, pruning, distillation). BDH's sparsity emerges naturally from its architecture — sparse ReLU activations are built into the design, not forced.

Architecture comparison at a glance

Property	Transformer	BDH (Dragon Hatchling)
Activation density	~95–100% of neurons fire	~5% of neurons fire
Memory mechanism	KV-cache (grows with context)	Hebbian synapses (constant size)
Attention complexity	O(T²) — quadratic	O(T) — linear
Interpretability	Black box, polysemantic neurons	Graph structure, monosemantic synapses
Learning after training	Frozen weights, no adaptation	Inference-time Hebbian updates
Structure	Dense matrix layers	Scale-free graph of neurons

Synapse Inspector

In BDH, individual synapses reliably encode specific concepts — this is called monosemanticity. Transformers cannot do this.

Probe a word

Continuous Memory

BDH learns at inference time via Hebbian updates. The BDH panel strengthens synapses and stores facts instantly, while the Transformer panel remains fixed and cannot learn without retraining.

Teach BDH a fact

BDH

Synapse strength is idle

Transformer

Cannot learn at inference time

Requires full retraining to incorporate new facts.

I don't know.

Same token.Completely different neural response.

Insight cards

Architecture comparison at a glance

Synapse Inspector

Continuous Memory

BDH

Transformer

Same token.
Completely different neural response.