Same token.
Completely different neural response.
Type any word and watch BDH fire ~5% of neurons while a Transformer activates nearly all of them — on the exact same input.
Path A: Visualization
Section 6.4 of BDH Paper
Sparse ReLU Activations
Interpretability by Design
Input token
Try:
Dragon Hatchling (BDH)
Post-transformer architecture · Pathway
—
neurons fired
—
activation rate
Transformer (GPT-style)
Dense matrix attention · Current standard
—
neurons fired
—
activation rate
Top Active Neurons
BDH sparse activations
Active neurons: — / — (—%)
Silent neurons: — / — (—%)
Transformer would activate: ~95% (—) neurons
Syntax/structure — neurons 0–10
Semantic meaning — neurons 11–25
Currency / numbers — neurons 26–40
Geography / places — neurons 41–55
Medical / biology — neurons 56–70
Language pattern — neurons 71–90
Abstract reasoning — neurons 91+
Legend
Active — BDH
Active — Transformer
Silent neuron
Activation density across layers
% of neurons firing at each depth — same input token
BDH
Transformer
Hebbian Memory — σ Matrix
Synaptic state for the last matched token
Each cell shows how strongly two neurons wired together processing this token. Brighter = stronger synaptic bond. This is Hebbian learning made visible.
Memory History
Insight cards
Why sparsity matters for efficiency
Fewer active neurons = less computation per token. BDH's ~5% activation rate means roughly 19× fewer neuron computations compared to a dense transformer on the same input.
Why sparsity enables interpretability
When only 5% of neurons fire, you can actually inspect them. The BDH paper demonstrates "currency synapses" and "country synapses" — single synapses that encode one concept, consistently, across languages.
This isn't pruning or distillation
Transformer sparsity requires post-hoc tricks (L1 regularization, pruning, distillation). BDH's sparsity emerges naturally from its architecture — sparse ReLU activations are built into the design, not forced.
Architecture comparison at a glance
| Property | Transformer | BDH (Dragon Hatchling) |
|---|---|---|
| Activation density | ~95–100% of neurons fire | ~5% of neurons fire |
| Memory mechanism | KV-cache (grows with context) | Hebbian synapses (constant size) |
| Attention complexity | O(T²) — quadratic | O(T) — linear |
| Interpretability | Black box, polysemantic neurons | Graph structure, monosemantic synapses |
| Learning after training | Frozen weights, no adaptation | Inference-time Hebbian updates |
| Structure | Dense matrix layers | Scale-free graph of neurons |
Synapse Inspector
In BDH, individual synapses reliably encode specific concepts — this is called monosemanticity. Transformers cannot do this.
Probe a word
Continuous Memory
BDH learns at inference time via Hebbian updates. The BDH panel strengthens synapses and stores facts instantly, while the Transformer panel remains fixed and cannot learn without retraining.
Teach BDH a fact
BDH
Synapse strength is idle
Transformer
Cannot learn at inference time
Requires full retraining to incorporate new facts.
I don't know.