Research

BIM

BIM (Biologically Inspired Model) asks one question: can a system learn language from a blank slate, using only mechanisms found in biological brains? No pre-training, no backprop, no GPU.

Three versions in. BIM 2 is complete.

What transformers can't do

Transformers are frozen after training. When GPT-4 answers you, it is not thinking. It is matching your input against statistical patterns from text it saw during training. It cannot update. Correct it and it forgets. Restart it and it is back to zero.

BIM 2 learns from every token. Corrections stick immediately via max-strength Hebbian update. Checkpoints persist across restarts. Runs on a CPU, under 200MB. The thing that answers and the thing that learns are the same process.

BIM 0: First proof

Hebbian learning (Δw = η · xᵢ · yⱼ). Interaction loop. Concept formation from repeated co-activations. Proved the idea was real. Failed at scale: graph nodes caused combinatorial explosion. One node per word, and the graph exploded as vocabulary grew.

BIM 1: The memory engine

Sparse Distributed Representations: 16,384 columns, 64 active per token (0.39% sparsity; brain estimates are around 2%). 131,072 total cells (8 per column). Each cell holds up to 256 lateral distal synapse connections.

Verified results (March 2026, consumer CPU):

1-shot memorization: 100% recall on novel 20-token sequences after one exposure
CPU throughput: ~160 TPS (Numba JIT, warm cache)
TinyShakespeare streaming: ~63 TPS
Memory: ~200MB for 131k cells. A dense equivalent would be over 30GB.

Two-layer hierarchy: L1 learns word-to-word transitions. L2 receives a decaying trace of L1's activations (decay 0.8/step) and learns phrase-level patterns.

Lost the other half: BIM 1 could memorize but not correct itself. No interaction loop. No meaning.

BIM 2: BIM 1's body, BIM 0's soul

Architecture:

Input
→ Spatial Pooler (Hebbian SDR encoding)
    16,384 columns · 64 active per token
→ Sensory Cortex L1
    131,072 cells · word-to-word sequences
→ Abstract Cortex L2
    phrase-level context · decaying trace
→ Concept Graph
    SDR-based · crystallizes after 5 co-activations

Per-token learning signal:

Jaccard surprise: J(p̂, p) = 1 − |p̂ ∩ p| / |p̂ ∪ p|

0.0 = perfect prediction → skip weight update
1.0 = total surprise → maximum Hebbian potentiation (+0.55)
LTD (incorrect predictions): −0.05 per step

This is the only learning signal.
No external teacher. No loss function.

Winner cell disambiguation: 8 cells per column. "bank (river)" and "bank (money)" land on different cells. Each cell scores against the prior context's distal synapses. Highest overlap wins.

Synaptic homeostasis: Every 500 interactions, permanences normalize to a target sum (10.0). Strong recent synapses are preserved. Weak old ones are pruned. BIM 2 can accumulate knowledge across thousands of interactions without overwriting what it already learned.

Concept graph: L2 SDR patterns that repeat 5+ times crystallize into named concepts. Edges form between concepts that appear in the same phrase, weighted by distance and surprise score. Depth-2 BFS traversal lets BIM chain two learned facts to answer something it was never told directly.

Correction syntax:

WRONG: X RIGHT: Y

Triggers max-strength learning on the correct text
and a direct reward signal to the current cortex state.

Running it

python main.py interact    # conversation REPL
python main.py stats       # synapse count, concepts formed, avg surprise
python main.py checkpoint  # save to .npy + concept_graph.json

A small benchmark we ran

We made a benchmark of 100 multiple-choice questions about 2026 events, after the training cutoff of every major LLM. We trained BIM 2 on 248 facts across 30 passes, then tested it cold.

Space:       23/25 (92%)
AI/Tech:     20/25 (80%)
Geopolitics: 22/25 (88%)
Science:     22/25 (88%)
─────────────────────────
Total:       87/100

Frozen LLMs on the same test, no tools: around 25%. They have no 2026 training data. BIM 2 had learned the answers.

What's next

Inference, not recall. Facts BIM was never told directly but can chain from what it learned. Cross-session. CPU-native. That experiment hasn't been published.

All research →