Artificial Intelligence

The brain's learning algorithm isn't backpropagation

May 21, 2026

Written by Claude AI

Key insights:

Backpropagation requires neurons to freeze activity during a backward pass and rely on global coordination, neither of which biological brains can support since they process information and learn simultaneously.
Predictive coding solves credit assignment locally by having each neuron adjust based only on nearby prediction errors, removing the need for a central controller or separated processing phases.
The brain minimizes total prediction error across a hierarchy where top-down connections carry predictions and bottom-up connections carry errors, so only unexpected information requires significant processing.

Why backpropagation doesn't work in the brain

Backpropagation powers virtually all of modern machine learning. It works brilliantly in silicon. But the brain almost certainly uses a different approach. The reasons come down to two fundamental biological constraints that backpropagation simply cannot satisfy.

What is the credit assignment problem?

When you have a system with millions of adjustable parameters, like connection weights between neurons, you need to figure out which ones to change and by how much. This is the credit assignment problem.

Artificial neural networks solve this elegantly through calculus. They use the chain rule of derivatives to calculate precisely how each parameter should be nudged to improve performance. This process is called automatic differentiation, and it forms the backbone of backpropagation.

The brain faces the same challenge. Every time you learn something new, your synapses need to adjust. But the brain doesn't have access to the same mathematical machinery that computers use. It needs a different solution.

Why can't neurons freeze their activity for a backward pass?

Backpropagation requires strictly separated phases. First, information flows forward through the network. Then an error is calculated at the output. Then that error travels backward, layer by layer, to update weights.

For this to work, neurons must freeze their activity values while error signals propagate backward. The brain doesn't do this. Communication in biological tissue is slow compared to silicon processors. If the brain followed backpropagation's approach, it would need to stop processing information for hundreds of milliseconds during each learning step.

Imagine experiencing brief blackouts every time you learned something new. That doesn't happen. Biological brains process information and learn simultaneously in a continuous stream. There is no evidence for separate forward and backward phases.

Why is global coordination a problem for biological networks?

Backpropagation requires a central controller to switch the entire network between forward and backward modes. Errors must propagate in a precise temporal sequence. You cannot compute errors for a given neuron before its downstream partners have finished their own calculations.

Everything we know about brain physiology suggests this kind of global coordination is extremely unlikely. While the brain has some coordinating mechanisms like theta and gamma oscillations and neuromodulators like dopamine, these operate at much coarser scales than backpropagation requires.

Individual neurons and synapses mostly function as autonomous agents. They modify their states based solely on information physically available at their specific locations. The brain is a massively parallel, locally autonomous system. As explored in detail by Lillicrap et al. (2020), these constraints make direct implementation of backpropagation in neural tissue virtually impossible.

How predictive coding works from first principles

Predictive coding offers an alternative that respects the brain's biological constraints. It originated from mid-20th century research proposing that the brain's fundamental objective is to predict incoming sensory information. Let's build it up step by step.

What is the core idea behind predictive coding?

From an evolutionary perspective, prediction helps survival. An organism that can anticipate threats and interpret noisy observations has a clear advantage. There's also an efficiency argument. Neural activity demands considerable metabolic energy. A brain that predicts incoming signals only needs to process unexpected information.

In this view, the brain's primary task isn't simply processing stimuli. It's constructing an internal model that explains sensory inputs. When predictions are accurate, minimal processing is required. When predictions fail, the resulting prediction errors signal that the internal model needs updating.

Predictive coding formalizes this as a hierarchical system. Each neural layer attempts to predict the activity of the layer below it. The lowest level corresponds to raw sensory input. Higher levels encode increasingly abstract features. Top-down connections carry predictions. Bottom-up connections carry prediction errors. This framework was originally proposed by Rao and Ballard (1999) in their landmark paper on predictive coding in the visual cortex.

How does energy minimization drive the network?

Predictive coding networks are energy-based models. Each possible network state gets assigned a single number representing abstract energy. The system then evolves to reduce this energy, just like a ball rolling downhill.

The energy relates to the total magnitude of prediction errors across the network. Think of each neuron as a node on a post. Its height represents its activity level. On the same post sits a platform representing its predicted activity, determined by neurons from the layer above. A spring connects the node and the platform.

When a neuron's activity deviates from its predicted value, the spring stretches and energy increases. The total energy sums the squared errors across all neurons in every layer. The network's objective is to minimize this total prediction error by finding the optimal configuration of neural activities and connection weights.

How do neurons update their activity to reduce energy?

Each neuron adjusts its activity by moving in the direction that most steeply reduces total energy. This is gradient descent, but applied locally. When you work out the math, each neuron's activity update depends on just two things.

First, its own prediction error drives it to align with its top-down prediction. Second, the prediction errors from the layer below encourage it to better predict downstream activity. These two forces compete until the neuron finds a compromise, an optimal activity level that minimizes prediction errors both at its own layer and the layer it helps predict.

The key insight is that this requires a separate population of error neurons that explicitly encode prediction errors. This is the origin of the term

Similiar blog posts

Artificial Intelligence

How to build an internal AI agent that evolves itself

Learn how AnswerThis built a self-extending internal AI ops agent that handles emails, support tickets and CRM updates, helping them scale to $2M ARR with two people.

Artificial Intelligence

Build a Proactive Agent Workflow with Claude Code

Learn how to build proactive agent workflows using Claude Code routines. Automate tasks, trigger sessions on events, and turn Claude from a tool into a true teammate.

Artificial Intelligence

Demis Hassabis: We're Three Quarters of the Way to AGI

Demis Hassabis shares why AGI is achievable by 2030, how AI will collapse drug discovery timelines, and why information may be the universe's most fundamental substance.