In today’s post, I am looking at the Free Energy Principle (FEP) by the British neuroscientist, Karl Friston. The FEP basically states that in order to resist the natural tendency to disorder, adaptive agents must minimize surprise. A good example to explain this is to say successful fish typically find themselves surrounded by water, and very atypically find themselves out of water, since being out of water for an extended time will lead to a breakdown of homoeostatic (autopoietic) relations.
Here the free energy refers to an information-theoretic construct:
Because the distribution of ‘surprising’ events is in general unknown and unknowable, organisms must instead minimize a tractable proxy, which according to the FEP turns out to be ‘free energy’. Free energy in this context is an information-theoretic construct that (i) provides an upper bound on the extent to which sensory data is atypical (‘surprising’) and (ii) can be evaluated by an organism, because it depends eventually only on sensory input and an internal model of the environmental causes of sensory input.
In FEP, our brains are viewed as predictive engines, or also Bayesian Inference engines. This idea is built on predictive coding/processing that goes back to the German physician and physicist Hermann von Helmholtz from the 1800s. The main idea is that we have a hierarchical structure in our brain that tries to predict what is going to happen based on the previous sensory data received. As philosopher Andy Clarke explains, our brain is not a cognitive couch potato waiting for sensory input to make sense of what is going on. It is actively predicting what is going to happen next. This is why minimizing the surprise is important. For example, when we lift a closed container, we predict that it is going to have a certain weight based on our previous experiences and the visual signal of the container. We are surprised if the container is light in weight and can be lifted easily. We have similar experiences when we miss a step on the staircase. From a mathematical standpoint, we can say that when our internal model matches the sensory input, we are not surprised. This refers to the KL divergence in information theory. The lower the divergence, the better the fit between the model and the sensory input, and lower the surprise. The hierarchical model is top down. The prediction flows top down, while the sensory data flows bottom up. If the model matches the sensory data, then nothing goes up the chain. However, when there is a significant difference between the top down prediction and the bottom up incoming sensory date, the difference is raised up the chain. One of my favorite examples to explain this further is to imagine that you are in the shower with your radio playing. You can faintly hear the radio in the shower. When your favorite song plays on the radio, you feel like you can hear it better than when an unfamiliar song is played. This is because your brain is able to better predict what is going to happen and the prediction helps smooth out the incoming auditory signals. British neuroscientist Anil Seth has a great quote regarding the predictive processing idea, “perception is controlled hallucination.”
Andy Clarke explains this further:
Perception itself is a kind of controlled hallucination… [T]he sensory information here acts as feedback on your expectations. It allows you to often correct them and to refine them.
(T)o perceive the world is to successfully predict our own sensory states. The brain uses stored knowledge about the structure of the world and the probabilities of one state or event following another to generate a prediction of what the current state is likely to be, given the previous one and this body of knowledge. Mismatches between the prediction and the received signal generate error signals that nuance the prediction or (in more extreme cases) drive learning and plasticity.
Predictive coding models suggest that what emerges first is the general gist (including the general affective feel) of the scene, with the details becoming progressively filled in as the brain uses that larger context — time and task allowing — to generate finer and finer predictions of detail. There is a very real sense in which we properly perceive the forest before the trees.
What we perceive (or think we perceive) is heavily determined by what we know, and what we know (or think we know) is constantly conditioned on what we perceive (or think we perceive).
(T)he task of the perceiving brain is to account for (to accommodate or ‘explain away’) the incoming or ‘driving’ sensory signal by means of a matching top-down prediction. The better the match, the less prediction error then propagates up the hierarchy. The higher level guesses are thus acting as priors for the lower level processing, in the fashion (as remarked earlier) of so-called ‘empirical Bayes’.
The question on what happens when the prediction does not match is best explained by Friston:
“The free-energy considered here represents a bound on the surprise inherent in any exchange with the environment, under expectations encoded by its state or configuration. A system can minimize free energy by changing its configuration to change the way it samples the environment, or to change its expectations. These changes correspond to action and perception, respectively, and lead to an adaptive exchange with the environment that is characteristic of biological systems. This treatment implies that the system’s state and structure encode an implicit and probabilistic model of the environment.”
Our brains are continuously sampling the data coming in and making predictions. When there is a mismatch between the prediction and the data, we have three options.
- Update our model to match the incoming data.
- Attempt to change the environment so that the model matches the environment. Try resampling the data coming in.
- Ignore and do nothing.
Option 3 is not always something that will yield positive results. Option 1 is a learning process where we are updating our internal models based on the new evidence. Option 2 show ours strong confidence in our internal model, and that we are able to change the environment. Or perhaps there is something wrong with the incoming data and we have to get more data to proceed.
The ideas from FEP can also further our understanding on our ability to balance between maintaining status quo (exploit) and going outside our comfort zones (explore). To paraphrase the English polymath Spencer Brown, the first act of cognition is to differentiate (act of distinction). We start with differentiating – Me/everything else. We experience and “bring forth” the world around us by constructing it inside our mind. This construction has to be a simpler version due to the very high complexity of the world around us. We only care about correlations that matter to us in our local environment. This matters the most for our survival and sustenance. This leads to a tension. We want to look for things that confirm our hypotheses and maintain status quo. This is a short-term vision. However, this doesn’t help in the long run with our sustenance. We also need to explore to look for things that we don’t know about. This is the long-term vision. This helps us prepare to adapt with the everchanging environment. There is a balance between the two.
The idea of FEP can go from “I model the world” to “we model the world” to “we model ourselves modelling the world.” As part of a larger human system, we can cocreate a shared model of our environment and collaborate to minimize the free energy leading to our sustenance as a society.
FEP is a fascinating field and I welcome the readers to check out the works of Karl Friston, Andy Clarke and others. I will finish with the insight from Friston that the idea of minimizing free energy is also a way to recognize one’s existence.
Avoiding surprises means that one has to model and anticipate a changing and itinerant world. This implies that the models used to quantify surprise must themselves embody itinerant wandering through sensory states (because they have been selected by exposure to an inconstant world): Under the free-energy principle, the agent will become an optimal (if approximate) model of its environment. This is because, mathematically, surprise is also the negative log-evidence for the model entailed by the agent. This means minimizing surprise maximizes the evidence for the agent (model). Put simply, the agent becomes a model of the environment in which it is immersed. This is exactly consistent with the Good Regulator theorem of Conant and Ashby (1970). This theorem, which is central to cybernetics, states that “every Good Regulator of a system must be a model of that system.” .. Like adaptive fitness, the free-energy formulation is not a mechanism or magic recipe for life; it is just a characterization of biological systems that exist. In fact, adaptive fitness and (negative) free energy are considered by some to be the same thing.
Always keep on learning…
In case you missed it, my last post was The Whole is ________ than the sum of its parts:
 The free energy principle for action and perception: A mathematical review. Christopher L. Buckley, Chang Sub Kim, Simon McGregor, Anil K. Seth (2017)