The “Mind Projection Fallacy” in Systems Thinking:

In today’s post, I am writing about the wonderful Bayesian E. T. Jaynes’ idea of “Mind Projection Fallacy” (MPF) with respect to Systems Thinking. He explained MPF as asserting one’s own private thoughts and sensations as realities existing externally in nature. Jaynes noted – One asserts that the creations of his own imagination are real properties of Nature, and thus in effect projects his own thoughts out onto Nature.

Jaynes used the English language to delve into this further. In Logic, we say that If A is B, then B is A. However, when we apply this in our language, we will have issues. He used the old adage of “knowledge is power” as an example. If we then say “power is knowledge”, then we have said something that is fantastically absurd. The trouble here is with the verb “is”. As Jaynes pointed out:

These examples remind us that the verb ‘is’ has, like any other verb, a subject and a predicate; but it is seldom noted that this verb has two entirely different meanings. A person whose native language is English may require some effort to see the different meanings in the statements: ‘The room is noisy’ and ‘There is noise in the room’. But in Turkish these meanings are rendered by different words, which makes the distinction so clear that a visitor who uses the wrong word will not be understood. The latter statement is ontological, asserting the physical existence of something, while the former is epistemological, expressing only the speaker’s personal perception…

Common language – or, at least, the English language – has an almost universal tendency to disguise epistemological statements by putting them into a grammatical form which suggests to the unwary an ontological statement. A major source of error in current probability theory arises from an unthinking failure to perceive this. To interpret the first kind of statement in the ontological sense is to assert that one’s own private thoughts and sensations are realities existing externally in Nature. We call this the ‘mind projection fallacy’.

Once one has grasped the idea, one sees the Mind Projection Fallacy everywhere; what we have been taught as deep wisdom, is stripped of its pretensions and seen to be instead a foolish non sequitur.

Jaynes noted that there are two complementary forms to MPF:

The error occurs in two complementary forms, which we might indicate thus:

(A) (My own imagination) –> (Real property of Nature)

(B) (My own ignorance) –> (Nature is indeterminate)

I am more interested in the first of the two forms here in relation to Systems Thinking. The “Thinking” in Systems Thinking implies that there is a thinker. This also implies that we are doing thinking about “systems”. As MPF suggests, we are prone to assuming that our epistemological stances are in fact ontological nature. When we talk about a “system”, it is in terms of “as-if” statements. We say that the health care system as-if there actually is a physical corresponding entity in the real world. We talk about fixing “systems” as-if there is a mechanical entity that needs some switching out of parts or upgrading. When we come across a phenomenon, and we try to understand the phenomenon, we do it so by creating a narrative around it. For example, if we see an object fall to the ground, we create a narrative around how something caused the object to fall to the ground. Or if we face some adverse events, we create a narrative around having a bad day. In these narratives, there is always a “wholeness” aspect in the sense that things make sense or things happen for a reason. This wholeness aspect is what makes the narrative flow. The big rub in all this is that the narrative is done from a perspective. Usually this is from the perspective of the one doing the narration, the observer. We create “systems” in order to make sense of things around in our world. We are in situated in the world, in this place and time. How we create these grand narratives are impacted by this situatedness.

As I noted above, systems thinking requires thinkers, and no one thinker is alike. Their versions of “systems” are unique to them. If we treat our “systems” as being real, it leads us to also assume that others are also seeing the same “system” and can understand what we mean by “system”. This is the Mind Projection Fallacy in action.

Another aspect of MPF is that we tend believe that there is uncertainty everywhere. As Jaynes pointed out this “is” a very troublesome verb. Uncertainty is not existing out there in the world, but in here in our understanding. This is the whole premise of Bayesian epistemology. Probability is not a property of a phenomenon in the real world, but a property of our knowledge or belief about the phenomenon. Jaynes wrote:

that term (random) is basically meaningless as an attribute of the real world; it has no clear definition applicable in the real world. The belief that ‘randomness’ is some kind of real property existing in Nature is a form of the mind projection fallacy which says, in effect, ‘I don’t know the detailed causes – therefore – Nature does not know them.’

What does all this mean to a Systems Thinker? How does this help improve our thinking? Jaynes continues:

It seems to us that the belief that probabilities are realities existing in Nature is pure mind projection fallacy. True ‘scientific objectivity’ demands that we escape from this delusion and recognize that in conducting inference our equations are not describing reality; they are describing and processing our information about reality.

This is a second order view – we are thinking about our thinking. We are able to better think only when we realize that there are problems with our thinking. When we assume that “systems” are not real or objective but mere devices to further our understanding, we come to be more curious. We become curious about how others view the world. If we thought that others can objectively view the “system”, then there is no need for us to seek their perspectives. When we are curious about how others view the “system”, then we can really start talking about “systems”. Tweaking what the great cybernetician Heinz von Foerster once said – You cannot hold a system responsible for anything – you cannot shake its hand, ask it to justify its actions – and you cannot enter into a dialogue with it; whereas I can speak with another self, a you!

I will finish with the wise words of the grand master of Systems Thinking, West Churchman:

The systems approach begins when first you see the world through the eyes of another.

Stay safe and always keep on learning…

In case you missed it, my last post was The Authentic Cybernetician:

Advertisement

The Maximum Entropy Principle:

In today’s post, I am looking at the Maximum Entropy principle, a brainchild of the eminent physicist E. T. Jaynes. This idea is based on Claude Shannon’s Information Theory. The Maximum Entropy principle (an extension of the Principle of Insufficient Reason) is the ideal epistemic stance. Loosely put, we should model only what is known, and we should assign maximum uncertainty for what is unknown. To explain this further, let’s look at an example of a coin toss.

If we don’t know anything about the coin, our prior assumption should be that heads or tails are equally likely to happen. This is a stance of maximum entropy. If we assumed that the coin was loaded, we would be trying to “load” our assumption model, and claim unfair certainty. Entropy is a measure proposed by Claude Shannon as part of his information theory. Low entropy messages have low information content or low surprise content. High entropy messages on the other hand have high information content or high surprise content. The informational entropy is also inversely proportional to the probability of an event. Low probability events have high information content. For example, an unlikely defeat of a reigning sports team generates more surprise than a likely win. Entropy is the average level of information when we consider all of the probabilities. In the case of the coin toss, the entropy is the average level of information when we consider the probability of heads or tail. For discrete events, the entropy is maximum for equally likely events, or in other words for uniform distribution. Thus, when we say that the probability of heads or tails is 0.5, we are assuming a maximum entropy model. In the case of uniform distribution, the maximum entropy model is also the same as Laplace’s principle of insufficient reason. If the coin was always landing on heads, we have a zero entropy case because there is no new information available. If it is a loaded coin that makes one side more likely to occur, then the entropy is lower than if it is a fair coin. This is shown below, where the X-axis is the probability of Heads, and the Y-axis is the information entropy. We can see that Pr(0) or no Heads, and Pr(1) or 100% Heads have zero entropy value. The highest value for entropy happens when the probability for heads is 0.5 or 50%. For those who are interested, Jon von Neumann had a great idea to make a loaded coin fair. You can check out that here.

From this standpoint, if we take a game, where one team is more favored to win, we could say that the most informative part of a game is sometimes the coin toss.

Let’s consider the case of a die. There are six possible events (1 through 6) when we roll a die. The maximum entropy model will be to assume a uniform distribution, i.e., to assign 1/6 as the probability for 1 through 6 value. If we somehow knew that 6 is more likely to happen. For example, if the manufacturer of the loaded die says that the number 6 is likely to occur 3/6 of the times. Per the maximum entropy model, we should divide the remaining 3/6 equally among the remaining 5 numbers. With each additional piece of information, we should change our model so that the entropy is at its maximum. What I have discussed here is the basic information regarding maximum entropy. Each new piece of “valid” information that we need to incorporate into our model is called a constraint. The maximum entropy approach utilizes Lagrangian multipliers to find the solutions. For discrete events, with no additional information, the maximum entropy model is the uniform distribution. In a similar vein, if you are looking at a continuous distribution, and you knew what the mean and variance of the distribution is, the maximum entropy model is the normal distribution.

The Role of The Observer:

Jaynes asked a great question about the information content of a message. He noted:

In a communication process, the message m(i) is assigned probability p(i), and the entropy H, is a measure of information. But WHOSE information?… The probabilities assigned to individual messages are not measurable frequencies; they are only a means of describing a state of knowledge.

The general idea of probability in the frequentist’s version of statistics is that it is fixed. However, in the Bayesian version, the probability is not a fixed entity. It represents a state of knowledge. Jaynes continues:

Entropy, H, measures not the information of the sender, but the ignorance of the receiver that is removed by the receipt of the message.

To me, this brings up the importance of the observer and circularity. As the great cybernetician Heinz von Foerster said:

“The essential contribution of cybernetics to epistemology is the ability to change an open system into a closed system, especially as regards the closing of a linear, open, infinite causal nexus into closed, finite, circular causality.”

Let’s go back to the example of a coin. If I am an alien and if I knew nothing about coins, should my maximum entropy model only include two possibilities of heads or tails? Why should it not include the coin landing on its edge? Or if a magician is tossing the coin, should I account for the coin to vanish in thin air? The assumption of just two possibilities (head or tails) is the prior information that we are accounting for, by saying that the probability of a heads or a tail is 0.5. As we gain more knowledge about the coin toss, we can update the model to reflect it, and at the same time change the model to a new state of maximum entropy. This iterative, closed loop process is the backbone of scientific enquiry and skepticism. The use of the maximum entropy model is a stance that we are taking to state our knowledge. Perhaps a better way to explain the coin toss is that – given our lack of knowledge about the coin, we are saying that the heads is not more likely to happen than tails until we find more evidence. Let’s look at another interesting example where I think the maximum entropy model comes up.

The Veil of Ignorance:

The veil of ignorance is an idea about ethics proposed by the great American Political philosopher, John Rawls. Loosely put, in this thought experiment, Rawls is asking us what kind of society should we aim for? Rawls asks us to imagine that we are behind a veil of ignorance, where we are completely ignorant of our natural abilities, societal standing, family etc. We are then randomly assigned a role in society. The big question then is – what should society be like where this random assignment promotes fairness and equality? The random assignment is a maximum entropy model since any societal role is equally likely.

Final Words:

Maximum entropy principle is a way of saying to not put all of your eggs in one basket. It is a way to be aware of your biases and it is an ideal position for learning. It is similar to the Epicurus’ principle of Multiple Explanations, that says – “Keep all the different hypotheses that are consistent with the facts.”

It is important to understand that “I don’t know,” is a valid and acceptable answer. It marks the boundary for learning.

Jaynes explained maximum entropy as follows:

The maximum entropy distribution may be asserted for the positive reason that is uniquely determined as the one which is maximally noncommittal with regard to missing information, instead of the negative one that there was no reason to think otherwise… Mathematically, the maximum entropy distribution has the important property that no possibility is ignored; it assigns positive weight to every situation that Is not absolutely excluded by the given information.

We learned that probability and entropy are dependent on the observer. I will finish off with the wise words from James Dyke and Axel Kleidon.

Probability can now be seen as assigning a value to our ignorance about a particular system or hypothesis. Rather than the entropy of a system being a particular property of a system, it is instead a measure of how much we know about a system.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Destruction of Information/The Performance Paradox: