The Extended Form of the Law of Requisite Variety:

This is a follow-up to my last week’s post – Notes on Regulation: In today’s post, I am looking at the Arvid Aulin-Ahmavaara’s extended form of the law of requisite variety (using Francis Heylighen’s version). As I have noted previously, Ross Ashby, the great mind and pioneer of Cybernetics came up with the law of requisite variety (LRV). The law can be stated as only variety can absorb variety. Here variety is the number of possible states available for a system. This is equivalent to statistical entropy. For example, a coin can be shown to have a variety of two – Heads and Tails. Thus, if a user wants a way to randomly choose one of two outcomes, the coin can be used. The user can toss the coin to randomly choose one of two options. However, if the user has 6 choices, they cannot use the coin to randomly choose one of six outcomes efficiently. In this case, a six-sided die can be used. A six-sided die has a variety of six. This is a simple explanation of variety absorbing variety.

The controller can find ways to amplify variety to still meet the external variety thrown upon the system. Let’s take the example of the coin and six choices again. It is possible for the user to toss the coin three times or use three coins, and use the three coin-toss results to make a choice (the variety for three coin-tosses is 8). This is a means to amplify variety in order to acquire requisite variety. From a cybernetics standpoint, the goal of regulation is to ensure that the external disturbances do not reach the essential variables. The essential variables are important for a system’s viability. If we take the example of an animal, some of the essential variables are the blood pressure, body temperature etc. The essential variables must be kept within a specific range to ensure that the animal continues to survive. The external disturbances are denoted by D, the essential variables by E and the actions available to the regulator as A. As noted, variety is expressed as statistical entropy for the variable. As Aulin-Ahmavaara notes – If A is a variable of any kind, the entropy H(A) is a measure of its variety.

With this background, we can note the extended form of the Law of Requisite Variety as:

H(E) ≥ H(D) – H(A) + H(A|D) – B

The H portions of the term represents the statistical entropy for the term. For example, H(E) is the statistical entropy for the essential variables. The larger the value for H, the more the uncertainty around the variable. The goal for the controller is to keep the H(E) as low as possible since a larger value for the entropy for the essential variables indicate a larger range of values for the essential variables. If the essential variables are not kept to a small range of values, the viability of the organism is compromised. We can now look at the other terms of the equation and see how the value for H(E) can be maintained at a lower value.

Heylighen notes:

This means that H(E) should preferably be kept as small as possible. In other words, any deviations from the ideal values must be efficiently suppressed by the control mechanism. The inequality expresses a lower bound for H(E): it cannot be smaller than the sum on the right-hand side. That means that if we want to make H(E) smaller, we must try to make the right-hand side of the inequality smaller. This side consists of four terms, expressing respectively the variety of disturbances H(D), the variety of compensatory actions H(A), the lack of requisite knowledge H(A|D) and the buffering capability B.

As noted, D represents the external disturbances, and H(D) is the variety of disturbances coming in. If H(D) is large, then it also increases the value generally for H(E). Thus, an organism in a complex environment is more likely to face some adversities that might drive the essential variables outside the safe range. For example, you are less likely to die while sitting in your armchair than while trekking through the Amazonian rain forest or wandering through the concrete jungle of a megacity. A good rule of thumb for survivability would be to avoid environments that have a larger variety for disturbances.

The term H(A) represents the variety of actions available to counter the disturbances. The more variety you have for your actions, the more likely you are able to counteract the disturbances. At least one of them will be able to solve the problem, escape the danger, or restore you to a safe, healthy state. Thus, the Amazonian jungle may not be so dangerous for an explorer having a gun to shoot dangerous animals, medicines to treat disease or snakebite, filters to purify water, and the physical condition to run fast or climb in trees if threatened. The term H(A) enters the inequality with a minus (–) sign, because a wider range of actions allows you to maintain a smaller range of deviations in the essential variables H(E).

The term H(A|D) represents a conditional state. It is also called the lack of requisite knowledge. It has a plus sign since it indicates a “lack”. It is not enough that you have a wide range of actions, you have to know which action will be effective. If you have minimal knowledge, then your best strategy is to try out each action at random, and this is highly inefficient and ineffective if time is not on your side. For example, there is little use in having a variety of antidotes for different types of snakebites if you do not know which snake bit you. H(A|D) expresses your uncertainty about performing an action A (e.g., taking a particular antidote) for a given disturbance D (e.g., being bitten by a particular snake). The larger your uncertainty, the larger the probability that you would choose a wrong action, and thus fail to reduce the deviation H(E). Therefore, this term has a “+” sign in the inequality: more uncertainty (= less knowledge) produces more potentially lethal variation in your essential variables.

The final term B stands for buffering (passive regulation). It expresses your amount of protective reserves or buffering capacity. Better even than applying the right antidote after a snake bite is to wear protective clothing thick enough to stop any snake poison from entering your blood stream. The term is negative because higher capacity means less deviation in the essential variables.

The law of requisite variety expresses in an abstract form what is needed for an organism to prevent or repair the damage caused by disturbances. If this regulation is insufficient, damage will accumulate, including damage to the regulation mechanisms themselves. This produces an acceleration in the accumulation of damage, because more damage implies less prevention or repair of further damage, and therefore a higher rate of additional damage.

The optimal formation for the Law of Requisite Variety occurs when the minimum value for H(E) is achieved, and when there is no lack of requisite knowledge. The essence of regulation is that disturbances happen all the time, but that their effects are neutralized before they have irreparably damaged the organism. This optimal result of regulation is represented as:

H(E)min = H(D) – H(A) – B

I encourage the reader to check out my previous posts on the LRV.

Getting Out of the Dark Room – Staying Curious:

Notes on Regulation:

Storytelling at the Gemba:

Exploring The Ashby Space:

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Notes on Regulation:

References:

[1] Cybernetic Principles of Aging and Rejuvenation: the buffering-challenging strategy for life extension – Francis Heylighen

[2] The Law of Requisite Hierarchy – A. Y. Aulin-Ahmavaara

Getting Out of the Dark Room – Staying Curious:

In today’s post I am looking at the importance of staying curious in the light of Karl Friston’s “Free Energy Principle” (FEP) and Ross Ashby’s ideas on indirect regulation. I have discussed Free Energy Principle here. The FEP basically states that in order to resist the natural tendency to disorder, adaptive agents must minimize surprise.

Karl Friston, the brilliant mind behind FEP noted:

the whole point of the free-energy principle is to unify all adaptive autopoietic and self-organizing behavior under one simple imperative; avoid surprises and you will last longer.

Avoiding surprises means that one has to model and anticipate a changing and itinerant world. This implies that the models used to quantify surprise must themselves embody itinerant wandering through sensory states (because they have been selected by exposure to an inconstant world): Under the free-energy principle, the agent will become an optimal (if approximate) model of its environment. This is because, mathematically, surprise is also the negative log-evidence for the model entailed by the agent. This means minimizing surprise maximizes the evidence for the agent (model). Put simply, the agent becomes a model of the environment in which it is immersed. This is exactly consistent with the Good Regulator theorem of Conant and Ashby (1970). This theorem, which is central to cybernetics, states that “every Good Regulator of a system must be a model of that system.” .. Like adaptive fitness, the free-energy formulation is not a mechanism or magic recipe for life; it is just a characterization of biological systems that exist. In fact, adaptive fitness and (negative) free energy are considered by some to be the same thing.

This idea of the agent having a model of its environment is quite important in Cybernetics. In fact, the idea of FEP can be traced back to Ashby’s ideas on Cybernetics. For an organism to survive, it needs to keep certain internal variables such as blood pressure, internal temperature etc. in a certain range. Ashby called these as essential variables, depicted by “E”. Ashby noted that the goal of regulation is to keep these essential variables in range, in the light of disturbances coming from the environment. In other words, the goal of regulation is to minimize the effect of disturbances coming in. A perfect regulation will result in no disturbances reaching the essential variables. The organism will be completely ignorant of what is going on outside in this case. When the regulation succeeds, we say that the regulator has requisite variety. It is able to counter the variety coming in from the environment. Ashby called this “the law of Requisite Variety”, and explained it succinctly as “only variety can absorb variety.” Ashby explained the direct and indirect regulation as follows:

Direct and indirect regulation occur as follows. Suppose an essential variable X has to be kept between limits x’ and x”. Whatever acts directly on X to keep it within the limits is regulating directly. It may happen, however, that there is a mechanism M available that affects X, and that will act as a regulator to keep X within the limits x’ and x” provided that a certain parameter P (parameter to M) is kept within the limits p’ and p”. If, now, any selective agent acts on P so as to keep it between p’ and p”, the end result, after M has acted, will be that X is kept between x’ and x”.

Now, in general, the quantities of regulation required to keep P in p’ and p” and to keep X in x’ to x” are independent. The law of requisite variety does not link them. Thus, it may happen that a small amount of regulation supplied to P may result in a much larger amount of regulation being shown by X.

When the regulation is direct, the amount of regulation that can be shown by X is absolutely limited to what can be supplied to it (by the law of requisite variety); when it is indirect, however, more regulation may be shown by X than is supplied to P. Indirect regulation thus permits the possibility of amplifying the amount of regulation; hence its importance.

Ashby explained the direct and indirect regulation with the following example:

Living organisms came across this possibility eons ago, for the gene-pattern is a channel of communication from parent to offspring: ‘Grow a pair of eyes,’ it says, ‘ they’ll probably come in useful; and better put hemoglobin into your veins — carbon monoxide is rare and oxygen common.’ As a channel of communication, it has a definite, finite capacity, Q say. If this capacity is used directly, then, by the law of requisite variety, the amount of regulation that the organism can use as defense against the environment cannot exceed Q. To this limit, the non-learning organisms must conform. If, however, the regulation is done indirectly, then the quantity Q, used appropriately, may enable the organism to achieve, against its environment, an amount of regulation much greater than Q. Thus, the learning organisms are no longer restricted by the limit.

A lower cognitive capacity organism may be able to survive with just relying on its gene-pattern, while a higher cognitive capacity organism has to supplement the basic gene-patterns with a learning behavior. In order to do this, it has to learn from its environment. Ashby continued:

In the same way the gene-pattern, when it determines the growth of a learning animal, expends part of its resources in forming a brain that is adapted not only by details in the gene-pattern but also by details in the environment… dictionary. While the hunting wasp, as it attacks its prey, is guided in detail by its genetic inheritance, the kitten is taught how to catch mice by the mice themselves. Thus, in the learning organism the information that comes to it by the gene-pattern is much supplemented by information supplied by the environment; so, the total adaptation possible, after learning, can exceed the quantity transmitted directly through the gene-pattern.

It is important to note that the environment does not input information into the organism. Instead, the organism perceives the environment through its action on the environment. The environment also acts on the organism, just like the organism acts on the environment. Perception is possible only through this circular causal cycle. As Ashby noted, the gene pattern for learning allows for the organism to model its environment, and this allows for the indirect regulation. Ashby explains this point further:

This is the learning mechanism. Its peculiarity is that the gene-pattern delegates part of its control over the organism to the environment. Thus, it does not specify in detail how a kitten shall catch a mouse, but provides a learning mechanism and a tendency to play, so that it is the mouse which teaches the kitten the finer points of how to catch mice. This is regulation, or adaptation, by the indirect method. The gene-pattern does not, as it were, dictate, but puts the kitten into the way of being able to form its own adaptation, guided in detail by the environment.

The Dark Room:

At this point, we can look at the idea of the dark room. This is a thought experiment in FEP. We can try to explain this also using Ashby’s ideas. If the goal of the regulator is to minimize the impact of disturbances on the essential variables, one strategy is to then go to an environment with minimum disturbances. In FEP, this thought experiment is explained similarly as – if the goal of the agent is to minimize surprise, why wouldn’t the agent find a dark room and stay in it indefinitely?

A recurrent puzzle raised by critics of these models (FEP) is that biological systems do not seem to avoid surprises. We do not simply seek a dark, unchanging chamber, and stay there. This is the “Dark-Room Problem.” 

Karl Friston offers an answer to this question:

Technically, the resolution of the Dark-Room Problem rests on the fact that average surprise or entropy H(s|m) is a function of sensations and the agent (model) predicting them. Conversely, the entropy H(s) minimized in dark rooms is only a function of sensory information. The distinction is crucial and reflects the fact that surprise only exists in relation to model-based expectations. The free-energy principle says that we harvest sensory signals that we can predict (cf., emulation theory; Grush, 2004); ensuring we keep to well-trodden paths in the space of all the physical and physiological variables that underwrite our existence. In this sense, every organism (from viruses to vegans) can be regarded as a model of its econiche, which has been optimized to predict and sample from that econiche. Interestingly, free energy is used explicitly for model optimization in statistics (e.g., Yedidia et al., 2005) using exactly the same principles.

This means that a dark room will afford low levels of surprise if, and only if, the agent has been optimized by evolution (or neurodevelopment) to predict and inhabit it. Agents that predict rich stimulating environments will find the “dark room” surprising and will leave at the earliest opportunity. This would be a bit like arriving at the football match and finding the ground empty. Although the ambient sensory signals will have low entropy in the absence of any expectations (model), you will be surprised until you find a rational explanation or a new model (like turning up a day early). Notice that average surprise depends on, and only on, sensations and the model used to explain them. This means an agent can compare the surprise under different models and select the best model; thereby eluding any “circular explanation” for the sensations at hand.

We are born with a gene pattern that allows for learning. The basic pattern is to learn, and our survival mainly comes from this. We are able to get out of the dark room because of this. We are born curious and this allows us to keep on learning. We have an inner ability to keep looking for answers and not be satisfied with status quo.

I am sure there is an important lesson for us all here with the idea of the dark room and the indirect regulation. I could simply say – stay curious and keep on learning. Or I can have you come to that conclusion on your own. As famous Spanish philosopher, José Ortega y Gasset noted – He who wants to teach a truth should place us in the position to discover it ourselves.

I will finish with a great lesson from Ashby to explain the idea of the indirect regulation:

If a child wanted to discover the meanings of English words, and his father had only ten minutes available for instruction, the father would have two possible modes of action. One is to use the ten minutes in telling the child the meanings of as many words as can be described in that time. Clearly there is a limit to the number of words that can be so explained. This is the direct method. The indirect method is for the father to spend the ten minutes showing the child how to use a dictionary. At the end of the ten minutes the child is, in one sense, no better off; for not a single word has been added to his vocabulary. Nevertheless, the second method has a fundamental advantage; for in the future the number of words that the child can understand is no longer bounded by the limit imposed by the ten minutes. The reason is that if the information about meanings has to come through the father directly, it is limited to ten-minutes’ worth; in the indirect method the information comes partly through the father and partly through another channel (the dictionary) that the father’s ten-minute act has made available.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was The Cybernetics of Ohno’s Production System:

The Cybernetics of a Society:

In today’s post, I will be following the thoughts from my previous post, Consistency over Completeness. We were looking at each one of us being informationally closed, and computing a stable reality. The stability comes from the recursive computations of what is being observed. I hope to expand the idea of stability from an individual to a society in today’s post.

Humberto Maturana, the cybernetician biologist (or biologist cybernetician) said – anything said is said by an observer. Heinz von Foerster, one of my heroes in cybernetics, expanded this and said – everything said is said to an observer. Von Foerster’s thinking was that language is not monologic but always dialogic. He noted:

The observer as a strange singularity in the universe does not attract me… I am fascinated by images of duality, by binary metaphors like dance and dialogue where only a duality creates a unity. Therefore, the statement.. – “Anything said is said by an observer” – is floating freely, in a sense. It exists in a vacuum as long as it Is not embedded in a social structure because speaking is meaningless, and dialogue is impossible, if no one is listening. So, I have added a corollary to that theorem, which I named with all due modesty Heinz von Foerster’s Corollary Nr. 1: “Everything said is said to an observer.” Language is not monologic but always dialogic. Whenever I say or describe something, I am after all not doing it for myself but to make someone else know and understand what I am thinking of intending to do.

Heinz von Foerster’s great insight was perhaps inspired by the works of his distant relative and the brilliant philosopher, Ludwig Wittgenstein. Wittgenstein proposed that language is a very public matter, and that a private language is not possible. The meaning of a word, such as “apple” does not inherently come from the word “apple”. The meaning of the word comes from how it is used. The meaning comes from repeat usage of the word in a public setting. Thus, even though the experience of an apple may be private to the individual, how we can describe it is by using a public language. Von Foerster continues:

When other observers are involved… we get a triad consisting of the observers, the languages, and the relations constituting a social unit. The addition produces the nucleus and the core structure of society, which consists of two people using language. Due to the recursive nature of their interactions, stabilities arise, they generate observers and their worlds, who recursively create other stable worlds through interacting in language. Therefore, we can call a funny experience apple because other people also call it apple. Nobody knows, however, whether the green color of the apple you perceive, is the same experience as the one I am referring to with the word green. In other words, observers, languages, and societies are constituted through recursive linguistic interaction, although it is impossible to say which of these components came first and which were last – remember the comparable case of hen, egg and cock – we need all three in order to have all three.

Klaus Krippendorff defined closure as follows – A system is closed if it provides its own explanation and no references to an input are required. With closures, recursions are a good and perhaps the only way to interact. As organizationally closed entities, we are able to stay viable only as part of a social realm. When we are part of a social realm, we have to construct reality with reference to an external reference. Understanding is still generated internally, but with an external point of reference. This adds to the reality of the social realm as a collective. If the society has to have an identity that is sustained over time, its viability must come from its members. Like a set of nested dolls, society’s structure comes from participating individuals who themselves are embedded recursively in the societal realm. The structure of the societal or social realm is not designed, but emergent from the interactions, desires, goals etc. of the individuals. The society is able to live on while the individuals come and go.

I am part of someone else’s environment, and I add to the variety of their environment with my decisions and actions (sometimes inactions). This is an important reminder for us to hold onto in light of recent world events including a devastating pandemic. I will finish with some wise words from Heinz von Foerster:

A human being is a human being together with another human being; this is what a human being is. I exist through another “I”, I see myself through the eyes of the Other, and I shall not tolerate that this relationship is destroyed by the idea of the objective knowledge of an independent reality, which tears us apart and makes the Other as object which is distinct from me. This world of ideas has nothing to do with proof, it is a world one must experience, see, or simply be. When one suddenly experiences this sort of communality, one begins to dance together, one senses the next common step and one’s movements fuse with those of the other into one and the same person, into a being that can see with four eyes. Reality becomes communality and community. When the partners are in harmony, twoness flows like oneness, and the distinction between leading and being led has become meaningless.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Consistency over Completeness:

Source – The Certainty of Uncertainty: Dialogues Introducing Constructivism By Bernhard Poerksen

Consistency over Completeness:

Today’s post is almost a follow-up to my earlier post – The Truth about True Models. In that post, I talked about Dr. Donald Hoffman’s idea of Fitness-Beats-Truth or FBT Theorem. Loosely put, the idea behind the FBT Theorem is that we have evolved to not have “true” perceptions of reality. We survived because we had “fitness” based models and because we did not have “true models”. In today’s post, I am continuing on this idea using the ideas from Heinz von Foerster, one of my Cybernetics heroes.

Heinz von Foerster came up with “the postulate of epistemic homeostasis”. This postulate states:

The nervous system as a whole is organized in such a way (organizes itself in such a way) that it computes a stable reality.

It is important to note here that, we are speaking about computing “a” reality and not “the” reality. Our nervous system is informationally closed (to follow up from the previous post). This means that we do not have direct access to the reality outside. All we have is what we can perceive through our perception framework. The famous philosopher, Immanuel Kant, referred to this as the noumena (the reality that we don’t have direct access to) and the phenomena (the perceived representation of the external reality). All we can do is to compute a reality based on our interpretive framework. This is just a version of the reality, and each one of us computes such a reality that is unique to each one of us.

The other concept to make note of is the “stable” part of the stable reality. In Godelian* speak, our nervous system cares more about consistency than completeness. When we encounter a phenomenon, our nervous system looks at stable correlations from the past and present, and computes a sensation that confirms the perceived representation of the phenomenon. Von Foerster gives the example of a table. We can see the table, and we can touch it, and maybe bang on it. With each of these confirmations and correlations between the different sensory inputs, the table becomes more and more a “table” to us.

*Kurt Godel, one of the famous logicians of last century came up with the idea that any formal system able to do elementary arithmetic cannot be both complete and consistent; it is either incomplete or inconsistent.

From the cybernetics standpoint, we are talking about an observer and the observed. The interaction between the observer and the observed is an act of computing a reality. The first step to computing a reality is making distinctions. If there are no distinctions, everything about the observed will be uniform, and no information can be processed by the observer. Thus, the first step is to make distinctions. The distinctions refer to the variety of the observed. The more distinctions there are, the more variety the observed has. From a second order cybernetics standpoint, the variety of the observed depends upon of the variety of the observer. This goes back to the unique stable reality computation point from earlier. Each one of us are unique in how we perceive things. This is our variety as the observer. The observed, that which is external to us, always has more potential variety than us. We cut down or attenuate this high variety by choosing certain attributes that interests us. Once the distinctions are made, we find relations between these distinctions to make sense of it all. This corresponds to the confirmations and correlations that we noted above in the example of a table.

We are able to survive in our environment because we are able to continuously compute a stable reality. The stability comes from the recursive computations of what is being observed. For example, lets go back to the example of the table. Our eyes receive the sensory input of the image of the table. This is a first set of computation. This sensory image then goes up the “neurochain”, where it is computed again. This happens again and again as the input gets “decoded” at each level, until it gets satisfactorily decoded by our nervous system. The final result is a computation of a computation of a computation of a computation and so on. The stability is achieved from this recursion.

The idea of a consistency over completeness is quite fascinating. This is mainly due to the limitation of our nervous system to have a true representation of the reality. There is a common belief that we live with uncertainty, but our nervous system strives to provide us a stable version of reality, one that is devoid of uncertainties. This is a fascinating idea. We are able to think about this only from a second order standpoint. We are able to ponder about our cognitive blind spots because we are able to do second order cybernetics. We are able to think about thinking. We are able to put ourselves into the observed. Second order cybernetics is the study of observing systems where the observer themselves are part of the observed system.

I will leave the reader with a final thoughtthe act of observing oneself is also a computation of “a” stable reality.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Wittgenstein and Autopoiesis:

When is a Model Not a Model?

Ross Ashby, one of the pioneers of Cybernetics, started an essay with the following question:

I would like to start not at: How can we make a model?, but at the even more primitive question: Why make a model at all?

He came up with the following answer:

I would like then to start from the basic fact that every model of a real system is in one sense second-rate. Nothing can exceed, or even equal, the truth and accuracy of the real system itself. Every model is inferior, a distortion, a lie. Why then do we bother with models? Ultimately, I propose. we make models for their convenience.

To go further on this idea, we make models to come up with a way to describe “how things work?” This is done for us to also answer the question – what happens when… If there is no predictive or explanatory power, there is no use for the model. From a cybernetics standpoint, we are not interested in the “What is this thing?”, but the “What does this thing do?” We never try to completely understand a “system”. We understand it in chunks, the chunks that we are interested in. We construct a model in our heads that we call a “system” to make sense of how we think things work out in the world. We only care about certain specific interactions and its outcomes.

One of the main ideas that Ashby proposed was the idea of variety. Loosely put, variety is the number of available states a system has. For example, a switch has a variety of two – ON or OFF. A stop light has a variety of three (generally) – Red, Yellow or Green. As we increase the complexity, the variety also increases. The variety is dependent on the ability of the observer to discern them. A keen-eyed observer can discern a higher number of states for a phenomenon than another observer. Take the example of the great fictional characters, Sherlock Holmes and John Watson. Holmes is able to discern more variety than Watson, when they come upon a stranger. Holmes is able to tell the most amazing details about the stranger that Watson cannot. When we construct a model, the model lacks the original variety of the phenomenon we are modeling. This is important to keep in mind. The external variety is always much larger than the internal variety of the observer. The observer simply lacks the ability to tackle the extremely high amount of variety. To address this, the observer removes or attenuates the unwanted variety of the phenomenon and constructs a simpler model. For example, when we talk about a healthcare system, the model in our mind is pretty simple. One hospital, some doctors and patients etc. It does not include the millions of patients, the computer system, the cafeteria, the janitorial service etc. We only look at the variables that we are interested in.

Ashby explained this very well:

Another common aim that will have to be given up is that of attempting to “understand” the complex system; for if “understanding” a system means having available a model that is isomorphic with it, perhaps in one’s head, then when the complexity of the system exceeds the finite capacity of the scientist, the scientist can no longer understand the system—not in the sense in which he understands, say, the plumbing of his house, or some of the simple models that used to be described in elementary economics.

A crude depiction of model-making is shown below. The observer has chosen certain variables that are of interest, and created a similar “looking” version as the model.

Ashby elaborated on this idea as:

We transfer from system to model to lose information. When the quantity of information is small, we usually try to conserve it; but when faced with the excessively large quantities so readily offered by complex systems, we have to learn how to be skillful in shedding it. Here, of course, model-makes are only following in the footsteps of the statisticians, who developed their techniques precisely to make comprehensible the vast quantities of information that might be provided by, say, a national census. “The object of statistical methods, said R. A. Fisher, “is the reduction of data.”

There is an important saying from Alfred Korzybski – the map is not the territory. His point was that we should take the map to be the real thing. An important corollary to this, as a model-maker is:

If the model is the same as the phenomenon it models, it fails to serve its purpose. 

The usefulness of the model is in it being an abstraction. This is mainly due to the observer not being able to handle the excess variety thrown at them. This also answers one part of the question posed in the title of this post – A model ceases to be a model when it is the same as the phenomenon it models. The second part of the answer is that the model has to have some similarities to the phenomenon, and this is entirely dependent on the observer and what they want.

This brings me to the next important point – We can only manage models. We don’t manage the actual phenomenon; we only manage the models of the phenomenon in our heads. The reason being again that we lack the ability to manage the variety thrown at us.

The eminent management cybernetician, Stafford Beer, has the following words of wisdom for us:

Instead of trying to specify it in full detail, you specify it only somewhat. You then ride on the dynamics of the system in the direction you want to go.

To paraphrase Ashby, we need not collect more information than is necessary for the job. We do not need to attempt to trace the whole chain of causes and effects in all its richness, but attempt only to relate controllable causes with ultimate effects.

The final aspect of model-making is to take into consideration the temporary nature of the model. Again, paraphrasing Ashby – We should not assume the system to be absolutely unchanging. We should accept frankly that our models are valid merely until such time as they become obsolete.

Final Words:

We need a model of the phenomenon to manage the phenomenon. And how we model the phenomenon depends upon our ability as the observer to manage variety. We only need to choose certain specific variables that we want. Perhaps, I can explain this further with the deep philosophical question – If a tree falls in a forest and no one is around to hear it, does it make a sound? The answer to a cybernetician should be obvious at this point. Whether there is sound or not depends on the model you have, and if you have any value in the tree falling having a sound.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was The Maximum Entropy Principle:

The Maximum Entropy Principle:

In today’s post, I am looking at the Maximum Entropy principle, a brainchild of the eminent physicist E. T. Jaynes. This idea is based on Claude Shannon’s Information Theory. The Maximum Entropy principle (an extension of the Principle of Insufficient Reason) is the ideal epistemic stance. Loosely put, we should model only what is known, and we should assign maximum uncertainty for what is unknown. To explain this further, let’s look at an example of a coin toss.

If we don’t know anything about the coin, our prior assumption should be that heads or tails are equally likely to happen. This is a stance of maximum entropy. If we assumed that the coin was loaded, we would be trying to “load” our assumption model, and claim unfair certainty. Entropy is a measure proposed by Claude Shannon as part of his information theory. Low entropy messages have low information content or low surprise content. High entropy messages on the other hand have high information content or high surprise content. The informational entropy is also inversely proportional to the probability of an event. Low probability events have high information content. For example, an unlikely defeat of a reigning sports team generates more surprise than a likely win. Entropy is the average level of information when we consider all of the probabilities. In the case of the coin toss, the entropy is the average level of information when we consider the probability of heads or tail. For discrete events, the entropy is maximum for equally likely events, or in other words for uniform distribution. Thus, when we say that the probability of heads or tails is 0.5, we are assuming a maximum entropy model. In the case of uniform distribution, the maximum entropy model is also the same as Laplace’s principle of insufficient reason. If the coin was always landing on heads, we have a zero entropy case because there is no new information available. If it is a loaded coin that makes one side more likely to occur, then the entropy is lower than if it is a fair coin. This is shown below, where the X-axis is the probability of Heads, and the Y-axis is the information entropy. We can see that Pr(0) or no Heads, and Pr(1) or 100% Heads have zero entropy value. The highest value for entropy happens when the probability for heads is 0.5 or 50%. For those who are interested, Jon von Neumann had a great idea to make a loaded coin fair. You can check out that here.

From this standpoint, if we take a game, where one team is more favored to win, we could say that the most informative part of a game is sometimes the coin toss.

Let’s consider the case of a die. There are six possible events (1 through 6) when we roll a die. The maximum entropy model will be to assume a uniform distribution, i.e., to assign 1/6 as the probability for 1 through 6 value. If we somehow knew that 6 is more likely to happen. For example, if the manufacturer of the loaded die says that the number 6 is likely to occur 3/6 of the times. Per the maximum entropy model, we should divide the remaining 3/6 equally among the remaining 5 numbers. With each additional piece of information, we should change our model so that the entropy is at its maximum. What I have discussed here is the basic information regarding maximum entropy. Each new piece of “valid” information that we need to incorporate into our model is called a constraint. The maximum entropy approach utilizes Lagrangian multipliers to find the solutions. For discrete events, with no additional information, the maximum entropy model is the uniform distribution. In a similar vein, if you are looking at a continuous distribution, and you knew what the mean and variance of the distribution is, the maximum entropy model is the normal distribution.

The Role of The Observer:

Jaynes asked a great question about the information content of a message. He noted:

In a communication process, the message m(i) is assigned probability p(i), and the entropy H, is a measure of information. But WHOSE information?… The probabilities assigned to individual messages are not measurable frequencies; they are only a means of describing a state of knowledge.

The general idea of probability in the frequentist’s version of statistics is that it is fixed. However, in the Bayesian version, the probability is not a fixed entity. It represents a state of knowledge. Jaynes continues:

Entropy, H, measures not the information of the sender, but the ignorance of the receiver that is removed by the receipt of the message.

To me, this brings up the importance of the observer and circularity. As the great cybernetician Heinz von Foerster said:

“The essential contribution of cybernetics to epistemology is the ability to change an open system into a closed system, especially as regards the closing of a linear, open, infinite causal nexus into closed, finite, circular causality.”

Let’s go back to the example of a coin. If I am an alien and if I knew nothing about coins, should my maximum entropy model only include two possibilities of heads or tails? Why should it not include the coin landing on its edge? Or if a magician is tossing the coin, should I account for the coin to vanish in thin air? The assumption of just two possibilities (head or tails) is the prior information that we are accounting for, by saying that the probability of a heads or a tail is 0.5. As we gain more knowledge about the coin toss, we can update the model to reflect it, and at the same time change the model to a new state of maximum entropy. This iterative, closed loop process is the backbone of scientific enquiry and skepticism. The use of the maximum entropy model is a stance that we are taking to state our knowledge. Perhaps a better way to explain the coin toss is that – given our lack of knowledge about the coin, we are saying that the heads is not more likely to happen than tails until we find more evidence. Let’s look at another interesting example where I think the maximum entropy model comes up.

The Veil of Ignorance:

The veil of ignorance is an idea about ethics proposed by the great American Political philosopher, John Rawls. Loosely put, in this thought experiment, Rawls is asking us what kind of society should we aim for? Rawls asks us to imagine that we are behind a veil of ignorance, where we are completely ignorant of our natural abilities, societal standing, family etc. We are then randomly assigned a role in society. The big question then is – what should society be like where this random assignment promotes fairness and equality? The random assignment is a maximum entropy model since any societal role is equally likely.

Final Words:

Maximum entropy principle is a way of saying to not put all of your eggs in one basket. It is a way to be aware of your biases and it is an ideal position for learning. It is similar to the Epicurus’ principle of Multiple Explanations, that says – “Keep all the different hypotheses that are consistent with the facts.”

It is important to understand that “I don’t know,” is a valid and acceptable answer. It marks the boundary for learning.

Jaynes explained maximum entropy as follows:

The maximum entropy distribution may be asserted for the positive reason that is uniquely determined as the one which is maximally noncommittal with regard to missing information, instead of the negative one that there was no reason to think otherwise… Mathematically, the maximum entropy distribution has the important property that no possibility is ignored; it assigns positive weight to every situation that Is not absolutely excluded by the given information.

We learned that probability and entropy are dependent on the observer. I will finish off with the wise words from James Dyke and Axel Kleidon.

Probability can now be seen as assigning a value to our ignorance about a particular system or hypothesis. Rather than the entropy of a system being a particular property of a system, it is instead a measure of how much we know about a system.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Destruction of Information/The Performance Paradox:

Destruction of Information/The Performance Paradox:

Ross Ashby was one of the pioneers of Cybernetics. His 1956 book, An Introduction to Cybernetics, is still one of the best introductions to Cybernetics. As I was researching his journals, I came across an interesting phrase – “destruction of information.” Ashby noted:

I am not sure whether I have stated before my thesis – that the business of living things is the destruction of information.

Ashby gave several examples to explain what he meant by this. For example:

Consider a thermostat controlling a room’s temperature. If it is working well, we can get no idea, from the temperature of the room whether it is hot or cold outside. The thermostat’s job is to stop this information from reaching the occupant.

He also gave the example of an antiaircraft gun and its predictor. Suppose we observe only the error made by each shell in succession. If the predictor is perfect, we shall get the sequence of 0,0,0,0 etc. By examining this sequence, we can get no information of about how the aircraft maneuvered. Contrast this with the record of a poor predictor: 2, 1, 2, 3… -3, 0, 3 etc. By examining, this we can get quite a good idea of how the pilot maneuvered. In general, the better the predictor, the less the maneuvers show in the errors. The predictor’s job is to destroy this information.

As an observer, we learn about a living system or a phenomenon by the variety it displays. Here, variety can be loosely expressed as the number of distinct states a system has. Interestingly, the number of states or the variety is dependent upon the system demonstrating it, as well as the observer’s ability to distinguish the different states. If the observer is not able to make the needed number of distinctions, then less information is generated. On the other hand, if the system of interest is able to hide its different states, it minimizes the amount of information available for the observer. In this post, we are interested in the latter category. Ashby talks about an interesting example to further this idea:

An insect whose coloration makes it invisible will not show, by its survival or disappearance whether a predator has or has not seen it. An imperfectly colored one will reveal this fact by whether it has survived or not.

Another example, Ashby gives is that of an expert boxer:

An expert boxer, when he comes home, will show no signs of whether he had a fight in the street or not. An imperfect boxer will carry the information.

Ashby’s idea can be further looked at from an adaptation standpoint. When you adapt very well to your everchanging surroundings, you are destroying information or you are not demonstrating any information. Ashby also noted that adaptation means “destroying information.” In this manner, you know that you are adapting well, when you don’t break a sweat. A master swordsman moves effortlessly while defeating an opponent. A good runner is not out of breath after a quick sprint.

The Performance Paradox:

My take on this idea from Ashby is to express it as a form of performance paradox – When something works really well, you will not notice it, or worse you will think that it’s wasteful. The most effective and highly efficient components stay the quietest. The best spy is the one you have not ever heard of. When you try to monitor a highly performing component, you may rarely get evidence of its performance. It is almost as if it is wasteful. Another way to view this is – the imperfect components lend themselves to be monitored, while the perfect components do not. The danger in not understanding regulation from a cybernetics standpoint is to completely misread the interactions, and assume that the perfect component has no value.

I encourage the reader to read further upon these ideas here:

Edit (12/1/2020): Adding more clarity on “destruction of information”.

The phrase “destruction of information” was used by Ashby from a Shannon entropy sense. He is indicating that the agent is purposefully reducing the information entropy that would had been otherwise available. Another example is that of a good poker player, who is difficult to read.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Locard’s Exchange Principle at the Gemba:

The Truth About True Models:

I recently came across Dr. Donald Hoffman’s idea of Fitness-Beats-Truth or FBT Theorem. This is the idea that evolution stamps out true perceptions. In other words, an organism is more likely to survive if it does not have a true and accurate perception. As Hoffman explains it:

Suppose there is an objective reality of some kind. Then the FBT Theorem says that natural selection does not shape us to perceive the structure of that reality. It shapes us to perceive fitness points, and how to get them… The FBT Theorem has been tested and confirmed in many simulations. They reveal that Truth often goes extinct even if Fitness is far less complex.

Hoffman suggests that natural selection did not shape us to perceive the structure of an objective reality. Evolution gave us a less complex but efficient perceptual network that takes shortcuts to perceive “fitness points.” Evolution by natural selection does not favor true perceptions—it routinely drives them to extinction. Instead, natural selection favors perceptions that hide the truth and guide useful action.

An easy to way to digest this idea is to consider our ancient ancestors. If they heard a rustling sound in the grass, it benefitted them to not analyze and capture the entire surrounding to get an accurate and true model of the reality. Instead, they would survive only if they got a “quick and dirty” or good-enough model of the surrounding. They did not gain anything by having an elaborate and accurate perception. Their quick and dirty heuristics such as “if you hear a rustling on the grass, then flee” allowed them to survive and pass of their genes. In other words, their fitter perception did not comprise of a true and accurate perception of the world around them. They gained (they survived) based on fitness rather than truth. As Hoffman noted, having true perception would have been detrimental because it avoided shortcuts and heuristics that saved time. As complexity increases, heuristics work much better.

The idea of FBT aligns pretty well with the ideas of second order cybernetics (SOC) and radical constructivism. From an SOC standpoint, the emphasis for the representation of the world is not that of a model of causality, but of a model of constraints. As Ernst von Glasersfeld explains this:

In the biological theory of evolution, we speak of variability and selection, of environmental constraints and of survival. If an organism survives individually or as a species it means that, so far at least, it has been viable in the environment in which it happens to live. To survive, however, does not mean that the organism must in any sense reflect the character or the qualities of his environment. Gregory Bateson (1967) was the first who noticed that this theory of evolution, Darwin’s theory, is really a cybernetic theory because it is based on the concept of constraint rather than on the concept of causation.

In order to remain among the survivors, an organism has to ‘‘get by” the constraints which the environment poses. It has to squeeze between the bars of the constraints, to coin a metaphor. The environment does not determine how that might he achieved. It does not cause certain organisms to have certain characteristics or capabilities or to be a certain way. The environment merely eliminates those organisms that knock against its constraints. Anyone who by any means manages to get by the constraints, survives… All the environment contributes is constraints that knock out some of the changed organisms while others are left to survive. Thus, we can say that the only indication we may get of the ‘‘real” structure of the environment is through the organisms and the species that have been extinguished; the viable ones that survive merely constitute a selection of solutions among an infinity of potential solutions that might be equally viable.

Nature prefers efficient solutions that does the work most of the time, rather than effective solutions that work all of the time – solutions that prefer least energy expenditure, least number of parts etc. This approach also resonates with Occam’s razor. It is always advisable to have the least number of assumptions in your model. Another way to look at this is – the design with the least number of moving parts is always preferred.

The idea that true perceptions are not always advantageous may be counterintuitive. As complexity increases, we lack the perceptual network to truly comprehend the complexity. How we perceive our world around us depends a lot on our perceptual network, which is unique to our species. Our reality consists of omitting most of the attributes of the world around us. As Hoffman explains – the reality becomes simply a species-specific representation of fitness points on offer, and how we can act to get those points. Evolution has shaped us with perceptions that allow us to survive. But part of that involves hiding from us the stuff we don’t need to know.

Complexity also favors this approach of viable solutions/fitter perceptions. Hoffman notes:

We find that increasing the complexity of objective reality, or perceptual systems, or the temporal dynamics of fitness functions, increases the selection pressures against veridical perceptions.

I will add more thoughts on the FBT theorem at a later time. I encourage the readers to check out Hoffman’s book, The Case Against Reality.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Talking about Constraints in Cybernetics:

Talking about Constraints in Cybernetics:

In today’s post, I am looking at constraints with respect to Cybernetics. I am looking mainly at the ideas from Ross Ashby, one of the pioneers of Cybernetics. Ashby wrote one of the best introductions to Cybernetics, aptly titled An Introduction to Cybernetics. Ashby described constraints in terms of variety. Variety is the number of distinct elements that an observer is capable of making. For example, consider the following set of elements:

{a, b, b, B, c, C}

Someone could say that the variety of this set is 3 since there are three letters. Some other person could say that the variety is actually 5 if the lower and upper cases are distinguished. A very common example to explain variety is a traffic stop light. Generally, the stop light in the US has 3 states (Red, Yellow and Green). Sometimes, additional states are possible such as blinking Red (indicating a STOP sign) or no light. Thus, the variety of a stop light can vary from 3 to 4 to 5.

Ashby explained constraints as – when there are two related sets and one set has less variety than the other, we can determine that a constraint is present in the set of elements with less variety. Let’s consider the stop light again. If all the lights were independent, we can have 8 possible states. This is shown below, where “X” means OFF and “O” means ON.

Figure 1 – The Eight States of a Stop Light

Per our discussion above, we utilize mainly 3 of these states to control traffic (ignoring the blinking states). These are identified in the blue shaded cells {2, 6, 7}. Thus, we can say that there is a constraint applied on the stop light since the actual variety the stop light possesses is 3 instead of 8. Ashby distinguishes slight and severe constraints. The example that Ashby gives is applying a constraint on a squad of soldiers in a single rank. The soldiers can be made to stand in numerous ways. For example, if the constraint to be applied is that no one soldier is to stand next to another soldier who shares the same birthday, the variety achieved is high. This is an example of a slight constraint. It is highly unlikely that two soldiers share the same birthday in a small group. However, if the constraint to be applied is that the soldiers should arrange themselves in the order of their height, the variety is then highly reduced. This is an example of a severe constraint.

Another example that Ashby gives is that of a chair. A chair taken as a whole has six degrees of freedom for movement. However, when the chair is disassembled into its parts, the freedom for movement increases. Ashby said:

A chair is a thing because it has coherence, because we can put it on this side of a table or that, because we can carry it around or sit on it. The chair is also a collection of parts. Now any free object in our three-dimensional world has six degrees of freedom for movement. Were the parts of the chair unconnected each would have its own six degrees of freedom; and this is in fact the amount of mobility available to the parts in the workshop before they are assembled. Thus, the four legs, when separate, have 24 degrees of freedom. After they are joined, however, they have only the six degrees of freedom of the single object. That there is a constraint is obvious when one realizes that if the positions of three legs of an assembled chair are known, then that of the fourth follows necessarily—it has no freedom.

Thus, the change from four separate and free legs to one chair corresponds precisely to the change from the set’s having 24 degrees of freedom to its having only 6. Thus, the essence of the chair’s being a “thing”, a unity, rather than a collection of independent parts corresponds to the presence of the constraint.

Ashby continued:

Seen from this point of view, the world around us is extremely rich in constraints. We are so familiar with them that we take most of them for granted, and are often not even aware that they exist. To see what the world would be like without its usual constraints we have to turn to fairy tales or to a “crazy” film, and even these remove only a fraction of all the constraints.

There are several takeaways we can have from Ashby’s explanation of constraints.

  1. The effect of the observer: The observer is king when it comes to cybernetics. The variety of an observed system is dependent on the observer. This means that the observation is subject to the constraints that the observer applies knowingly or unknowingly in the form of biases, beliefs, etc. The observer brings and applies internal constraints on the external world. Taking this a step further, our experiential reality is a result of our limited perceptual network. For example, we can see only a small section of the light spectrum. We can hear only a small section of the sound spectrum. We have cognitive blind-spots that we are not aware of. And yet we claim access to an objective reality and we are surprised when people don’t understand our point of view. We should not force our own views such that we come up with false dichotomies. This is sadly all very prevalent in today’s politics where almost every matter has been turned into a political viewpoint.
  2. Constraints are not a bad thing: Ashby’s great insight was that when a constraint exists, we can take advantage of it. We can make reasonably good predictions when constraints exist. Constraints help us to understand how things work. Ashby said that every law of nature is a constraint. We are able to estimate the variety that would exist if total independence occurred. We are able to minimize this variety by understanding the existing variety and adding further constraints as possible to produce results that we want. Adding constraints is about reducing unwanted variety. Design Engineering takes full use of this. On a similar note, Ashby also pointed out that learning is possible only to the extent that a sequence shows constraint. Learning is only possible when there is a constraint. If we are to learn a language, we learn it by learning the constraints that exists in the language in the form of syntax, meanings of the words, grammar etc.
  3. Law of Requisite Variety: Ross Ashby came up with the Law of Requisite Variety. This law simply can be explained as variety destroys (compensates) variety. For example, a good swordsman is able to fend off an opponent, if they are able to block and counter-attack every move of the opponent. The swordsman has to match the variety of the opponent (the set of attacks and blocks). To take our previous example, the stop light has to have a requisite variety to control traffic. If the 3 states identified in Figure 1 are not enough, the “system” will absorb the variety in the form of a traffic jam. When we think in terms of constraints, the requisite variety should be aligned with the identified constraints. We should minimize bringing in our internal constraints, and watch for the external constraints existing. The variety that we need to match must be aligned to the constraints already existing.
  4. Constraints do not need to be Objects: Similar to point 1, what we tell ourselves in terms of narratives and stories are also constraints. We are Homo Narrans – storytellers. We make sense of the world in terms of the stories we share and tell ourselves and others. We control ourselves and others with the stories we tell. We limit ourselves with what we believe. If we can understand the stories, we tell ourselves or others are telling us, we can better ourselves.
  5. Adaptation or Fit: Ashby realized that an organism can adapt just so far as the real world is constrained, and no further. Evolution is about fit. It is about supporting those factors that allow the organism to match the constraint in order to survive. The organism evolves to match the changing constraints present in the changing environment. This often happens through finding use for what is already existing. There is a great example that Cybernetician and Radical Constructivist, Ernst von Glasersfeld gives – the way the key fits a lock that it is able to open:

The fit describes a capacity of the key, not a property of the lock. When we face a novel problem, we are in much the same position as the burglar who wishes to enter a house. The “key” with which he successfully opens the door might be a paper clip, a bobby pin, a credit card, or a skillfully crafted skeleton key. All that matters is that it fits within the constraints of the particular lock and allows the burglar to get in.

I will finish with Ernst von Galsersfeld’s description of Cybernetics in terms of constraints:

Cybernetics is not interested in causality but constraints. Cybernetics is the art of maintaining equilibrium in a world of constraints and possibilities.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was Deconstructing Systems – There is Nothing Outside the Text:

Deconstructing Systems – There is Nothing Outside the Text:

In today’s post, I am looking at ideas of the famous Algerian-French philosopher, Jacques Derrida. Derrida is often described as a post-structuralist philosopher. His most famous idea is deconstruction. Deconstruction is often associated with analyzing literary works. The basic notion of deconstruction can be loosely explained as when a text is produced, the author dies, and the reader is born. A text is presented as a coherent whole with a basic idea in the center. The language in the text is all about the idea in the center. The assumption is that the central idea has a fixed meaning. The point of deconstruction is then to disturb this coherent whole, and challenge the hierarchy of the coherent whole. The intent of deconstruction is discovery; the discovery of what is hidden behind the elaborate plot to stage the central idea. It is an attempt to subvert the dominant theme.

Deconstruction is taking the text apart to understand the structure of the text as it is written, and to determine the meaning in several different ways by challenging the hierarchy put in focus by the author. Derrida believed that in language we always prefer hierarchies. We prefer good over bad, or day over night etc. Most often this behavior of focusing on hierarchies results in believing them to be the ultimate truth. We tend to think in terms of false dichotomies. It has to be “this” or “that”. If I don’t do “this”, I am “bad”. Deconstruction always pushes us to look at it from another side or perspective. Deconstruction challenges the notion that language is a closed system – that the meaning is fixed. Derrida viewed language to be an open system, where meaning is not fixed and can depend on the context, the culture and the social realm in which it was constructed. Every perspective is an attempt to focus on certain ideas. But in the act of doing this, we are forced to ignore certain other ideas. The act of deconstruction is an attempt to look at the ideas that lay concealed in the text.

Another important idea that Derrida put forward was differance. Derrida came up with this as a play on words. Derrida is putting two different ideas together into one word. The two different ideas are that of difference (how one word get its meaning by being different to another), and deference (how the meaning of a word is provided in terms of yet more words). The idea of differance is that the complete meaning is always deferred (postponed) and is also differential. The dictionary is a great example to explain differance. The meaning of a word is given in terms of other words. The meaning of those words is given in terms of yet another set of words, and so on.

Derrida’s most famous quotation is – Il n’y a pas de hors-texte. This is often translated as “There is nothing outside the text.” This idea is misrepresented as all ideas are contained in language and that you cannot go outside the language. Derrida was not saying this. A better translation is – There is no outside-text. Here the outside-text refers to an inset in a book, something that is provided in a book as a supplement to provide clarity. We can see this as an outside authority trying to shed light on the book. Derrida is saying that there is no such thing. The meaning is not fixed, and what is presented as a closed system is actually an open system. We have to understand the historicity and context of the text to gain better understanding. Derrida is inviting us to feel the texture of text. As Alex Callinicos explained it:

Derrida wasn’t, like some ultra-idealist, reducing everything to language (in the French original he actually wrote ‘Il n’y a pas de hors-texte’ – ‘There is no outside-text’). Rather he was saying that once you see language as a constant movement of differences in which there is no stable resting point, you can no longer appeal to reality as a refuge independent of language. Everything acquires the instability and ambiguity that Derrida claimed to be inherent in language.

 Derrida says that every text deconstructs themselves. Every text has contradictions, and the author has written the text in a forceful manner to stay away from the internal contradictions. Derrida is inviting us to challenge the coherence of text by pulling on the central idea and supplementing it to distort the balance. Paul Ricoeur wonderfully explained deconstruction as an act that uncovers the question behind the answers already provided in the text. The answers are already there, and our job then is to find the questions. We cannot assume that we have understood the entire meaning of the text. We have to undo what we have learned and try to feel the texture of the relations of the words to each other in the text.

Derrida was influenced by the ideas of Ferdinand de Sassure, who was a pioneer of a movement called Structuralism. Structuralism presents language as a self-enclosed system in which the important relationships are not those between words and the real objects to which they refer, but rather those internal to language and consisting in the interrelations of signifiers. Ferdinand de Sassure stated that in language, there are only differences. Derrida went a step further this. He challenged the idea of the continuous movement of differences and postponement of meaning that came as a result of structuralism. Callinicos explained this beautifully:

There is no stable halting point in language, but only what Derrida called ‘infinite play’, the endless slippages through which meaning is sought but never found. The only way to stop this play of difference would be if there were what Derrida called a ‘transcendental signified’ – a meaning that exists outside language and that therefore isn’t liable to this constant process of subversion inherent in signification. But the transcendental signified is nothing but an illusion, sustained by the ‘metaphysics of presence’, the belief at the heart of the western philosophical tradition that we can gain direct access to the world independently of the different ways in which we talk about and act on it…

He (Derrida) believed that it was impossible to escape the metaphysics of presence. Meaning in the shape of the ‘transcendental signified’ may be an illusion, but it is a necessary illusion. Derrida summed this tension up by inventing the word ‘differance’, which combines the meanings of ‘differ’ and ‘defer’. Language is a play of differences in which meaning is endlessly deferred, but constantly posed. The idea of differance informed Derrida’s particular practice of philosophy, which he called deconstruction. The idea was to scrutinize texts – particularly philosophical classics – to expose both how they participated in the metaphysics of presence and also the flaws and tensions through which the limitations of this way of thinking were revealed. As a result, these texts would end up very different from how they had seemed when Derrida started on them: they would have been dismantled – deconstructed.

 Deconstructing Systems:

At this point, I will look at deconstructing Systems. The idea of a System is very much aligned to the ideas of Structuralism. A system is viewed as a whole with interconnected parts working together. The focus is on the benefit of the whole. The whole is the central idea of Systems Thinking. The whole is said to be more than the sum of its parts. The parts must be sub-servient to the whole.

When we approach systems with the ideas of deconstruction, we realize that every system is contingent on who is observing the system. There is no system without an observer. This makes all systems to be human systems. We have to consider the role of the observer and the impossibility of an objective world. As the famous Cybernetician, Klaus Krippendorff said – whatever is outside our nervous system is accessible only through our nervous system, and cannot be observed directly and separated from how that nervous system operates. We may refer to and talk about the same “system.” However, what constitutes the system, its complexity and what we desire its purpose to be all depend upon the observer. All systems are constructed in a social realm. After all, meaning is assigned in the social realm, where we bring forth the world together through “languaging.” What the whole is and whether a part should be subservient to the whole depends upon who constructs the system as a mental construct to make sense of the world. If you consider the healthcare system, what it means and what it should do depends on who you talk to. If you talk to the healthcare provider or the insurance company or the patient, you would get different answers as to what the healthcare system means and what it should be doing. There is no one objective healthcare system. We can all identify the parts, but what the “system” means cannot be objectively identified. We must look at this from different perspectives to challenge the metanarratives. We should welcome multiple perspectives. Every perspective reveals certain attributes that were hidden before; the process of which knowingly or unknowingly requires hiding certain other attributes. From the discussion, we might say that – The center does not hold in systems.

There are many similarities between the hard systems approach of Systems Thinking and Structuralism. We talk of systems as if they are real and that everyone can objectively view and understand it. Gavin. P. Hendricks sheds some light on this:

Structuralism argues that the structure of language itself produces ‘reality’. That homo sapiens (humans) can think only through language and, therefore, our perceptions of reality are determined by the structure of language. The source of meaning is not an individual’s experiences or being but signs and grammar that govern language. Rather than seeing the individual as the center of meaning, structuralism places the structure at the center. It is the structure that originates or produces meaning, not the individual self. Meaning does not come from individuals but from the socially constructed system that governs what any individual can do.

Derrida’s ideas obviously rejected the notions put forth by Structuralism. Derrida’s ideas support pluralism. There is no outside-text doesn’t mean that there is no text for us to process. It means that the text can be interpreted in multiple meaningful ways. And of course, this does not mean that all of them valid. This would be the idea of relativism.  As Derrida said, meaning is made possible by relations of words to other words within the network of structures that language is. The different meanings generated through deconstruction (pluralism) are meaningful to those who generated them. This idea is something that we need to bring back into “the front” of Systems Thinking. Derrida invites us to dissolve the hierarchy of the whole in the system that you have created, and look at the part that you have marginalized in your system. When we view the part from another perspective, we suddenly realize that the center of our system does not align with the center of the new different view.

I will finish with wise words from Richard Rorty:

There is nothing deep down inside us except what we have put there ourselves.

The corollary of course is- there is nothing out there giving us meaning or purpose, except that which we have constructed ourselves.

Please maintain social distance and wear masks. Stay safe and Always keep on learning…

In case you missed it, my last post was When a Machine Breaks…: