What Actually Is ‘Information’ — And How Does The Visual System Encode It?

9 min readSep 13, 2021

Where is the ‘information’ in what you see? Credit: Getty Images

We take in information from the world around us through our five senses. All the information the brain has access to and bases decisions on are the result of each sensory system taking in some form of energy from the environment and converting it — transducing it — into a set of neural signals the brain can interpret and understand. But what exactly is ‘information’. How can it be defined in a quantitative way? And what does the brain do with it? Even more fundamentally, how do the sensory systems and the brain actually ‘take in’ information in the first place? Let’s explore these questions in the context of the visual system.

From light to sight

Lining the back of both eyes is the neural sensory retina, a thin tissue paper-like piece of the brain that extends out to your eyes. It is made up of functionally distinct layers of highly specialized neurons and other types of neural cells. The retina in each eye is actually a part of the brain itself, literally an extension of the brain that reaches out to each eye, connected to the rest of the brain via the optic nerves.

The collection of photons that make up the light that you see enter each eye through the pupils and are focused onto each retina by the optics of the eye, the cornea and lens. If your eyes can’t do that on their own, and the focal point is off the surface of the retina, your vision is blurry. Glasses and contact lenses add extra optical layers to correct the bending of light so that the focus is as sharp as possible on the retinas.

At its simplest, photons are tiny little discrete packets of energy. When incoming photons are ‘absorbed’ by the retina, what that really means is that they interact in a very specific way with photoreceptor neurons that make up one of the layers of the retina. Photoreceptors contain a specialized molecule called a photopigment. The photopigments are responsible for converting photons into electrical and chemical signals that represent the internal language of the brain. This conversion process is called phototransduction.

When a photon of light is absorbed by a photopigment molecule in a photoreceptor, the energy in the photon is used to break a specific chemical bond in the photopigment. When that chemical bond breaks, the photopigment ‘relaxes’ and changes shape. It’s what’s referred to as a confirmation change in the structure of the molecule.

Imagine a molecule of photopigment like a set of tinker toys made up from different connector pieces. If you remove just the right connector piece, you can move and bend and change the shape in new ways. In other words, it has additional degrees of freedom. It’s a similar thing when the energy from a photon break a specific type of chemical bond in photopigment molecule.

The consequence of a photopigment changing shape is it sets off a series of biochemical reactions in the photoreceptor that signals the arrival of the photon. That message then gets passed on to the other neurons in the retina, and eventually it travels to other parts of the brain that are responsible for putting all that visual information together in order to create a mental model of the outside physical world.

Everything you see and experience, the entire richness of the visual world around you, always begins as the breaking of that one specific chemical bond in photoreceptors the exact same way. Billions and billions of times over.

But notice how the actual phototransduction event itself is very stereotyped. The same thing happens in a completely predictable way. It seems rather repetitive and boring even. So where exactly then, is all the ‘information’ your visual system takes in? How is it encoded by these events? After all, despite the complexity of the brain, ultimately its ability to process sensory information into meaningful mental models and decisions of the outside world is dependent on the richness of the raw information it has access to in the first place.

What exactly is ‘information’ and how can it be measured?

In 1948 an engineer by the name of Claude Shannon at Bell Labs published a paper that built on previous work by Harry Nyquist and Ralph Hartley dating back to the 1920’s. The ideas that emerged are the basis of an entire field of study today called information theory.

Information theory is so important, that the entire modern digital world stands on its shoulders. Subsequent seminal work by Alan Turing, Norbert Weiner, and John von Neumann, among others, resulted in modern day computers and eventually the internet.

The genius behind Shannon’s ideas was the separation of the meaning or semantics of a message from the physical processes that carried it. The result is a mathematically objective measure of the amount of information in a message — independent of what the message actually means.

The actual meaning of a message is a subjective interpretation on the part of the receiver of the message. Subjective because the amount of value one receiver over another is able to extract from a message will vary. For example, a physics student listening to a lecture on quantum mechanics will (hopefully) derive significant value from the lecture. But a small child presented with the same ‘information’ would get a lot less out of it.

‘Bits’ of information are child’s play

The key idea behind how to measure information in the absence of caring about the meaning of a message is best understood by using a game as an example. The rules of the game are similar to those of ’20 questions’: Imagine you’re given a set of possible messages (the message space) to choose from — where one of the messages is the ‘chosen’ or ‘correct’ message determined by another player. How many ‘yes’ or ‘no’ questions do you need to ask in order to arrive at the ‘right’ answer, i.e. the chosen message out of the set.

For example, assume the message space is the set of numbers from one to ten. The other person picks one number and your job is to ask the minimum number of ‘yes’ or ‘no’ questions needed to figure out what number the other player decided to pick. Say it was the number ‘seven’. You could start by asking “is it one?”, to which the other player would answer “no”. You could then ask “is it two?”, and so on until you hit on the right number.

But this is a very inefficient way of asking ‘yes-no’ questions that results in you having to potentially ask more questions than are actually needed. You could get lucky and hit on the right answer with one question, say if the number chosen was ‘one’ and you started at the bottom. Or four questions if like in the original example the answer was ‘seven’ and you started at top with ‘ten’ and worked your way down. But it would take seven questions if you started at the bottom with ‘is it one?’. And as many as ten questions if the chosen number was ‘ten.

A more efficient way to ask ‘yes-no’ questions is to first ask “is the number greater than five?”. If yes then you can ask “is it greater than seven” for example. In this case if the answer was seven the answer would be ‘no’ and you could then ask “is it six”. Again the answer would be ‘no’ and that would be it. You would have your answer and know that the correct number was ‘seven’. As above, it’s a process of elimination via ’yes-no’ questions, but more efficient by asking questions that progressively half the message space until there is no more uncertainty in the possible choices of messages.

In this example it took three ‘yes-no’ questions. On average if you play the game many times it takes 2.5 questions to reliably guess the right answer. One ‘yes-no’ answer gives you one bit of information. In fact, a ‘bit’ is precisely the fundamental unit of information Shannon used.

In this example one gains three bits of information — the answers to the three ‘yes-no’ questions. But note how it doesn’t matter what the message space is, or what the actual chosen message is, the game can be played the same way independently of the physical carrier of the information. For example, choosing a number between one and ten, coin flips, written words and sentences in a book, or the electrical on-off switching in the billions of transistors that make up a computer are all the same. Information — regardless of the meaning of the message — can be measured in units of bits.

Information and statistics

Shannon also realized that not all possible messages within a message space need be necessarily equally likely. In other words, they don’t all have to occur with the same probability.

In the ‘pick-a-number between one and ten’ game there was actually an implicit assumption about the probability of each message occurring — about the probability distribution of the message space. Namely, that the probability distribution for all possible answers was uniform. This means that each number has an equal likelihood of occurring, a 1/10 chance of being chosen as the message.

But what if the other player told you ahead of time that the probability of them choosing the number ‘two’ as the answer was a 73%. That would clearly affect how you play the game and how you would ask ‘yes-no’ questions. In this case it makes the most sense to begin by asking ‘is it a two?’ Almost three quarters of the time you would be done, and would only require one bit of information to arrive at the answer. In the event that it isn’t ‘two’, and in the absence of any further additional information ahead of time, it would then make sense to go back to the previous strategy of progressively halfling the message space.

Shannon formalized all these ideas mathematically, meaning he was able to write down equations that captured these concepts in a way that would allow one to calculate the number of bits or amount of information produced by a given scenario, which he called the information entropy. From this many other important ideas about information followed.

Surprise!

The key takeaway is that in all cases ‘information’ is tied to the amount that one learns about the message space. Shannon was able to mathematically define information based on the amount of surprise and uncertainty one has about the message and the message space. The more uncertainty about the message space, the greater the number of choices and the more you ‘learn’ as you narrow down what the actual message is. Always by using ‘yes-no’ questions, independent of the semantics of the messages themselves. As Shannon put it: ‘Information is the resolution of uncertainty’.

There is no information to be gained in a predictable event. Literally. There is no uncertainty.

Visual information

With this basic notion of what information is, and how it can be measured and quantified, it makes sense — albeit a bit counterintuitively — that there is actually no information in a single phototransduction event induced by a single photon interacting with the retina. Every photon does the exact same thing; it breaks the same chemical bond at the same place in the same molecules. There is no surprise or resolution of uncertainty.

But spread across the entire retina across all photoreceptors, there certainly is information in the dynamically changing patterns of stimulated photoreceptors as a function of the photons being reflected or emitted by whatever it is you are looking at. The information contained in the visual message is the result of a photoreceptor population phenomenon across both space and time (where photons hit the retina and when). It is a dynamic spatiotemporal encoding of the physical world. In this case there is a lot of uncertainty in the messages and a lot of messages to chose from. You can learn a lot by observing the spatiotemporal patterns, even though the mechanisms producing that information on their own do not individually carry any information beyond a single bit.

Physiologically, the visual system is engineered this way because it only needs to be able to do one single thing over and over again, and therefore only needs to be designed one way. Yet at the same time, its able to encode a rich continuum of information represented by a message space that is so large that it would be essentially impossible to explicitly encode it a collection of distinct events. Engineering at its very best.

This article was originally published on Forbes.com. You can check out this and other pieces written by the author on Forbes here.