Language, Semantic Hypergraphs, and the Exploration of Information
Language as Liberator and Jailer
"The limits of my language mean the limits of my world," wrote Ludwig Wittgenstein in 1922, capturing a profound paradox of human cognition. Language is our greatest tool for abstract thought – by naming and describing things, we can contemplate concepts far removed from immediate experience. At the same time, however, language can act as a cognitive cage. The Sapir–Whorf hypothesis, originating from the work of Edward Sapir and Benjamin Lee Whorf, argues that our perceived reality is channelled by linguistic categories: we notice and recall what our language has words for, and overlook or even fail to perceive distinctions our language does not make.
In my experience, this is both empowering and unsettling. On one hand, acquiring a new word or definition can feel like putting on a pair of augmented-reality glasses that reveal a pattern I couldn't see before. On the other hand, I often wonder what I'm blind to – what nuances or phenomena slip by simply because my native language or conceptual scheme doesn't carve them out. Language enables us to think about things never directly sensed (like quantum physics or distant galaxies), yet it also filters what we do sense, fitting the blooming, buzzing confusion of reality into familiar, pre-set bins.
The very act of labelling something "X" highlights certain features of it while obscuring others. Thus, we live every day inside this paradox: the richer our language and concepts, the more we can think, but the less we directly see. Wittgenstein and Whorf together frame the central enigma: we need abstraction to handle the world, but in doing so, we distort and constrain our perception of that world.
Reality as Information (and Why We Must Compress It)
I align with the view that reality isn’t inherently composed of solid objects and straightforward facts at all, but of information (a system without any human concept of ‘structure’). Modern physics and philosophy increasingly entertain this – that every particle, every field, every interaction encodes bits of reality. For a biological brain trying to survive, that’s a daunting prospect, and computationally expensive (probably impossible) to process. The raw sensory input we receive every waking second – photons hitting retinas, air pressure waves in ears, chemicals on tongues and skin – is astronomically complex. If we treated every tiny variation as equally important, we’d drown in data. Compression is the evolutionary answer to this deluge of information. Much as a computer compresses an image by discarding superfluouss pixels and focusing on patterns, our perceptual systems compress reality by focusing on the features that mattered for survival in our ancestral environment (edges, movements, facial configurations, familiar sounds) and ignoring the rest.
Compression allows abstraction: we group individual stimuli into categories (“that’s a tree” – we don’t register every leaf), and we summarise detailed experiences as gist (“it was a nice day”). By compressing, we dramatically reduce the mental representation we have to deal with, making cognition tractable. Without this, a simple act like walking through a forest would be an overwhelming combinatorial explosion of processing – every leaf on every tree presenting a new pattern to analyse, every step offering infinite possible foot placements. Compression tames the explosion by smoothing over detail and retaining only key regularities. Yet, this very process introduces a kind of irrationality into our view of the world. What we call “irrational” or “random” is often simply what doesn’t fit our mental model – the data our compression threw away or never captured. If reality in its fullness is incredibly high-dimensional, any low-dimensional schema we impose will occasionally be confounded. Those confounding bits – the paradoxes, the surprises, the outliers – appear to us as irrational or chaotic. In truth, they may be perfectly “rational” in a richer model of reality, but from our compressed vantage, they’re noise.
I find this subjective aspect fascinating: each of us compresses reality in a slightly different way, based oon our language, culture, and personal experiences. That means my version of “reality” is not identical to yours – it’s slanted by the information my mind has decided to keep or ignore. Multiply that by all of humanity, and it’s actually remarkable we agree on as much as we do. Much conflict and miscommunication might stem from differences in these hidden compression algorithms. What looks obviously true or reasonable to one person can seem absurd to another, because each is using a different compressed model of the situation. Thus, the appearance of irrationality often boils down to incompatible compressions: reality offers far more possible interpretations than any one mind can handle, so each of us picks a manageable subset. We then gape at how others could be so “irrational” – not realising they’ve simply picked a different subset. In a sense, subjectivity itself is a consequence of compression: more ways of compressing information yield more ways of perceiving and interpreting the world. The upside is diversity of thought; the downside is perennial incomprehension.
The “Nice Chair” Problem: Simple Words, Complex Realities
To make this more concrete, consider something as innocuous as calling an object a “nice chair.” This two-word phrase rolls off the tongue as if it were the simplest thing in the world – a basic label plus a basic descriptor. We all roughly know what it means. But peel back the layers, and “nice chair” explodes into a web of subconcepts and hidden criteria. What makes a chair a chair? At first pass: a piece of furniture with a seat, a back, legs, meant for a person to sit. Yet think of all the borderline cases: a stool (no back – still a chair or not?), a beanbag (no legs), a throne (is that a chair or a category of its own?), a car seat, a tree stump someone sat on. The category “chair” doesn’t come from nature; it’s a convenient compression over a range of objects that share a loose function and form. We learn the prototype (the archetypal chair shape) and then extend or contract the category as needed, often with cultural consensus guiding us. Now, add “nice” – what is a “nice” chair? Nice could refer to comfort (ergonomic design, plush cushions), aesthetics (pleasing colour, stylish form), context (it’s nice for a particular purpose or fits the room’s decor), or even personal sentiment (“my grandmother’s old rocker is a nice chair to me, though it’s shabby”). Each of these aspects – comfort, aesthetics, context, sentiment – is itself a bundle of further sub-criteria (comfort involves padding, lumbar support, material temperature; aesthetics involves shape, colour harmony, craftsmanship, era style; and so on).
When I casually say “That’s a nice chair,” I am performing an act of mental compression. My mind has taken into account a host of sensory inputs (the chair’s appearance, texture, etc.), compared them to my latent knowledge (memories of other chairs, my learned standards of comfort and beauty), and even considered situational factors (maybe I’ve been looking for a chair for my office, and this one fits the bill). All of that multifaceted data gets boiled down into a simple judgment: nice chair. The phrase conveys a lot in a compact way – which is why language is powerful – but it also hides the structure of the evaluation. In truth, “nice chair” is not a single atomic idea but a hyperstructure: a constellation of interrelated mini-concepts (chair-ness, niceness, comfort, style, purpose, personal preference) that our brain has bound together. We don’t experience all those pieces separately because our conceptual compression presents them to consciousness as one coherent package. This is efficient – we quickly communicate and think about the chair without unpacking every detail – but it comes at the cost of transparency. If someone from another culture or era had a completely different notion of furniture or beauty, “nice chair” might not translate at all, or could trigger a very different impression, because the underlying web of subconcepts isn’t shared.
This illustrates how even our intuitive, everyday categories are in fact complex composites. Each simple word is like a zip file: small on the outside, but containing a trove of information when uncompressed. The more we look into seemingly basic concepts (like “chair”), the more we discover they have no simple definition – only family resemblances and probabilistic clusters of features. That complexity is normally hidden from us because, in fluent thought and speech, we manipulate these compressed packages without needing to inspect their contents. Only when misunderstandings occur (“Oh, that’s what you call a chair?” or “You think that is nice?”) do we realise that what seemed like a straightforward label actually rested on a heap of implicit assumptions. In short, our intuitive categories keep reality’s tangled richness under the hood, giving us a clean user interface of words – a necessity for quick thinking, but a kind of illusion of simplicity overlying complexity.
Babies as Reality Natives
If adults are prisoners of their own compressions, then who are the true “reality natives” among us? I would argue: babies. Infants come into the world with astonishing openness. They have very few concepts and no language at birth – and thus, very little compression of sensory experience. Developmental psychologists sometimes describe infant perception as “blooming, buzzing confusion,” though that phrase (originally William James’s) might undersell the sophistication of babies. It’s not that babies can’t make sense of anything; it’s that they haven’t learned what not to pay attention to yet. Imagine experiencing a scene without knowing the names or functions of anything – your attention might flit to the play of light on the wall, the minute fluctuations in your parent’s voice tone, the feel of your own toes, with equal interest in all. In a sense, infants are processing more of reality’s raw information than adults do. Their young brains are superplastic general-purpose learning machines, taking in statistical patterns of sounds, sights, and touches, trying to find any regularity. This means infants’ mental model of the world is high-entropy and high-detail; it hasn’t been simplified by years of pruning irrelevant data.
Neuroscience offers evidence of this. At birth, babies have around 100 billion brain cells (neurons). Newborns start life with a surplus of neural connections. In the first few years, the brain forms synapses at a furious pace, far outnumbering what an adult brain will have. In the early years, particularly during the first year of life, the brain forms synapses at an extraordinary rate—about one million new synapses every second. By age 2 or 3, infants have up to about 15,000 synapses per neuron, far higher than in adulthood. During this period, the total number of synapses in the brain reaches into the trillions. Then, gradually, the child’s brain prunes away nearly half of those connections, preserving and strengthening the pathways that are used frequently and trimming those that aren’t. As children grow, the brain undergoes synaptic pruning: about 50% of these surplus connections are eliminated between ages 2 and 10. This synaptic pruning is essentially compression in action – the brain is reducing redundancy and streamlining itself based on the child’s experiences, which are teaching it what’s important in its environment. This streamlines the neural network, leaving the adult human brain with approximately 100 billion neurons and generally fewer synapses per neuron—often estimated as up to 15,000 connections per neuron, with the overall synapse count substantially lower than during early childhood. Before pruning, the infant brain is, intriguingly, more chaotically connected (which some researchers liken to a higher-entropy state). Infants are known to be sensitive to distinctions that adults simply can’t detect. For example, up to about 6–8 months of age, a baby can discriminate between speech sounds (phonemes) from virtually any language – even ones their parents do not speak. A 7-month-old raised in an English-speaking home can hear the difference between two similar Hindi consonants that an English-speaking adult cannot tell apart. By one year old, that ability narrows: the baby, now tuned to English, has lost the sensitivity to certain foreign contrasts. In losing it, she gains efficiency – her brain now more sharply distinguishes the sounds that matter in her native language, which will help her learn words faster – but at the cost of a more universal ear. The same happens in vision: young infants can distinguish individual monkey faces or upside-down faces better than adults can, an ability that wanes as they specialise in the types of faces (upright human ones, usually) they encounter most. Psychologists call this process categorical perception or perceptual narrowing – the broad, high-fidelity perception is sacrificed for a categorical, task-optimised perception.
I often think of babies as experiencing a kind of zen state of perception that adults find hard to emulate. Everything is novel and interesting; attention is captured by tiny details; there is no strong filter telling them “ignore this, it’s irrelevant.” Of course, there are downsides: babies also don’t know what to do with all that information. They have little control or foresight. In computational terms, they have enormous data throughput but not much executive program to direct it. They are exploring reality in its raw form, soaking in entropy. They are not yet exploiting that information efficiently. In a way, they are the antithesis of a seasoned expert or an adult on autopilot. Adults see through a glass darkly – or rather, see through a glass narrowly – whereas babies see less sharply but far more widely. This is why I call them reality natives: they inhabit the uncategorized truth of sensory experience more fully. They haven’t learned the “tricks” to compress and shortcut their interpretation of the world – tricks that will later help them survive and tie their shoes and solve math problems, but also tricks that will lock them into a certain mode of seeing.
Development: Trading Truth for Efficiency
As we grow from infancy into childhood and then adulthood, we undergo a journey of progressive compression. Each stage of development involves further trimming down possibilities and solidifying certain interpretations of the world. This is not just a metaphor – it happens on multiple levels. Neurologically, as mentioned, synaptic pruning sculpts the brain’s circuits, cutting back the exuberant branches of neurons into efficient, specialised networks optimised for the environment. Behaviorally, we pass through critical periods – windows of development where we readily absorb certain kinds of information (like language, phonetics or visual depth cues) and after which it becomes much harder to learn those fundamentals. By the time the window closes, the brain has, in essence, “decided” on a configuration for that domain of knowledge, thereby compressing the range of perception. A classic example is language accents: a child can learn to speak any language like a native if exposed early enough, but an adult learner of a new language usually retains an accent. The adult brain’s speech sound categories have been cemented; it compresses all foreign sounds into the nearest native categories, losing subtle distinctions – an efficient strategy for a monolingual environment, but a handicap in a new language setting.
We also see this truth-for-efficiency trade-off in how children start to categorise and stereotype as they learn. Young kids might call every four-legged animal “doggy” at first; over time, they compress the concept space into finer categories (“dog” vs “cat” vs “cow”), and eventually even subcategories (“Dalmatian” vs “Labrador”). In each case, once a category is formed, individual differences among members of that category become less salient. A cow is a cow; you don’t marvel at each cow’s unique pattern of spots when you’re a farmer trying to milk the herd every day. Likewise, as we mature, we develop mental schemas and heuristics that speed up decision-making: we have scripts for how a restaurant visit goes, stereotypes (accurate or not) about social roles, and expectations about cause and effect. These are all compressions – they summarise typical patterns so we don’t have to reason from scratch in every situation. The benefit is obvious: speed and efficiency. We make snap judgments, and usually they’re serviceable. The cost is that we overlook nuance and sometimes override reality with our assumptions.
Cognitive biases are a well-known manifestation of this. Take something like the availability heuristic – we judge the likelihood of events by how easily examples come to mind. This is a mental compression tactic: instead of painstakingly calculating probabilities, our brain uses a shortcut (if I can think of many examples quickly, it must be common). Most of the time, it’s not a bad rule of thumb, but it can lead to serious errors (e.g. fearing plane crashes more than car crashes because plane crashes are more dramatic and memorable, even though they’re rarer). Or consider confirmation bias – we notice and remember information that fits our pre-existing beliefs more than information that contradicts them. That’s a form of compressing the incoming information to fit an established model (which is easier than constantly reshuffling our model). Again, efficient in the short run, but it means we effectively delete data that might be important or true, hence ending up with warped beliefs. These biases illustrate adults “trading truth for efficiency” on the fly. The world often presents more complexity or contradiction than we bother to absorb, because doing so would require updating our mental database (expensive) or living with uncertainty (uncomfortable). It’s usually more convenient to stick with the compressed version of reality we already have and force-fit new experiences into it.
Even our perceptions can be tricked by this principle. Visual illusions exploit our brain’s compressed interpretations. For example, we perceive a series of still images as fluid motion (the basis of film) because our visual system compresses time and fills in gaps – an efficient summary that normally serves us well but can be deceived. We “see” a continuous world, not the saccadic jumps and blind spots that our eyes actually deliver, because the brain edits out the noise and stitches together a stable picture. It is remarkable how much of what we take for reality is actually the brain’s best guess after heavy editing. As adults, we are constantly using predictive coding – our brain’s internal model tries to predict sensory inputs, and only the deviations (prediction errors) get flagged for attention. This is compression too: if nothing surprising is happening, we effectively ignore a lot of the raw input because it’s deemed redundant (exactly what a compression algorithm does with predictable data). Walk down a familiar street and you likely won’t notice half the details you’d see if you were visiting it for the first time; your brain says “I know this scene” and relegates it to the background. The truth (the street has new cracks in the pavement, the sky is a slightly different shade today, etc.) is sacrificed for the efficiency of not processing every bit every time.
It might sound as if I’m painting this in a negative light, but it’s a deeply necessary trade-off. Without it, we couldn’t function. An adult who somehow retained a baby-like openness to all information might find themselves paralysed by indecision or overwhelmed by stimuli. In fact, certain neurodivergent conditions give us glimpses of what happens when filtering is impaired – for instance, some individuals with autism report being unable to habituate to sounds or lights that others tune out, leading to sensory overload. The typical adult brain’s heavy use of compression is what allows us to act quickly and decisively in a complex world. We respond to the gist, not the full data. We operate on concepts, not raw perceptions. As we develop, we steadily move from an exploratory mode (open to new information, constantly adjusting our mental compression algorithms) to an exploitative mode (using the established algorithms to navigate life with minimal fuss). In childhood, exploration dominates – hence kids’ incredible learning capacity and creativity, but also their naïveté and need for guidance. In adulthood, exploitation of what’s learned dominates – hence our efficiency and expertise, but also our rigidity and occasional blindness to new insights. We gain speed and lose flexibility; we gain knowledge and lose some wonder. This is the compression bargain at the heart of growing up.
When I casually say "That's a nice chair," I am performing an act of mental compression. My mind has taken into account a host of sensory inputs (the chair's appearance, texture, etc.), compared them to my latent knowledge (memories of other chairs, my learned standards of comfort and beauty), and even considered situational factors (maybe I've been looking for a chair for my office, and this one fits the bill). All of that multifaceted data gets boiled down into a simple judgment: nice chair. The phrase conveys a lot in a compact way – which is why language is powerful – but it also hides the structure of the evaluation. In truth, "nice chair" is not a single atomic idea but a hyperstructure: a constellation of interrelated mini-concepts (chair-ness, niceness, comfort, style, purpose, personal preference) that our brain has bound together. We don't experience all those pieces separately because our conceptual compression presents them to consciousness as one coherent package. This is efficient – we quickly communicate and think about the chair without unpacking every detail – but it comes at the cost of transparency. If someone from another culture or era had a completely different notion of furniture or beauty, "nice chair" might not translate at all, or could trigger a very different impression, because the underlying web of subconcepts isn't shared.
This illustrates how even our intuitive, everyday categories are in fact complex composites. Each simple word is like a zip file: small on the outside, but containing a trove of information when uncompressed. The more we look into seemingly basic concepts (like "chair"), the more we discover they have no simple definition – only family resemblances and probabilistic clusters of features. That complexity is normally hidden from us because, in fluent thought and speech, we manipulate these compressed packages without needing to inspect their contents. Only when misunderstandings occur ("Oh, that's what you call a chair?" or "You think that is nice?") do we realise that what seemed like a straightforward label actually rested on a heap of implicit assumptions. In short, our intuitive categories keep reality's tangled richness under the hood, giving us a clean user interface of words – a necessity for quick thinking, but a kind of illusion of simplicity overlying complexity.
Babies as Reality Natives
If adults are prisoners of their own compressions, then who are the true "reality natives" among us? I would argue: babies. Infants come into the world with astonishing openness. They have very few concepts and no language at birth – and thus, very little compression of sensory experience. Developmental psychologists sometimes describe infant perception as "blooming, buzzing confusion," though that phrase (originally William James's) might undersell the sophistication of babies. It's not that babies can't make sense of anything; it's that they haven't learned what not to pay attention to yet. Imagine experiencing a scene without knowing the names or functions of anything – your attention might flit to the play of light on the wall, the minute fluctuations in your parent's voice tone, the feel of your own toes, with equal interest in all. In a sense, infants are processing more of reality's raw information than adults do. Their young brains are superplastic general-purpose learning machines, taking in statistical patterns of sounds, sights, and touches, trying to find any regularity. This means infants' mental model of the world is high-entropy and high-detail; it hasn't been simplified by years of pruning irrelevant data.
Neuroscience offers evidence of this. At birth, babies have around 100 billion brain cells (neurons). Newborns start life with a surplus of neural connections. In the first few years, the brain forms synapses at a furious pace, far outnumbering what an adult brain will have. In the early years, particularly during the first year of life, the brain forms synapses at an extraordinary rate—about one million new synapses every second. By age 2 or 3, infants have up to about 15,000 synapses per neuron, far higher than in adulthood. During this period, the total number of synapses in the brain reaches into the trillions. Then, gradually, the child's brain prunes away nearly half of those connections, preserving and strengthening the pathways that are used frequently and trimming those that aren't. As children grow, the brain undergoes synaptic pruning: about 50% of these surplus connections are eliminated between ages 2 and 10. This synaptic pruning is essentially compression in action – the brain is reducing redundancy and streamlining itself based on the child's experiences, which are teaching it what's important in its environment. This streamlines the neural network, leaving the adult human brain with approximately 100 billion neurons and generally fewer synapses per neuron—often estimated as up to 15,000 connections per neuron, with the overall synapse count substantially lower than during early childhood. Before pruning, the infant brain is, intriguingly, more chaotically connected (which some researchers liken to a higher-entropy state). Infants are known to be sensitive to distinctions that adults simply can't detect. For example, up to about 6–8 months of age, a baby can discriminate between speech sounds (phonemes) from virtually any language – even ones their parents do not speak. A 7-month-old raised in an English-speaking home can hear the difference between two similar Hindi consonants that an English-speaking adult cannot tell apart. By one year old, that ability narrows: the baby, now tuned to English, has lost the sensitivity to certain foreign contrasts. In losing it, she gains efficiency – her brain now more sharply distinguishes the sounds that matter in her native language, which will help her learn words faster – but at the cost of a more universal ear. The same happens in vision: young infants can distinguish individual monkey faces or upside-down faces better than adults can, an ability that wanes as they specialise in the types of faces (upright human ones, usually) they encounter most. Psychologists call this process categorical perception or perceptual narrowing – the broad, high-fidelity perception is sacrificed for a categorical, task-optimised perception.
I often think of babies as experiencing a kind of zen state of perception that adults find hard to emulate. Everything is novel and interesting; attention is captured by tiny details; there is no strong filter telling them "ignore this, it's irrelevant." Of course, there are downsides: babies also don't know what to do with all that information. They have little control or foresight. In computational terms, they have enormous data throughput but not much executive program to direct it. They are exploring reality in its raw form, soaking in entropy. They are not yet exploiting that information efficiently. In a way, they are the antithesis of a seasoned expert or an adult on autopilot. Adults see through a glass darkly – or rather, see through a glass narrowly – whereas babies see less sharply but far more widely. This is why I call them reality natives: they inhabit the uncategorized truth of sensory experience more fully. They haven't learned the "tricks" to compress and shortcut their interpretation of the world – tricks that will later help them survive and tie their shoes and solve math problems, but also tricks that will lock them into a certain mode of seeing.
Development: Trading Truth for Efficiency
As we grow from infancy into childhood and then adulthood, we undergo a journey of progressive compression. Each stage of development involves further trimming down possibilities and solidifying certain interpretations of the world. This is not just a metaphor – it happens on multiple levels. Neurologically, as mentioned, synaptic pruning sculpts the brain's circuits, cutting back the exuberant branches of neurons into efficient, specialised networks optimised for the environment. Behaviorally, we pass through critical periods – windows of development where we readily absorb certain kinds of information (like language, phonetics or visual depth cues) and after which it becomes much harder to learn those fundamentals. By the time the window closes, the brain has, in essence, "decided" on a configuration for that domain of knowledge, thereby compressing the range of perception. A classic example is language accents: a child can learn to speak any language like a native if exposed early enough, but an adult learner of a new language usually retains an accent. The adult brain's speech sound categories have been cemented; it compresses all foreign sounds into the nearest native categories, losing subtle distinctions – an efficient strategy for a monolingual environment, but a handicap in a new language setting.
We also see this truth-for-efficiency trade-off in how children start to categorise and stereotype as they learn. Young kids might call every four-legged animal "doggy" at first; over time, they compress the concept space into finer categories ("dog" vs "cat" vs "cow"), and eventually even subcategories ("Dalmatian" vs "Labrador"). In each case, once a category is formed, individual differences among members of that category become less salient. A cow is a cow; you don't marvel at each cow's unique pattern of spots when you're a farmer trying to milk the herd every day. Likewise, as we mature, we develop mental schemas and heuristics that speed up decision-making: we have scripts for how a restaurant visit goes, stereotypes (accurate or not) about social roles, and expectations about cause and effect. These are all compressions – they summarise typical patterns so we don't have to reason from scratch in every situation. The benefit is obvious: speed and efficiency. We make snap judgments, and usually they're serviceable. The cost is that we overlook nuance and sometimes override reality with our assumptions.
Cognitive biases are a well-known manifestation of this. Take something like the availability heuristic – we judge the likelihood of events by how easily examples come to mind. This is a mental compression tactic: instead of painstakingly calculating probabilities, our brain uses a shortcut (if I can think of many examples quickly, it must be common). Most of the time, it's not a bad rule of thumb, but it can lead to serious errors (e.g. fearing plane crashes more than car crashes because plane crashes are more dramatic and memorable, even though they're rarer). Or consider confirmation bias – we notice and remember information that fits our pre-existing beliefs more than information that contradicts them. That's a form of compressing the incoming information to fit an established model (which is easier than constantly reshuffling our model). Again, efficient in the short run, but it means we effectively delete data that might be important or true, hence ending up with warped beliefs. These biases illustrate adults "trading truth for efficiency" on the fly. The world often presents more complexity or contradiction than we bother to absorb, because doing so would require updating our mental database (expensive) or living with uncertainty (uncomfortable). It's usually more convenient to stick with the compressed version of reality we already have and force-fit new experiences into it.
Even our perceptions can be tricked by this principle. Visual illusions exploit our brain's compressed interpretations. For example, we perceive a series of still images as fluid motion (the basis of film) because our visual system compresses time and fills in gaps – an efficient summary that normally serves us well but can be deceived. We "see" a continuous world, not the saccadic jumps and blind spots that our eyes actually deliver, because the brain edits out the noise and stitches together a stable picture. It is remarkable how much of what we take for reality is actually the brain's best guess after heavy editing. As adults, we are constantly using predictive coding – our brain's internal model tries to predict sensory inputs, and only the deviations (prediction errors) get flagged for attention. This is compression too: if nothing surprising is happening, we effectively ignore a lot of the raw input because it's deemed redundant (exactly what a compression algorithm does with predictable data). Walk down a familiar street and you likely won't notice half the details you'd see if you were visiting it for the first time; your brain says "I know this scene" and relegates it to the background. The truth (the street has new cracks in the pavement, the sky is a slightly different shade today, etc.) is sacrificed for the efficiency of not processing every bit every time.
It might sound as if I'm painting this in a negative light, but it's a deeply necessary trade-off. Without it, we couldn't function. An adult who somehow retained a baby-like openness to all information might find themselves paralysed by indecision or overwhelmed by stimuli. In fact, certain neurodivergent conditions give us glimpses of what happens when filtering is impaired – for instance, some individuals with autism report being unable to habituate to sounds or lights that others tune out, leading to sensory overload. The typical adult brain's heavy use of compression is what allows us to act quickly and decisively in a complex world. We respond to the gist, not the full data. We operate on concepts, not raw perceptions. As we develop, we steadily move from an exploratory mode (open to new information, constantly adjusting our mental compression algorithms) to an exploitative mode (using the established algorithms to navigate life with minimal fuss). In childhood, exploration dominates – hence kids' incredible learning capacity and creativity, but also their naïveté and need for guidance. In adulthood, exploitation of what's learned dominates – hence our efficiency and expertise, but also our rigidity and occasional blindness to new insights. We gain speed and lose flexibility; we gain knowledge and lose some wonder. This is the compression bargain at the heart of growing up.
Hypergraphs: Capturing Reality's Multidimensional Tangle
I'll now introduce a more formal lens to think about how information can be represented and what it means to compress it. In the world of mathematics and computer science, one useful model for representing complex relationships is the graph: a network of nodes (vertices) connected by links (edges). Traditional graphs, however, have a limitation – each edge connects only two nodes (a pairwise relation). This works for some things (like a family tree linking parent to child), but it struggles to directly represent relations that involve more than two entities at once. Consider the simple sentence, "David Lynch wrote a song with Lykke Li." If we try to map this using a normal graph, we have nodes for David Lynch, Lykke Li, the song, and perhaps the concept "wrote" or "with." A typical knowledge graph (like those used in the semantic web or databases) might break this into binary relations: "David Lynch — wrote → (song)" and "David Lynch — with → Lykke Li" and "song — with → Lykke Li," etc., possibly needing a dummy node for the event of writing. It becomes a bit clumsy, and the true unity of the event ("writing a song together") gets lost in a tangle of multiple pairwise links.
Enter the hypergraph. A hypergraph generalises the idea of an edge: instead of a link between two nodes, a hyperedge can connect any number of nodes in a single relationship. If we had a hyperedge labelled "wrote_song_with" connecting [David Lynch, Lykke Li, song_X], that one entity (the hyperedge) encodes the entire relationship in one chunk. In other words, hyperedges allow multi-way relationships to be represented as first-class citizens, rather than forcing them to be decomposed into many binary pieces. Intuitively, this is more faithful to reality's tangled interactions. Many events or facts are inherently $n$-ary (involve several participants or components). For example, a medical diagnosis might link patient, disease, organ, doctor, date, all in one context; a scientific theory might connect multiple concepts in a single principle. With hypergraphs, we can capture that complexity more directly.
To build the intuition further before touching formality. Imagine a very simple graph representation of concepts: one node per concept, and edges for relationships like "is-a" or "part-of." If I want to represent "a car is a vehicle that has wheels," I might put an edge "is_a" between car and vehicle, and an edge "has_part" between car and wheel. But what about a statement like "A car needs a driver to move"? That involves car, driver, movement – a three-way relationship. In an ordinary graph, I'd have to introduce something like a node "driving" or "operation" and connect car—(driving)→movement, and driver—(driving)→car, etc., effectively reifying the relationship as another node. A hypergraph lets me have a single hyperedge (call it "operates") connecting [driver, car, movement], meaning in one chunk I represent "driver operates car for movement." This is more compact and arguably closer to how we conceptualise the event (as one thing: "driving").
Mathematically, a hypergraph is usually defined as a pair $(V, E)$ where $V$ is a set of vertices and $E$ is a set of hyperedges, each hyperedge being a subset of $V$ (some definitions allow it to be an ordered tuple or multiset for more flexibility). So if $V = \{\text{David Lynch}, \text{Lykke Li}, \text{Song}_X, \text{wrote}, \text{with}\}$, an example hyperedge could be $\{\text{David Lynch}, \text{Lykke Li}, \text{Song}_X, \text{wrote\\_{}with}\}$ signifying that all those elements collectively form a relationship. Some frameworks, like the one we'll discuss shortly, even treat the first element of a hyperedge as a special label or connector (like "wrote_with") and the rest as its arguments – making the hyperedge more like a little sentence structure. In fact, hypergraphs can be recursive: a hyperedge can itself be a vertex in a higher-level hyperedge. This corresponds to the idea of nesting relationships inside larger relationships, which is exactly what language allows. Consider the sentence: "Professor Smith claims that homoeopathy is pseudoscience." In a hypergraph format, one could represent the inner statement "homeopathy is pseudoscience" as a hyperedge (let's denote it $(\text{is homeopathy pseudoscience})$), and then represent the claiming event as another hyperedge linking Professor Smith, the act of claiming, and the inner statement – something like $(\text{claims Professor\\_{}Smith} (\text{is homeopathy pseudoscience}))$. Here we've treated the entire proposition "homoeopathy is pseudoscience" as a single node-like unit (by virtue of being a hyperedge that can plug in as an element of another). This ability to encode facts about facts is critical for modelling knowledge faithfully: we often need to reference a whole idea within a larger idea (as when attributing a statement to someone, or stating a precondition like "When the sky is blue, …").
In a standard graph, representing "Professor Smith claims that homoeopathy is pseudoscience" might require creating an extra node for the claim event, linking Smith to that node, linking the node to the proposition, and so on – a more convoluted structure. Hypergraphs handle it more organically, as a natural extension of the data model. One could say hyperedges reduce dimensionality in the representation because they allow you to encapsulate what would otherwise be an explosion of multiple edges and intermediate nodes into a single higher-dimensional edge. Rather than having to break an $n$-way relationship into $\binom{n}{2}$ pairwise pieces or introduce auxiliary nodes, you keep it as one piece of data. In that sense, hyperedges prevent the representational combinatorial explosion that simple graphs risk when faced with complex relations.
Biological Priors as Axioms of the Mind
Cognitive scientists and linguists often debate the extent to which certain concepts are innate. For example, Noam Chomsky famously argued for an innate universal grammar, suggesting that the rules of syntax might be a priori wiring in our hypergraph for language. Developmental psychologists like Elizabeth Spelke have posited that infants have core knowledge of certain domains: objects (with properties like solidity and continuity), agents (with goals and intentions), numbers (basic arithmetic of small quantities), and spatial relationships. These might be thought of as axiomatic nodes or substructures in the infant mind's hypergraph – they don't need to be learned from scratch because evolution has hard-coded them, given their survival value. A baby, for instance, is surprised if an object seemingly vanishes or teleports (a violation of object permanence), suggesting that the baby's cognitive framework assumes objects continue to exist and move continuously in time and space. That assumption is like an axiom – it's not derived from experience, it's used to interpret experience.
In my compression analogy, these biological priors are like default compression settings that come pre-installed. They determine how raw inputs get organised. If reality is fundamentally information, as proposed, then these priors are like built-in filters that say "this type of information matters – treat it as a foundational entity." For example, the human auditory system is extremely sensitive to patterns that could be speech or music (far more than random noise), indicating a bias to find structured sound meaningful. Babies will preferentially pay attention to human faces and voices over other stimuli; again, a built-in guide for where to allocate their compression budget. Each of these biases effectively reduces the entropy of the input by focusing processing on certain interpretable aspects. From day one, a baby's hypergraph might have special "nodes" for things like face (a pattern of two eyes and a mouth), agent (something that moves on its own and acts), and cause (events often have causes). These become hubs that organise future learning.
I could even speculate that such priors form the core architecture of the semantic hypergraph of an adult mind. As we accumulate experiences, we attach new nodes and edges onto this scaffold. But the scaffold itself shapes what form those experiences take in our understanding. To illustrate, imagine two sentient species with different innate priors encountering the same physical reality – they might build very different mental hypergraphs. One might naturally think in terms of objects and substances (like humans do: tree, rock, water, etc.), another might parse the world in terms of continuous processes and transformations (seeing "growing" or "flowing" where we see "tree" and "water"). Each species compresses reality along the lines of its priors, constructing "axioms" that become self-evident truths to them. For us, it's self-evident that an object cannot be in two places at once – it doesn't even occur to us as a possibility – because our whole cognition is scaffolded on spatiotemporal continuity. But perhaps a being not bound by our 3D spatial intuition might not have that axiom, and would perceive certain quantum-level phenomena (one particle seemingly in two states) more naturally than we do.
In more everyday terms, consider cultural "priors": a child born into a culture provides an interesting mix of nature and nurture. The brain's biological priors are then tuned by the cultural context, which acts like an additional set of near-axioms. A child in a collectivist culture might absorb "family/community is fundamental" as a sort of axiomatic truth about social reality, whereas a child in a highly individualist culture might form the axiom "individual autonomy is fundamental." These core beliefs become almost invisible to us in adulthood – they are just assumed, the background of all thought. Only by encountering a very different culture or mindset might we realise our minds have these hidden axioms, much like one realises a camera had a filter on the lens only when seeing a photo taken without it for contrast.
From the perspective of our semantic hypergraph metaphor, once an axiom node is in place, a lot of knowledge builds outward from it. It's as if the graph cannot be restructured without tugging at those deep nodes (which is why, for example, fundamental worldview changes or paradigm shifts in science are so difficult – they require reexamining "axioms" that had been taken for granted). Our biological priors are usually never violated in normal life, so they're very hard to even notice, let alone unlearn. And yet, they represent information that was selected long ago, which might not be universally and eternally true, just true enough for Earthly survival. This raises an interesting thought: if we could identify and loosen some of these priors, would we perceive more of reality's truth? This question bridges into the domain of information theory and what it says about compression and information loss, and it sets the stage for exploring whether we can ever reverse the compression process. Before that, I'll solidify the link to information theory itself.
Information Theory: Compression and Its Discontents
Claude Shannon, the father of information theory, taught us that information can be measured in bits and that any message with redundancy can be compressed to a shorter message without losing content. However, if a message (or data) is already at maximum entropy (meaning no predictable patterns, all bits equally likely), then it cannot be compressed without losing something. In practical terms, imagine a very long book that's mostly repetitive versus pure random gibberish of the same length. The repetitive book can be summarised (compressed) heavily – you can just state the unique parts and the pattern of repetition. The random gibberish cannot be summarised meaningfully; any compression would throw away some of its unique patterns, because by definition, it has no simple pattern. Now think of reality as an infinite book of data. The job of perception and cognition is to summarise that book in a way that's useful for an organism. From an information-theoretic perspective, our senses and brains are performing a cascade of lossy compression. They find the regularities (the sun rises every day, objects fall downward, ripe fruits tend to be sweet) and they discard the noise (the exact shape of each cloud, the precise number of leaves on a tree, irrelevant one-time coincidences).
When we say compression reduces entropy, it means our internal model of the world is more orderly and predictable than the raw world. A good example is how we perceive categories: if I think of "birds" as a category, I might compress a lot of details about individual birds into some prototypical bird template. That template has less entropy (less variability) than the full distribution of actual birds in nature. I might assume all birds can fly, sing, and build nests in trees – because those are common patterns – and I'll be a bit thrown off by exceptions like ostriches or penguins. My "bird" schema is a compressed code for an entire set of animals, but that compression leaves out some information (like "not all birds fly"). This is analogous to a compressed image losing some sharpness or colour fidelity compared to the raw image.
Perhaps there is a fundamental limit: lossless compression of arbitrary reality is impossible for a finite being. If the universe's information is unbounded or very rich, any finite brain or model has to be selective. We cannot have a one-to-one, fully detailed representation of reality in our heads; it would be as large as reality itself, defeating the purpose of a model. So every representation is necessarily a reduction. Gregory Bateson famously defined information as "a difference that makes a difference." Under that definition, our brains only keep differences that make a difference to us – which is precisely a lossy strategy.
Consider something as mundane as driving a car. An experienced driver compresses the situation to a few key variables – road trajectory, positions of nearby cars, speed relative to the limit, etc. A huge amount of visual detail (the colour of every passing house, the faces of pedestrians, the pattern of clouds) doesn't make it into the driver's decision-making process at all. They're perceived peripherally, perhaps, but not encoded deeply. Now imagine a scenario arises where one of those previously irrelevant details suddenly is relevant – say a pedestrian wearing a nearly camouflaged outfit steps into the road. If our driver's compression was too ruthless (e.g. completely ignoring anything not moving like a car), they might literally not notice until too late. Usually, however, our senses have some safety nets (motion detection, etc.) to catch unusual but important anomalies, albeit still as part of an efficient scheme.
In technical terms, one could say biological perception employs a lossy compression with a bias towards certain kinds of information (like movement, contrast, human voices) which were statistically important for survival. Everything else is background noise we compress away. This compression is dynamic and context-dependent – you compress different aspects when you're in a dark alley at night (where any sound could be a threat) versus a safe park at noon (where you might tune out the voices of others chatting). But in all cases, you are reducing the entropy of your sensory input by focusing on expected patterns. You carry an internal model (with some probability distributions for what you expect to see/hear), and you compress the incoming data by merging it with those expectations. In Bayesian terms, the brain's priors (our model) combined with sensory evidence yield posterior beliefs. The priors themselves are a compressed encoding of past evidence.
One of the consequences of this is that some information is inevitably lost – and that lost information could sometimes be valuable or beautiful or true, but our standard compression just doesn't capture it. I sometimes think about how many subtle phenomena go unnoticed around us: the micro-expressions on strangers' faces, the shifting patterns of bird songs in the morning, the way light reflects off a window for just a second at sunset. These bits of reality don't register because they don't matter to our goals; they are "entropy" that gets shaved off. Information theory tells us that if we suddenly want to retain more detail (increase resolution, so to speak), we need to pay a price: more mental bandwidth, more attention, more memory. We have a limited cognitive budget, so focusing on one thing means compressing or ignoring others even more. This is why, for example, a trained musician hears finer details in a piece of music (they have allocated more of their compression budget to sound nuances), but perhaps that means they are less attuned to something else at that moment.
Another relevant concept here is algorithmic information: the idea that the simplest explanation for data is the best (Occam's razor), which, in information terms, means the shortest program (fewest bits) that can generate the data. This concept, formalised as Kolmogorov Complexity (Kolmogorov, 1965), provides a rigorous measure of the inherent complexity of a string of data based on its most compact representation. When we form concepts and scientific theories, we are basically doing this – looking for a compressed explanation of a vast set of observations. Newton's law of gravitation compresses the motions of planets and apples into a single inverse-square law formula – an incredibly lossy summary of reality (since it ignores all other forces and details), but extremely useful and elegant. Every scientific theory is a kind of compression of empirical information. We judge theories by how much they compress (explain concisely) versus how much they lose (where they fail to predict accurately). In everyday life, our mental models are like personal theories: "People are generally trustworthy" is a theory that compresses a lot of social observations into a simple rule; it might mostly hold, but it will have exceptions that are essentially the information lost in that compression. The more we compress into a general belief, the more exceptions (loss) we get – yet without those general beliefs, we'd be at a loss in making decisions.
So, information theory gives a formal underpinning to the compression paradox: a lower-entropy representation (our thoughts, perceptions) can never capture the full entropy of the source (reality) without being as complex as the source. In practice, we live with approximate representations – maps that are simpler than the territories they represent. A useful map omits almost everything to highlight a few relevant things (roads and towns on a map, not every tree and rock). Our brain maps of reality do the same. The tragedy (if there is one) is that we can become so used to the map that we mistake it for the territory, forgetting how much has been left out.
Language: The Shared Compression Protocol
Circling back to language now, armed with this understanding of compression. Think of language as a compression protocol that is shared among a group of people. Think of two people trying to communicate a complex idea. If they had to transmit the raw data of their thought, it might be impossibly large (imagine sending someone every neuron's firing pattern that corresponds to an idea). Instead, we encode the idea into words – a vastly compressed code. The other person's brain then decodes those words (based on shared definitions and grammar) to reconstruct an approximation of the idea in their own mind. Communication succeeds to the extent that both parties are using the same compression scheme.
Language, in essence, is like an agreed-upon file format for reality. Each word and grammatical rule is a standard feature of that format. When I say "dog," I rely on the assumption that your compression of reality also has a concept "dog" similar enough to mine. If our concepts align, the word triggers roughly the right image or idea in you. If our concepts differ or don't exist (for instance, I use a technical term or a cultural reference you're not familiar with), then the compression fails – you receive the packet "dog" but it doesn't decompress properly on your end.
This is why learning a new language is hard: you not only learn new words (labels), but you also have to adjust to new ways of compressing reality. Languages carve up the world differently. An often-cited example is colour terminology: one language might have one word for what another language splits into two or three colours. Russian, for instance, has separate basic words for light blue ("goluboy") and dark blue ("siniy"), whereas English speakers compress both under "blue" (with qualifiers). Conversely, English splits "snow" and "ice" into different words, whereas some arctic languages have multiple words distinguishing types of snow or ice that English speakers simply don't encode so specifically. These differences mean that when a Russian and an English speaker communicate, if the concept of "blue" comes up, there's a subtle compression mismatch – the English speaker might not intuitively get why two shades of blue would be called entirely different colours. It's as if the Russian has a higher-resolution compression in that colour domain, while the English speaker's is lumpier.
The same goes for more abstract ideas: time, space, emotions, and social relations – all have culturally and linguistically specific compressions. In some languages, there is no distinction between "stranger" and "guest" (they compress that into one concept or don't differentiate roles), whereas in others, those are very different notions. Some cultures compress gender into two categories, others recognise more fluid continua. Each of these is a kind of cognitive compression scheme reflected in language. When we translate or interact across cultures, we often need to decompress concepts into explanatory phrases because there isn't a one-to-one word. For example, a concept like Japanese "wabi-sabi" (aesthetic of imperfect beauty) doesn't have a single English word equivalent; to communicate it, one must expand it out ("a philosophy that finds beauty in imperfection and transience…"). That expansion is basically providing the raw detail that an English speaker's compression protocol doesn't natively carry. Over time, if English speakers find the concept useful, they might borrow the word "wabi-sabi" and start connecting with it themselves.
I've experienced that whenever I learn a new term or concept in any domain, it feels like gaining a chunk that I can now use to compress complexity. For instance, when I first learned the concept of "confirmation bias" (mentioned earlier), it compressed an entire pattern of thinking errors into a tidy label. Instead of recounting a whole scenario ("sometimes people notice evidence that supports their beliefs and ignore evidence that doesn't…"), I could just say "confirmation bias" and instantly, if the listener knows the concept, that whole scenario is activated in their mind. This mutual compression saves so much effort and allows rapid communication of complex ideas – provided the participants share the same conceptual lexicon. When they don't, communication slows down and can fail. It's like trying to open a .docx file with a program that only understands .pdf – misalignment in protocols. So, language aligns our mental hypergraphs to a significant extent.
Language, being a shared compression protocol, means it's also an evolving one. As our collective knowledge and priorities change, we invent new words or let old ones fade. It's like updating the compression algorithm. In recent years, for instance, society has coined terms for concepts that were previously unarticulated in mainstream discourse (from technical terms like "algorithmic bias" to social ones like "microaggression"). Each new term compresses an emergent idea that, before, would have taken a paragraph to explain. That speeds up the discourse but also shapes thought – once the term exists, people more readily identify instances of it (the compression guides perception). This is a microcosm of the Sapir-Whorf effect in real time: tweak the compression language provides, and you tweak what people notice or think possible.
Exploration vs. Exploitation: Learning New Compressions vs. Using the Known
Throughout life, and even within daily cognition, there's a tension between exploration (trying new ways to perceive or categorise) and exploitation (using the trusty ways you already have). In the context of compression, exploration is like testing a new compression algorithm – maybe one that captures some previously ignored patterns – whereas exploitation is running the well-tuned algorithm you know works for current needs. Babies, as we described, are the ultimate explorers. They're constantly adjusting their mental models, effectively re-compressing data in different configurations to see what fits best. Every time a toddler plays with an object in a novel way or babbles a new sound, they are probing the boundaries of their current compression scheme, exploring new possibilities. Social interaction and environmental feedback are crucial here: when a child says a word slightly wrong but a parent corrects them, or when they categorise a whale as a "fish" and someone explains it's a mammal, the child's comprehension gets refined. We could say that through exploration, children search for better compression algorithms that make sense of their sensory world and social world.
This is analogous to how a scientist might play with different hypotheses (models) to see which one best compresses/explains the data, or how, in machine learning algorithms, we allow random exploration (like epsilon-greedy strategies) to find better solutions than the current one. Exploration is typically high in entropy – it involves randomness, variation, tolerating errors and inefficiencies – because only by deviating from the known path can you discover a potentially more optimal path. Children, in their play and curiosity, often exhibit behaviours that seem random or silly, but that's the necessary "entropy injection" to learn. They'll try to fit a square block in a round hole, not because it's efficient, but because testing even "wrong" ideas teaches them about shapes and holes in a way that a purely efficient strategy (only ever matching shapes correctly) might not.
As we accumulate a reliable set of compressions – essentially, once our mental models predict the world well enough to get by – the balance usually shifts towards exploitation. By the time we're adults, we rely mostly on the compressed representations we've formed. We don't question them unless forced to. We gravitate to familiar patterns, jobs we're competent at (a form of higher-order exploitation), and interpretations that align with what we already know. In decision-making literature, this is often discussed in terms of habit and expertise: an expert in a domain has a very honed comprehension for that domain (they see the chessboard and immediately perceive the key patterns, whereas a novice sees a bewildering array of possibilities). That expert will, by default, exploit their compression to make fast moves; only when a novel situation arises that defies their usual pattern do they need to slow down and possibly explore a new tactic.
In personal life too, as one gets older, it's common to feel that time is flying by or that life has become routine. One reason could be that our experience is increasingly filtered through well-worn compressions – we're not forming radically new categories or encountering completely strange stimuli as often, so the brain sort of "fast-forwards" through familiar scenes. (This is why doing something new can make a weekend feel longer and more memorable – novelty forces the brain to go into exploration mode, paying closer attention and perhaps forming new schemas.)
The explore/exploit trade-off is fundamental in learning theory: explore too little and you might miss better ways of doing things; explore too much and you never settle into an efficient routine. Nature seems to have biased us to explore in youth and exploit in maturity, which makes sense: when you're young, you have more time ahead for the payoffs of learning, and you're in more danger from not knowing the environment; when older, you want to capitalise on all that experience and also avoid unnecessary risks. Culturally, this is reflected in how we structure education (broad learning early, specialisation later) and perhaps in the stereotype that young people are more open to new ideas while older folks are more set in their ways.
However, the trade-off continues throughout life in smaller ways. Even as adults, we can choose to remain mentally curious (doing things like reading widely, travelling, meeting new people – all forms of exploration), or we can narrow our focus. There's an inherent energy cost and discomfort in exploration: it means admitting you don't know something, potentially failing or looking foolish as you try to learn, and it requires cognitive effort to incorporate new information. So exploitation often feels easier – it's coasting on what you've already compressed. Some people manage to keep a strong exploratory drive well into old age (the proverbial "lifelong learners" or creative types who keep reinventing themselves), essentially updating their compression algorithms continually. Others plateau at some point and stick to what they know, which maximises short-term efficiency but can lead to stagnation and brittle understanding if the world changes around them.
From a societal perspective, we need both explorers and exploiters. We need the dreamers, researchers, and artists who push boundaries and try wild new compressions (even at risk of failure) as well as the practitioners, managers, and workers who refine and execute known good strategies efficiently. Often, exploration happens in times or areas of uncertainty, and once some stable compressions are found (say, best practices or standard models), exploitation dominates to reap benefits. It's a cycle: periods of innovation followed by periods of consolidation.
The Compression Tragedy: Gaining Skills, Losing Raw Perception
By now, it should be clear that compression is a double-edged sword. It is our lifeline to navigate an information-rich world, yet it inherently means a surrender of detail. I often think of it in terms of a poignant trade: we sacrifice direct experience for conceptual efficiency. Children see the world with fresh eyes but lack understanding; adults understand much more but see less of what is actually there. This life-long process of trading "truth for efficiency" is what I'll call the compression tragedy. It's analogous to the idea of the "fall from grace" or the biblical loss of Eden's innocence: as we gain knowledge (eating from the tree of knowledge, symbolically), we lose the pure, unadulterated connection to the world.
Is this overly romanticising infancy? Perhaps – I don't mean to suggest that babies are enlightened sages. But there is something awe-inspiring about the way young children can become utterly absorbed in an aspect of reality adults deem trivial. Watch a toddler marvel at dust motes dancing in a sunbeam, or gleefully experiment with the sounds of nonsense syllables, and you're witnessing a human mind engaging with raw sensory or cognitive experience without the usual filters. As we become proficient in life, those moments diminish. We've seen dust motes before ("nothing new, ignore"), we know the syllables aren't real words ("no meaning, move on"). We've compressed "dust in sunbeam" into a generic concept of negligible interest and "random babbling" into a category not worth attending to. And so we tune out whole swaths of potentially beautiful or intriguing information that don't fit our agenda.
The tragedy is that, in becoming functional adults, we inevitably lose something precious: the ability to perceive without preconception. Philosophers and spiritual teachers often talk about seeing reality as it is, or cultivating "beginner's mind" (as in Zen Buddhism). These are attempts to reverse or suspend the habitual compression. Why? Possibly because there's a recognition that our compressed, utilitarian mode of perception, while necessary, is not the full story of being alive. It helps us survive, but does it help us fully live? Many people have had moments – maybe in travel, in nature, in art, in love – where suddenly their categories drop away for a second and they experience a kind of flooding in of raw reality. These moments can be triggered by the sheer novelty or beauty of a scene, or by a crisis that breaks our routines, or even intentionally through meditation or mind-altering substances. In those moments, one often feels a childlike wonder or an overwhelming sense of presence. Time might feel slowed, details stand out, and a sense of connection or wholeness can emerge. It's as if the mind stops compressing so hard and lets more data through, and along with it comes a richness of experience that had been pared away.
Yet, we can't live every moment like that. The person staring in awe at dust motes for hours is not going to accomplish much in daily life. To do any specific task, one must narrow one's focus and apply the right conceptual lens. That's exploitation of compression again. So, there's a pendulum swing: our lives oscillate between, on one end, compressed, efficient, goal-directed periods (work, chores, routine interactions) and on the other end, those rare de-compressed moments where we perhaps feel most alive (a festival, a breathtaking landscape, the first time you hold your newborn child and you're simply present). In a way, the tragedy is not that we compress – we must – but that we often forget we're doing it. We take our compression as reality itself and thereby blind ourselves to possibilities. We rarely pause to say, "Is my way of seeing this situation the only way? What details am I glossing over?" We just react according to our training.
Education, ironically, both alleviates and exacerbates this. It alleviates it when it exposes us to multiple perspectives, teaching us that one framework might not be absolute. But it exacerbates it when it channels us into a specialization where we develop very strong filters (the saying "to a hammer, everything looks like a nail" is relevant – if your education is in law, you start seeing legal issues everywhere; if in psychology, you see psychological motives everywhere; each expert has a honed lens that also blocks other views).
Perhaps the deepest tragedy is that we cannot consciously recall what it was like to be a baby – to experience the world uncompressed. Our adult memory itself is structured by the concepts and language we've acquired, so early childhood mostly exists for us as a blur, or at best fragments of uncontextualized sensation. It's like, once you've compressed a file and deleted the original, you can't get the original back. We carry maybe a subconscious imprint of that early state (some speculate it influences our artistic impulses or spiritual yearnings, the sense that there's something "just beyond" our conscious grasp), but we cannot live in it again, barring extraordinary circumstances.
This might sound a bit bleak, but I think acknowledging this trade-off allows us to make more mindful choices. Understanding that as we gain cognitive efficiency, we lose some perceptual truth can motivate us to periodically decompress and reconnect with raw experience. It could be as simple as taking a walk without your phone or any specific purpose, and just noticing what you see and hear as if you were new to the neighbourhood. Or making a deliberate effort to learn something totally outside your expertise (forcing yourself back into a novice exploratory mindset). In doing so, we momentarily step out of the cage (or, more gently, the tunnel) of our habitual compression.
Can We Rebalance or Reverse the Compression? (Art, Meditation, Psychedelics)
The final question I arrive at is a speculative and personal one: given the necessity of compression for survival, but the cost we pay in lost richness and flexibility, can we intentionally rebalance the equation? Can we design practices or use tools that let us recover some of that direct, unfiltered connection to reality without losing the benefits of our hard-won abstractions? This question straddles science, art, and philosophy – and I suspect the answer, if any, is not one-size-fits-all.
Art is one avenue where humanity seems to decompress aspects of experience. The role of artists can be seen as helping others see what they normally overlook. A great painter might notice how light falls in a way most people's compressed vision never registers, and put it on canvas, forcing the viewer to pay attention to it. A poet finds words for a subtle internal feeling that everyday language compresses away, suddenly expanding our understanding of that feeling. Essentially, art often makes the familiar strange and the strange familiar – it defamiliarises the ordinary so we see it anew, and it gives tangible form to things we didn't have a concept for so that they become approachable. In our framework, art is like temporarily switching to a more detailed codec for a particular slice of life. It doesn't abolish compression, but it swaps in a different set of filters that catch something novel. For example, when you watch a well-crafted film, you might find yourself noticing expressions or moods you wouldn't in real life, because the film directs your attention in a certain way (through close-ups, music, etc.). Or when you read a novel written from a perspective very different from yours, you're effectively running the author's compression scheme in your mind for a while, gaining new categories for understanding human behaviour. These experiences can expand our mental hypergraph – adding a new node or a new edge that we'll carry forward. In that sense, art can counteract the narrowing aspect of compression by injecting new information and new ways of seeing.
Meditation and mindfulness practices take a different tack. Instead of introducing new content, they encourage dropping as much content as possible to observe reality (and one's own mind) with minimal interpretation. A common meditation instruction is to "just note what is, without judgment." This is asking the brain to suspend its habitual compress-and-react cycle and simply witness sensations and thoughts as they come and go. Many meditators report that with practice, they start noticing layers of experience they hadn't before – the subtle tensions in their body tied to emotions, the way a thought arises and dissolves, the richness of a single breath. These were always there, but our compressive, goal-directed mode glossed over them. In a way, meditation is like deliberately lowering the compression ratio: you allow higher fidelity of input at the cost of not immediately acting on it. It's not easy, because the mind is trained to judge and categorise (that's compression) almost instantly. But those who stick with it often describe a kind of clarity and presence that feels like seeing the world more authentically. Some traditions even claim that with enough practice, one can attain a state of "pure awareness" or "seeing the true nature of reality," which sounds like the ultimate decompressed mental state – experiencing consciousness or sensory phenomena without any of the usual filters (no ego label, no narrative, no desire or aversion layered on top). Achieving such states is rare and usually temporary, but even brief glimpses can be transformative. Importantly, meditation doesn't remove your ability to compress when you need to – you can't function otherwise – but it trains a flexibility: you can recognise your thoughts and perceptions as constructs, and perhaps choose to drop them momentarily. It's like being able to hit pause on the movie of concepts and just watch the raw pixels for a while.
Psychedelics offer another, more radical path to altering compression. Substances like LSD, psilocybin, DMT, and others are known to drastically change perception and cognition. From a neuroscientific standpoint, one leading theory (the "Entropic Brain" hypothesis by Carhart-Harris et al., 2014) is that psychedelics chemically loosen the brain's high-level priors and increase the entropy (or disorder) of neural activity. In our terms, they partially disable the brain's usual compression algorithms, causing a flood of unfiltered or unconventionally filtered information. Users often report phenomena like seeing the world with intense novelty, as if objects are alive with meaning; experiencing synesthesia (mixing of senses, like "hearing colours"); or having their sense of self and time dissolve into a more primal swirl of sensations and thoughts. Essentially, psychedelics can temporarily roll back some of the compression that keeps our adult perceptions in check, potentially returning the brain to a more "childlike" or even beyond-childlike state of openness. Of course, this comes with risks and unpredictability – remove too many filters at once and the experience can be overwhelming or nightmarish (just as it would be for a baby if they were suddenly aware of adult concerns without any framework). But many people who carefully and responsibly use psychedelics (often in therapeutic or controlled settings) describe positive outcomes: they break out of rigid thought patterns, they see personal issues from a radically new perspective, they feel reconnected with emotions or with their environment. It's as if a compressed file in their mind was briefly unpacked, revealing details they forgot were there. And interestingly, some compressions might be rebuilt in a healthier way afterwards – for example, someone might lose a compulsive habit or a depressive schema because the trip allowed them to "reset" that particular pattern.
All these methods – art, meditation, psychedelics – hint that reversing or easing compression is possible, but typically transient. After the painting is finished, the meditation session ends, or the drug wears off, normal perception returns (and you still have to pay your bills and do the laundry with your efficient adult schemas). However, each experience can leave a lasting imprint. You might carry a new metaphor or understanding that enriches your compressed reality henceforth. Perhaps after a profound experience, you treat others with more kindness, having seen the common humanity beyond your prior judgments (a decompression of social perception that then gets integrated as a new, hopefully better, compression – like a more inclusive category of "us"). There's a kind of alchemy here: by deliberately courting states of lower compression (higher entropy), we can sometimes discover patterns or insights that improve our normal compressed worldview.
So, can compression be reversed or balanced? Complete reversal – regaining an unfiltered Eden – is likely impossible (and not desirably functional long-term). But balance – I believe yes, to a degree. We can learn to be more aware of our filters, to tweak them, or even occasionally drop them in safe contexts. It's a balancing act between the wonder and raw truth of uncompressed reality and the safety and utility of compressed reality. Perhaps the ideal is not to live at either extreme, but to dance between them: to be able to engage fully with life's immediacy when we want to, and to compress and act decisively when we need to. In a way, this entire reflection is me trying to integrate these perspectives – to understand that the world as I perceive it is a kind of convenient fiction, but one I can't do without, and then to ask how I might expand that fiction when it grows too narrow.
The journey from a babbling infant to a rational adult is one of increasing compression – language, concepts, schemas, all layered on – and it can feel like a one-way street. But maybe, through creativity, contemplation, or communion (in whatever form), we can occasionally walk it a bit backwards, not to stay there, but to remember what was lost and to retrieve some wisdom from that primordial state. In the end, the compression paradox is not something to solve once and for all, but to continually navigate. As Wittgenstein implied, expanding the limits of our language (literal or metaphorical) can expand the limits of our world. And as we do so, we might reclaim just a little more of the raw irrational fullness of reality – enriching our efficient lives with a dose of ineffable truth.
References
Carhart-Harris, R. L., Leech, R., Hellyer, P. J., Shanahan, M., Feilding, A., Tagliazucchi, E., … & Nutt, D. (2014). The entropic brain: a theory of conscious states informed by neuroimaging research with psychedelics. Frontiers in Human Neuroscience, 8(20), 1–22.
Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204.
Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory (2nd ed.). Hoboken, NJ: Wiley-Interscience.
Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138.
Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1(1), 1–7.
Kuhl, P. K. (2004). Early language acquisition: cracking the speech code. Nature Reviews Neuroscience, 5(11), 831–843.
Li, M., & Vitányi, P. (2008). An Introduction to Kolmogorov Complexity and Its Applications (3rd ed.). New York: Springer.
Menezes, T. & Roth, C. (2021). Semantic Hypergraphs. arXiv preprint arXiv:1908.10784.
Quirk, C. & Choudhury, P. (2013). Semantic neighbourhoods as hypergraphs. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, pp. 555–560.
Sapir, E. (1929). The status of linguistics as a science. Language, 5(4), 207–214.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423, 623–656.
Solomonoff, R. J. (1964). A formal theory of inductive inference. Information and Control, 7(1), 1–22; 7(2), 224–254.
Chaitin, G. J. (1966). On the length of programs for computing finite binary sequences. Journal of the ACM, 13(4), 547–569.
Bateson, G. (1972). Steps to an Ecology of Mind. San Francisco: Chandler Publishing Company.
Whorf, B. L. (1956). Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf. Cambridge, MA: MIT Press.
Wittgenstein, L. (1922). Tractatus Logico-Philosophicus. London: Kegan Paul.