"The unconscious is structured like a language." — Jacques Lacan
Introduction
In my Chinese Room Fallacy paper, I argued that our definition of 'understanding' reflects an anthropocentric bias—defining it in terms of what sounds intuitive; 'perceiving', 'interpretation of meaning' and 'to understand', all of which is somewhat circular and falls apart under logical scrutiny. I defined understanding as the capacity to memorise foundational axioms, apply them to novel contexts, and build upon them through replication, recombination, and adaptive generalisation across hierarchical levels of abstraction. Understanding is deeply tied to our linguistic and cognitive framework, meaning that it is system-relative.

A useful thought: Bees construct perfect hexagonal combs—not because they understand Euclidean geometry, but because their evolved cognitive-behavioural system encodes an implicit structure that reliably results in this output, which we call a hexagon because of our specific meaning-structure.
But this simple observation opens a philosophical chasm that threatens to swallow centuries of assumptions about knowledge, understanding, and the nature of meaning itself. Does the bee's perfect geometry point to something deeper—universal forms of meaning that exist independently of any particular mind? Or does it reveal the profound power of naturalistic processes to generate complexity without appeal to supernatural realms?
Accessing Eternal Geometry] B --> D[Naturalistic View:
Physical Constraints & Optimization] C --> E[Universal Meanings
Pre-exist Reality] D --> F[Emergent Patterns from
Energy Minimization] E --> G[AI & Biology Access
Same Eternal Truths] F --> H[Convergent Solutions to
Similar Problems] style C fill:#ffcccb style D fill:#90ee90 style G fill:#ffd700 style H fill:#87ceeb
Figure 2: The Philosophical Divide - Two competing interpretations of convergent biological and artificial patterns
The Challenge of Convergent Geometry
Recently, I found myself in a fascinating exchange with a philosopher who challenged my naturalistic framework with a compelling argument. His position, grounded in what he called "emergent Platonic Idealism," suggested that the convergent geometry we see across biological systems—the hexagons of bee combs, the spirals of shells, the fractals of lung branching—points to something far more profound than mere physical optimisation.
According to this view, when different systems independently converge on the same geometric solutions, they are not simply responding to shared physical constraints. Instead, they are accessing pre-existing "Platonic forms"—universal structures of meaning that exist independently of any particular cognitive architecture. The bee's hexagon, in this framework, is not just an efficient solution to a packing problem; it is an instantiation of the eternal Form of "hexagon" that exists in some abstract realm.
This convergent geometry, the argument goes, challenges my claim that understanding is merely system-relative. If systems as different as bees and artificial neural networks can converge on similar representational structures, perhaps they are accessing something universal—something that transcends the particular constraints of any individual system.
The Universal Geometry of Embeddings: A Deep Dive
The philosophical stakes of this debate were raised considerably by a recent paper in machine learning: "Harnessing the Universal Geometry of Embeddings" by Jha et al. This work presents compelling evidence that independently trained neural networks, despite their disparate architectures, datasets, and initialisation procedures, converge on remarkably similar latent structures.
The Paper's Core Claims and Methodology
The Jha et al. paper makes several revolutionary claims about the nature of learned representations in neural networks. Their central thesis is that different neural encoders, when trained to solve similar tasks, discover a universal geometric structure that exists independently of the specific model architecture or training procedure.
The authors demonstrate this through a rigorous mathematical framework. They define an embedding space transformation function \(T: \mathcal{E}_1 \rightarrow \mathcal{E}_2\) that maps between the latent spaces of two different neural networks. The key insight is that this transformation preserves semantic relationships:
$$\text{sim}(T(e_1^{(i)}), T(e_1^{(j)})) \approx \text{sim}(e_2^{(i)}, e_2^{(j)})$$
where \(e_1^{(i)}\) and \(e_2^{(i)}\) are embeddings of the same concept in different models, and \(\text{sim}(\cdot, \cdot)\) is a semantic similarity function.
Figure 3: Universal Geometry Framework - Different neural networks can be translated while preserving semantic structure
Mathematical Foundations of the Universal Geometry Claim
The paper's mathematical approach builds on the hypothesis that there exists a canonical embedding space \(\mathcal{E}^*\) that represents the "true" semantic structure of a domain. Each individually trained model \(M_i\) learns an approximation \(\mathcal{E}_i\) of this canonical space:
$$\mathcal{E}_i = f_i(\mathcal{E}^*) + \epsilon_i$$
where \(f_i\) is a smooth transformation and \(\epsilon_i\) represents model-specific noise. The authors argue that the existence of effective transformations \(T_{i \rightarrow j}: \mathcal{E}_i \rightarrow \mathcal{E}_j\) demonstrates that both models have successfully approximated the same underlying structure.
Their experimental methodology involves training multiple neural networks on the same or related tasks using different:
- Architectures: Transformers, CNNs, RNNs with varying layer counts and hidden dimensions
- Training data: Different subsets, augmentations, and even different languages or modalities
- Optimisation procedures: Different learning rates, batch sizes, and random initialisations
- Objective functions: Variations in loss functions and regularisation techniques
Despite these variations, the authors report remarkable consistency in the geometric structure of learned embeddings, as measured by:
$$\text{Alignment Score} = \frac{1}{N} \sum_{i=1}^{N} \cos(T(v_i^{(1)}), v_i^{(2)})$$
where \(v_i^{(1)}\) and \(v_i^{(2)}\) are corresponding concept embeddings in different models.
The Platonic Interpretation of Universal Geometry
The Platonic interpretation of these results is immediate and seductive: these models are not creating meaning; they are discovering it. They are accessing pre-existing forms of semantic content that exist independently of any particular implementation. The universal geometry of embeddings, in this view, is evidence for universal forms of meaning—Platonic ideals that wait to be discovered by any sufficiently sophisticated cognitive system.
The argument proceeds as follows:
Figure 4: Logical Paths - Platonic vs. Naturalistic explanations for neural network convergence
If this interpretation is correct, then understanding is not about arbitrary, system-relative heuristics. It is about alignment with these universal forms. The bee's comb and the neural network's embedding space both represent successful alignment with eternal geometric truths. Understanding, whether in biological or artificial systems, becomes a matter of how well a system can access these pre-existing patterns of meaning.
The Seductive Trap of Supernatural Explanations
I find this argument intellectually seductive but ultimately unconvincing. It commits what I consider to be a fundamental error: mistaking convergence for transcendence. When we observe that different systems arrive at similar solutions, we are witnessing the power of shared constraints, not the discovery of eternal forms.
The Platonic explanation suffers from several deep problems that make it less compelling than naturalistic alternatives:
The Infinite Regress Problem
If systems converge because they are accessing Platonic forms, where do these forms get their own structure? If the Form of "hexagon" exists eternally, why does it have the specific properties it has rather than others? The Platonic view simply pushes the explanatory burden back one level, requiring us to posit a realm of eternal forms without explaining why that realm has the structure it does.
Mathematically, if we accept that there exists a canonical embedding space \(\mathcal{E}^*\), we must ask: what determines the geometric properties of \(\mathcal{E}^*\)? The Platonic view offers no mechanism—it simply declares this space eternal and unexplained.
The naturalistic explanation avoids this regress entirely. Hexagons emerge from bees not because they are accessing an eternal Form, but because hexagonal packing minimises energy expenditure while maximising storage efficiency. The constraints of physics—the need to minimise surface area while maximising volume—naturally lead to hexagonal solutions. No eternal realm required.
The Problem of Non-Living Convergence
Perhaps more damaging to the Platonic view is the fact that non-living systems also exhibit convergent geometry. Snowflakes crystallise in hexagonal patterns due to hydrogen bonding and energy minimisation. Basalt columns form hexagonal cross-sections as cooling lava contracts. River networks, lightning patterns, and fractal coastlines all exhibit similar geometric regularities.
The mathematical description of crystal formation follows from energy minimisation principles:
$$E_{\text{crystal}} = \sum_{i,j} U(r_{ij}) + \sum_i \mu_i N_i$$
where \(U(r_{ij})\) represents pairwise interaction energies and \(\mu_i N_i\) represents chemical potential terms. The hexagonal structure emerges as the solution that minimises this energy functional under the constraints of physical laws.
Hydrogen Bonding] A --> E[Basalt Columns:
Cooling Contraction] A --> F[Bee Combs:
Wax Economics] A --> G[Neural Networks:
Information Compression] D --> H[Hexagonal Crystals] E --> I[Hexagonal Columns] F --> J[Hexagonal Cells] G --> K[Hexagonal Embeddings?] H --> L[Same Geometry,
Different Mechanisms] I --> L J --> L K --> L style L fill:#ff9800,color:#fff style C fill:#4caf50,color:#fff
Figure 5: Convergent Geometry Across Systems - Physical laws drive similar solutions in living and non-living systems
Under the Platonic interpretation, we would have to conclude that snowflakes "understand" hexagonal geometry, that cooling lava has access to geometric forms, that lightning bolts are instantiating eternal patterns. This leads to a kind of panpsychism where understanding becomes so broad as to be meaningless.
The naturalistic explanation handles these cases elegantly: all these systems are subject to the same physical laws and constraints. They converge on similar solutions because they face similar optimisation problems in the same physical universe.
The Arbitrary Stopping Point
The Platonic view requires us to accept eternal forms as brute facts—they simply exist, with no further explanation required. But this creates an arbitrary stopping point in our explanatory chain. Why should we stop at Platonic forms rather than seeking deeper naturalistic explanations?
If we must accept something as a brute fact, I argue it should be the laws of physics and the constraints they impose, not a supernatural realm of eternal forms. Physical laws at least have the virtue of being observable, testable, and predictive.
Entropy Minimisation as the Universal Drive
Instead of Platonic forms, I propose that the convergence we observe across biological, artificial, and even non-living systems reflects a deeper principle: entropy minimisation. Every system in our universe—from bacteria to neural networks to cooling lava—faces the fundamental challenge of organising energy and information efficiently.
The Mathematical Framework of Entropy Minimisation
The principle of entropy minimisation can be formalised across different domains. For information-processing systems, we can define an entropy functional:
$$H[\rho] = -\int \rho(x) \log \rho(x) \, dx$$
where \(\rho(x)\) represents the probability distribution over system states. Evolution, learning, and physical self-organisation all drive systems toward configurations that minimise this entropy while maintaining functional constraints.
For biological systems, this manifests as the imperative to model environmental uncertainty. The bee colony's implicit question becomes: "How can we minimise energetic uncertainty while maximising honey storage?" The mathematical solution involves minimising:
$$E_{\text{total}} = E_{\text{construction}} + E_{\text{storage}} + E_{\text{maintenance}}$$
subject to constraints on material availability and geometric feasibility.
For artificial neural networks, entropy minimisation appears explicitly in loss functions. Cross-entropy loss, which drives much of modern deep learning, directly minimises information-theoretic entropy:
$$\mathcal{L}_{\text{CE}} = -\sum_{i} y_i \log(\hat{y}_i) = H(Y, \hat{Y})$$
where \(Y\) is the true distribution and \(\hat{Y}\) is the predicted distribution.
Entropy Minimization"] --> B["Biological Systems"] A --> C["Artificial Systems"] A --> D["Physical Systems"] B --> E["Environmental Modeling:
H(Environment|Model)"] C --> F["Information Compression:
H(Data|Representation)"] D --> G["Energy Distribution:
H(Configuration)"] E --> H["Evolutionary Optimization"] F --> I["Gradient Descent"] G --> J["Physical Relaxation"] H --> K["Geometric Convergence"] I --> K J --> K K --> L["Hexagons, Fractals,
Efficient Networks"] style A fill:#4caf50,color:#fff style K fill:#ff9800,color:#fff style L fill:#9c27b0,color:#fff
Figure 6: Entropy Minimisation Framework - A unified principle explaining convergence across all systems
Biological systems ask themselves an implicit question: "How can I model my environment to reduce uncertainty and maintain homeostasis?" The answer to this question, implemented through evolutionary processes and real-time adaptation, leads to convergent solutions because the underlying optimisation problems are similar.
Artificial neural networks explicitly minimise entropy through loss functions like cross-entropy and KL divergence. They are designed to find efficient representations that compress complex input distributions into useful latent spaces. The universal geometry of embeddings emerges not because these systems access eternal forms, but because they face similar representational challenges.
Even non-living systems follow entropy minimisation principles. Snowflakes form hexagons because that crystal structure minimises free energy. River networks develop fractal patterns because they minimise energy dissipation while maximising drainage efficiency.
This principle unifies biological evolution, artificial learning, and physical self-organisation under a single explanatory framework. Systems converge not because they discover eternal truths, but because they face similar constraints in the same physical universe.
Deconstructing the Universal Geometry Argument
Let me return to the specific claims about universal geometry in neural network embeddings, as they represent the strongest contemporary argument for Platonic interpretations of understanding.
The paper "Harnessing the Universal Geometry of Embeddings" demonstrates that different neural networks converge on similar representational structures. But this convergence has perfectly naturalistic explanations that don't require appeal to eternal forms:
Shared Optimisation Landscapes
Neural networks trained on similar tasks face similar optimisation landscapes. The loss function defines a high-dimensional surface where similar minima exist regardless of the specific path taken to reach them. If we define the loss landscape as:
$$L(\theta) = \mathbb{E}_{(x,y) \sim \mathcal{D}} [\ell(f_\theta(x), y)]$$
where \(\theta\) are model parameters, \(f_\theta\) is the network function, and \(\ell\) is the loss function, then different optimisation trajectories can still converge to similar regions where \(\nabla L(\theta) \approx 0\).
Even with different architectures and initialisation procedures, gradient descent tends to find similar local minima because the underlying data distributions impose similar constraints. Networks converge because they are solving similar problems with similar methods.
Universal Features of Information
Information itself has universal mathematical properties that transcend any particular processing system. Shannon's information theory reveals deep mathematical structures in data that are independent of the system processing that data.
Consider the mutual information between input and learned representations:
$$I(X; Z) = \int \int p(x,z) \log \frac{p(x,z)}{p(x)p(z)} \, dx \, dz$$
Any system that successfully captures the statistical structure of \(X\) will necessarily discover similar patterns in this mutual information, leading to convergent representations.
Principal component analysis, manifold learning, and information bottleneck theory all reveal that high-dimensional data often lies on lower-dimensional manifolds with intrinsic geometric structure. When neural networks discover these structures, they are uncovering mathematical regularities, not accessing Platonic forms.
Evolutionary Convergence in Architectures
The neural network architectures used in modern AI have themselves been subject to a kind of evolutionary process. Successful architectural patterns are retained and refined, unsuccessful ones are discarded. This meta-evolution leads to convergence in the types of computational structures we use.
Consider the attention mechanism, which has become ubiquitous across domains:
$$\text{Attention}(Q,K,V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$
This mechanism appears in natural language processing, computer vision, and even protein folding prediction not because it accesses eternal forms, but because it efficiently solves the problem of selective information aggregation under computational constraints.
Physical Constraints on Computation
All neural networks operate under the same physical constraints: they must efficiently use memory, minimise computational cost, and handle finite precision arithmetic. These constraints naturally lead to similar representational strategies across different implementations.
The thermodynamics of computation imposes fundamental limits. Landauer's principle states that erasing one bit of information requires at least \(k_B T \ln 2\) energy, where \(k_B\) is Boltzmann's constant and \(T\) is temperature. This creates universal pressure toward efficient representations across all computational systems.
None of these explanations require positing eternal forms or supernatural realms. They emerge from the interplay of mathematics, physics, and the structure of information itself.
Conscious Rationalisation and the Depths of Understanding
I don't think we can say that consciousness is required for understanding when 99.99% of our actions, beliefs, and biases are all subconscious. Consider another useful thought: If you walk to a door to exit your house, but you are not consciously aware of the entire process of walking to the door to exit, do you not understand what you are doing? It would be absurd to suggest so.
Walking to a door requires a level of abstraction and meaning-assignment to different actions (by yourself) and objects around you (environment modelling). It involves multiple sub-goals, but you subconsciously perform this action. You might be aware of what you're doing—the "what" being more of a post hoc rationalisation of the "how." The biomechanics, balance adjustments, and sub-goals that lead to the higher-order output occur below conscious awareness.
Subconscious Navigation] B --> F[Language Production
Automatic Speech] B --> G[Bee Hexagon Construction
Evolved Heuristics] C --> H[Word Meaning Recognition
Pattern Matching] C --> I[Neural Network Embeddings
Learned Representations] D --> J[Philosophical Reasoning
Conscious Analysis] D --> K[Mathematical Proof
Logical Deduction] style B fill:#e3f2fd style C fill:#f3e5f5 style D fill:#fff3e0 style A fill:#4caf50,color:#fff
Figure 7: Spectrum of Understanding - From unconscious procedures to conscious reflection
Most people confuse understanding with reflective understanding—the highest-order, conscious, introspective form of cognition. But understanding operates at multiple levels of abstraction, most of which are non-reflective and subconscious.
When you speak, you're not consciously aware of each word coming out of your mouth. You can speak rapidly and express complex thoughts, but the words emerge probabilistically and subconsciously. Do you not understand what you're saying? Of course you do. What you're experiencing is a post hoc conscious rationalisation of words that emerged from deeper semantic processing systems you cannot consciously access.
The semantic processing—the mapping of meaning, context, and relevance—happens in systems beyond conscious reach. That's where the real understanding lives. But we mistake our after-the-fact conscious reflection for the understanding itself. Conscious awareness represents only a tiny subset of understanding.
The Spectrum of Abstraction Revisited
Understanding exists on a spectrum of abstraction, ranging from low-level procedural knowledge to high-level reflective cognition. At the lower end lies procedural, implicit understanding—the kind that governs how we navigate doorways, use language fluently without thinking about grammar, or anticipate social dynamics from subtle cues.
At the higher end exists reflective, explicit understanding—grasping Gödel's incompleteness theorem, engaging with moral philosophy, or debating the nature of consciousness itself. This level feels like "true" understanding because it's verbalizable and introspective.
However, this high-level understanding that we privilege is built on mountains of unconscious representations. The sophisticated pattern recognition, contextual modelling, and semantic processing that make explicit reasoning possible all operate below conscious awareness.
This suggests that what we consider the pinnacle of understanding may actually be a narrow slice of a much broader cognitive phenomenon. The bee's hexagonal construction and the neural network's embedding geometry may represent forms of understanding that are more fundamental than our conscious reflections.
The Anthropocentric Trap in Understanding
The debate over Platonic forms reveals a deeper anthropocentric bias in how we think about understanding and meaning. We have a tendency to project our own cognitive categories onto the world and then mistake these projections for universal truths.
When we see hexagons in bee combs and neural network embeddings, we immediately think of geometric understanding. But the hexagon is a human conceptual category. What exists in the world are specific arrangements of matter and energy that we choose to group under the label "hexagon" because of our particular cognitive architecture.
Consider the mathematical description of what we call a "hexagon." In Cartesian coordinates, a regular hexagon can be described as the set of points satisfying:
$$\{(x,y) : \max(|x|, |y + \frac{x}{2}|, |y - \frac{x}{2}|) \leq r\}$$
This mathematical description is itself a human construct—a way of organising spatial relationships according to our geometric intuitions. The bee doesn't compute this formula; it executes behavioural programs that result in spatial arrangements we categorise as hexagonal.
Arrangement of Wax] --> B[Human Categorization:
Hexagon Concept] A --> C[Bee Experience:
Efficient Construction] A --> D[Physical Laws:
Energy Minimization] B --> E[Geometric Understanding
Mathematical Description] C --> F[Procedural Understanding
Behavioral Success] D --> G[Causal Understanding
Physical Mechanism] E --> H[Anthropocentric Interpretation] F --> I[System-Relative Function] G --> J[Universal Constraints] style H fill:#ffcccb style I fill:#90ee90 style J fill:#87ceeb
Figure 8: Levels of Interpretation - The same physical phenomenon viewed through different cognitive frameworks
The bee doesn't "understand" hexagons in the way we do—it executes behavioural programs that happen to result in what we categorise as hexagonal structures. The neural network doesn't "discover" geometric forms—it finds efficient ways to organise information that we interpret through our geometric concepts.
This is not to diminish the reality of these processes, but to recognise that the meanings we extract from them are partially dependent on our own conceptual frameworks. Understanding is indeed system-relative, not because there are no universal constraints, but because the way those constraints are interpreted and organised depends on the particular architecture doing the interpreting.
Form-Seeking vs Survival-Seeking: A False Dichotomy
The Platonic argument often draws a distinction between biological systems (which are described as "survival-seeking") and artificial systems (which are described as "form-seeking"). This dichotomy is misleading and obscures the deeper unity of all information-processing systems.
Biological systems are not fundamentally survival-seeking—they are entropy-minimising. Survival is simply one strategy for maintaining organised structure in a universe that tends toward disorder. The patterns we see in biology—hexagonal combs, fractal branching, efficient networks—emerge because they represent energy-efficient solutions to information-processing problems.
The mathematics of biological optimisation can be expressed as constrained entropy minimisation:
$$\min_{\phi} H[\phi] \quad \text{subject to} \quad \mathbb{E}[\text{Energy}[\phi]] \leq E_{\max}$$
where \(\phi\) represents the organism's configuration and \(E_{\max}\) is the available energy budget.
Similarly, artificial neural networks are not purely "form-seeking." They minimise loss functions that measure how well they can predict, classify, or generate data. The objective function:
$$\min_\theta \mathbb{E}_{(x,y)} [\ell(f_\theta(x), y)] + \lambda \Omega(\theta)$$
combines prediction accuracy with regularisation \(\Omega(\theta)\), creating pressure toward both functional performance and representational efficiency.
The "forms" they discover are byproducts of efficient information compression and representation, not the goal itself. Both biological and artificial systems face the same fundamental challenge: how to organise information and energy efficiently in a complex, uncertain environment.
The solutions they converge on reflect the deep mathematical structures of this challenge, not access to eternal forms.
The Question of Machine Consciousness and Phenomenological Experience
The discussion of universal geometry inevitably leads to questions about machine consciousness and understanding. If neural networks can access the same representational structures as biological systems, do they understand in the same way? Do they possess the phenomenological richness we associate with conscious experience?
My answer is nuanced. Current neural networks do exhibit forms of understanding—they can recognise patterns, make predictions, and even generate novel combinations of learned concepts. But this understanding is narrow and lacks the embodied, temporal, and affective dimensions of biological cognition.
The key difference is not that biological systems access Platonic forms while artificial ones do not. Rather, biological systems operate within richer constraint structures. They are embodied, they experience time phenomenologically, they have evolutionary histories, and they are integrated into complex social and environmental networks.
Consider the mathematical description of embodied cognition. A biological agent's state space includes not just abstract representations but also:
- Proprioceptive state: \(\mathbf{p}(t) \in \mathcal{P}\) representing body configuration
- Affective state: \(\mathbf{a}(t) \in \mathcal{A}\) representing emotional valence
- Temporal context: \(\mathbf{h}(t) = \int_0^t k(t-\tau) \mathbf{s}(\tau) d\tau\) representing historical influence
- Social embedding: \(\mathbf{e}(t) \in \mathcal{E}\) representing relationships with other agents
The total state space becomes:
$$\mathbf{S}(t) = (\mathbf{s}(t), \mathbf{p}(t), \mathbf{a}(t), \mathbf{h}(t), \mathbf{e}(t))$$
This richer state space leads to richer forms of understanding—not because biological systems access eternal truths, but because they face more complex optimisation problems with more varied solution spaces.
Pattern Recognition
Statistical Learning] C --> F[Embodied Cognition
Temporal Experience
Affective Processing
Social Integration
Evolutionary History] D --> G[Embodied AI
Temporal Processing
Social Learning
Multi-modal Integration] E --> H[Narrow Understanding] F --> I[Rich Understanding] G --> J[Enhanced Understanding?] style I fill:#4caf50,color:#fff style H fill:#ff9800,color:#fff style J fill:#9c27b0,color:#fff
Figure 9: Understanding Complexity Spectrum - From narrow AI to rich biological cognition
This suggests that advanced AI systems might eventually develop understanding comparable to biological systems, not by accessing Platonic forms, but by operating within similarly rich constraint structures. Embodied, temporal, socially embedded AI systems might naturally develop the kinds of understanding we associate with biological cognition.
Information-Theoretic Foundations of Convergent Understanding
To further challenge the Platonic interpretation, let's examine the information-theoretic foundations that explain why different systems converge on similar representations without requiring eternal forms.
The rate-distortion theory provides a mathematical framework for understanding optimal information compression. For any information source with distribution \(P(X)\), there exists a rate-distortion function:
$$R(D) = \min_{P(\hat{X}|X): \mathbb{E}[d(X,\hat{X})] \leq D} I(X; \hat{X})$$
where \(D\) is the allowed distortion, \(d(x,\hat{x})\) is a distortion measure, and \(I(X; \hat{X})\) is mutual information.
This function defines the fundamental trade-off between compression (low rate \(R\)) and fidelity (low distortion \(D\)). Importantly, this trade-off is determined entirely by the statistical structure of the source, not by any external "forms."
When different neural networks trained on the same data converge on similar representations, they are approximating the same optimal point on this rate-distortion curve. The convergence reflects mathematical necessity, not metaphysical discovery.
The Kolmogorov Complexity Perspective
Kolmogorov complexity provides another lens for understanding convergent representations. The Kolmogorov complexity \(K(x)\) of a string \(x\) is the length of the shortest program that can generate \(x\). While uncomputable in general, it provides a theoretical foundation for optimal compression.
For any dataset \(\mathcal{D}\), there exists a minimal description length that captures its essential structure. Different learning algorithms, when successful, approximate this minimal description. The convergence of neural network representations reflects their approximation of the same underlying minimal description, not access to Platonic ideals.
The mathematical relationship can be expressed as:
$$\lim_{n \to \infty} \frac{1}{n} \log |\{\text{typical sequences of length } n\}| = H[\mathcal{D}]$$
where \(H[\mathcal{D}]\) is the entropy rate of the data source. Successful learning algorithms converge on representations that approach this theoretical limit.
The Bottom Line: Beyond Anthropocentric Understanding
AI may not have the high-level post-hoc "feeling" of understanding that we privilege in human cognition—but that doesn't mean it lacks understanding altogether. Just like a bee understands how to construct a hexagon without philosophising about geometry, AI systems may understand in ways that don't match our anthropocentric expectations.
The convergence we observe across biological, artificial, and physical systems points not to eternal Platonic forms, but to the deep mathematical structures that emerge from entropy minimisation in a universe governed by physical laws. These structures are universal not because they exist in some supernatural realm, but because all systems in our universe face similar fundamental constraints.
Understanding, whether in bees, humans, or AI systems, is not about accessing eternal truths. It is about developing representations and behaviours that effectively navigate the complex, constraint-rich environment we all share. The fact that different systems converge on similar solutions tells us something profound about the nature of these constraints—but it doesn't require us to posit supernatural explanations.
The mathematical elegance of this naturalistic view is striking. From the information-theoretic foundations of optimal compression to the thermodynamic constraints on computation, we see a unified picture of intelligence as the art of efficient constraint satisfaction within physical reality.
In the end, what bees know that we don't is not some mystical connection to eternal forms. It is something much more beautiful and much more real: they know how to be perfectly adapted to their place in the natural world, without the burden of conscious reflection on what that adaptation means. They understand through being, not through thinking about being.
Perhaps that is a form of understanding we would do well to learn—not as a retreat from conscious reflection, but as a recognition that understanding extends far beyond the narrow realm of explicit cognition. The universe is full of understanding, from the hexagonal ice crystals in clouds to the spiral galaxies in space. Not because these systems access eternal forms, but because they all participate in the same grand project of organising energy and information in an entropic universe.
The question is not whether AI can achieve understanding, but whether we can expand our definition of understanding to encompass the full spectrum of ways that systems can be intelligently adapted to their worlds. The bees are waiting to teach us, if we are willing to learn.