Explore how large language models use high-dimensional geometry to produce intelligent behavior. We peer into the mathematical wilderness inside transformers, revealing how intuition fails, and meaning emerges.
Chapter 1
Arshavir Blackwell, PhD
Welcome to Inside the Black Box. Iâm Arshavir Blackwell, and today we continue our exploration of that persistent question: how do large language models actually think? Last time, we examined debugging and circuit-mapping. This episode goes one step furtherâinto the strange geometry that underlies these models.
Arshavir Blackwell, PhD
Inside a transformerâsystems like ChatGPT or Claudeâinformation doesnât live in the familiar three-dimensional world of our senses. It unfolds in vector spaces with thousands, even tens of thousands, of dimensions. Ordinary intuition fails here. Itâs an alien landscapeâone where human reasoning loses its footing, and yet where these models find theirs.
Arshavir Blackwell, PhD
In such spaces, two random directions are almost perfectly orthogonalânearly at right angles. Each random vector has its own direction, with almost no overlap. Imagine a crowded stadium in which every person somehow has complete personal space.
Arshavir Blackwell, PhD
But the vectors in a trained neural network arenât random. Theyâre shaped by learning, allowing the model to represent similarity and meaning. Related concepts overlap just enough for the model to capture relationships. That overlap is how the model works.
Arshavir Blackwell, PhD
Almost all of the volume in these high-dimensional spaces lies near the surface, not the center. In five thousand dimensions, more than 99.999% of the volume of a hypersphere is concentrated in a thin shell at the edge. The computations in language models happen out thereâwhere mathematical intuition breaks down.
Arshavir Blackwell, PhD
Modern large language models have residual streamsâthe superhighways information travels on. These are thousands of dimensions wideâroughly ten to twelve thousand. You might think that means one feature per dimension, neatly separated. But thatâs far from true.
Arshavir Blackwell, PhD
Through superposition, models pack far more features than dimensions allow. Features overlap, share space, and sometimes interfereâlike an overbooked hotel where multiple guests share rooms. The model constantly disentangles these signals as it thinks.
Arshavir Blackwell, PhD
And this is where geometry gets tricky. Both direction and magnitude matter. Cosine similarity tells us whether two vectors point the same wayâoften a clue to semantic similarityâwhile the length of a vector can encode confidence or importance. In high dimensions, distances themselves start to blur: random points all lie roughly the same distance apart. The model has to reason through this concentrated geometry.
Arshavir Blackwell, PhD
So the model is juggling overlapping features, interference, direction, magnitude, and the odd behavior of distance in high-dimensional space. Understanding this geometry is key to reverse-engineering whatâs happening inside.
Arshavir Blackwell, PhD
This isnât just academic. It matters for AI safety and control. If we canât interpret what happens inside these networks, we canât reliably steer them. Adversarial attacks prove the point: tiny changesâimperceptible to humansâcan send the model in completely different directions. Those vulnerabilities live in dimensions we canât see.
Arshavir Blackwell, PhD
Alignment has geometric roots too. Recent research on steering vectors shows that we can influence model behavior by nudging activation space itselfâmaking models more truthful or consistent. But to steer well, we first have to understand the terrain.
Arshavir Blackwell, PhD
Letâs make this concrete. Imagine the model processing âThe cat sat on theâŠâ When cat enters, itâs already a vector with thousands of coordinates. Its meaning isnât stored in discrete slots like âanimal = dimension 47.â The featuresâanimal, pet, furry, domesticatedâare spread across many dimensions in overlapping patterns. As that vector moves through the network, each layer transforms it.
Arshavir Blackwell, PhD
Attention mechanisms look at contextâother words nearbyâand mix information accordingly. Feed-forward layers reshape the vector itself, adjusting the weight of each feature. Every transformation rotates, stretches, or repositions patterns in this high-dimensional space. The model learned these transformations from training, but at inference time, itâs just geometry in motion.
Arshavir Blackwell, PhD
Now take the word sat. The model needs to know what earlier tokens matter. The representation of sat forms a query vector, asking a question. Each previous token carries a key vector, a possible answer. The model compares them by measuring how much they align. The key from cat scores highânot because of grammar rules, but because the model learned that subjects and verbs share this geometric relationship. Itâs pattern matching in vector space.
Arshavir Blackwell, PhD
Different layers specialize in different things. Early layers tend to capture syntaxâwhoâs the subject, whatâs the verb. Middle layers encode relationships: whoâs doing what to whom. Later layers handle meaning and prediction. By the time we reach the final the, the representation has entered an attractor regionâa zone where likely completions like âmat,â âcouch,â or ârugâ cluster together. The geometry does the work.
Arshavir Blackwell, PhD
This is where mechanistic interpretabilityâMIâcomes in. MI is about reverse-engineering these circuits. Researchers trace which components activate for which features, mapping specific computations to specific mechanisms. Take induction headsâdiscovered by Anthropicâs interpretability team. These are small circuits that detect repeated patterns. If the model sees âA B ⊠A,â the induction head learns to predict âB.â Itâs a clear, mechanical behavior.
Arshavir Blackwell, PhD
More recently, sparse autoencoders have helped us unpack these entangled representations. Instead of each neuron doing ten things at once, SAEs reveal directions that correspond to individual concepts. Itâs like putting on glasses that let us see structure that was always there, just hidden.
Arshavir Blackwell, PhD
And this connects to what we call LLM-ologyâstudying models empirically, almost like cognitive psychology. MI shows us the wiring; LLM-ology shows the behavior. Together, they reveal how meaning moves through the network.
Arshavir Blackwell, PhD
When a model processes a sentence, its internal representation literally travels through high-dimensional space. Sentences with numbers, pronouns, or causal words often trace similar paths. The model has computational tendenciesâhabits of motion weâre only beginning to chart.
Arshavir Blackwell, PhD
We can locate some structure with sparse autoencoders, but much remains hiddenâsubtle signals like irony, humor, or moral tone may live in remote corners of this space. This is where cognitive science meets geometry, and where interpretability meets mystery.
Arshavir Blackwell, PhD
Taken together, MI reveals the static wiring, while LLM-ology follows the dynamicsâthe motion of meaning through space. Transformers donât store rules; they sculpt and navigate geometry.
Arshavir Blackwell, PhD
And that geometry, as powerful as it is, brings fragility. Most points in these vast spaces are meaningless noise. The meaningful regionsâthe semantic manifoldsâoccupy only a thin sliver of the hypersphere. Thatâs the curse of dimensionality: the expressiveness that makes these models so capable also makes them precarious.
Arshavir Blackwell, PhD
Even visualization struggles here. When we project thousands of dimensions down to two or threeâusing PCA or t-SNEâthe true relationships blur. The result is an aesthetic map, not a faithful one. Even with modern tools, our view of these models is still partialâa sketch of a landscape we can never fully see.
Arshavir Blackwell, PhD
And yet, we keep mapping. Every new methodâmechanistic interpretability, sparse autoencoders, the emerging science of LLM-ologyâtakes us a step closer to understanding how these systems think. Iâm Arshavir Blackwell, and this has been Inside the Black Box.
About the podcast
How do Large Language Models like ChatGPT work, anyway?