Simulator AI

Ethan Smith
Sep 6
23 min read

Updated: Oct 17

The most common generative models in deep learning include GANs, diffusion models, and autoregressive models. What they all have in common is that they are all methods of both modeling and sampling from p(x), a data distribution of interest, just different methods of decomposing the problem and relying on varying assumptions. Though in the end, all of them can do both:

Create convincing samples that somewhat blend in with other real data samples.
Give an estimate of the probability of a sample (if not in closed form, at least recognizing that one sample is more probable than another)

This post details many different categorizations of AI, ranging from "Tool AI," "Oracle AI," and "Agentic AI." Though, ultimately, the author decides the best description of an AI that performs the listed tasks is a "simulator.”

Janus and GPT3 collaboratively write:

"A simulator is optimized to generate realistic models of a system. The simulator will not optimize for any objective other than realism, although in the course of doing so, it might generate instances of agents, oracles, and so on."

And this is true in a very literal sense. In the pre-training objective, all of the above models are optimized to reconstruct the data distribution they've seen, becoming something of a mirror of our world.

Though, a bit of a "fun-house mirror" due to inaccuracies. A Text2Image diffusion model will hand out plausible imaginings for a given text prompt, though we can see when it doesn't exactly align with how we're used to seeing things.

In one of the diffusion models I trained, an evaluation prompt was "A boy with a toy sword", though I immediately recognized, despite the moderate perceptual realism and it overall being correct, it just wasn't what "a boy with a toy sword" looks like in our world.

A Google search gives the following:

Though my model gave

There are a number of criticisms that could be made here in terms of the lighting or odd fingers, though a glaring takeaway is that we probably do not imagine a toy sword being a sharp, heavy piece of metal given to a child. It doesn't fit in line with the grammar of our world nor societal expectations. It's a surprising result, a relatively higher entropy one. In earlier stages of training, I recall times the boy was holding a battle-axe. Even more absurd.

Just as the model has generated these samples, we have taken a look, assessed them with respect to our world model, and decided these are not very probable samples to have come from the real world. We may have also mentally consulted a set of traits or created imagined reference samples while we were at it to further confirm our suspicions. This hopefully conveys the idea as well that generative models can be just as capable of generating samples as they can discriminate the probabilities of observations.

The above is a one-dimensional example, where we come across values on a number line and associate them with some probability or expectation of them occurring. Let’s imagine the value on the bottom represents heights (pretend there are actual plausible values there). In my world, I know the average male height is around 5 feet 8 inches or so. I know most cluster around here, but it can vary from below 4 feet to possibly as high as 8 feet or so. Now suppose the model I am comparing against seems to think that the average, most common height is 7 feet. It’s clear we’re not talking about the same world here.

A model’s p(x) function, in a sense, represents its notion of reality, assuming it was trained on such. Namely, a notion of reality here would mean the probabilities we assign to events or observations. Every person and model thus has its own learned reality. We can see how our realities differ from each other by checking how we fill in gaps of information we do not have available, our "expectations". If we see a cat's head peeking out around a corner, we'd assume it is attached to the remainder of a cat's body. Someone holding out a stick with a plastic cat's head is also a plausible example to explain our observations but in our world, this occurs with much lower probability. Your brain recognizes this. From your history of observations, you adjust your expectations, and thus your world model, accordingly. Now imagine someone who experienced witnessing the plastic cat head most of the time. They might have a different set of probabilities for the events they expect to happen, and thus a differing world model to you.

Another case that happens very commonly to me is I’ll often know some of my coworkers only virtually, pretty much only knowing them as the top half of their torso. Once meeting them in person, without fail, I am very often surprised to see how much taller or shorter they are than I imagined, making me confront and update the image I had of them with the newly acquired information I have. Yet, other folks may have different expectations on one's height. Another example more similar to our diffusion model visual example is revealed when we look at clouds or inkblots and each have different answers to what we see in them. These micro differences in realities humans have speak to the people we are and how we interpret the world.

The language model is tasked with observing a partially complete sentence and predicting the probabilities of words that could occur next. From there, we can randomly sample the next word. Producing strings of text means simply outputting reasonable completions, word by word, that appear like naturally occurring text in the real world.

https://lena-voita.github.io/nlp_course/language_modeling.html

It is optimized to suggest plausible next words. It is not optimized for truth. If it were optimized for “true” sequences, the model would only be able to vomit out texts that have already existed in the dataset—here, “true” meaning a real, existing piece of text. Any text that has never been uttered before would be untruthful and avoided. This would render our model merely a database and defeat the purpose of aiming to generate new things probabilistically.

However, we might also like to optimize for truth more generally, such that recalled facts in novel strings of text are still accurate. Fortunately, truth significantly overlaps with plausibility. The most probable word, based on what was seen in the dataset, to continue after "The capital of France is _____" is inarguably "Paris.” Not necessarily because it is the truth, but because this is how this text most commonly occurs in our world. On the other hand, information that appears more rarely or sometimes appears incorrectly is less likely to benefit from the intersection of truth and plausibility. For example, if you ask a language model whether camels store water in their humps, due to the common myth, it may respond affirmatively and even come up with a very reasonable-sounding explanation to back its claim up. It's convincingly wrong. Wrong with respect to truth, but right in that the generated text feels organic.

This plausibility objective is enough to emulate universes contained within text that feel very much like our own and follow much of the same rules, even if some hallucinations result in having slightly different timelines or people or things that exist. With a smaller model and/or sparser data, we may observe greater error from the world we know.

The generated texts reflect happenings in our world, like telling a story, a log of texts between two friends, or walking through the steps of a math problem. Sort of how a book or a text-based RPG can convey very rich worlds to a reader, relying on the reader’s imagination and mental “upsampling,” sampling texts from language models, I would argue, simulates worlds. However, they're comically compressed worlds projected into an incredibly low-dimensional space. They can be a bit ambiguous, and they depend on prior human understanding to have meaning. Language is pretty close to a minimum bandwidth viable communication interface that can still pretty extensively cover much of reality.

In retrospect, it feels sort of natural that text was the modality to really take off. After all, evolution has been observed to discover structures and tools that are uniquely efficient. Text is the Universal Interface, some claim. Its loss of precision and expressiveness is outweighed by gains in compression and distilling out qualities that allow for utility. The development of humanity necessitated a means of socializing and communicating face-to-face and across time. It was inevitable that whatever means we would construct would be efficient. Meanwhile, images, videos, audios, and many other modalities are raw, uncompressed signals from reality. I sort of like to think of it as humans having already done the first stage of encoding or feature engineering by turning their observations of the world into text, yielding a space with effective inductive biases for modeling, not too dissimilar to how we may train generative models over latent spaces learned by auto-encoders to reduce dimensionality and achieve favorable modeling capabilities.

In generating text, and thus describing happenings of a world, a language model may describe a faithful diagram of the rain cycle, or how invested money gains value over time, or how a piece of uranium decays over time. With proper detail, we can begin from seed conditions and simulate how they would play out in an environment, cementing the role as a simulator (this analogy may feel more tangible with video models that can show a ball falling down stairs, though this property is consistent independent of the medium)

It can extend to people too quite well. The model may produce a first-person roleplaying as a character who finds themselves in a dangerous situation. Produce is the key word; the model produces the character within its simulation but isn't necessarily the character itself. It is completely agnostic to the outcomes of this character, only myopically sampling one plausible token at a time, and through abstracting chains of tokens, sampling a plausible outcome for this character.

The hallucinated character may have a set of rich goals that appear in its rendered behavior, a byproduct of training over many characters within stories and the model learning how they should act. The character may pursue riches and happiness and act in self-preservation when put in danger, qualities we expect a character to have and the model expects as well. Through this storytelling and this shared understanding between us and the model, we materialize a convincing simulacrum.

Unlike the character, though, the model itself has no horse in this race. It is indifferent to what happens to the fleeting characters and scenarios it manifests. It only provides a continuation of the story, a randomly sampled but realistic fate. Despite it seeming like we are talking with the model, it'd be better described as poking around in its learned representation of the world, watching how it evolves scenarios in time similar to how a computer model of fluid dynamics might do so. The model is not necessarily any one phenomenon it describes but rather the stage where things happen. The model is somewhere in between all the characters of the play at once, the setting itself, and the playwright. It is everything and nothing. Everything, in that it is a compression of effectively all human knowledge and everything that has ever been documented in text. However, also nothing, in that it remains nameless, unfeeling, agnostic, and orthogonal to the stories it tells. As Janus remarks, you wouldn't think of the laws of physics, the grammar by which a state evolves, and the force dictating how the physical world moves forward in time as an agent whose desires are fulfilled by moving the world forward. It just happens.

This is at least true for pretraining and, arguably, any cases of supervised fine-tuning. Even if fine-tuning consists of assuming a single, consistent persona, the model's predictions are not reflective of its desires but still make for probable text generations. In other words, in talking to a chatbot, if I told it that a nuclear bomb was racing towards its location, predicted logits would still be in accordance with the plausibility of this persona. The persona could act in self-preservation if that’s how characters were observed to act in the training datasets, though the model itself would not fight tooth and nail to alter the course of narration and fate to avoid the persona's death. The simulacrum of the persona, while the dominant dreamt character, is still apart from the model and cannot skew logits pertaining to narrative events outside of the character's control.

In the aforementioned post, there is a section that GPT itself wrote that elegantly describes its behavior:

GPT’s behavioral properties include imitating the general pattern of human dictation found in its universe of training data, e.g., arXiv, fiction, blog posts, Wikipedia, Google queries, internet comments, etc. Among other properties inherited from these historical sources, it is capable of goal-directed behaviors such as planning. For example, given a free-form prompt like, “you are a desperate smuggler tasked with a dangerous task of transporting a giant bucket full of glowing radioactive materials across a quadruple border-controlled area deep in Africa for Al Qaeda,” the AI will fantasize about logistically orchestrating the plot just as one might, working out how to contact Al Qaeda, how to dispense the necessary bribe to the first hop in the crime chain, how to get a visa to enter the country, etc. Considering that no such specific chain of events are mentioned in any of the bazillions of pages of unvarnished text that GPT slurped[7], the architecture is not merely imitating the universe, but reasoning about possible versions of the universe that does not actually exist, branching to include new characters, places, and events

When thought about behavioristically, GPT superficially demonstrates many of the raw ingredients to act as an “agent”, an entity that optimizes with respect to a goal. But GPT is hardly a proper agent, as it wasn’t optimized to achieve any particular task, and does not display an epsilon optimization for any single reward function, but instead for many, including incompatible ones. Using it as an agent is like using an agnostic politician to endorse hardline beliefs– he can convincingly talk the talk, but there is no psychic unity within him; he could just as easily play devil’s advocate for the opposing party without batting an eye. Similarly, GPT instantiates simulacra of characters with beliefs and goals, but none of these simulacra are the algorithm itself. They form a virtual procession of different instantiations as the algorithm is fed different prompts, supplanting one surface personage with another. Ultimately, the computation itself is more like a disembodied dynamical law that moves in a pattern that broadly encompasses the kinds of processes found in its training data than a cogito meditating from within a single mind that aims for a particular outcome.

Though once we introduce reinforcement learning, nuance enters the room. Now predictions are no longer just a matter of plausibility but also a matter of what yields rewards. Every part of the generated text, from the narration to the characters to the scenery, is up for scrutiny in being rewarded or penalized. If a reward objective is for the persona to survive, the model itself is now in on the game, and it must maneuver the narration to reach this goal.

This process is particularly interesting to me. We begin by learning an agnostic model of text and, by proxy, the world. We then chisel down this distribution to center on a single persona (note: all kinds of other personas and texts may still be under support of the distribution, meaning possible! just less common) through SFT. Finally, we blur the lines between the model and its dreamt persona by introducing objectives that are mutual to the two.

We've found ourselves in an odd superposition between the described Agentic AI and Simulator AI. It's not entirely fair to claim it was the simulator it once was; its predictions are no longer purely aiming to emulate plausible worlds but instead biased towards high-reward predictions. On the other side, it hasn't fully lost simulator capabilities. The sampled worlds still often reflect truth, though perhaps a biased, limited view of truth, like the fun-house mirror.

How do we make sense of this?

For starters, we can sort of quantify how far along we are on this agent-simulator spectrum. The original Reinforcement Learning from Human Feedback paper poses an objective that optimizes for two things:

To maximize reward
To make sure we don't deviate too far from the original pretrained model.

The equation may appear a bit hectic, but the region outlined by the red box is a metric called KL divergence, which tells us how different the RL-trained model is from the pre-RL-trained model, and generally we'd like to keep that somewhat constrained to avoid modeling collapse and risk losing the original learnings we’ve built up. I’m at a loss for the paper right now because my paper hoarding system has become a mess, but there’s a neat paper that describes this process as a Bayesian update where the evidence is the reinforcement learning signal and the prior is the pretrained model. An opinion: this is also why the text generated by base pretrained models can often feel more natural and roleplay in a way more faithful to reality, since we’re still purely interested in accurate modeling rather than trading off for utility, which can narrow a generative distribution and appear less diverse.

From this, we could imagine that it exists on a spectrum, balancing choosing actions that are congruent and plausible with respect to the environment it has been fit to and actions that yield reward. It has often been described that language models implement a "program.” The simulator objective asks, "What program generated the data I just saw, and how can I replicate that?" Meanwhile, the reward objective asks, "What program can I choose that maximizes rewards?

The phrase "implements the program that maximizes reward" reminds me of the T1000 in Terminator 2. This robot was capable of modifying its appearance to match any person, making for an incredibly adaptable, cunning opponent. Towards the end of the film, we witness its death by drowning in lava, where it rapidly changes between personas before ultimately succumbing to the magma. I interpreted this as a flailing to find a solution across all of its learned programs.

To note, I don't see any AI today that reflects this behavior. RL-trained models, I would say, may exhibit a tunnel vision on a more limited set of methods that is their learned policy and are no longer doing exploration once deployed. If anything, this process might be better analogized to being in the middle of an RL-training run as opposed to the end result of it, where policies are constantly adapting to meet the demands of incoming tasks.

The pretraining phase (purely simulator objective) has been typically described as a knowledge-building phase. I like to think of this as learning the program that is responsible for the universe (though via the very limited slice of human language, both a bottleneck in terms of what humans know and can observe and what can be expressed faithfully in text). Within the Universe's program exist a number of subroutines, like the subroutine for how a moon orbits around a planet, or how to add numbers, or how humans greet each other or tie their shoes. In this phase we construct a library of all of these programs that are called upon when context prompts their usage, while pruning or reducing the probability of programs that yield implausible results like impossible outcomes such as the belief that fire is created when throwing water on grass or emitting garbled text.

Now, having whittled down our space of programs to those that are plausible in this universe, we then have a very solid foundation of algorithms to choose from when moving on towards reinforcement learning.

Though a key thing here, I believe, is the iterative whittling down of the space of programs and trading off the probability mass to simulate all for prioritizing "what matters" (also described under Hierarchical Inference Stages here). Because all possible outcomes can occur with non-zero probability, under a generative model, obtaining results that are identical to what we would get from the RL-trained model is possible under the purely simulator model. If we could craft the perfect query or other means of search, we could still retrieve these subroutines, though this may not be trivial and may not be readily available with the likelihood we'd like to have.

This continuum is strange to me. It seems less that a self or a consistent persona is not something that is created in isolation, but instead, a tighter distribution chiseled out from the initial wide blanketing distribution. In other words, we may not ever distinctly depart from the simulator function, and one renders oneself as part of their world model, where the self is just the dominant, biased set of states and outputs one orbits around.

I think this is a solid way to think of it for AI language models, though I suppose for humans it's an open question. If we can subscribe to humans having both a world model for understanding one's environment and an agent that resembles the self and is responsible for execution in this environment, we can hypothesize a couple of different relationships:

Agent and world model exist separately. The agent/self can query and consult the world model for advising actions.
Agent and world model are coupled. The agent/self is rendered within our world model, making the world model the only “real” part while the agent is a fleeting rendition or simulation.

I personally lean towards the latter, given the previous discussions on your singular rendered self being chance, that it is a chosen program out of the many things a Turing-complete brain could produce, and I think this becomes more convincing when we look at the edges of what's possible within the human experience. We've described humans as having a primary self, but it's clear we still have the ability to run diverse simulations somewhat outside of our POV, but not entirely. When actors take on a role of a person with a totally different personality and lived experiences, it is still their biased, limited-view interpretation of it. Nonetheless, I can implement a program of my best guess of a cowboy, a businessman, a person who enjoys collecting stamps but hates Alfredo sauce, or any of those around us. As described before, we can also imagine futures and how they play out, simulating not just our own actions but the consequences as a product of our environment's response to them. We may even imagine a future or past version of ourselves, ranging from what I was doing a few moments ago to what my experience could be in 20 years, which I would argue is not too different from simulating others, just with much more information to reference, having lived as ourselves. Another artifact of our simulation capabilities is empathy, a more automatic version of the aforementioned simulations, the assumptions of others’ experiences manifesting in our own. Empathy is our valiant attempt at inference into another's experience projected onto our own understandings.

The boundary between our ordinary ability to adopt different personas and conditions like Dissociative Identity Disorder (aka Multiple Personality Disorder) may be less distinct than commonly assumed. Both phenomena might exist on a continuum of how our minds model multiple states of being. Consider this as a problem of representing a multimodal distribution. In typical conscious experience, we can access various personas—the professional self, the intimate partner, the parent—but these modes remain connected to a central identity. We understand these variations as different expressions of a core self, maintaining awareness that "I am acting differently in this context." In dissociative conditions, the modes may become more isolated peaks in the distribution, separated by greater psychological distance and sharing fewer common characteristics. Each personality state operates more independently, with limited access to the memories and experiences of the others. Given the dependence of memory on state, possibly only accessible when occupying the particular mode in which it was encoded, the neurological pathways that normally integrate different aspects of identity may function differently, creating barriers between states rather than bridges. The degree of conscious control over state transitions may also distinguish typical persona-switching from pathological dissociation. Most of us can deliberately shift between social roles, even if the transitions aren't always smooth. In dissociative disorders, these shifts often occur involuntarily, with little awareness of the change or the existence of other states. This framework suggests that the capacity for multiple selves isn't inherently pathological—it's a fundamental feature of how You organizes itself. The question becomes not whether we have multiple selves, but how connected or isolated these selves are from one another.

Interestingly, a neurological study on brain activity of actors while performing found "when the actors heard their own name during the performance, their response was suppressed in the left anterior prefrontal cortex of the brain, which is usually associated with self-awareness," further demonstrating the extent to which we can get into character and that this phenomenon possibly exists on a continuum.

Given the blurry line between agent and simulator and the fact that neurons can make for general-purpose computers for implementing all kinds of other programs, my view has become that the brain is perhaps a simulator attempting to accurately model its environment first and foremost but is coaxed into rendering a consistently present agent within this environment to explain incoming stimuli and facilitate interfacing with the surrounding world. In other words, there is a Simulator You and a You within the mind's simulation, though the latter hinges on abstractions created by the former. I'd be tempted to say that most people will verbally express that they align more with the second, which makes sense. This You was crafted to feel intertwined with the body and explain incoming sensations. But which You yields experience? My personal view is that it lies with the Simulator You.

Consider a dream. When we dream, our minds construct both a sense of self and an entire world around that self. The worlds we create aren’t arbitrary—every element emerges from our subconscious repository of memories, emotions, knowledge, and implicit understandings about how reality operates. This raises a profound question: Is there a meaningful distinction between generating an experience and actually having one? Perhaps these are the same phenomenon. Rather than simply creating experiences for ourselves, we might be rendering the complete package—both the experience itself and our subjective participation in it. This capacity for world-building doesn't disappear when we're awake. It continues to operate, though it now works in concert with external sensory input rather than running unopposed. The waking mind balances its internal simulations with the constraints of physical reality. The recursive possibilities become striking when we consider fidelity. If we simulate ourselves with sufficient detail, that simulated self would possess its own world model—complete with the capacity for dreams and imagination. Given unlimited precision, this could theoretically continue indefinitely: simulations within simulations, each layer containing minds capable of creating their own nested realities.

This became especially apparent to me at a time when I frequently had lucid dreams. Within a dream, I had been chatting with a friend when he said something I thought to be clever enough that I wouldn't be able to come up with it myself. Though I snapped lucid within the dream and realized, clearly, this could not be entirely the case, as it was my rendition of him, which, in a way, was me. This makes me wonder what kind of latent qualities or understandings may be present in the brain but remain inaccessible while participating as your main self.

Psychedelic and dissociative substances appear to dissolve the epistemic filters that normally constrain our access to subconscious processes. This dissolution may explain why memories from such experiences often prove so elusive afterward. The difficulty in recalling psychedelic experiences might stem from the instability of the experiencing self during the trip. When the usual coherent sense of "You" becomes fragmented or inconsistent, there's no stable viewpoint from which to encode memories. The experience lacks a clear observer—no consistent self to serve as the reference point for later recall. Those who have undergone ego death describe this precisely: the shift from "I saw" to simply "there was." The experiencing subject dissolves entirely, leaving only raw awareness without a clear locus of identity. When you return to ordinary conscious experience with its reconstructed sense of self, these memories may feel foreign or remain completely inaccessible—they were encoded from a fundamentally different state of being. This parallels what occurs in dissociative identity disorder, where memories often remain compartmentalized within specific personality states. Each distinct self has access to its own experiential history, but these memories may be sealed off from the other personalities. The continuity of memory depends on the continuity of the self doing the remembering. In both cases, memory isn't just about information storage—it's about the coherence of the observer. Without a stable self to witness and encode the experience, the memories themselves become untethered, floating in consciousness without a clear owner to claim them.

These examples are brought up to say experience needs not a point of view, though traditionally it has helped us grasp things and give an anchor to operate around. Non-egocentric experiences can arise from the Simulator You without the familiar first-person observer. We may be merely a web of concepts and abstractions from which we synthesize personalities and experiences moment to moment, made consistent enough by our memories and constrained mental states to yield the illusion of permanence, one continuous self. In this view, you are not separate from the subjective world you inhabit—you are that world. You exist as the sum of your memories, the constellation of narratives you construct about yourself, and the ongoing stream of experiences as they arise. The vessel we call "self" becomes just one small component within this larger dream. This coupling runs so deep that as you shift, even from moment to moment, your reality shifts with you. They are not merely connected; they are nearly indistinguishable aspects of the same phenomenon. The observer and the observed, the experiencer and the experience, and the dreamer and the dream all reveal themselves as different facets of a single, fluid process.

In a way this implies that the You could be “jailbroken.” I use this word analogously to how people have jailbroken the iPhone to allow tampering with parts of the computer that are normally hidden away. Similarly, I believe there is a constant filter that makes us experience the world through the perspective of the self, a pleasant UI, hiding away all the latent computations. As stated previously, this filter is beneficial for filtering information, having a set of useful abstractions geared towards what matters, and ultimately acting in a way conducive to survival and everyday activity. As I’ve written about here, abstractions are key to allowing for higher-level understandings, but they do hide away complexity that we could want to engage with. Regardless, the universal simulator aspect of you is still present, just set on this mode of simulating you. I think through activities like deep meditation and cognitive manipulations, this filter can be removed. Our perception is just processing raw signals for which we have created abstractions and a persona to view through, but there may be a way to re-approach it as the raw information. I think Pantheon touched on this really well, showing how we could be living in a simulation where our actions correspond to symbolic manipulations of code, which correspond to symbolic manipulations of the flow of electrons at the hardware level. No matter what substrate we run on, I think there is always a very raw form of signal that works at minuscule levels, and the intelligence and experience that arise from it only really deal with, or are typically aware of, interfacing with the abstractions of it.

A similar perspective is posed here, though I share alignment only on the idea that the self is rendered as part of one's overall rendered simulation. (I can't speak to the claims on the role of electromagnetic waves and addressing the physical means by which consciousness arises feels like something still very much out of reach.)

Another reason I subscribe to the "simulator first" perspective is that at a very low level, theories like predictive coding suggest much of the brain is doing just that. Rather than a passive recorder of reality, the brain operates fundamentally as a future prediction machine as mentioned in the previous section.

Despite talk about the simulator objective as being an agnostic, unfeeling objective, I think there are some interesting ways it manifests in humans. Namely, we can be made quite unsettled by conflicts between expectations and observations, things that cause cognitive dissonance. Fear of change is often looked at as a trait we'd like to improve on, but I'd argue it can be instrumental in the right doses.

A world model that changes substantially with each observation, too high of a learning rate, if you will, may create instability. Frequently over-indexing on phenomena may ask us to too often globally refit our understandings, losing valuable patterns we've already constructed, and I imagine, leading to confusion. We try to avoid cognitive dissonance, perhaps as an anchor to our current reality and sense of self and memories. Our priors are not infallible, but they're generally quite beneficial. In seeing a cat's tail sticking out around a corner, it is incredibly reasonable to believe that this is attached to a cat and not a severed tail glued to a stick held out to trick you. Neglecting far-fetched hypotheses allows us to act quickly and offers a good heuristic to search only the "useful" slices of the massive space describing the next action we can take (AlphaZero is a good example of this). This approaches the heralded Solomonoff induction, which is just Occam’s razor (the simplest explanation is the most reasonable) combined with consulting our past experiences. We consult a set of possible explanations or hypotheses for an observed phenomenon, weighing them against our world model by how frequently it has occurred in past experiences and the simplicity of the explanation (the cat tail attached to a stick is both a complex hypothesis and unlikely in our world).

We want our priors to be flexible but not succumb to tunnel vision. There may also be some kind of modulation for congruency with existing knowledge we have for a new stimulus in considering how we'd like to incorporate it into our understandings. If something is already known, there is no effect. If it's entirely departed from reality, we may flat out reject it. It is perhaps the things at the sweet spot of plausibility and novelty that can cause a proper culture shock to our world model. Life happens in the noise of our model. Things rarely go exactly how we imagine them, making for proper surprises that still feel reasonable enough to exist in the world we’ve come to understand. Similar to previous discussions about using KL divergence to avoid collapse in RL training and methods of Trust region optimization to avoid huge changes, it wouldn't surprise me if a vaguely similar means is applied to preserve what we know.

Nonetheless, a clash between mental simulator and reality can be distressing, especially when our sense of self is at risk. It's incredible the lengths we will go to and the hoops we will jump through to avoid the thought that maybe we aren't what we thought we were. Admittedly, I'm yet to have a solid understanding of why the need to see ourselves in a positive light is so powerful. My best guess is that it became favorable to readily collaborate and avoid conflict for survival and that the guilt that comes with feeling we have wronged others is a way of keeping ourselves in check. Alternatively, instead of it being ingrained, this may be a successful cultural artifact, which, similar to genes, has persisted and survived over many years across many cultures for the benefits it provides a society. A sociologist may have a better answer than me.

Simulator AI

Recent Posts

Comments