Social Learning and Biases
- Ethan Smith
- Jun 4
- 5 min read

In our pursuit to perfect neural networks, we often look to how humans learn for reference, which has had varying degrees of success. Whether or not these inspirations lead to performant models, the research itself, both positive and negative results, is interesting, and I like to think it only adds to our knowledge compendium.
One space of human-inspired learning I am particularly interested in is how humans learn in social settings. The human, rarely if ever, learns alone. Unlike our typical methods, we don't chug through data in isolation. Imagine, for a second, if that were the case. We'd go through life seeing, reading, feeling, and hearing our environments. We'll also allow watching others behave as neural networks take in data like this as well.
However, we'd remove many places for learning we have, such as debates with our peers trying to enforce our world models upon each other, dealing with cognitive dissonance when we experience mismatches between reality and our notions of the world, assembling the perspectives of those in proximity with us, balancing others' opinions to help gauge our own, and so forth. These behaviors, I would argue, do not have well-defined analogues in neural network learning.
One could argue teacher-student learning could constitute an example of social learning, though it feels more like distillation and a single-channeled knowledge transfer. Namely, I would be interested in cases where a neural network both learns from its own data stream but is also forced to weigh its developing world model against others with distinctly different ideas. Methods like Bayesian learning do resemble the idea of balancing our priors with new incoming information, though we somewhat predefine the weights of how we balance this dynamic.
How might a neural network develop its own agency to decide what information is not worth adding to its knowledge base, writing it off as noise or invalid, and what is worth learning? I believe we'd call this bias.
You'll hear many people refer to neural networks as biased. This couldn't be further from the truth. Our models have slight biases towards solutions that generalize well and are "smooth" by dynamics of gradient descent and some other magic we still haven't fully understood yet. Though, overall, they are very close to unbiased estimators of the distribution of observed data. They approximate mirrors of the reality they were shown. When we accuse neural networks of acting in biased ways, what's really happening is that its training set was sampled in a biased fashion, and our model just absorbed that bias without its own bias. For instance, when neural networks generate images of professions with a noticeable demographic bias, you'll typically find a very similar bias performing a search on Google. Thus, it is the bias of humans selecting how we build our digital world that results in a collection of images on the internet that are not representative of reality that the neural network later digests. The distribution of images on the internet is in no way representative of how often they occur in reality, which not only hurts the chance of neural networks learning accurate depictions but also further biases ourselves as we see a distorted picture of reality.

So I hope this example drives home the point that neural networks effectively learn an unbiased reflection of their dataset (indicated by low cross entropy or FID), while humans are biased creatures. Unbiasedness seems like a desirable trait to have for fairness and accuracy in our neural networks. Though, I believe human biases exist for a reason outside of discrimination and other maladaptive behaviors. Our biases serve us to select for information that we consider valuable or coherent with our existing notions and reject noise or dissonance. This kind of intuition feels nontrivial to construct in current neural networks.
Somehow networks would need to develop their own sense of how to weigh incoming information by utility, preference, and consonance, not just by surprisal factor, predefined loss clipping, and a KL distance to another model. Perhaps reinforcement learning could account for some of this, but I'd be interested in how we could better understand this phenomenon and design training methodologies that replicate the ways we acquire knowledge through socialization and use biases to filter what influences our world models.
I have no idea how this should look, though I have a few mental models of things that could be fun to explore for social learning examples despite not having a great outlook for profound results.
Socially Cooperative Learning
A method of training where a fleet of models each see different shards of a dataset, but each model also depends on the others for filling out the gaps of what is not available in their shards.
A possible scenario:
Initialize N models of the same type (using the same initialization?)
Split the dataset into N evenly sized shards
At every training step
each model trains on a batch from its own shard and updates its weights
each model takes a step towards all of the other models by linear interpolation
Another possible scenario:
Initialize N models of the same type (using the same initialization?)
Split the dataset into N+1 evenly sized shards, one shard will be communal
At every training step
each model trains on a batch from its own shard and updates its weights
all models make a prediction on a batch from the communal shard
predictions are averaged, and this becomes the target for each model to train on as opposed to using ground truth labels of the communal set.
Cons:
This is quite expensive both in computation, loss of parallelism, and memory for storing multiple models. Would probably only try this with SSL or classifier models.
It's uncertain how much of an advantage is here compared to EMA training or post-training model soups/merging
More hyperparameters to decide how large of a step models should take towards each other, although there may be a heuristic that fits here, like weighing the step size by distance.
The hope:
Improved generalization capabilities as indicated by performance on the validation set and reaching flatter minima.
It could be interesting for decentralized/async training options where model weights exhibit a controlled divergence across devices instead of being identical copies and having the update step with other models be less frequent.
In the end, the models would be used as an ensemble, weighing all of their predictions together or averaging all model weights into a single model.

Socially Adversarial Learning
A method of training where 2 generative models face off against each other, attempting to "persuade" the opponent model towards becoming more like itself. In short, a model aims to generate the optimal dataset such that when trained on, the opponent model "agrees" more with it.
A possible scenario:
Obtain 2 fine-tuned LLMs, each fine-tuned from the same base model on different sets. It's likely important that the models have meaningful distance from each other, i.e., not equivalent solutions that are permutations of each other.
Obtain a dataset of answerable questions or prompts for the LLMs to respond to
Each LLM writes an answer to the question; the output is fed to the opponent to train on.
The reward each model gets is the cosine similarity between the opponents update direction and the difference between the two LLM weights, scaled by the update's norm
To regularize, each LLM also takes a small step towards its original weights (or KL-style regularization) to prevent collapse and stay somewhat anchored to its original self.
Cons:
This kind of RL would likely have high variance and be difficult to train
Outputs from models could end up appearing like adversarial examples rather than being meaningful. In other words, generating text that is persuasive with respect to affecting another model's weights might not necessarily make a model persuasive.
The hope:
Not as sure here! There is some hope that models can use adversarial games for continual self-improvement, but I'm not quite sure what to expect here.
Another example may be more purely in the data space rather than fussing about with model weights. For instance, two LLMs debate each other, a winner is somehow decided, and this becomes the reward and penalty to improve one's debating abilities.
Comments