blog post | Ethansmith2000

Giving Models Their Own Sense of Taste

A friend of mine asked me, "How can we give models their own sense of taste?" Furthermore, he remarked that typically we use humans as...

Ethan Smith

Oct 12, 20256 min read

Social Learning and Biases

In our pursuit to perfect neural networks, we often look to how humans learn for reference, which has had varying degrees of success....

Ethan Smith

Jun 4, 20255 min read

Recurrent Parameterless Attention is a Consensus Algorithm

In another post, I wrote about parameterless (boneless) attention as a means of mixing information across datapoints weighted by their...

Ethan Smith

May 24, 20252 min read

The mean preference is a bad estimate of preferences.

I felt compelled to make this post after seeing yet another reinforcement learning paper for diffusion models that does spectacularly in terms of being able to fit to the reward function, but the actual results look terrible , collapse to a single style, and are otherwise kitsch. This is not at all a criticism of the author's work. They proposed a new method of reinforcement learning, and it successfully fit the objective quite well. But the images all have the same tired, f

Ethan Smith

May 18, 20256 min read

How do we tackle noisy recognition?

Something I've been thinking about a lot lately is how humans handle noisy recognition. Maybe you recognize the image above, if not you...

Ethan Smith

Apr 9, 202513 min read

Boneless Attention and Low Rank Attention Layers

I’ve seen a lot of convoluted tutorials on attention but nothing really made it click for me more as understanding as mixing a projected...

Ethan Smith

Mar 23, 20258 min read

The Need for Relative Optimizers | Hypothesis on Muon

Presently, most optimizers used in deep learning do not explicitly accommodate their updates with respect to the expected range of...

Ethan Smith

Mar 18, 202511 min read

Softmax Attention is a Fluke

Calibrated Attention Calibrated Attention NanoGPT Attention is the magic ingredient of modern neural networks. It is the core of what has...

Ethan Smith

Mar 13, 202510 min read

Discrete Diffusion Sudoku and Diffusion Lore

A short attempt at a small portion of the diffusion Family Tree https://www.canva.com/design/DAGgnVB3x2s/b52Y3Kg-frWdRlPzI3_5pA/edit?utm_...

Ethan Smith

Mar 3, 20255 min read

How I like to think about diffusion

It's a bit hard to see in the diagram but in addition to being convolved with a gaussian, these points are also drifting towards zero....

Ethan Smith

Jan 26, 20254 min read

Classifier free guidance and reinforcement learning

https://sweet-hall-e72.notion.site/Classifier-Free-Guidance-to-Approximate-RL-9f78c02801c6434da61f37c8d843c5bf

Ethan Smith

Jan 26, 20251 min read

Why are Modern Neural Nets the way they are? And Hidden Hypernetworks.

https://sweet-hall-e72.notion.site/Why-are-Modern-Neural-Nets-the-way-they-are-And-Hidden-Hypernetworks-6c7195709e7b4abbada921875a951c54

Ethan Smith

Oct 6, 20241 min read

Do Diffusion Transformers Deserve The Hype?

https://sweet-hall-e72.notion.site/Do-Diffusion-Transformers-Deserve-The-Hype-9b9ca7bead374b47aac96558714c203b

Ethan Smith

Jul 28, 20241 min read

Automated LoRA Discovery and Teaching Neural Networks to make Neural Networks

https://sweet-hall-e72.notion.site/Automated-LoRA-Discovery-and-Teaching-Neural-Networks-to-make-Neural-Networks-22aa3b5ad66e4bc985ff2c93...

Ethan Smith

May 26, 20241 min read

Diffusion and Autoregressive Models for Learning to Solve Mazes

https://sweet-hall-e72.notion.site/Diffusion-and-Autoregressive-Models-for-Learning-to-Solve-Mazes-c3bc4bcdfa304ecd9531ee5445a4da66

Ethan Smith

May 21, 20241 min read

Traversing through CLIP Space, PCA and Latent Directions

https://sweet-hall-e72.notion.site/Traversing-through-CLIP-Space-PCA-and-Latent-Directions-b898932e13684d58957405b4a2747a79

Ethan Smith

May 6, 20241 min read

Learning Space Filling Curves with Autoencoders

https://sweet-hall-e72.notion.site/Learning-Space-Filling-Curves-with-Autoencoders-e39e41ce75894c3a8fecfee0f3bbfb23?pvs=4

Ethan Smith

Apr 14, 20241 min read

Mimicking Diffusion Models by Sequencing Frequency Coefficients

https://sweet-hall-e72.notion.site/Mimicking-Diffusion-Models-by-Sequencing-Frequency-Coefficients-8e5a60e876d640c390369627d55330b1

Ethan Smith

Mar 13, 20241 min read

ContrastiveDPO for Diffusion, Generalizing DPO to multiple items

https://sweet-hall-e72.notion.site/ContrastiveDPO-for-Diffusion-Generalizing-DPO-to-multiple-items-PART1-226b3746aa4d4ff9995d1e26b38a9674

Ethan Smith

Mar 8, 20241 min read

Dipole Attention: Opposites May Be Deep Connections

Image from: https://twitter.com/toshi2fly/status/911306344376012800 Post: https://sweet-hall-e72.notion.site/Dipole-Attention-Opposites-M...

Ethan Smith

Mar 5, 20241 min read