|
Transformers are clever. They're what powers Large Language Models, and despite how powerful they may appear, they have some limitations.
A key limitation is that they're feedforward and not recurrent. There are certain computations that require recurrence-- Bayesian updating and simulating automata, for instance. So how do transformers do both of these things? For Bayesian updating, they use a spectral decomposition method to turn a recurrent computation into something that's essentially feedforward. For simulating automata, they use a Krohn-Rhodes decomposition. You may not know what these tricks are, but trust me-- they're clever mathematical tricks. So why wouldn't the transformer use another clever mathematical trick when looking at video? If you want to understand objects rotating and translating, a powerful technique to understand what the object is separate from the orientation and position of the object is to calculate the bispectrum, a form of an autocorrelation from group theory. Why wouldn't transformer activations correlate with the bispectrum, too?
0 Comments
|
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
August 2025
Categories |