orbital-decay an hour ago

>Each individual prediction has its own separated state - it's not carried over from the previous one.

>they're completely stateless in their architecture

Wait, what? The basic fact it uses most recently generated tokens to predict the next one (or rather the distribution, not the point) seems to contradict that.