>Each individual prediction has its own separated state - it's not carried over from the previous one.
>they're completely stateless in their architecture
Wait, what? The basic fact it uses most recently generated tokens to predict the next one (or rather the distribution, not the point) seems to contradict that.
>Each individual prediction has its own separated state - it's not carried over from the previous one.
>they're completely stateless in their architecture
Wait, what? The basic fact it uses most recently generated tokens to predict the next one (or rather the distribution, not the point) seems to contradict that.