github: LLMs-from-scratch/ch02/01_main-chapter-code 数据 Data sampling with a sliding window
We train LLMs to generate one word at a time, so we want to prepare the training data accordingly where the next word in a sequence represents the target to predi…