Why SFT over LLaMA3-base Results in Repeated Conversations During the Reasoning Process Until Max Token is Reached?
I attempted to create a custom instruct version using Supervised Fine-Tuning (SFT) based on the LLaMA3-base model. During the process, I masked the target utterances as desired and computed the loss on these sentences. However, after three epochs of training, I observed that the training loss had decreased to a very low level (0.5~0.6), while the evaluation loss was approximately twice the training loss (with the best model selected before the evaluation loss began to rise).
When performing inference on the test dataset, the model generated a reasonable first response but then repeated a particular sentence continuously until the max token limit was reached.
Troubleshooting Attempts:
- Increased the repetition penalty to 1.2 — the repetition issue still persisted.
- Validated the training process by running tests on my own training dataset — in this case, the model's generation stopped naturally without repetition.
I would appreciate any insights into the cause of this behavior and suggestions for potential solutions.