What is SFT?
SFT is a supervised training stage that fine-tunes a pretrained language model on curated prompt-response examples.
Quick Facts
| Full Name | Supervised Fine-Tuning |
|---|
How It Works
SFT is often the first alignment step after pretraining. It teaches a model how to follow instructions, answer in specific formats, use domain language, or behave consistently for target tasks. The quality of SFT depends more on data quality, coverage, and formatting discipline than on raw dataset size. In production, SFT is useful when prompting alone cannot reliably enforce behavior, but it can also overfit, reduce general capability, or amplify errors if examples are noisy.
Key Characteristics
- Uses labeled input-output examples rather than preference comparisons
- Adapts a pretrained model toward target tasks, styles, and response formats
- Highly sensitive to dataset quality, deduplication, and instruction clarity
- Often precedes preference optimization methods such as RLHF or DPO
- Can be implemented as full fine-tuning or parameter-efficient fine-tuning
Common Use Cases
- Teaching a model to follow internal support-answer formats
- Adapting an LLM to domain terminology and workflows
- Creating a task-specific assistant from curated examples
- Preparing a base model before preference optimization
- Improving structured outputs when prompting is not stable enough
Example
Loading code...Frequently Asked Questions
How is SFT different from pretraining?
Pretraining learns broad language patterns from large corpora. SFT uses curated examples to teach task behavior and instruction following.
Is more SFT data always better?
No. Low-quality or inconsistent examples can harm behavior. A smaller high-quality dataset often beats a larger noisy one.
Does SFT replace prompting?
No. SFT changes model behavior, while prompting still controls task context, constraints, and runtime instructions.
When should SFT be used?
Use it when repeated prompting cannot reliably produce the required style, schema, domain behavior, or task performance.