What is Preference Data?
Preference Data is training data that records which model responses are preferred, ranked, rejected, or rated for the same prompt or task.
How It Works
Preference data tells an alignment method what better behavior looks like when multiple answers are possible. It may come from human annotators, expert reviewers, user feedback, AI-assisted labeling, or synthetic comparisons. Unlike SFT data, which provides a target answer, preference data compares alternatives and can capture qualities such as helpfulness, factuality, safety, tone, completeness, and refusal behavior. Its reliability depends on clear labeling guidelines, representative prompts, annotator agreement, and bias control.
Key Characteristics
- Compares alternative responses rather than only providing a single target answer
- Can be represented as chosen-rejected pairs, rankings, ratings, or critiques
- Used by RLHF, reward modeling, DPO, ORPO, KTO, and related methods
- Sensitive to annotator bias, prompt distribution, and guideline ambiguity
- Requires quality control because noisy preferences can train the wrong behavior
Common Use Cases
- Training a reward model for RLHF
- Creating chosen-rejected pairs for DPO
- Capturing expert preferences for domain assistants
- Filtering or weighting model responses by human feedback
- Evaluating whether a model's style matches product expectations
Example
Loading code...Frequently Asked Questions
How is preference data different from SFT data?
SFT data provides a target response. Preference data compares responses and indicates which one is better under a guideline.
Can preference data be synthetic?
Yes, but synthetic preferences should be validated carefully because they may reflect the judging model's biases and blind spots.
What makes preference data high quality?
Clear rubrics, representative prompts, expert review, annotator agreement checks, and strong filtering all matter.
Why does preference data matter for alignment?
It encodes tradeoffs that are hard to express as one correct answer, such as helpfulness, safety, tone, and factual support.