RAFT: Reward rAnked FineTuning - A New Approach to Generative Model Alignment
Summary
This post is an explanation of this paper:RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment.
Generative foundation models, such as Large Language Models (LLMs) and diffusion models, have revolutionized AI by achieving human-like content generation. However, they often suffer from
- Biases – Models can learn and reinforce societal biases present in the training data (e.g., gender, racial, or cultural stereotypes).
- Ethical Concerns – AI-generated content can be misused for misinformation, deepfakes, or spreading harmful narratives.
- Alignment Issues – The model’s behavior may not match human intent, leading to unintended or harmful outputs despite good intentions.
Traditionally, Reinforcement Learning from Human Feedback (RLHF) has been used to align these models, but RLHF comes with stability and efficiency challenges. To address these limitations, RAFT (Reward rAnked FineTuning) was introduced as a more stable and scalable alternative. RAFT fine-tunes models using a ranking-based approach to filter high-reward samples, allowing generative models to improve without complex reinforcement learning setups.