TLDR: Proximal Supervised Fine-Tuning (PSFT) is a novel method for fine-tuning large language models (LLMs) that addresses the common issues of poor generalization and reduced exploration capacity associated with standard Supervised Fine-Tuning (SFT). Inspired by reinforcement learning techniques, PSFT introduces a ‘trust-region’ mechanism to constrain policy updates, preventing overfitting and entropy collapse. Experiments demonstrate that PSFT maintains competitive in-domain performance while significantly improving out-of-domain generalization and providing a more robust foundation for subsequent optimization stages like Reinforcement Learning and Direct Preference Optimization.
Supervised Fine-Tuning (SFT) is a common technique used to adapt large foundation models for specific tasks or domains. While efficient and straightforward, SFT often faces challenges such as poor generalization, where models lose their broader capabilities after being fine-tuned on new data. This limitation can also lead to a reduced ability for the model to explore new solutions, a phenomenon sometimes referred to as ‘entropy collapse’.
Inspired by advanced reinforcement learning (RL) algorithms like Trust-Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), researchers have introduced a new approach called Proximal Supervised Fine-Tuning (PSFT). This method aims to overcome the shortcomings of traditional SFT by incorporating a ‘trust-region’ mechanism, similar to those used in RL, to carefully control how much the model’s ‘policy’ (its decision-making process) changes during fine-tuning. By doing so, PSFT seeks to stabilize the optimization process, improve generalization, and maintain the model’s capacity for exploration.
How PSFT Works
At its core, PSFT reinterprets SFT as a specific type of policy gradient method, where the model learns from a fixed dataset of ‘correct’ actions. Building on this, PSFT introduces a ‘clipped surrogate objective’ – a mathematical formula that limits how drastically the model’s predictions can change from one training step to the next. This clipping mechanism acts like a soft boundary, ensuring that updates remain within an acceptable range. This prevents the model from making overly confident or destructive changes to its internal workings, thereby preserving its existing knowledge and general capabilities.
The paper also suggests that an initial ‘warm-up’ phase using standard SFT can further enhance PSFT’s performance, helping the model to better align with the training data before the trust-region constraints are fully applied.
Also Read:
- Decoding LLM Fine-Tuning: How Reinforcement Learning Recovers Lost Generalization
- Enhancing Large Language Model Reasoning Through Contrastive Learning and Reinforced Fine-Tuning
Key Advantages and Findings
Experiments conducted across diverse domains, including mathematical reasoning and human-value alignment, highlight several significant benefits of PSFT:
-
Improved Generalization: PSFT consistently outperforms standard SFT in out-of-domain generalization, meaning models fine-tuned with PSFT are better at handling tasks they haven’t been specifically trained on.
-
Stable Training: Unlike SFT, which can show sharp declines in entropy (indicating overfitting), PSFT maintains a smoother entropy curve throughout training. This stability prevents ‘entropy collapse’ and allows for more prolonged, effective fine-tuning.
-
Better Foundation for Post-Training: PSFT-tuned models serve as a superior starting point for subsequent optimization stages, such as Reinforcement Learning (RL) or Direct Preference Optimization (DPO). Models initialized with PSFT show greater potential for exploration and achieve better ultimate performance in these later stages.
-
Reduced Alignment Tax: In human alignment tasks, PSFT effectively reduces the ‘alignment tax’ – the trade-off where models lose general capabilities when aligned to specific human values. PSFT helps models maintain their broad abilities while still achieving alignment.
-
Robustness Across Models: PSFT demonstrates consistent improvements across different base models, showcasing its broad applicability.
The research also provides insights into the types of tokens (words or sub-words) that are most affected by PSFT’s clipping mechanism. These often include uncertain words or phrases that represent ‘long thinking patterns,’ which are crucial for complex reasoning and are gradually and smoothly learned by PSFT without disrupting the model’s general knowledge.
In conclusion, Proximal Supervised Fine-Tuning offers a promising alternative to traditional SFT, providing a more stable, generalizable, and robust method for adapting foundation models. By drawing inspiration from reinforcement learning, PSFT ensures that models not only perform well on target tasks but also retain and enhance their broader reasoning and exploration capabilities. You can read the full research paper here: Proximal Supervised Fine-Tuning.


