技術
RLHF(人類回饋強化學習)
RLHF (Reinforcement Learning from Human Feedback)
透過人類對模型回答的偏好評分,來訓練語言模型產生更有用、更安全回應的技術。
技術
RLHF (Reinforcement Learning from Human Feedback)
透過人類對模型回答的偏好評分,來訓練語言模型產生更有用、更安全回應的技術。
We use cookies
Anonymous analytics help us improve the site. You can opt out anytime. Learn more