技术
RLHF(人类反馈强化学习)
RLHF (Reinforcement Learning from Human Feedback)
通过人类对模型回答的偏好评分,训练语言模型产生更有用、更安全回应的技术。
技术
RLHF (Reinforcement Learning from Human Feedback)
通过人类对模型回答的偏好评分,训练语言模型产生更有用、更安全回应的技术。
We use cookies
Anonymous analytics help us improve the site. You can opt out anytime. Learn more