Metric
HumanEval
OpenAI's coding benchmark of 164 hand-written Python problems where models are scored by whether their generated code passes hidden unit tests (pass@k).
Metric
OpenAI's coding benchmark of 164 hand-written Python problems where models are scored by whether their generated code passes hidden unit tests (pass@k).
We use cookies
Anonymous analytics help us improve the site. You can opt out anytime. Learn more