参考
Model Name | Win Rate | Length |
---|---|---|
GPT-4 Turbo 📄 | 50.00% | 2049 |
Contextual AI (KTO-Mistral-PairRM) 📄 | 33.23% | 2521 |
Yi 34B Chat 📄 | 29.66% | 2123 |
Claude 3 Opus (02/29) 📄 | 29.04% | 1388 |
Claude 3 Sonnet (02/29) 📄 | 25.56% | 1420 |
GPT-4 📄 | 23.58% | 1365 |
GPT-4 0314 📄 | 22.07% | 1371 |
Mistral Medium 📄 | 21.86% | 1500 |
Mistral Large (24/02) 📄 | 21.44% | 1362 |
Mixtral 8x7B v0.1 📄 | 18.26% | 1465 |
Claude 2 📄 | 17.19% | 1069 |
Claude 📄 | 16.99% | 1082 |
Tulu 2+DPO 70B 📄 | 15.98% | 1418 |
GPT-4 0613 📄 | 15.76% | 1140 |
Claude 2.1 📄 | 15.73% | 1096 |
Mistral 7B v0.2 📄 | 14.72% | 1676 |
GPT 3.5 Turbo 0613 | 14.13% | 1328 |
LLaMA2 Chat 70B 📄 | 13.87% | 1790 |
Cohere Command 📄 | 12.90% | 1983 |
Vicuna 33B v1.3 📄 | 12.71% | 1479 |
OpenHermes-2.5-Mistral (7B) 📄 | 10.34% | 1107 |
GPT 3.5 Turbo 0301 📄 | 9.62% | 827 |
GPT 3.5 Turbo 1106 📄 | 9.18% | 796 |
Phi-2 DPO 📄 | 7.76% | 1687 |
LLaMA2 Chat 13B 📄 | 7.70% | 1513 |
Vicuna 13B v1.3 📄 | 7.14% | 1132 |
Gemma Instruct (7B) 📄 | 6.94% | 1115 |
Guanaco 65B 📄 | 6.86% | 1249 |
LLaMA 33B OASST RLHF 📄 | 6.30% | 1079 |
WizardLM 13B 📄 | 5.88% | 985 |
Vicuna 13B 📄 | 5.83% | 1037 |
Nous Hermes 13B 📄 | 5.41% | 844 |
Guanaco 33B 📄 | 5.00% | 1311 |
LLaMA2 Chat 7B 📄 | 4.96% | 1479 |
LLaMA 33B OASST SFT 📄 | 4.77% | 748 |
Vicuna 7B v1.3 📄 | 4.64% | 1110 |
Vicuna 7B 📄 | 4.16% | 1044 |
Alpaca Farm PPO Human 7B 📄 | 4.10% | 803 |
Phi-2 SFT 📄 | 3.98% | 1068 |
Guanaco 13B 📄 | 3.47% | 1774 |
Alpaca Farm PPO Sim (GPT-4) 7B 📄 | 3.45% | 511 |
Gemma Instruct (2B) 📄 | 3.40% | 1041 |
Falcon 40B Instruct 📄 | 3.34% | 662 |
Guanaco 7B 📄 | 2.88% | 1364 |
Davinci001 📄 | 2.76% | 296 |
Alpaca 7B 📄 | 2.59% | 396 |
Pythia 12B SFT 📄 | 2.58% | 913 |
Falcon 7B Instruct 📄 | 2.15% | 478 |
Pythia 12B OASST SFT 📄 | 1.79% | 726 |