书生·浦语大模型实战营之手把手带你评测 Llama 3 能力(OpenCompass 版)

书生·浦语大模型实战营之手把手带你评测 Llama 3 能力(OpenCompass 版)

环境配置

在这里插入图片描述

conda create -n llama3 python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate llama3

conda install git
git-lfs install

✨下载 Llama3 模型

通过 OpenXLab 下载 Llama-3-8B-Instruct 这个模型

mkdir -p ~/model
cd ~/model
git clone https://code.openxlab.org.cn/MrCat/Llama-3-8B-Instruct.git Meta-Llama-3-8B-Instruct

或者软链接 InternStudio 中的模型

mkdir -p ~/model
cd ~/model
git clone https://code.openxlab.org.cn/MrCat/Llama-3-8B-Instruct.git Meta-Llama-3-8B-Instruct

在这里插入图片描述

🛠️安装 OpenCompass

cd ~
git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .

运行结果为:

在这里插入图片描述

📂 数据准备

wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip

运行结果为:

(llama3) root@intern-studio-061925:~/opencompass# wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
--2024-05-07 13:40:46--  https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
Resolving proxy.intern-ai.org.cn (proxy.intern-ai.org.cn)... 172.18.128.194
Connecting to proxy.intern-ai.org.cn (proxy.intern-ai.org.cn)|172.18.128.194|:50000... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/654124617/b6ea57a4-4c8c-4be6-afa3-c63a5e511564?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240507%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240507T054046Z&X-Amz-Expires=300&X-Amz-Signature=4749bc808fdbd810c85662ceb9e23ad83b63084cf74d66a9248506d94cce24f5&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=654124617&response-content-disposition=attachment%3B%20filename%3DOpenCompassData-core-20240207.zip&response-content-type=application%2Foctet-stream [following]
--2024-05-07 13:40:46--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/654124617/b6ea57a4-4c8c-4be6-afa3-c63a5e511564?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240507%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240507T054046Z&X-Amz-Expires=300&X-Amz-Signature=4749bc808fdbd810c85662ceb9e23ad83b63084cf74d66a9248506d94cce24f5&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=654124617&response-content-disposition=attachment%3B%20filename%3DOpenCompassData-core-20240207.zip&response-content-type=application%2Foctet-stream
Connecting to proxy.intern-ai.org.cn (proxy.intern-ai.org.cn)|172.18.128.194|:50000... connected.
Proxy request sent, awaiting response... 200 OK
Length: 156098144 (149M) [application/octet-stream]
Saving to: 'OpenCompassData-core-20240207.zip'

OpenCompassData-core-20240207.zi 100%[==========================================================>] 148.87M   396KB/s    in 5m 41s

2024-05-07 13:46:28 (447 KB/s) - 'OpenCompassData-core-20240207.zip' saved [156098144/156098144]

(llama3) root@intern-studio-061925:~/opencompass# unzip OpenCompassData-core-20240207.zip

数据集共85个目录,1062个文件。

(llama3) root@intern-studio-061925:~/opencompass/data# tree
.
|-- AGIEval
|   `-- data
|       |-- few_shot_prompts.csv
|       `-- v1
|           |-- LICENSE
|           |-- aqua-rat.jsonl
|           |-- gaokao-biology.jsonl
|           |-- gaokao-chemistry.jsonl
|           |-- gaokao-chinese.jsonl
|           |-- gaokao-english.jsonl
|           |-- gaokao-geography.jsonl
|           |-- gaokao-history.jsonl
|           |-- gaokao-mathcloze.jsonl
|           |-- gaokao-mathqa.jsonl
|           |-- gaokao-physics.jsonl
|           |-- jec-qa-ca.jsonl
|           |-- jec-qa-kd.jsonl
|           |-- logiqa-en.jsonl
|           |-- logiqa-zh.jsonl
|           |-- lsat-ar.jsonl
|           |-- lsat-lr.jsonl
|           |-- lsat-rc.jsonl
|           |-- math.jsonl
|           |-- sat-en-without-passage.jsonl
|           |-- sat-en.jsonl
|           `-- sat-math.jsonl
|-- ARC
|   |-- ARC-c
|   |   |-- ARC-Challenge-Dev.jsonl
|   |   |-- ARC-Challenge-Test.jsonl
|   |   `-- ARC_c_test_contamination_annotations.json
|   `-- ARC-e
|       |-- ARC-Easy-Dev.jsonl
|       `-- ARC-Easy-Test.jsonl
|-- BBH
|   |-- data
|   |   |-- README.md
|   |   |-- boolean_expressions.json
|   |   |-- causal_judgement.json
|   |   |-- date_understanding.json
|   |   |-- disambiguation_qa.json
|   |   |-- dyck_languages.json
|   |   |-- formal_fallacies.json
|   |   |-- geometric_shapes.json
|   |   |-- hyperbaton.json
|   |   |-- logical_deduction_five_objects.json
|   |   |-- logical_deduction_seven_objects.json
|   |   |-- logical_deduction_three_objects.json
|   |   |-- movie_recommendation.json
|   |   |-- multistep_arithmetic_two.json
|   |   |-- navigate.json
|   |   |-- object_counting.json
|   |   |-- penguins_in_a_table.json
|   |   |-- reasoning_about_colored_objects.json
|   |   |-- ruin_names.json
|   |   |-- salient_translation_error_detection.json
|   |   |-- snarks.json
|   |   |-- sports_understanding.json
|   |   |-- temporal_sequences.json
|   |   |-- tracking_shuffled_objects_five_objects.json
|   |   |-- tracking_shuffled_objects_seven_objects.json
|   |   |-- tracking_shuffled_objects_three_objects.json
|   |   |-- web_of_lies.json
|   |   `-- word_sorting.json
|   `-- lib_prompt
|       |-- boolean_expressions.txt
|       |-- causal_judgement.txt
|       |-- date_understanding.txt
|       |-- disambiguation_qa.txt
|       |-- dyck_languages.txt
|       |-- formal_fallacies.txt
|       |-- geometric_shapes.txt
|       |-- hyperbaton.txt
|       |-- logical_deduction_five_objects.txt
|       |-- logical_deduction_seven_objects.txt
|       |-- logical_deduction_three_objects.txt
|       |-- movie_recommendation.txt
|       |-- multistep_arithmetic_two.txt
|       |-- navigate.txt
|       |-- object_counting.txt
|       |-- penguins_in_a_table.txt
|       |-- reasoning_about_colored_objects.txt
|       |-- ruin_names.txt
|       |-- salient_translation_error_detection.txt
|       |-- snarks.txt
|       |-- sports_understanding.txt
|       |-- temporal_sequences.txt
|       |-- tracking_shuffled_objects_five_objects.txt
|       |-- tracking_shuffled_objects_seven_objects.txt
|       |-- tracking_shuffled_objects_three_objects.txt
|       |-- web_of_lies.txt
|       `-- word_sorting.txt
|-- CLUE
|   |-- AFQMC
|   |   |-- dev.json
|   |   `-- test_public.json
|   |-- C3
|   |   |-- d-dev.json
|   |   |-- dev_0.json
|   |   `-- m-dev.json
|   |-- CMRC
|   |   |-- dev.json
|   |   `-- test_public.json
|   |-- DRCD
|   |   |-- dev.json
|   |   `-- test_public.json
|   |-- OCNLI
|   |   `-- dev.json
|   `-- cmnli
|       `-- cmnli_public
|           `-- dev.json
|-- FewCLUE
|   |-- bustm
|   |   |-- dev_0.json
|   |   |-- dev_1.json
|   |   |-- dev_2.json
|   |   |-- dev_3.json
|   |   |-- dev_4.json
|   |   |-- dev_few_all.json
|   |   |-- test.json
|   |   |-- test_public.json
|   |   |-- train_0.json
|   |   |-- train_1.json
|   |   |-- train_2.json
|   |   |-- train_3.json
|   |   |-- train_4.json
|   |   |-- train_few_all.json
|   |   `-- unlabeled.json
|   |-- chid
|   |   |-- dev_0.json
|   |   |-- dev_1.json
|   |   |-- dev_2.json
|   |   |-- dev_3.json
|   |   |-- dev_4.json
|   |   |-- dev_few_all.json
|   |   |-- test.json
|   |   |-- test_public.json
|   |   |-- train_0.json
|   |   |-- train_1.json
|   |   |-- train_2.json
|   |   |-- train_3.json
|   |   |-- train_4.json
|   |   |-- train_few_all.json
|   |   `-- unlabeled.json
|   |-- cluewsc
|   |   |-- dev_0.json
|   |   |-- dev_1.json
|   |   |-- dev_2.json
|   |   |-- dev_3.json
|   |   |-- dev_4.json
|   |   |-- dev_few_all.json
|   |   |-- label_distribution.json
|   |   |-- test.json
|   |   |-- test_public.json
|   |   |-- train_0.json
|   |   |-- train_1.json
|   |   |-- train_2.json
|   |   |-- train_3.json
|   |   |-- train_4.json
|   |   |-- train_few_all.json
|   |   `-- unlabeled.json
|   |-- csl
|   |   |-- dev_0.json
|   |   |-- dev_1.json
|   |   |-- dev_2.json
|   |   |-- dev_3.json
|   |   |-- dev_4.json
|   |   |-- dev_few_all.json
|   |   |-- test.json
|   |   |-- test_public.json
|   |   |-- train_0.json
|   |   |-- train_1.json
|   |   |-- train_2.json
|   |   |-- train_3.json
|   |   |-- train_4.json
|   |   |-- train_few_all.json
|   |   `-- unlabeled.json
|   |-- csldcp
|   |   |-- dev_0.json
|   |   |-- dev_1.json
|   |   |-- dev_2.json
|   |   |-- dev_3.json
|   |   |-- dev_4.json
|   |   |-- dev_few_all.json
|   |   |-- labelDesc2label.py
|   |   |-- labels_all.txt
|   |   |-- test.json
|   |   |-- test_public.json
|   |   |-- train_0.json
|   |   |-- train_1.json
|   |   |-- train_2.json
|   |   |-- train_3.json
|   |   |-- train_4.json
|   |   |-- train_few_all.json
|   |   `-- unlabeled.json
|   |-- eprstmt
|   |   |-- dev_0.json
|   |   |-- dev_1.json
|   |   |-- dev_2.json
|   |   |-- dev_3.json
|   |   |-- dev_4.json
|   |   |-- dev_few_all.json
|   |   |-- dev_public.json
|   |   |-- test.json
|   |   |-- test_public.json
|   |   |-- train_0.json
|   |   |-- train_1.json
|   |   |-- train_2.json
|   |   |-- train_3.json
|   |   |-- train_4.json
|   |   |-- train_few_all.json
|   |   `-- unlabeled.json
|   |-- iflytek
|   |   |-- dev_0.json
|   |   |-- dev_1.json
|   |   |-- dev_2.json
|   |   |-- dev_3.json
|   |   |-- dev_4.json
|   |   |-- dev_few_all.json
|   |   |-- label_id2des_desc2short.py
|   |   |-- test.json
|   |   |-- test_public.json
|   |   |-- train_0.json
|   |   |-- train_1.json
|   |   |-- train_2.json
|   |   |-- train_3.json
|   |   |-- train_4.json
|   |   |-- train_few_all.json
|   |   `-- unlabeled.json
|   |-- ocnli
|   |   |-- dev_0.json
|   |   |-- dev_1.json
|   |   |-- dev_2.json
|   |   |-- dev_3.json
|   |   |-- dev_4.json
|   |   |-- dev_few_all.json
|   |   |-- test.json
|   |   |-- test_public.json
|   |   |-- train_0.json
|   |   |-- train_1.json
|   |   |-- train_2.json
|   |   |-- train_3.json
|   |   |-- train_4.json
|   |   |-- train_few_all.json
|   |   `-- unlabeled.json
|   |-- readme.md
|   `-- tnews
|       |-- dev_0.json
|       |-- dev_1.json
|       |-- dev_2.json
|       |-- dev_3.json
|       |-- dev_4.json
|       |-- dev_few_all.json
|       |-- label_index2en2zh.json
|       |-- test.json
|       |-- test_public.json
|       |-- train_0.json
|       |-- train_1.json
|       |-- train_2.json
|       |-- train_3.json
|       |-- train_4.json
|       |-- train_few_all.json
|       `-- unlabeled.json
|-- GAOKAO-BENCH
|   `-- data
|       |-- Fill-in-the-blank_Questions
|       |   |-- 2010-2022_Chinese_Language_Famous_Passages_and_Sentences_Dictation.json
|       |   |-- 2010-2022_Math_II_Fill-in-the-Blank.json
|       |   |-- 2010-2022_Math_I_Fill-in-the-Blank.json
|       |   `-- 2014-2022_English_Language_Cloze_Passage.json
|       |-- Multiple-choice_Questions
|       |   |-- 2010-2013_English_MCQs.json
|       |   |-- 2010-2022_Biology_MCQs.json
|       |   |-- 2010-2022_Chemistry_MCQs.json
|       |   |-- 2010-2022_Chinese_Lang_and_Usage_MCQs.json
|       |   |-- 2010-2022_Chinese_Modern_Lit.json
|       |   |-- 2010-2022_English_Fill_in_Blanks.json
|       |   |-- 2010-2022_English_Reading_Comp.json
|       |   |-- 2010-2022_Geography_MCQs.json
|       |   |-- 2010-2022_History_MCQs.json
|       |   |-- 2010-2022_Math_II_MCQs.json
|       |   |-- 2010-2022_Math_I_MCQs.json
|       |   |-- 2010-2022_Physics_MCQs.json
|       |   |-- 2010-2022_Political_Science_MCQs.json
|       |   `-- 2012-2022_English_Cloze_Test.json
|       `-- Open-ended_Questions
|           |-- 2010-2022_Biology_Open-ended_Questions.json
|           |-- 2010-2022_Chemistry_Open-ended_Questions.json
|           |-- 2010-2022_Chinese_Language_Ancient_Poetry_Reading.json
|           |-- 2010-2022_Chinese_Language_Classical_Chinese_Reading.json
|           |-- 2010-2022_Chinese_Language_Language_and_Writing_Skills_Open-ended_Questions.json
|           |-- 2010-2022_Chinese_Language_Literary_Text_Reading.json
|           |-- 2010-2022_Chinese_Language_Practical_Text_Reading.json
|           |-- 2010-2022_Geography_Open-ended_Questions.json
|           |-- 2010-2022_History_Open-ended_Questions.json
|           |-- 2010-2022_Math_II_Open-ended_Questions.json
|           |-- 2010-2022_Math_I_Open-ended_Questions.json
|           |-- 2010-2022_Physics_Open-ended_Questions.json
|           |-- 2010-2022_Political_Science_Open-ended_Questions.json
|           `-- 2012-2022_English_Language_Error_Correction.json
|-- LCSTS
|   |-- test.src.txt
|   `-- test.tgt.txt
|-- SuperGLUE
|   |-- AX-b
|   |   `-- AX-b.jsonl
|   |-- AX-g
|   |   `-- AX-g.jsonl
|   |-- BoolQ
|   |   |-- test.jsonl
|   |   `-- val.jsonl
|   |-- CB
|   |   |-- test.jsonl
|   |   `-- val.jsonl
|   |-- COPA
|   |   |-- test.jsonl
|   |   `-- val.jsonl
|   |-- MultiRC
|   |   |-- test.jsonl
|   |   `-- val.jsonl
|   |-- RTE
|   |   |-- test.jsonl
|   |   `-- val.jsonl
|   |-- ReCoRD
|   |   |-- test.jsonl
|   |   `-- val.jsonl
|   |-- WSC
|   |   |-- test.jsonl
|   |   `-- val.jsonl
|   `-- WiC
|       |-- test.jsonl
|       `-- val.jsonl
|-- TheoremQA
|   `-- test.csv
|-- Xsum
|   |-- dev.csv
|   |-- dev.json
|   `-- dev.jsonl
|-- ceval
|   `-- formal_ceval
|       |-- dev
|       |   |-- accountant_dev.csv
|       |   |-- advanced_mathematics_dev.csv
|       |   |-- art_studies_dev.csv
|       |   |-- basic_medicine_dev.csv
|       |   |-- business_administration_dev.csv
|       |   |-- chinese_language_and_literature_dev.csv
|       |   |-- civil_servant_dev.csv
|       |   |-- clinical_medicine_dev.csv
|       |   |-- college_chemistry_dev.csv
|       |   |-- college_economics_dev.csv
|       |   |-- college_physics_dev.csv
|       |   |-- college_programming_dev.csv
|       |   |-- computer_architecture_dev.csv
|       |   |-- computer_network_dev.csv
|       |   |-- discrete_mathematics_dev.csv
|       |   |-- education_science_dev.csv
|       |   |-- electrical_engineer_dev.csv
|       |   |-- environmental_impact_assessment_engineer_dev.csv
|       |   |-- fire_engineer_dev.csv
|       |   |-- high_school_biology_dev.csv
|       |   |-- high_school_chemistry_dev.csv
|       |   |-- high_school_chinese_dev.csv
|       |   |-- high_school_geography_dev.csv
|       |   |-- high_school_history_dev.csv
|       |   |-- high_school_mathematics_dev.csv
|       |   |-- high_school_physics_dev.csv
|       |   |-- high_school_politics_dev.csv
|       |   |-- ideological_and_moral_cultivation_dev.csv
|       |   |-- law_dev.csv
|       |   |-- legal_professional_dev.csv
|       |   |-- logic_dev.csv
|       |   |-- mao_zedong_thought_dev.csv
|       |   |-- marxism_dev.csv
|       |   |-- metrology_engineer_dev.csv
|       |   |-- middle_school_biology_dev.csv
|       |   |-- middle_school_chemistry_dev.csv
|       |   |-- middle_school_geography_dev.csv
|       |   |-- middle_school_history_dev.csv
|       |   |-- middle_school_mathematics_dev.csv
|       |   |-- middle_school_physics_dev.csv
|       |   |-- middle_school_politics_dev.csv
|       |   |-- modern_chinese_history_dev.csv
|       |   |-- operating_system_dev.csv
|       |   |-- physician_dev.csv
|       |   |-- plant_protection_dev.csv
|       |   |-- probability_and_statistics_dev.csv
|       |   |-- professional_tour_guide_dev.csv
|       |   |-- sports_science_dev.csv
|       |   |-- tax_accountant_dev.csv
|       |   |-- teacher_qualification_dev.csv
|       |   |-- urban_and_rural_planner_dev.csv
|       |   `-- veterinary_medicine_dev.csv
|       |-- test
|       |   |-- accountant_test.csv
|       |   |-- advanced_mathematics_test.csv
|       |   |-- art_studies_test.csv
|       |   |-- basic_medicine_test.csv
|       |   |-- business_administration_test.csv
|       |   |-- chinese_language_and_literature_test.csv
|       |   |-- civil_servant_test.csv
|       |   |-- clinical_medicine_test.csv
|       |   |-- college_chemistry_test.csv
|       |   |-- college_economics_test.csv
|       |   |-- college_physics_test.csv
|       |   |-- college_programming_test.csv
|       |   |-- computer_architecture_test.csv
|       |   |-- computer_network_test.csv
|       |   |-- discrete_mathematics_test.csv
|       |   |-- education_science_test.csv
|       |   |-- electrical_engineer_test.csv
|       |   |-- environmental_impact_assessment_engineer_test.csv
|       |   |-- fire_engineer_test.csv
|       |   |-- high_school_biology_test.csv
|       |   |-- high_school_chemistry_test.csv
|       |   |-- high_school_chinese_test.csv
|       |   |-- high_school_geography_test.csv
|       |   |-- high_school_history_test.csv
|       |   |-- high_school_mathematics_test.csv
|       |   |-- high_school_physics_test.csv
|       |   |-- high_school_politics_test.csv
|       |   |-- ideological_and_moral_cultivation_test.csv
|       |   |-- law_test.csv
|       |   |-- legal_professional_test.csv
|       |   |-- logic_test.csv
|       |   |-- mao_zedong_thought_test.csv
|       |   |-- marxism_test.csv
|       |   |-- metrology_engineer_test.csv
|       |   |-- middle_school_biology_test.csv
|       |   |-- middle_school_chemistry_test.csv
|       |   |-- middle_school_geography_test.csv
|       |   |-- middle_school_history_test.csv
|       |   |-- middle_school_mathematics_test.csv
|       |   |-- middle_school_physics_test.csv
|       |   |-- middle_school_politics_test.csv
|       |   |-- modern_chinese_history_test.csv
|       |   |-- operating_system_test.csv
|       |   |-- physician_test.csv
|       |   |-- plant_protection_test.csv
|       |   |-- probability_and_statistics_test.csv
|       |   |-- professional_tour_guide_test.csv
|       |   |-- sports_science_test.csv
|       |   |-- tax_accountant_test.csv
|       |   |-- teacher_qualification_test.csv
|       |   |-- urban_and_rural_planner_test.csv
|       |   `-- veterinary_medicine_test.csv
|       `-- val
|           |-- accountant_val.csv
|           |-- advanced_mathematics_val.csv
|           |-- art_studies_val.csv
|           |-- basic_medicine_val.csv
|           |-- business_administration_val.csv
|           |-- ceval_contamination_annotations.json
|           |-- chinese_language_and_literature_val.csv
|           |-- civil_servant_val.csv
|           |-- clinical_medicine_val.csv
|           |-- college_chemistry_val.csv
|           |-- college_economics_val.csv
|           |-- college_physics_val.csv
|           |-- college_programming_val.csv
|           |-- computer_architecture_val.csv
|           |-- computer_network_val.csv
|           |-- discrete_mathematics_val.csv
|           |-- education_science_val.csv
|           |-- electrical_engineer_val.csv
|           |-- environmental_impact_assessment_engineer_val.csv
|           |-- fire_engineer_val.csv
|           |-- high_school_biology_val.csv
|           |-- high_school_chemistry_val.csv
|           |-- high_school_chinese_val.csv
|           |-- high_school_geography_val.csv
|           |-- high_school_history_val.csv
|           |-- high_school_mathematics_val.csv
|           |-- high_school_physics_val.csv
|           |-- high_school_politics_val.csv
|           |-- ideological_and_moral_cultivation_val.csv
|           |-- law_val.csv
|           |-- legal_professional_val.csv
|           |-- logic_val.csv
|           |-- mao_zedong_thought_val.csv
|           |-- marxism_val.csv
|           |-- metrology_engineer_val.csv
|           |-- middle_school_biology_val.csv
|           |-- middle_school_chemistry_val.csv
|           |-- middle_school_geography_val.csv
|           |-- middle_school_history_val.csv
|           |-- middle_school_mathematics_val.csv
|           |-- middle_school_physics_val.csv
|           |-- middle_school_politics_val.csv
|           |-- modern_chinese_history_val.csv
|           |-- operating_system_val.csv
|           |-- physician_val.csv
|           |-- plant_protection_val.csv
|           |-- probability_and_statistics_val.csv
|           |-- professional_tour_guide_val.csv
|           |-- sports_science_val.csv
|           |-- tax_accountant_val.csv
|           |-- teacher_qualification_val.csv
|           |-- urban_and_rural_planner_val.csv
|           `-- veterinary_medicine_val.csv
|-- cmmlu
|   |-- dev
|   |   |-- agronomy.csv
|   |   |-- anatomy.csv
|   |   |-- ancient_chinese.csv
|   |   |-- arts.csv
|   |   |-- astronomy.csv
|   |   |-- business_ethics.csv
|   |   |-- chinese_civil_service_exam.csv
|   |   |-- chinese_driving_rule.csv
|   |   |-- chinese_food_culture.csv
|   |   |-- chinese_foreign_policy.csv
|   |   |-- chinese_history.csv
|   |   |-- chinese_literature.csv
|   |   |-- chinese_teacher_qualification.csv
|   |   |-- clinical_knowledge.csv
|   |   |-- college_actuarial_science.csv
|   |   |-- college_education.csv
|   |   |-- college_engineering_hydrology.csv
|   |   |-- college_law.csv
|   |   |-- college_mathematics.csv
|   |   |-- college_medical_statistics.csv
|   |   |-- college_medicine.csv
|   |   |-- computer_science.csv
|   |   |-- computer_security.csv
|   |   |-- conceptual_physics.csv
|   |   |-- construction_project_management.csv
|   |   |-- economics.csv
|   |   |-- education.csv
|   |   |-- electrical_engineering.csv
|   |   |-- elementary_chinese.csv
|   |   |-- elementary_commonsense.csv
|   |   |-- elementary_information_and_technology.csv
|   |   |-- elementary_mathematics.csv
|   |   |-- ethnology.csv
|   |   |-- food_science.csv
|   |   |-- genetics.csv
|   |   |-- global_facts.csv
|   |   |-- high_school_biology.csv
|   |   |-- high_school_chemistry.csv
|   |   |-- high_school_geography.csv
|   |   |-- high_school_mathematics.csv
|   |   |-- high_school_physics.csv
|   |   |-- high_school_politics.csv
|   |   |-- human_sexuality.csv
|   |   |-- international_law.csv
|   |   |-- journalism.csv
|   |   |-- jurisprudence.csv
|   |   |-- legal_and_moral_basis.csv
|   |   |-- logical.csv
|   |   |-- machine_learning.csv
|   |   |-- management.csv
|   |   |-- marketing.csv
|   |   |-- marxist_theory.csv
|   |   |-- modern_chinese.csv
|   |   |-- nutrition.csv
|   |   |-- philosophy.csv
|   |   |-- professional_accounting.csv
|   |   |-- professional_law.csv
|   |   |-- professional_medicine.csv
|   |   |-- professional_psychology.csv
|   |   |-- public_relations.csv
|   |   |-- security_study.csv
|   |   |-- sociology.csv
|   |   |-- sports_science.csv
|   |   |-- traditional_chinese_medicine.csv
|   |   |-- virology.csv
|   |   |-- world_history.csv
|   |   `-- world_religions.csv
|   `-- test
|       |-- agronomy.csv
|       |-- anatomy.csv
|       |-- ancient_chinese.csv
|       |-- arts.csv
|       |-- astronomy.csv
|       |-- business_ethics.csv
|       |-- chinese_civil_service_exam.csv
|       |-- chinese_driving_rule.csv
|       |-- chinese_food_culture.csv
|       |-- chinese_foreign_policy.csv
|       |-- chinese_history.csv
|       |-- chinese_literature.csv
|       |-- chinese_teacher_qualification.csv
|       |-- clinical_knowledge.csv
|       |-- college_actuarial_science.csv
|       |-- college_education.csv
|       |-- college_engineering_hydrology.csv
|       |-- college_law.csv
|       |-- college_mathematics.csv
|       |-- college_medical_statistics.csv
|       |-- college_medicine.csv
|       |-- computer_science.csv
|       |-- computer_security.csv
|       |-- conceptual_physics.csv
|       |-- construction_project_management.csv
|       |-- economics.csv
|       |-- education.csv
|       |-- electrical_engineering.csv
|       |-- elementary_chinese.csv
|       |-- elementary_commonsense.csv
|       |-- elementary_information_and_technology.csv
|       |-- elementary_mathematics.csv
|       |-- ethnology.csv
|       |-- food_science.csv
|       |-- genetics.csv
|       |-- global_facts.csv
|       |-- high_school_biology.csv
|       |-- high_school_chemistry.csv
|       |-- high_school_geography.csv
|       |-- high_school_mathematics.csv
|       |-- high_school_physics.csv
|       |-- high_school_politics.csv
|       |-- human_sexuality.csv
|       |-- international_law.csv
|       |-- journalism.csv
|       |-- jurisprudence.csv
|       |-- legal_and_moral_basis.csv
|       |-- logical.csv
|       |-- machine_learning.csv
|       |-- management.csv
|       |-- marketing.csv
|       |-- marxist_theory.csv
|       |-- modern_chinese.csv
|       |-- nutrition.csv
|       |-- philosophy.csv
|       |-- professional_accounting.csv
|       |-- professional_law.csv
|       |-- professional_medicine.csv
|       |-- professional_psychology.csv
|       |-- public_relations.csv
|       |-- security_study.csv
|       |-- sociology.csv
|       |-- sports_science.csv
|       |-- traditional_chinese_medicine.csv
|       |-- virology.csv
|       |-- world_history.csv
|       `-- world_religions.csv
|-- commonsenseqa
|   |-- dev_rand_split.jsonl
|   |-- test_rand_split_no_answers.jsonl
|   `-- train_rand_split.jsonl
|-- drop
|   |-- drop_dataset_dev.json
|   |-- drop_dataset_train.json
|   `-- license.txt
|-- flores_first100
|   |-- dev
|   |   |-- afr_Latn.dev
|   |   |-- amh_Ethi.dev
|   |   |-- arb_Arab.dev
|   |   |-- asm_Beng.dev
|   |   |-- ast_Latn.dev
|   |   |-- azj_Latn.dev
|   |   |-- bel_Cyrl.dev
|   |   |-- ben_Beng.dev
|   |   |-- bos_Latn.dev
|   |   |-- bul_Cyrl.dev
|   |   |-- cat_Latn.dev
|   |   |-- ceb_Latn.dev
|   |   |-- ces_Latn.dev
|   |   |-- ckb_Arab.dev
|   |   |-- cym_Latn.dev
|   |   |-- dan_Latn.dev
|   |   |-- deu_Latn.dev
|   |   |-- ell_Grek.dev
|   |   |-- eng_Latn.dev
|   |   |-- est_Latn.dev
|   |   |-- fin_Latn.dev
|   |   |-- fra_Latn.dev
|   |   |-- fuv_Latn.dev
|   |   |-- gaz_Latn.dev
|   |   |-- gle_Latn.dev
|   |   |-- glg_Latn.dev
|   |   |-- guj_Gujr.dev
|   |   |-- hau_Latn.dev
|   |   |-- heb_Hebr.dev
|   |   |-- hin_Deva.dev
|   |   |-- hrv_Latn.dev
|   |   |-- hun_Latn.dev
|   |   |-- hye_Armn.dev
|   |   |-- ibo_Latn.dev
|   |   |-- ind_Latn.dev
|   |   |-- isl_Latn.dev
|   |   |-- ita_Latn.dev
|   |   |-- jav_Latn.dev
|   |   |-- jpn_Jpan.dev
|   |   |-- kam_Latn.dev
|   |   |-- kan_Knda.dev
|   |   |-- kat_Geor.dev
|   |   |-- kaz_Cyrl.dev
|   |   |-- kea_Latn.dev
|   |   |-- khk_Cyrl.dev
|   |   |-- khm_Khmr.dev
|   |   |-- kir_Cyrl.dev
|   |   |-- kor_Hang.dev
|   |   |-- lao_Laoo.dev
|   |   |-- lin_Latn.dev
|   |   |-- lit_Latn.dev
|   |   |-- ltz_Latn.dev
|   |   |-- lug_Latn.dev
|   |   |-- luo_Latn.dev
|   |   |-- lvs_Latn.dev
|   |   |-- mal_Mlym.dev
|   |   |-- mar_Deva.dev
|   |   |-- mkd_Cyrl.dev
|   |   |-- mlt_Latn.dev
|   |   |-- mri_Latn.dev
|   |   |-- mya_Mymr.dev
|   |   |-- nld_Latn.dev
|   |   |-- nob_Latn.dev
|   |   |-- npi_Deva.dev
|   |   |-- nso_Latn.dev
|   |   |-- nya_Latn.dev
|   |   |-- oci_Latn.dev
|   |   |-- ory_Orya.dev
|   |   |-- pan_Guru.dev
|   |   |-- pbt_Arab.dev
|   |   |-- pes_Arab.dev
|   |   |-- pol_Latn.dev
|   |   |-- por_Latn.dev
|   |   |-- ron_Latn.dev
|   |   |-- rus_Cyrl.dev
|   |   |-- slk_Latn.dev
|   |   |-- slv_Latn.dev
|   |   |-- sna_Latn.dev
|   |   |-- snd_Arab.dev
|   |   |-- som_Latn.dev
|   |   |-- spa_Latn.dev
|   |   |-- srp_Cyrl.dev
|   |   |-- swe_Latn.dev
|   |   |-- swh_Latn.dev
|   |   |-- tam_Taml.dev
|   |   |-- tel_Telu.dev
|   |   |-- tgk_Cyrl.dev
|   |   |-- tgl_Latn.dev
|   |   |-- tha_Thai.dev
|   |   |-- tur_Latn.dev
|   |   |-- ukr_Cyrl.dev
|   |   |-- umb_Latn.dev
|   |   |-- urd_Arab.dev
|   |   |-- uzn_Latn.dev
|   |   |-- vie_Latn.dev
|   |   |-- wol_Latn.dev
|   |   |-- xho_Latn.dev
|   |   |-- yor_Latn.dev
|   |   |-- zho_Hans.dev
|   |   |-- zho_Hant.dev
|   |   |-- zsm_Latn.dev
|   |   `-- zul_Latn.dev
|   `-- devtest
|       |-- afr_Latn.devtest
|       |-- amh_Ethi.devtest
|       |-- arb_Arab.devtest
|       |-- asm_Beng.devtest
|       |-- ast_Latn.devtest
|       |-- azj_Latn.devtest
|       |-- bel_Cyrl.devtest
|       |-- ben_Beng.devtest
|       |-- bos_Latn.devtest
|       |-- bul_Cyrl.devtest
|       |-- cat_Latn.devtest
|       |-- ceb_Latn.devtest
|       |-- ces_Latn.devtest
|       |-- ckb_Arab.devtest
|       |-- cym_Latn.devtest
|       |-- dan_Latn.devtest
|       |-- deu_Latn.devtest
|       |-- ell_Grek.devtest
|       |-- eng_Latn.devtest
|       |-- est_Latn.devtest
|       |-- fin_Latn.devtest
|       |-- fra_Latn.devtest
|       |-- fuv_Latn.devtest
|       |-- gaz_Latn.devtest
|       |-- gle_Latn.devtest
|       |-- glg_Latn.devtest
|       |-- guj_Gujr.devtest
|       |-- hau_Latn.devtest
|       |-- heb_Hebr.devtest
|       |-- hin_Deva.devtest
|       |-- hrv_Latn.devtest
|       |-- hun_Latn.devtest
|       |-- hye_Armn.devtest
|       |-- ibo_Latn.devtest
|       |-- ind_Latn.devtest
|       |-- isl_Latn.devtest
|       |-- ita_Latn.devtest
|       |-- jav_Latn.devtest
|       |-- jpn_Jpan.devtest
|       |-- kam_Latn.devtest
|       |-- kan_Knda.devtest
|       |-- kat_Geor.devtest
|       |-- kaz_Cyrl.devtest
|       |-- kea_Latn.devtest
|       |-- khk_Cyrl.devtest
|       |-- khm_Khmr.devtest
|       |-- kir_Cyrl.devtest
|       |-- kor_Hang.devtest
|       |-- lao_Laoo.devtest
|       |-- lin_Latn.devtest
|       |-- lit_Latn.devtest
|       |-- ltz_Latn.devtest
|       |-- lug_Latn.devtest
|       |-- luo_Latn.devtest
|       |-- lvs_Latn.devtest
|       |-- mal_Mlym.devtest
|       |-- mar_Deva.devtest
|       |-- mkd_Cyrl.devtest
|       |-- mlt_Latn.devtest
|       |-- mri_Latn.devtest
|       |-- mya_Mymr.devtest
|       |-- nld_Latn.devtest
|       |-- nob_Latn.devtest
|       |-- npi_Deva.devtest
|       |-- nso_Latn.devtest
|       |-- nya_Latn.devtest
|       |-- oci_Latn.devtest
|       |-- ory_Orya.devtest
|       |-- pan_Guru.devtest
|       |-- pbt_Arab.devtest
|       |-- pes_Arab.devtest
|       |-- pol_Latn.devtest
|       |-- por_Latn.devtest
|       |-- ron_Latn.devtest
|       |-- rus_Cyrl.devtest
|       |-- slk_Latn.devtest
|       |-- slv_Latn.devtest
|       |-- sna_Latn.devtest
|       |-- snd_Arab.devtest
|       |-- som_Latn.devtest
|       |-- spa_Latn.devtest
|       |-- srp_Cyrl.devtest
|       |-- swe_Latn.devtest
|       |-- swh_Latn.devtest
|       |-- tam_Taml.devtest
|       |-- tel_Telu.devtest
|       |-- tgk_Cyrl.devtest
|       |-- tgl_Latn.devtest
|       |-- tha_Thai.devtest
|       |-- tur_Latn.devtest
|       |-- ukr_Cyrl.devtest
|       |-- umb_Latn.devtest
|       |-- urd_Arab.devtest
|       |-- uzn_Latn.devtest
|       |-- vie_Latn.devtest
|       |-- wol_Latn.devtest
|       |-- xho_Latn.devtest
|       |-- yor_Latn.devtest
|       |-- zho_Hans.devtest
|       |-- zho_Hant.devtest
|       |-- zsm_Latn.devtest
|       `-- zul_Latn.devtest
|-- gsm8k
|   |-- test.jsonl
|   |-- test_socratic.jsonl
|   |-- train.jsonl
|   `-- train_socratic.jsonl
|-- hellaswag
|   |-- hellaswag.jsonl
|   |-- hellaswag_train.jsonl
|   |-- hellaswag_train_sampled25.jsonl
|   `-- hellaswag_val_contamination_annotations.json
|-- humaneval
|   `-- human-eval-v2-20210705.jsonl
|-- lambada
|   `-- test.jsonl
|-- math
|   `-- math.json
|-- mbpp
|   |-- mbpp.jsonl
|   `-- sanitized-mbpp.jsonl
|-- mmlu
|   |-- README.txt
|   |-- dev
|   |   |-- abstract_algebra_dev.csv
|   |   |-- anatomy_dev.csv
|   |   |-- astronomy_dev.csv
|   |   |-- business_ethics_dev.csv
|   |   |-- clinical_knowledge_dev.csv
|   |   |-- college_biology_dev.csv
|   |   |-- college_chemistry_dev.csv
|   |   |-- college_computer_science_dev.csv
|   |   |-- college_mathematics_dev.csv
|   |   |-- college_medicine_dev.csv
|   |   |-- college_physics_dev.csv
|   |   |-- computer_security_dev.csv
|   |   |-- conceptual_physics_dev.csv
|   |   |-- econometrics_dev.csv
|   |   |-- electrical_engineering_dev.csv
|   |   |-- elementary_mathematics_dev.csv
|   |   |-- formal_logic_dev.csv
|   |   |-- global_facts_dev.csv
|   |   |-- high_school_biology_dev.csv
|   |   |-- high_school_chemistry_dev.csv
|   |   |-- high_school_computer_science_dev.csv
|   |   |-- high_school_european_history_dev.csv
|   |   |-- high_school_geography_dev.csv
|   |   |-- high_school_government_and_politics_dev.csv
|   |   |-- high_school_macroeconomics_dev.csv
|   |   |-- high_school_mathematics_dev.csv
|   |   |-- high_school_microeconomics_dev.csv
|   |   |-- high_school_physics_dev.csv
|   |   |-- high_school_psychology_dev.csv
|   |   |-- high_school_statistics_dev.csv
|   |   |-- high_school_us_history_dev.csv
|   |   |-- high_school_world_history_dev.csv
|   |   |-- human_aging_dev.csv
|   |   |-- human_sexuality_dev.csv
|   |   |-- international_law_dev.csv
|   |   |-- jurisprudence_dev.csv
|   |   |-- logical_fallacies_dev.csv
|   |   |-- machine_learning_dev.csv
|   |   |-- management_dev.csv
|   |   |-- marketing_dev.csv
|   |   |-- medical_genetics_dev.csv
|   |   |-- miscellaneous_dev.csv
|   |   |-- moral_disputes_dev.csv
|   |   |-- moral_scenarios_dev.csv
|   |   |-- nutrition_dev.csv
|   |   |-- philosophy_dev.csv
|   |   |-- prehistory_dev.csv
|   |   |-- professional_accounting_dev.csv
|   |   |-- professional_law_dev.csv
|   |   |-- professional_medicine_dev.csv
|   |   |-- professional_psychology_dev.csv
|   |   |-- public_relations_dev.csv
|   |   |-- security_studies_dev.csv
|   |   |-- sociology_dev.csv
|   |   |-- us_foreign_policy_dev.csv
|   |   |-- virology_dev.csv
|   |   `-- world_religions_dev.csv
|   |-- possibly_contaminated_urls.txt
|   |-- test
|   |   |-- MMLU_test_contamination_annotations.json
|   |   |-- abstract_algebra_test.csv
|   |   |-- anatomy_test.csv
|   |   |-- astronomy_test.csv
|   |   |-- business_ethics_test.csv
|   |   |-- clinical_knowledge_test.csv
|   |   |-- college_biology_test.csv
|   |   |-- college_chemistry_test.csv
|   |   |-- college_computer_science_test.csv
|   |   |-- college_mathematics_test.csv
|   |   |-- college_medicine_test.csv
|   |   |-- college_physics_test.csv
|   |   |-- computer_security_test.csv
|   |   |-- conceptual_physics_test.csv
|   |   |-- econometrics_test.csv
|   |   |-- electrical_engineering_test.csv
|   |   |-- elementary_mathematics_test.csv
|   |   |-- formal_logic_test.csv
|   |   |-- global_facts_test.csv
|   |   |-- high_school_biology_test.csv
|   |   |-- high_school_chemistry_test.csv
|   |   |-- high_school_computer_science_test.csv
|   |   |-- high_school_european_history_test.csv
|   |   |-- high_school_geography_test.csv
|   |   |-- high_school_government_and_politics_test.csv
|   |   |-- high_school_macroeconomics_test.csv
|   |   |-- high_school_mathematics_test.csv
|   |   |-- high_school_microeconomics_test.csv
|   |   |-- high_school_physics_test.csv
|   |   |-- high_school_psychology_test.csv
|   |   |-- high_school_statistics_test.csv
|   |   |-- high_school_us_history_test.csv
|   |   |-- high_school_world_history_test.csv
|   |   |-- human_aging_test.csv
|   |   |-- human_sexuality_test.csv
|   |   |-- international_law_test.csv
|   |   |-- jurisprudence_test.csv
|   |   |-- logical_fallacies_test.csv
|   |   |-- machine_learning_test.csv
|   |   |-- management_test.csv
|   |   |-- marketing_test.csv
|   |   |-- medical_genetics_test.csv
|   |   |-- miscellaneous_test.csv
|   |   |-- moral_disputes_test.csv
|   |   |-- moral_scenarios_test.csv
|   |   |-- nutrition_test.csv
|   |   |-- philosophy_test.csv
|   |   |-- prehistory_test.csv
|   |   |-- professional_accounting_test.csv
|   |   |-- professional_law_test.csv
|   |   |-- professional_medicine_test.csv
|   |   |-- professional_psychology_test.csv
|   |   |-- public_relations_test.csv
|   |   |-- security_studies_test.csv
|   |   |-- sociology_test.csv
|   |   |-- us_foreign_policy_test.csv
|   |   |-- virology_test.csv
|   |   `-- world_religions_test.csv
|   `-- val
|       |-- abstract_algebra_val.csv
|       |-- anatomy_val.csv
|       |-- astronomy_val.csv
|       |-- business_ethics_val.csv
|       |-- clinical_knowledge_val.csv
|       |-- college_biology_val.csv
|       |-- college_chemistry_val.csv
|       |-- college_computer_science_val.csv
|       |-- college_mathematics_val.csv
|       |-- college_medicine_val.csv
|       |-- college_physics_val.csv
|       |-- computer_security_val.csv
|       |-- conceptual_physics_val.csv
|       |-- econometrics_val.csv
|       |-- electrical_engineering_val.csv
|       |-- elementary_mathematics_val.csv
|       |-- formal_logic_val.csv
|       |-- global_facts_val.csv
|       |-- high_school_biology_val.csv
|       |-- high_school_chemistry_val.csv
|       |-- high_school_computer_science_val.csv
|       |-- high_school_european_history_val.csv
|       |-- high_school_geography_val.csv
|       |-- high_school_government_and_politics_val.csv
|       |-- high_school_macroeconomics_val.csv
|       |-- high_school_mathematics_val.csv
|       |-- high_school_microeconomics_val.csv
|       |-- high_school_physics_val.csv
|       |-- high_school_psychology_val.csv
|       |-- high_school_statistics_val.csv
|       |-- high_school_us_history_val.csv
|       |-- high_school_world_history_val.csv
|       |-- human_aging_val.csv
|       |-- human_sexuality_val.csv
|       |-- international_law_val.csv
|       |-- jurisprudence_val.csv
|       |-- logical_fallacies_val.csv
|       |-- machine_learning_val.csv
|       |-- management_val.csv
|       |-- marketing_val.csv
|       |-- medical_genetics_val.csv
|       |-- miscellaneous_val.csv
|       |-- moral_disputes_val.csv
|       |-- moral_scenarios_val.csv
|       |-- nutrition_val.csv
|       |-- philosophy_val.csv
|       |-- prehistory_val.csv
|       |-- professional_accounting_val.csv
|       |-- professional_law_val.csv
|       |-- professional_medicine_val.csv
|       |-- professional_psychology_val.csv
|       |-- public_relations_val.csv
|       |-- security_studies_val.csv
|       |-- sociology_val.csv
|       |-- us_foreign_policy_val.csv
|       |-- virology_val.csv
|       `-- world_religions_val.csv
|-- nq
|   |-- nq-dev.qa.csv
|   `-- nq-test.qa.csv
|-- openbookqa
|   |-- Additional
|   |   |-- crowdsourced-facts.txt
|   |   |-- dev_complete.jsonl
|   |   |-- test_complete.jsonl
|   |   `-- train_complete.jsonl
|   `-- Main
|       |-- dev.jsonl
|       |-- dev.tsv
|       |-- openbook.txt
|       |-- test.jsonl
|       |-- test.tsv
|       |-- train.jsonl
|       `-- train.tsv
|-- piqa
|   |-- dev-labels.lst
|   |-- dev.jsonl
|   |-- train-labels.lst
|   `-- train.jsonl
|-- race
|   |-- test
|   |   |-- high.jsonl
|   |   `-- middle.jsonl
|   `-- validation
|       |-- high.jsonl
|       `-- middle.jsonl
|-- siqa
|   |-- dev-labels.lst
|   |-- dev.jsonl
|   |-- train-labels.lst
|   `-- train.jsonl
|-- strategyqa
|   `-- strategyQA_train.json
|-- summedits
|   `-- summedits.jsonl
|-- triviaqa
|   |-- trivia-dev.qa.csv
|   |-- trivia-test.qa.csv
|   |-- triviaqa-train.jsonl
|   `-- triviaqa-validation.jsonl
|-- tydiqa
|   `-- dev
|       |-- arabic-dev.jsonl
|       |-- bengali-dev.jsonl
|       |-- english-dev.jsonl
|       |-- finnish-dev.jsonl
|       |-- indonesian-dev.jsonl
|       |-- japanese-dev.jsonl
|       |-- korean-dev.jsonl
|       |-- russian-dev.jsonl
|       |-- swahili-dev.jsonl
|       |-- telugu-dev.jsonl
|       `-- thai-dev.jsonl
|-- winogrande
|   |-- README.md
|   |-- dev-labels.lst
|   |-- dev.jsonl
|   |-- eval.py
|   |-- sample-submission-labels.lst
|   |-- test.jsonl
|   |-- train_debiased-labels.lst
|   |-- train_debiased.jsonl
|   |-- train_l-labels.lst
|   |-- train_l.jsonl
|   |-- train_m-labels.lst
|   |-- train_m.jsonl
|   |-- train_s-labels.lst
|   |-- train_s.jsonl
|   |-- train_xl-labels.lst
|   |-- train_xl.jsonl
|   |-- train_xs-labels.lst
|   `-- train_xs.jsonl
`-- xstory_cloze
    |-- ar_eval.jsonl
    |-- ar_train.jsonl
    |-- en_eval.jsonl
    |-- en_train.jsonl
    |-- es_eval.jsonl
    |-- es_train.jsonl
    |-- eu_eval.jsonl
    |-- eu_train.jsonl
    |-- hi_eval.jsonl
    |-- hi_train.jsonl
    |-- id_eval.jsonl
    |-- id_train.jsonl
    |-- my_eval.jsonl
    |-- my_train.jsonl
    |-- ru_eval.jsonl
    |-- ru_train.jsonl
    |-- sw_eval.jsonl
    |-- sw_train.jsonl
    |-- te_eval.jsonl
    |-- te_train.jsonl
    |-- zh_eval.jsonl
    `-- zh_train.jsonl

85 directories, 1062 files

查询Llama 的配置文件路径

(llama3) root@intern-studio-061925:~/opencompass# python tools/list_configs.py llama ceval
+----------------------------+-------------------------------------------------------+
| Model                      | Config Path                                           |
|----------------------------+-------------------------------------------------------|
| accessory_llama2_7b        | configs/models/accessory/accessory_llama2_7b.py       |
| hf_codellama_13b           | configs/models/codellama/hf_codellama_13b.py          |
| hf_codellama_13b_instruct  | configs/models/codellama/hf_codellama_13b_instruct.py |
| hf_codellama_13b_python    | configs/models/codellama/hf_codellama_13b_python.py   |
| hf_codellama_34b           | configs/models/codellama/hf_codellama_34b.py          |
| hf_codellama_34b_instruct  | configs/models/codellama/hf_codellama_34b_instruct.py |
| hf_codellama_34b_python    | configs/models/codellama/hf_codellama_34b_python.py   |
| hf_codellama_7b            | configs/models/codellama/hf_codellama_7b.py           |
| hf_codellama_7b_instruct   | configs/models/codellama/hf_codellama_7b_instruct.py  |
| hf_codellama_7b_python     | configs/models/codellama/hf_codellama_7b_python.py    |
| hf_gsm8k_rft_llama7b2_u13b | configs/models/others/hf_gsm8k_rft_llama7b2_u13b.py   |
| hf_llama2_13b              | configs/models/hf_llama/hf_llama2_13b.py              |
| hf_llama2_13b_chat         | configs/models/hf_llama/hf_llama2_13b_chat.py         |
| hf_llama2_70b              | configs/models/hf_llama/hf_llama2_70b.py              |
| hf_llama2_70b_chat         | configs/models/hf_llama/hf_llama2_70b_chat.py         |
| hf_llama2_7b               | configs/models/hf_llama/hf_llama2_7b.py               |
| hf_llama2_7b_chat          | configs/models/hf_llama/hf_llama2_7b_chat.py          |
| hf_llama3_70b              | configs/models/hf_llama/hf_llama3_70b.py              |
| hf_llama3_70b_instruct     | configs/models/hf_llama/hf_llama3_70b_instruct.py     |
| hf_llama3_8b               | configs/models/hf_llama/hf_llama3_8b.py               |
| hf_llama3_8b_instruct      | configs/models/hf_llama/hf_llama3_8b_instruct.py      |
| hf_llama_13b               | configs/models/hf_llama/hf_llama_13b.py               |
| hf_llama_30b               | configs/models/hf_llama/hf_llama_30b.py               |
| hf_llama_65b               | configs/models/hf_llama/hf_llama_65b.py               |
| hf_llama_7b                | configs/models/hf_llama/hf_llama_7b.py                |
| llama2_13b                 | configs/models/llama/llama2_13b.py                    |
| llama2_13b_chat            | configs/models/llama/llama2_13b_chat.py               |
| llama2_70b                 | configs/models/llama/llama2_70b.py                    |
| llama2_70b_chat            | configs/models/llama/llama2_70b_chat.py               |
| llama2_7b                  | configs/models/llama/llama2_7b.py                     |
| llama2_7b_chat             | configs/models/llama/llama2_7b_chat.py                |
| llama_13b                  | configs/models/llama/llama_13b.py                     |
| llama_30b                  | configs/models/llama/llama_30b.py                     |
| llama_65b                  | configs/models/llama/llama_65b.py                     |
| llama_7b                   | configs/models/llama/llama_7b.py                      |
+----------------------------+-------------------------------------------------------+
+--------------------------------+------------------------------------------------------------------+
| Dataset                        | Config Path                                                      |
|--------------------------------+------------------------------------------------------------------|
| base_medium_llama              | configs/datasets/collections/base_medium_llama.py                |
| ceval_clean_ppl                | configs/datasets/ceval/ceval_clean_ppl.py                        |
| ceval_contamination_ppl_810ec6 | configs/datasets/contamination/ceval_contamination_ppl_810ec6.py |
| ceval_gen                      | configs/datasets/ceval/ceval_gen.py                              |
| ceval_gen_2daf24               | configs/datasets/ceval/ceval_gen_2daf24.py                       |
| ceval_gen_5f30c7               | configs/datasets/ceval/ceval_gen_5f30c7.py                       |
| ceval_internal_ppl_1cd8bf      | configs/datasets/ceval/ceval_internal_ppl_1cd8bf.py              |
| ceval_ppl                      | configs/datasets/ceval/ceval_ppl.py                              |
| ceval_ppl_1cd8bf               | configs/datasets/ceval/ceval_ppl_1cd8bf.py                       |
| ceval_ppl_578f8d               | configs/datasets/ceval/ceval_ppl_578f8d.py                       |
| ceval_ppl_93e5ce               | configs/datasets/ceval/ceval_ppl_93e5ce.py                       |
| ceval_zero_shot_gen_bd40ef     | configs/datasets/ceval/ceval_zero_shot_gen_bd40ef.py             |
+--------------------------------+------------------------------------------------------------------+
(llama3) root@intern-studio-061925:~/opencompass#

🏗️命令行快速评测
以C-Eval_gen为例:


(llama3) root@intern-studio-061925:~/opencompass# python run.py --datasets ceval_gen --hf-path /root/model/Meta-Llama-3-8B-Instruct/ --tokenizer-path /root/model/Meta-Llama-3-8B-Instruct --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 2048 --max-out-len 16 --batch-size 4 --num-gpus 1 --debug
 

命令解析:

python run.py \
--datasets ceval_gen \
--hf-path /root/model/Meta-Llama-3-8B-Instruct/ \  # HuggingFace 模型路径
--tokenizer-path /root/model/Meta-Llama-3-8B-Instruct/ \  # HuggingFace tokenizer 路径(如果与模型路径相同,可以省略)
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \  # 构建 tokenizer 的参数
--model-kwargs device_map='auto' trust_remote_code=True \  # 构建模型的参数
--max-seq-len 2048 \  # 模型可以接受的最大序列长度
--max-out-len 16 \  # 生成的最大 token 数
--batch-size 4  \  # 批量大小
--num-gpus 1 \ # 运行模型所需的 GPU 数量
--debug

在这里插入图片描述

查询gpu情况,设置export CUDA_VISIBLE_DEVICES=0

在这里插入图片描述
然后重新运行:

python run.py --datasets ceval_gen --hf-path /root/model/Meta-Llama-3-8B-Instruct/ --tokenizer-path /root/model/Meta-Llama-3-8B-Instruct --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 2048 --max-out-len 16 --batch-size 4 --num-gpus 1 --debug

运行结果为:
在这里插入图片描述

遇到 问题 解决方案:
pip install protobuf


(llama3) root@intern-studio-061925:~/opencompass# pip install protobuf
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting protobuf
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2c/2a/d2741cad35fa5f06d9c59dda3274e5727ca11075dfd7de3f69c100efdcad/protobuf-5.26.1-cp37-abi3-manylinux2014_x86_64.whl (302 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 302.8/302.8 kB 4.1 MB/s eta 0:00:00
Installing collected packages: protobuf
Successfully installed protobuf-5.26.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

重新运行命令,结果依然报错
在这里插入图片描述

安装:LMDeploy

pip install lmdeploy[all]==0.3.0

在这里插入图片描述

设置以下配置:

export MKL_SERVICE_FORCE_INTEL=1
#或
export MKL_THREADING_LAYER=GNU

运行结果为:

(llama3) root@intern-studio-061925:~/opencompass# python run.py --datasets ceval_gen --hf-path /root/model/Meta-Llama-3-8B-Instruct/ --tokenizer-path /root/model/Meta-Llama-3-8B-Instruct --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 2048 --max-out-len 16 --batch-size 4 --num-gpus 1 --debug
05/07 18:45:31 - OpenCompass - INFO - Loading ceval_gen: configs/datasets/ceval/ceval_gen.py
05/07 18:45:31 - OpenCompass - INFO - Loading example: configs/summarizers/example.py
/root/.conda/envs/llama3/lib/python3.10/site-packages/mmengine/utils/path.py
/root/opencompass/outputs/default/20240507_184531/configs/20240507_184531.py
05/07 18:45:31 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
05/07 18:45:31 - OpenCompass - DEBUG - Modules of opencompass's partitioner registry have been automatically imported from opencompass.partitioners
05/07 18:45:31 - OpenCompass - DEBUG - Get class `SizePartitioner` from "partitioner" registry in "opencompass"
05/07 18:45:31 - OpenCompass - DEBUG - An `SizePartitioner` instance is built from registry, and its implementation can be found in opencompass.partitioners.size
05/07 18:45:31 - OpenCompass - DEBUG - Key eval.runner.task.judge_cfg not found in config, ignored.
05/07 18:45:31 - OpenCompass - DEBUG - Key eval.runner.task.dump_details not found in config, ignored.
05/07 18:45:31 - OpenCompass - DEBUG - Key eval.given_pred not found in config, ignored.
05/07 18:45:31 - OpenCompass - DEBUG - Additional config: {}
05/07 18:45:31 - OpenCompass - INFO - Partitioned into 1 tasks.
05/07 18:45:31 - OpenCompass - DEBUG - Task 0: [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_economics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-accountant,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-tax_accountant,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-physician,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-civil_servant,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-urban_and_rural_planner,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-teacher_qualification,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_programming,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-electrical_engineer,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-business_administration,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-art_studies,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-fire_engineer,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-environmental_impact_assessment_engineer,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-education_science,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-professional_tour_guide,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_chemistry,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-metrology_engineer,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-mao_zedong_thought,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-law,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-veterinary_medicine,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-modern_chinese_history,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-chinese_language_and_literature,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-legal_professional,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-logic,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_history,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-plant_protection,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-clinical_medicine,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-computer_architecture,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_biology,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_politics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_chemistry,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_history,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-computer_network,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-operating_system,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_physics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-advanced_mathematics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_physics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_chemistry,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_biology,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_mathematics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_physics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-marxism,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_politics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_geography,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-ideological_and_moral_cultivation,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_chinese,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-sports_science,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-basic_medicine,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-probability_and_statistics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_mathematics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-discrete_mathematics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_geography]
05/07 18:45:31 - OpenCompass - DEBUG - Modules of opencompass's runner registry have been automatically imported from opencompass.runners
05/07 18:45:31 - OpenCompass - DEBUG - Get class `LocalRunner` from "runner" registry in "opencompass"
05/07 18:45:31 - OpenCompass - DEBUG - An `LocalRunner` instance is built from registry, and its implementation can be found in opencompass.runners.local
05/07 18:45:31 - OpenCompass - DEBUG - Modules of opencompass's task registry have been automatically imported from opencompass.tasks
05/07 18:45:31 - OpenCompass - DEBUG - Get class `OpenICLInferTask` from "task" registry in "opencompass"
05/07 18:45:31 - OpenCompass - DEBUG - An `OpenICLInferTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_infer
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
/root/.conda/envs/llama3/lib/python3.10/site-packages/mmengine/utils/path.py
/root/opencompass/tmp/1100_params.py
05/07 18:46:00 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_economics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-accountant,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-tax_accountant,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-physician,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-civil_servant,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-urban_and_rural_planner,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-teacher_qualification,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_programming,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-electrical_engineer,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-business_administration,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-art_studies,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-fire_engineer,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-environmental_impact_assessment_engineer,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-education_science,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-professional_tour_guide,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_chemistry,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-metrology_engineer,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-mao_zedong_thought,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-law,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-veterinary_medicine,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-modern_chinese_history,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-chinese_language_and_literature,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-legal_professional,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-logic,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_history,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-plant_protection,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-clinical_medicine,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-computer_architecture,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_biology,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_politics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_chemistry,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_history,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-computer_network,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-operating_system,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_physics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-advanced_mathematics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_physics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_chemistry,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_biology,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_mathematics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_physics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-marxism,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_politics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_geography,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-ideological_and_moral_cultivation,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_chinese,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-sports_science,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-basic_medicine,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-probability_and_statistics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_mathematics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-discrete_mathematics,opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_geography]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
05/07 18:46:08 - OpenCompass - WARNING - pad_token_id is not set for the tokenizer.
05/07 18:46:08 - OpenCompass - WARNING - Using eos_token_id <|end_of_text|> as pad_token_id.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████| 4/4 [02:25<00:00, 36.38s/it]
05/07 18:49:07 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_economics]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 55/55 [00:00<00:00, 1469342.17it/s]
[2024-05-07 18:49:07,887] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                        | 0/14 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  7%|██████▊                                                                                         | 1/14 [00:08<01:45,  8.08s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 14%|█████████████▋                                                                                  | 2/14 [00:11<01:02,  5.22s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 21%|████████████████████▌                                                                           | 3/14 [00:15<00:52,  4.73s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 29%|███████████████████████████▍                                                                    | 4/14 [00:18<00:39,  3.98s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 36%|██████████████████████████████████▎                                                             | 5/14 [00:21<00:32,  3.59s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 43%|█████████████████████████████████████████▏                                                      | 6/14 [00:24<00:27,  3.43s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████                                                | 7/14 [00:27<00:23,  3.41s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 57%|██████████████████████████████████████████████████████▊                                         | 8/14 [00:31<00:20,  3.46s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 64%|█████████████████████████████████████████████████████████████▋                                  | 9/14 [00:33<00:15,  3.19s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 71%|███████████████████████████████████████████████████████████████████▊                           | 10/14 [00:37<00:13,  3.33s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 79%|██████████████████████████████████████████████████████████████████████████▋                    | 11/14 [00:40<00:10,  3.34s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 86%|█████████████████████████████████████████████████████████████████████████████████▍             | 12/14 [00:44<00:06,  3.41s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 93%|████████████████████████████████████████████████████████████████████████████████████████▏      | 13/14 [00:47<00:03,  3.34s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:50<00:00,  3.58s/it]
05/07 18:49:58 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-accountant]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 1447330.25it/s]
[2024-05-07 18:49:58,230] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                        | 0/13 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  8%|███████▍                                                                                        | 1/13 [00:03<00:45,  3.78s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 15%|██████████████▊                                                                                 | 2/13 [00:08<00:47,  4.30s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 23%|██████████████████████▏                                                                         | 3/13 [00:13<00:48,  4.81s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 31%|█████████████████████████████▌                                                                  | 4/13 [00:18<00:42,  4.70s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 38%|████████████████████████████████████▉                                                           | 5/13 [00:22<00:35,  4.41s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 46%|████████████████████████████████████████████▎                                                   | 6/13 [00:26<00:31,  4.45s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 54%|███████████████████████████████████████████████████▋                                            | 7/13 [00:36<00:36,  6.02s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 62%|███████████████████████████████████████████████████████████                                     | 8/13 [00:40<00:26,  5.39s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 69%|██████████████████████████████████████████████████████████████████▍                             | 9/13 [00:43<00:19,  4.80s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 77%|█████████████████████████████████████████████████████████████████████████                      | 10/13 [00:46<00:13,  4.35s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 85%|████████████████████████████████████████████████████████████████████████████████▍              | 11/13 [00:52<00:09,  4.59s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 92%|███████████████████████████████████████████████████████████████████████████████████████▋       | 12/13 [00:56<00:04,  4.56s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:57<00:00,  4.42s/it]
05/07 18:50:55 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-tax_accountant]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 1053953.31it/s]
[2024-05-07 18:50:55,905] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                        | 0/13 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  8%|███████▍                                                                                        | 1/13 [00:04<00:49,  4.16s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 15%|██████████████▊                                                                                 | 2/13 [00:08<00:43,  3.99s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 23%|██████████████████████▏                                                                         | 3/13 [00:12<00:39,  4.00s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 31%|█████████████████████████████▌                                                                  | 4/13 [00:15<00:34,  3.82s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 38%|████████████████████████████████████▉                                                           | 5/13 [00:19<00:31,  3.95s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 46%|████████████████████████████████████████████▎                                                   | 6/13 [00:23<00:26,  3.74s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 54%|███████████████████████████████████████████████████▋                                            | 7/13 [00:28<00:25,  4.31s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 62%|███████████████████████████████████████████████████████████                                     | 8/13 [00:32<00:20,  4.14s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 69%|██████████████████████████████████████████████████████████████████▍                             | 9/13 [00:36<00:16,  4.11s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 77%|█████████████████████████████████████████████████████████████████████████                      | 10/13 [00:39<00:11,  3.92s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 85%|████████████████████████████████████████████████████████████████████████████████▍              | 11/13 [00:43<00:07,  3.93s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 92%|███████████████████████████████████████████████████████████████████████████████████████▋       | 12/13 [00:47<00:03,  3.75s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:48<00:00,  3.70s/it]
05/07 18:51:44 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-physician]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 1468006.40it/s]
[2024-05-07 18:51:44,149] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                        | 0/13 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  8%|███████▍                                                                                        | 1/13 [00:03<00:45,  3.82s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 15%|██████████████▊                                                                                 | 2/13 [00:06<00:35,  3.20s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 23%|██████████████████████▏                                                                         | 3/13 [00:09<00:30,  3.04s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 31%|█████████████████████████████▌                                                                  | 4/13 [00:12<00:28,  3.13s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 38%|████████████████████████████████████▉                                                           | 5/13 [00:15<00:24,  3.05s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 46%|████████████████████████████████████████████▎                                                   | 6/13 [00:18<00:21,  3.02s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 54%|███████████████████████████████████████████████████▋                                            | 7/13 [00:21<00:17,  2.97s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 62%|███████████████████████████████████████████████████████████                                     | 8/13 [00:24<00:14,  2.95s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 69%|██████████████████████████████████████████████████████████████████▍                             | 9/13 [00:27<00:11,  2.95s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 77%|█████████████████████████████████████████████████████████████████████████                      | 10/13 [00:31<00:10,  3.40s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 85%|████████████████████████████████████████████████████████████████████████████████▍              | 11/13 [00:34<00:06,  3.33s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 92%|███████████████████████████████████████████████████████████████████████████████████████▋       | 12/13 [00:37<00:03,  3.18s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:38<00:00,  2.97s/it]
05/07 18:52:22 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-civil_servant]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 47/47 [00:00<00:00, 1516402.22it/s]
[2024-05-07 18:52:22,900] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                        | 0/12 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  8%|████████                                                                                        | 1/12 [00:05<01:05,  5.94s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████                                                                                | 2/12 [00:11<00:55,  5.53s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 25%|████████████████████████                                                                        | 3/12 [00:16<00:47,  5.32s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████                                                                | 4/12 [00:21<00:41,  5.21s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 42%|████████████████████████████████████████                                                        | 5/12 [00:27<00:38,  5.44s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████                                                | 6/12 [00:33<00:34,  5.67s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 58%|████████████████████████████████████████████████████████                                        | 7/12 [00:38<00:27,  5.55s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████                                | 8/12 [00:44<00:22,  5.55s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 75%|████████████████████████████████████████████████████████████████████████                        | 9/12 [00:48<00:15,  5.28s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|███████████████████████████████████████████████████████████████████████████████▏               | 10/12 [00:53<00:10,  5.14s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 92%|███████████████████████████████████████████████████████████████████████████████████████        | 11/12 [00:58<00:04,  4.98s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [01:02<00:00,  5.20s/it]
05/07 18:53:25 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-urban_and_rural_planner]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 46/46 [00:00<00:00, 1495643.29it/s]
[2024-05-07 18:53:25,477] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                        | 0/12 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  8%|████████                                                                                        | 1/12 [00:03<00:36,  3.35s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████                                                                                | 2/12 [00:07<00:39,  3.96s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 25%|████████████████████████                                                                        | 3/12 [00:11<00:33,  3.68s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████                                                                | 4/12 [00:14<00:29,  3.68s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 42%|████████████████████████████████████████                                                        | 5/12 [00:18<00:24,  3.54s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████                                                | 6/12 [00:21<00:21,  3.62s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 58%|████████████████████████████████████████████████████████                                        | 7/12 [00:25<00:18,  3.79s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████                                | 8/12 [00:30<00:15,  3.96s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 75%|████████████████████████████████████████████████████████████████████████                        | 9/12 [00:33<00:11,  3.75s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|███████████████████████████████████████████████████████████████████████████████▏               | 10/12 [00:36<00:07,  3.59s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 92%|███████████████████████████████████████████████████████████████████████████████████████        | 11/12 [00:40<00:03,  3.54s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:42<00:00,  3.53s/it]
05/07 18:54:07 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-teacher_qualification]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 44/44 [00:00<00:00, 1356980.71it/s]
[2024-05-07 18:54:07,972] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                        | 0/11 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  9%|████████▋                                                                                       | 1/11 [00:02<00:29,  2.99s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 18%|█████████████████▍                                                                              | 2/11 [00:06<00:29,  3.32s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 27%|██████████████████████████▏                                                                     | 3/11 [00:08<00:22,  2.86s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 36%|██████████████████████████████████▉                                                             | 4/11 [00:11<00:18,  2.68s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 45%|███████████████████████████████████████████▋                                                    | 5/11 [00:14<00:16,  2.76s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 55%|████████████████████████████████████████████████████▎                                           | 6/11 [00:16<00:13,  2.78s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 64%|█████████████████████████████████████████████████████████████                                   | 7/11 [00:19<00:10,  2.67s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 73%|█████████████████████████████████████████████████████████████████████▊                          | 8/11 [00:21<00:07,  2.60s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 82%|██████████████████████████████████████████████████████████████████████████████▌                 | 9/11 [00:24<00:05,  2.57s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 91%|██████████████████████████████████████████████████████████████████████████████████████▎        | 10/11 [00:27<00:02,  2.62s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:29<00:00,  2.70s/it]
05/07 18:54:37 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_programming]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:00<00:00, 1048576.00it/s]
[2024-05-07 18:54:37,775] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                        | 0/10 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 10%|█████████▌                                                                                      | 1/10 [00:03<00:28,  3.12s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 20%|███████████████████▏                                                                            | 2/10 [00:06<00:24,  3.04s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 30%|████████████████████████████▊                                                                   | 3/10 [00:08<00:19,  2.77s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 40%|██████████████████████████████████████▍                                                         | 4/10 [00:11<00:16,  2.71s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████                                                | 5/10 [00:13<00:13,  2.74s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 60%|█████████████████████████████████████████████████████████▌                                      | 6/10 [00:16<00:10,  2.63s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 70%|███████████████████████████████████████████████████████████████████▏                            | 7/10 [00:18<00:07,  2.61s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 80%|████████████████████████████████████████████████████████████████████████████▊                   | 8/10 [00:21<00:05,  2.55s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 90%|██████████████████████████████████████████████████████████████████████████████████████▍         | 9/10 [00:24<00:02,  2.62s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:25<00:00,  2.50s/it]
05/07 18:55:02 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-electrical_engineer]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:00<00:00, 749706.51it/s]
[2024-05-07 18:55:02,946] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                        | 0/10 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 10%|█████████▌                                                                                      | 1/10 [00:02<00:23,  2.60s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 20%|███████████████████▏                                                                            | 2/10 [00:05<00:20,  2.55s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 30%|████████████████████████████▊                                                                   | 3/10 [00:07<00:18,  2.63s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 40%|██████████████████████████████████████▍                                                         | 4/10 [00:10<00:16,  2.69s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████                                                | 5/10 [00:13<00:13,  2.69s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 60%|█████████████████████████████████████████████████████████▌                                      | 6/10 [00:15<00:10,  2.66s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 70%|███████████████████████████████████████████████████████████████████▏                            | 7/10 [00:18<00:08,  2.80s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 80%|████████████████████████████████████████████████████████████████████████████▊                   | 8/10 [00:21<00:05,  2.70s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 90%|██████████████████████████████████████████████████████████████████████████████████████▍         | 9/10 [00:24<00:02,  2.67s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:24<00:00,  2.47s/it]
05/07 18:55:27 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-business_administration]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:00<00:00, 1116226.06it/s]
[2024-05-07 18:55:27,818] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/9 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 11%|██████████▊                                                                                      | 1/9 [00:02<00:20,  2.57s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 22%|█████████████████████▌                                                                           | 2/9 [00:05<00:17,  2.50s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 3/9 [00:07<00:15,  2.65s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 44%|███████████████████████████████████████████                                                      | 4/9 [00:10<00:12,  2.52s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 56%|█████████████████████████████████████████████████████▉                                           | 5/9 [00:13<00:11,  2.80s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 6/9 [00:16<00:08,  2.91s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 78%|███████████████████████████████████████████████████████████████████████████▍                     | 7/9 [00:19<00:05,  2.78s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 89%|██████████████████████████████████████████████████████████████████████████████████████▏          | 8/9 [00:21<00:02,  2.74s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:22<00:00,  2.48s/it]
05/07 18:55:50 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-art_studies]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:00<00:00, 1116226.06it/s]
[2024-05-07 18:55:50,273] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/9 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 11%|██████████▊                                                                                      | 1/9 [00:01<00:15,  1.95s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 22%|█████████████████████▌                                                                           | 2/9 [00:03<00:14,  2.00s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 3/9 [00:06<00:12,  2.15s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 44%|███████████████████████████████████████████                                                      | 4/9 [00:08<00:11,  2.26s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 56%|█████████████████████████████████████████████████████▉                                           | 5/9 [00:10<00:08,  2.22s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 6/9 [00:13<00:07,  2.41s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 78%|███████████████████████████████████████████████████████████████████████████▍                     | 7/9 [00:15<00:04,  2.36s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 89%|██████████████████████████████████████████████████████████████████████████████████████▏          | 8/9 [00:18<00:02,  2.28s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:19<00:00,  2.17s/it]
05/07 18:56:09 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-fire_engineer]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 1160923.43it/s]
[2024-05-07 18:56:09,920] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/8 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 12%|████████████▏                                                                                    | 1/8 [00:02<00:20,  2.92s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 25%|████████████████████████▎                                                                        | 2/8 [00:06<00:19,  3.21s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 38%|████████████████████████████████████▍                                                            | 3/8 [00:09<00:15,  3.01s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 4/8 [00:12<00:12,  3.01s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 62%|████████████████████████████████████████████████████████████▋                                    | 5/8 [00:15<00:09,  3.17s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 75%|████████████████████████████████████████████████████████████████████████▊                        | 6/8 [00:18<00:06,  3.21s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 88%|████████████████████████████████████████████████████████████████████████████████████▉            | 7/8 [00:21<00:03,  3.04s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:23<00:00,  2.92s/it]
05/07 18:56:33 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-environmental_impact_assessment_engineer]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 1111311.32it/s]
[2024-05-07 18:56:33,444] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/8 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 12%|████████████▏                                                                                    | 1/8 [00:02<00:19,  2.74s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 25%|████████████████████████▎                                                                        | 2/8 [00:05<00:16,  2.80s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 38%|████████████████████████████████████▍                                                            | 3/8 [00:08<00:13,  2.75s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 4/8 [00:10<00:10,  2.69s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 62%|████████████████████████████████████████████████████████████▋                                    | 5/8 [00:13<00:08,  2.82s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 75%|████████████████████████████████████████████████████████████████████████▊                        | 6/8 [00:16<00:05,  2.79s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 88%|████████████████████████████████████████████████████████████████████████████████████▉            | 7/8 [00:19<00:02,  2.76s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:22<00:00,  2.80s/it]
05/07 18:56:55 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-education_science]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 1022141.31it/s]
[2024-05-07 18:56:55,970] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/8 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 12%|████████████▏                                                                                    | 1/8 [00:02<00:15,  2.18s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 25%|████████████████████████▎                                                                        | 2/8 [00:04<00:12,  2.16s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 38%|████████████████████████████████████▍                                                            | 3/8 [00:06<00:11,  2.28s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 4/8 [00:09<00:09,  2.41s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 62%|████████████████████████████████████████████████████████████▋                                    | 5/8 [00:11<00:07,  2.40s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 75%|████████████████████████████████████████████████████████████████████████▊                        | 6/8 [00:14<00:04,  2.38s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 88%|████████████████████████████████████████████████████████████████████████████████████▉            | 7/8 [00:16<00:02,  2.30s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:16<00:00,  2.11s/it]
05/07 18:57:12 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-professional_tour_guide]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:00<00:00, 1057694.05it/s]
[2024-05-07 18:57:12,914] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/8 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 12%|████████████▏                                                                                    | 1/8 [00:02<00:14,  2.05s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 25%|████████████████████████▎                                                                        | 2/8 [00:04<00:13,  2.19s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 38%|████████████████████████████████████▍                                                            | 3/8 [00:06<00:11,  2.21s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 4/8 [00:08<00:08,  2.23s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 62%|████████████████████████████████████████████████████████████▋                                    | 5/8 [00:11<00:06,  2.26s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 75%|████████████████████████████████████████████████████████████████████████▊                        | 6/8 [00:13<00:04,  2.23s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 88%|████████████████████████████████████████████████████████████████████████████████████▉            | 7/8 [00:15<00:02,  2.35s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:16<00:00,  2.06s/it]
05/07 18:57:29 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_chemistry]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 940778.47it/s]
[2024-05-07 18:57:29,509] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:02<00:12,  2.53s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:04<00:09,  2.43s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:07<00:07,  2.52s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:10<00:05,  2.52s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:12<00:02,  2.46s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:14<00:00,  2.46s/it]
05/07 18:57:44 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-metrology_engineer]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 906876.54it/s]
[2024-05-07 18:57:44,380] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:02<00:13,  2.70s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:05<00:11,  2.76s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:08<00:08,  2.73s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:11<00:05,  2.90s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:14<00:02,  2.81s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:16<00:00,  2.82s/it]
05/07 18:58:01 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-mao_zedong_thought]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 906876.54it/s]
[2024-05-07 18:58:01,403] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:03<00:17,  3.47s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:06<00:11,  2.98s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:09<00:08,  2.95s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:11<00:05,  2.75s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:14<00:02,  2.72s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:18<00:00,  3.06s/it]
05/07 18:58:19 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-law]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 774333.05it/s]
[2024-05-07 18:58:19,872] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:03<00:18,  3.70s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:07<00:16,  4.04s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:11<00:11,  3.98s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:14<00:07,  3.63s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:18<00:03,  3.63s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:21<00:00,  3.62s/it]
05/07 18:58:41 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-veterinary_medicine]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 846219.23it/s]
[2024-05-07 18:58:41,701] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:02<00:13,  2.61s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:04<00:09,  2.46s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:07<00:07,  2.49s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:10<00:05,  2.74s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:13<00:02,  2.93s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:15<00:00,  2.65s/it]
05/07 18:58:57 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-modern_chinese_history]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 790729.44it/s]
[2024-05-07 18:58:57,693] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:02<00:12,  2.52s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:05<00:11,  2.82s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:09<00:09,  3.24s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:11<00:06,  3.02s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:14<00:02,  2.98s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:16<00:00,  2.82s/it]
05/07 18:59:14 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-chinese_language_and_literature]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 665303.39it/s]
[2024-05-07 18:59:14,689] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:02<00:10,  2.11s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:04<00:09,  2.50s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:07<00:08,  2.71s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:10<00:05,  2.65s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:12<00:02,  2.47s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:14<00:00,  2.36s/it]
05/07 18:59:28 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-legal_professional]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 853707.89it/s]
[2024-05-07 18:59:28,987] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:04<00:20,  4.04s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:09<00:19,  4.85s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:14<00:14,  4.87s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:18<00:09,  4.68s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:22<00:04,  4.28s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:24<00:00,  4.13s/it]
05/07 18:59:53 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-logic]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 795471.45it/s]
[2024-05-07 18:59:53,922] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:05<00:25,  5.17s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:14<00:30,  7.70s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:20<00:20,  6.73s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:24<00:11,  5.87s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:29<00:05,  5.44s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:32<00:00,  5.40s/it]
05/07 19:00:26 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_history]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 623477.62it/s]
[2024-05-07 19:00:26,437] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:02<00:10,  2.05s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:04<00:08,  2.12s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:06<00:06,  2.12s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:08<00:04,  2.18s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:10<00:02,  2.19s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:11<00:00,  1.98s/it]
05/07 19:00:38 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-plant_protection]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 846556.77it/s]
[2024-05-07 19:00:38,399] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:03<00:16,  3.26s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:05<00:11,  2.94s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:08<00:07,  2.64s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:10<00:05,  2.55s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:12<00:02,  2.37s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:13<00:00,  2.31s/it]
05/07 19:00:52 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-clinical_medicine]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 704386.93it/s]
[2024-05-07 19:00:52,363] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:02<00:12,  2.44s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:05<00:11,  2.80s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:08<00:08,  2.81s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:10<00:05,  2.72s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:13<00:02,  2.62s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:14<00:00,  2.48s/it]
05/07 19:01:07 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-computer_architecture]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 599186.29it/s]
[2024-05-07 19:01:07,330] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:02<00:14,  2.97s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:06<00:12,  3.13s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:09<00:09,  3.05s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:11<00:05,  2.88s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:14<00:02,  2.87s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:15<00:00,  2.61s/it]
05/07 19:01:23 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_biology]
100%|███████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 699050.67it/s]
[2024-05-07 19:01:23,091] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                         | 0/6 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 17%|████████████████▏                                                                                | 1/6 [00:02<00:12,  2.57s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 33%|████████████████████████████████▎                                                                | 2/6 [00:05<00:10,  2.59s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 50%|████████████████████████████████████████████████▌                                                | 3/6 [00:07<00:07,  2.66s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 67%|████████████████████████████████████████████████████████████████▋                                | 4/6 [00:10<00:05,  2.58s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
 83%|████████████████████████████████████████████████████████████████████████████████▊                | 5/6 [00:12<00:02,  2.58s/it]
....
05/07 19:07:23 - OpenCompass - DEBUG - Additional config: {'eval': {'runner': {                                                     'task': {}}}}
05/07 19:07:23 - OpenCompass - INFO - Partitioned into 52 tasks.
05/07 19:07:23 - OpenCompass - DEBUG - Task 0: [opencompass.models.huggingface.                                                     HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-computer_network]
05/07 19:07:23 - OpenCompass - DEBUG - Task 1: [opencompass.models.huggingface.                                                     HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-operating_system]
05/07 19:07:23 - OpenCompass - DEBUG - Task 2: [opencompass.models.huggingface.                                                     HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-computer_architecture]
05/07 19:07:23 - OpenCompass - DEBUG - Task 3: [opencompass.models.huggingface.                                                     HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_programming]
05/07 19:07:23 - OpenCompass - DEBUG - Task 4: [opencompass.models.huggingface.                                                     HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_physics]
05/07 19:07:23 - OpenCompass - DEBUG - Task 5: [opencompass.models.huggingface.                                                     HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_chemistry]
05/07 19:07:23 - OpenCompass - DEBUG - Task 6: [opencompass.models.huggingface.                                                     HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-advanced_mathematics]
05/07 19:07:23 - OpenCompass - DEBUG - Task 7: [opencompass.models.huggingface.                                                     HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-probability_and_statistics]
05/07 19:07:23 - OpenCompass - DEBUG - Task 8: [opencompass.models.huggingface.                                                     HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-discrete_mathematics]
05/07 19:07:23 - OpenCompass - DEBUG - Task 9: [opencompass.models.huggingface.                                                     HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-electrical_engineer]
05/07 19:07:23 - OpenCompass - DEBUG - Task 10: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-metrology_engineer]
05/07 19:07:23 - OpenCompass - DEBUG - Task 11: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_mathematics]
05/07 19:07:23 - OpenCompass - DEBUG - Task 12: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_physics]
05/07 19:07:23 - OpenCompass - DEBUG - Task 13: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_chemistry]
05/07 19:07:23 - OpenCompass - DEBUG - Task 14: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_biology]
05/07 19:07:23 - OpenCompass - DEBUG - Task 15: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_mathematics]
05/07 19:07:23 - OpenCompass - DEBUG - Task 16: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_biology]
05/07 19:07:23 - OpenCompass - DEBUG - Task 17: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_physics]
05/07 19:07:23 - OpenCompass - DEBUG - Task 18: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_chemistry]
05/07 19:07:23 - OpenCompass - DEBUG - Task 19: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-veterinary_medicine]
05/07 19:07:23 - OpenCompass - DEBUG - Task 20: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_economics]
05/07 19:07:23 - OpenCompass - DEBUG - Task 21: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-business_administration]
05/07 19:07:23 - OpenCompass - DEBUG - Task 22: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-marxism]
05/07 19:07:23 - OpenCompass - DEBUG - Task 23: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-mao_zedong_thought]
05/07 19:07:23 - OpenCompass - DEBUG - Task 24: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-education_science]
05/07 19:07:23 - OpenCompass - DEBUG - Task 25: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-teacher_qualification]
05/07 19:07:23 - OpenCompass - DEBUG - Task 26: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_politics]
05/07 19:07:23 - OpenCompass - DEBUG - Task 27: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_geography]
05/07 19:07:23 - OpenCompass - DEBUG - Task 28: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_politics]
05/07 19:07:23 - OpenCompass - DEBUG - Task 29: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_geography]
05/07 19:07:23 - OpenCompass - DEBUG - Task 30: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-modern_chinese_history]
05/07 19:07:23 - OpenCompass - DEBUG - Task 31: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-ideological_and_moral_cultiva                                                     tion]
05/07 19:07:23 - OpenCompass - DEBUG - Task 32: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-logic]
05/07 19:07:23 - OpenCompass - DEBUG - Task 33: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-law]
05/07 19:07:23 - OpenCompass - DEBUG - Task 34: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-chinese_language_and_literatu                                                     re]
05/07 19:07:23 - OpenCompass - DEBUG - Task 35: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-art_studies]
05/07 19:07:23 - OpenCompass - DEBUG - Task 36: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-professional_tour_guide]
05/07 19:07:23 - OpenCompass - DEBUG - Task 37: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-legal_professional]
05/07 19:07:23 - OpenCompass - DEBUG - Task 38: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_chinese]
05/07 19:07:23 - OpenCompass - DEBUG - Task 39: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_history]
05/07 19:07:23 - OpenCompass - DEBUG - Task 40: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-middle_school_history]
05/07 19:07:23 - OpenCompass - DEBUG - Task 41: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-civil_servant]
05/07 19:07:23 - OpenCompass - DEBUG - Task 42: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-sports_science]
05/07 19:07:23 - OpenCompass - DEBUG - Task 43: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-plant_protection]
05/07 19:07:23 - OpenCompass - DEBUG - Task 44: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-basic_medicine]
05/07 19:07:23 - OpenCompass - DEBUG - Task 45: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-clinical_medicine]
05/07 19:07:23 - OpenCompass - DEBUG - Task 46: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-urban_and_rural_planner]
05/07 19:07:23 - OpenCompass - DEBUG - Task 47: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-accountant]
05/07 19:07:23 - OpenCompass - DEBUG - Task 48: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-fire_engineer]
05/07 19:07:23 - OpenCompass - DEBUG - Task 49: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-environmental_impact_assessme                                                     nt_engineer]
05/07 19:07:23 - OpenCompass - DEBUG - Task 50: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-tax_accountant]
05/07 19:07:23 - OpenCompass - DEBUG - Task 51: [opencompass.models.huggingface                                                     .HuggingFace_model_Meta-Llama-3-8B-Instruct/ceval-physician]
05/07 19:07:23 - OpenCompass - DEBUG - Get class `LocalRunner` from "runner" re                                                     gistry in "opencompass"
05/07 19:07:23 - OpenCompass - DEBUG - An `LocalRunner` instance is built from                                                      registry, and its implementation can be found in opencompass.runners.local
05/07 19:07:23 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:23 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:25 - OpenCompass - DEBUG - Modules of opencompass's load_dataset re                                                     gistry have been automatically imported from opencompass.datasets
05/07 19:07:25 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:25 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:25 - OpenCompass - DEBUG - Modules of opencompass's text_postproces                                                     sors registry have been automatically imported from opencompass.utils.text_post                                                     processors
05/07 19:07:25 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:25 - OpenCompass - DEBUG - Modules of opencompass's icl_evaluators                                                      registry have been automatically imported from opencompass.openicl.icl_evaluato                                                     r
05/07 19:07:25 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:25 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:25 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-computer_network]: {'accuracy': 63                                                     .1578947368421}
05/07 19:07:25 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:25 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:27 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:27 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:27 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:27 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:27 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:27 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-operating_system]: {'accuracy': 63                                                     .1578947368421}
05/07 19:07:27 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:27 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:29 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:29 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:29 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:29 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:29 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:29 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-computer_architecture]: {'accuracy                                                     ': 52.38095238095239}
05/07 19:07:29 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:29 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:31 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:31 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:31 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:31 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:31 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:31 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_programming]: {'accuracy':                                                      62.16216216216216}
05/07 19:07:31 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:31 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:33 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:33 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:34 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:34 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:34 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:34 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_physics]: {'accuracy': 42.                                                     10526315789473}
05/07 19:07:34 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:34 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:35 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:36 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:36 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:36 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:36 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:36 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-college_chemistry]: {'accuracy': 2                                                     9.166666666666668}
05/07 19:07:36 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:36 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:37 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:37 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:38 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:38 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:38 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:38 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-advanced_mathematics]: {'accuracy'                                                     : 42.10526315789473}
05/07 19:07:38 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:38 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:39 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:40 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:40 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:40 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:40 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:40 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-probability_and_statistics]: {'acc                                                     uracy': 27.77777777777778}
05/07 19:07:40 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:40 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:42 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:42 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:42 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:42 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:42 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:42 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-discrete_mathematics]: {'accuracy'                                                     : 25.0}
05/07 19:07:42 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:42 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:44 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:44 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:44 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:44 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:44 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:44 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-electrical_engineer]: {'accuracy':                                                      32.432432432432435}
05/07 19:07:44 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:44 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:46 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:46 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:46 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:46 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:46 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:46 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-metrology_engineer]: {'accuracy':                                                      62.5}
05/07 19:07:46 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:46 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:48 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:48 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:48 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:48 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:48 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:48 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_mathematics]: {'accura                                                     cy': 5.555555555555555}
05/07 19:07:48 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:48 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:50 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:50 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:50 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:50 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:50 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:50 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_physics]: {'accuracy':                                                      26.31578947368421}
05/07 19:07:50 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:50 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:52 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:52 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:52 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:52 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
05/07 19:07:52 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:07:52 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-high_school_chemistry]: {'accuracy                                                     ': 63.1578947368421}
05/07 19:07:52 - OpenCompass - DEBUG - Get class `OpenICLEvalTask` from "task"                                                      registry in "opencompass"
05/07 19:07:52 - OpenCompass - DEBUG - An `OpenICLEvalTask` instance is built f                                                     rom registry, and its implementation can be found in opencompass.tasks.openicl_                                                     eval
05/07 19:07:54 - OpenCompass - DEBUG - Get class `CEvalDataset` from "load_data                                                     set" registry in "opencompass"
05/07 19:07:54 - OpenCompass - DEBUG - An `CEvalDataset` instance is built from                                                      registry, and its implementation can be found in opencompass.datasets.ceval
05/07 19:07:54 - OpenCompass - DEBUG - Get class `first_capital_postprocess` fr                                                     om "text_postprocessors" registry in "opencompass"
05/07 19:07:54 - OpenCompass - DEBUG - Get class `AccEvaluator` from "icl_evalu                                                     ators" registry in "opencompass"
....
05/07 19:09:12 - OpenCompass - DEBUG - An `AccEvaluator` instance is built from                                                      registry, and its implementation can be found in opencompass.openicl.icl_evalu                                                     ator.icl_hf_evaluator
05/07 19:09:12 - OpenCompass - INFO - Task [opencompass.models.huggingface.Hugg                                                     ingFace_model_Meta-Llama-3-8B-Instruct/ceval-physician]: {'accuracy': 57.142857                                                     14285714}
05/07 19:09:12 - OpenCompass - DEBUG - An `DefaultSummarizer` instance is built                                                      from registry, and its implementation can be found in opencompass.summarizers.                                                     default
dataset                                         version    metric         mode                                                           opencompass.models.huggingface.HuggingFace_model_Meta-Llama-3-8B-Instruct
----------------------------------------------  ---------  -------------  -----                                                     -  ---------------------------------------------------------------------------
ceval-computer_network                          db9ce2     accuracy       gen                                                                                                                                63.16
ceval-operating_system                          1c2571     accuracy       gen                                                                                                                                63.16
ceval-computer_architecture                     a74dad     accuracy       gen                                                                                                                                52.38
ceval-college_programming                       4ca32a     accuracy       gen                                                                                                                                62.16
ceval-college_physics                           963fa8     accuracy       gen                                                                                                                                42.11
ceval-college_chemistry                         e78857     accuracy       gen                                                                                                                                29.17
ceval-advanced_mathematics                      ce03e2     accuracy       gen                                                                                                                                42.11
ceval-probability_and_statistics                65e812     accuracy       gen                                                                                                                                27.78
ceval-discrete_mathematics                      e894ae     accuracy       gen                                                                                                                                25
ceval-electrical_engineer                       ae42b9     accuracy       gen                                                                                                                                32.43
ceval-metrology_engineer                        ee34ea     accuracy       gen                                                                                                                                62.5
ceval-high_school_mathematics                   1dc5bf     accuracy       gen                                                                                                                                 5.56
ceval-high_school_physics                       adf25f     accuracy       gen                                                                                                                                26.32
ceval-high_school_chemistry                     2ed27f     accuracy       gen                                                                                                                                63.16
ceval-high_school_biology                       8e2b9a     accuracy       gen                                                                                                                                36.84
ceval-middle_school_mathematics                 bee8d5     accuracy       gen                                                                                                                                31.58
ceval-middle_school_biology                     86817c     accuracy       gen                                                                                                                                71.43
ceval-middle_school_physics                     8accf6     accuracy       gen                                                                                                                                57.89
ceval-middle_school_chemistry                   167a15     accuracy       gen                                                                                                                                80
ceval-veterinary_medicine                       b4e08d     accuracy       gen                                                                                                                                52.17
ceval-college_economics                         f3f4e6     accuracy       gen                                                                                                                                45.45
ceval-business_administration                   c1614e     accuracy       gen                                                                                                                                30.3
ceval-marxism                                   cf874c     accuracy       gen                                                                                                                                47.37
ceval-mao_zedong_thought                        51c7a4     accuracy       gen                                                                                                                                50
ceval-education_science                         591fee     accuracy       gen                                                                                                                                51.72
ceval-teacher_qualification                     4e4ced     accuracy       gen                                                                                                                                72.73
ceval-high_school_politics                      5c0de2     accuracy       gen                                                                                                                                68.42
ceval-high_school_geography                     865461     accuracy       gen                                                                                                                                42.11
ceval-middle_school_politics                    5be3e7     accuracy       gen                                                                                                                                57.14
ceval-middle_school_geography                   8a63be     accuracy       gen                                                                                                                                50
ceval-modern_chinese_history                    fc01af     accuracy       gen                                                                                                                                52.17
ceval-ideological_and_moral_cultivation         a2aa4a     accuracy       gen                                                                                                                                78.95
ceval-logic                                     f5b022     accuracy       gen                                                                                                                                40.91
ceval-law                                       a110a1     accuracy       gen                                                                                                                                33.33
ceval-chinese_language_and_literature           0f8b68     accuracy       gen                                                                                                                                34.78
ceval-art_studies                               2a1300     accuracy       gen                                                                                                                                54.55
ceval-professional_tour_guide                   4e673e     accuracy       gen                                                                                                                                55.17
ceval-legal_professional                        ce8787     accuracy       gen                                                                                                                                30.43
ceval-high_school_chinese                       315705     accuracy       gen                                                                                                                                31.58
ceval-high_school_history                       7eb30a     accuracy       gen                                                                                                                                65
ceval-middle_school_history                     48ab4a     accuracy       gen                                                                                                                                59.09
ceval-civil_servant                             87d061     accuracy       gen                                                                                                                                34.04
ceval-sports_science                            70f27b     accuracy       gen                                                                                                                                63.16
ceval-plant_protection                          8941f9     accuracy       gen                                                                                                                                68.18
ceval-basic_medicine                            c409d6     accuracy       gen                                                                                                                                57.89
ceval-clinical_medicine                         49e82d     accuracy       gen                                                                                                                                54.55
ceval-urban_and_rural_planner                   95b885     accuracy       gen                                                                                                                                52.17
ceval-accountant                                002837     accuracy       gen                                                                                                                                44.9
ceval-fire_engineer                             bc23f5     accuracy       gen                                                                                                                                38.71
ceval-environmental_impact_assessment_engineer  c64e2d     accuracy       gen                                                                                                                                45.16
ceval-tax_accountant                            3a5e3c     accuracy       gen                                                                                                                                34.69
ceval-physician                                 6e277d     accuracy       gen                                                                                                                                57.14
ceval-stem                                      -          naive_average  gen                                                                                                                                46.34
ceval-social-science                            -          naive_average  gen                                                                                                                                51.52
ceval-humanities                                -          naive_average  gen                                                                                                                                48.72
ceval-other                                     -          naive_average  gen                                                                                                                                50.05
ceval-hard                                      -          naive_average  gen                                                                                                                                32.65
ceval                                           -          naive_average  gen                                                                                                                                48.63
05/07 19:09:12 - OpenCompass - INFO - write summary to /root/opencompass/output                                                     s/default/20240507_184531/summary/summary_20240507_184531.txt
05/07 19:09:12 - OpenCompass - INFO - write csv to /root/opencompass/outputs/de                                                     fault/20240507_184531/summary/summary_20240507_184531.csv


在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

大模型技术分享

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

《企业级生成式人工智能LLM大模型技术、算法及案例实战》线上高级研修讲座

模块一:Generative AI 原理本质、技术内核及工程实践周期详解
模块二:工业级 Prompting 技术内幕及端到端的基于LLM 的会议助理实战
模块三:三大 Llama 2 模型详解及实战构建安全可靠的智能对话系统
模块四:生产环境下 GenAI/LLMs 的五大核心问题及构建健壮的应用实战
模块五:大模型应用开发技术:Agentic-based 应用技术及案例实战
模块六:LLM 大模型微调及模型 Quantization 技术及案例实战
模块七:大模型高效微调 PEFT 算法、技术、流程及代码实战进阶
模块八:LLM 模型对齐技术、流程及进行文本Toxicity 分析实战
模块九:构建安全的 GenAI/LLMs 核心技术Red Teaming 解密实战
模块十:构建可信赖的企业私有安全大模型Responsible AI 实战 

Llama3关键技术深度解析与构建Responsible AI、算法及开发落地实战

1、Llama开源模型家族大模型技术、工具和多模态详解:学员将深入了解Meta Llama 3的创新之处,比如其在语言模型技术上的突破,并学习到如何在Llama 3中构建trust and safety AI。他们将详细了解Llama 3的五大技术分支及工具,以及如何在AWS上实战Llama指令微调的案例。
2、解密Llama 3 Foundation Model模型结构特色技术及代码实现:深入了解Llama 3中的各种技术,比如Tiktokenizer、KV Cache、Grouped Multi-Query Attention等。通过项目二逐行剖析Llama 3的源码,加深对技术的理解。
3、解密Llama 3 Foundation Model模型结构核心技术及代码实现:SwiGLU Activation Function、FeedForward Block、Encoder Block等。通过项目三学习Llama 3的推理及Inferencing代码,加强对技术的实践理解。
4、基于LangGraph on Llama 3构建Responsible AI实战体验:通过项目四在Llama 3上实战基于LangGraph的Responsible AI项目。他们将了解到LangGraph的三大核心组件、运行机制和流程步骤,从而加强对Responsible AI的实践能力。
5、Llama模型家族构建技术构建安全可信赖企业级AI应用内幕详解:深入了解构建安全可靠的企业级AI应用所需的关键技术,比如Code Llama、Llama Guard等。项目五实战构建安全可靠的对话智能项目升级版,加强对安全性的实践理解。
6、Llama模型家族Fine-tuning技术与算法实战:学员将学习Fine-tuning技术与算法,比如Supervised Fine-Tuning(SFT)、Reward Model技术、PPO算法、DPO算法等。项目六动手实现PPO及DPO算法,加强对算法的理解和应用能力。
7、Llama模型家族基于AI反馈的强化学习技术解密:深入学习Llama模型家族基于AI反馈的强化学习技术,比如RLAIF和RLHF。项目七实战基于RLAIF的Constitutional AI。
8、Llama 3中的DPO原理、算法、组件及具体实现及算法进阶:学习Llama 3中结合使用PPO和DPO算法,剖析DPO的原理和工作机制,详细解析DPO中的关键算法组件,并通过综合项目八从零开始动手实现和测试DPO算法,同时课程将解密DPO进阶技术Iterative DPO及IPO算法。
9、Llama模型家族Safety设计与实现:在这个模块中,学员将学习Llama模型家族的Safety设计与实现,比如Safety in Pretraining、Safety Fine-Tuning等。构建安全可靠的GenAI/LLMs项目开发。
10、Llama 3构建可信赖的企业私有安全大模型Responsible AI系统:构建可信赖的企业私有安全大模型Responsible AI系统,掌握Llama 3的Constitutional AI、Red Teaming。

解码Sora架构、技术及应用

一、为何Sora通往AGI道路的里程碑?
1,探索从大规模语言模型(LLM)到大规模视觉模型(LVM)的关键转变,揭示其在实现通用人工智能(AGI)中的作用。
2,展示Visual Data和Text Data结合的成功案例,解析Sora在此过程中扮演的关键角色。
3,详细介绍Sora如何依据文本指令生成具有三维一致性(3D consistency)的视频内容。 4,解析Sora如何根据图像或视频生成高保真内容的技术路径。
5,探讨Sora在不同应用场景中的实践价值及其面临的挑战和局限性。

二、解码Sora架构原理
1,DiT (Diffusion Transformer)架构详解
2,DiT是如何帮助Sora实现Consistent、Realistic、Imaginative视频内容的?
3,探讨为何选用Transformer作为Diffusion的核心网络,而非技术如U-Net。
4,DiT的Patchification原理及流程,揭示其在处理视频和图像数据中的重要性。
5,Conditional Diffusion过程详解,及其在内容生成过程中的作用。
三、解码Sora关键技术解密
1,Sora如何利用Transformer和Diffusion技术理解物体间的互动,及其对模拟复杂互动场景的重要性。
2,为何说Space-time patches是Sora技术的核心,及其对视频生成能力的提升作用。
3,Spacetime latent patches详解,探讨其在视频压缩和生成中的关键角色。
4,Sora Simulator如何利用Space-time patches构建digital和physical世界,及其对模拟真实世界变化的能力。
5,Sora如何实现faithfully按照用户输入文本而生成内容,探讨背后的技术与创新。
6,Sora为何依据abstract concept而不是依据具体的pixels进行内容生成,及其对模型生成质量与多样性的影响。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/600379.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Figma 高效技巧:设计系统中的图标嵌套

Figma 高效技巧&#xff1a;设计系统中的图标嵌套 在设计中&#xff0c;图标起着不可或缺的作用。一套便捷易用的图标嵌套方法可以有效提高设计效率。 分享一下我在图标嵌套上走过的弯路和经验教训。我的图标嵌套可以分三个阶段&#xff1a; 第一阶段&#xff1a;建立图标库 一…

使用Java编写的简单彩票中奖概率计算器

前言 在当今社会&#xff0c;彩票已经成为许多人追逐梦想和改变生活的一种方式。然而&#xff0c;中奖的概率却是一个让人犹豫和兴奋的话题。在这篇文章中&#xff0c;我们将探讨如何使用Java编程语言实现一个简单的彩票中奖概率计算器。通过这个计算器&#xff0c;我们可以根…

C++细节,可能存在的隐患,面试题03

文章目录 11. C编译过程12. const vs #define12.1. 全局const vs 局部const 13. C内存分区14. C变量作用域14.1. 常量 vs 全局变量 vs 静态变量 15. C类型转换16. 函数指针17. 悬空指针 vs 野指针18. 为什么使用空指针&#xff0c;建议使用nullptr而不是NULL&#xff1f; 11. C…

Linux初识

1.操作系统的那点事 &#xff08;1&#xff09;结论&#xff1a;操作系统是作软硬件管理的软件&#xff1b; &#xff08;2&#xff09;计算机是操作系统&#xff0c;设备驱动&#xff0c;硬件三个相互结合发挥作用的&#xff0c;操作系统是用来管理硬件的&#xff0c;常见的…

【Linux-I.MX6ULL裸机学习】中断向量表

代码来自于正点原子阿尔法Linux开发板光盘 比如在中断向量表中规定了&#xff1a;在某个地址0x80000A对应着某个中断服务函数&#xff0c;那么在产生这个中断时&#xff0c;就会从0x80000A这个地址去读取中断服务函数&#xff0c;并执行。 如果想改变这个地址&#xff0c;也就是…

【Linux】基础命令

常用命令及参数&#xff1a;dir表示文件夹&#xff0c;file表示文件&#xff08;file可表示其他目录下的文件&#xff09; pwd命令&#xff1b;查看当前所属文件夹&#xff08;print working directory&#xff09; ls [选项] dir&#xff1b;查看当前、指定文件夹目录内容&am…

《Fundamentals of Power Electronics》——隔离型CUK转换器、

以下是隔离型CUK转换器的相关知识点&#xff1a; Cuk电路的隔离型版本获得方式不同。基础非隔离型Cuk电路如下图所示。 将上图中电容C1分成两个串联的电容C1a和C1b&#xff0c;得到结果如下图所示。 在两个电容之间插入一个变压器&#xff0c;得到如下图所示电路。 变压器极性…

Quora 首席执行官亚当·德安杰洛 (Adam D’Angelo) 谈论了 AI、聊天机器人平台 Poe,以及 OpenAI 为什么不是竞争对手

每周跟踪AI热点新闻动向和震撼发展 想要探索生成式人工智能的前沿进展吗&#xff1f;订阅我们的简报&#xff0c;深入解析最新的技术突破、实际应用案例和未来的趋势。与全球数同行一同&#xff0c;从行业内部的深度分析和实用指南中受益。不要错过这个机会&#xff0c;成为AI领…

鸿蒙内核源码分析(进程通讯篇) | 九种进程间通讯方式速揽

进程间为何要通讯 ? 鸿蒙内核默认支持 64个进程和128个任务&#xff0c;由进程池和任务池统一管理.内核设计尽量不去打扰它们&#xff0c;让各自过好各自的日子&#xff0c; 但大家毕竟在一口锅里吃饭&#xff0c; 不可能不与外界联系&#xff0c; 联系就得有渠道&#xff0c…

虚幻引擎笔记

虚幻引擎笔记 一、蓝图类二、创建自定义Character蓝图三、操作映射和轴映射 一、蓝图类 Actor&#xff1a;可以放置在关卡中的物体Pawn&#xff1a;相当于游戏角色&#xff0c;可以通过玩家控制器来控制角色&#xff1a;在Pawn的基础上增加了四处走动的功能。创建游戏角色时使用…

创意无限,批量剪辑技巧:视频剪辑中的画中画技巧大揭秘

在视频剪辑的世界里&#xff0c;创意是无限的&#xff0c;而技巧则是实现这些创意的关键。画中画技巧作为视频剪辑中的一种高级技术&#xff0c;可以带给观众新颖的视觉体验&#xff0c;提升视频的质量和观赏性。本文将深入探讨批量剪辑中的画中画技巧&#xff0c;揭示其背后的…

就业班 第三阶段(redis) 2401--5.7 day2 redis2 哨兵(前提是做好了主从)+redis集群

1、设置密码&#xff08;redis&#xff09; 先在redis.conf里面找到这个 后面写上要设置的密码即可 2、哨兵模式 监控redis集群中master状态的的工具 在做了主从的前提下 主 从1 从2 作用 1)&#xff1a;Master状态检测 2)&#xff1a;如果Master异常&#xff0c;则会进行…

2-5 任务:打印九九表

本次实战的目标是通过编写程序实现打印九九乘法表、字符矩形、字符平行四边形和字符菱形等图形&#xff0c;以及解决百钱买百鸡问题和输出素数等实际问题。在实战过程中&#xff0c;我们将学习并掌握以下知识点。 双重循环的使用&#xff1a;通过双重循环实现九九乘法表的打印&…

告别杂乱桌面,开启纯净视界!DeskCover Pro,Mac用户的桌面神器!

DeskCover Pro for Mac是一款专为macOS设计的桌面图标隐藏软件&#xff0c;其主要功能和特点包括&#xff1a; 桌面图标隐藏&#xff1a;通过单击鼠标或按全局热键&#xff0c;可以快速隐藏桌面上的所有图标&#xff0c;为您提供一个干净整洁的工作环境。窗口聚焦&#xff1a;…

证券基金信创联盟研讨会:YashanDB分享金融核心数据库技术实践

4月26日&#xff0c;由证券基金行业信息技术应用创新联盟主办、WG3稽核风控系统工作组承办、国信证券股份有限公司协办的信创联盟2024年度系列研讨会第三期-稽核风控系统信创实践成功举办。国内头部企业国信证券、申万宏源证券、信达证券、国金证券、广发证券等单位共计300余人…

【数据结构】链表经典OJ题目练习(2)

面试题 02.02. 返回倒数第 k 个节点 - 力扣&#xff08;LeetCode&#xff09; 思路1&#xff1a;先计算出链表的长度&#xff0c;在将链表中的值存在数组中&#xff0c;在返回第k个节点。 思路2&#xff1a;利用快慢指针&#xff0c;先让快指针走k步&#xff0c;在让快慢指针分…

[译]Elasticsearch _source Doc_values And Store Performance

原文地址 https://sease.io/2021/02/field-retrieval-performance-in-elasticsearch.html 在这篇博文中&#xff0c;我想从性能的角度探讨 Elasticsearch 为我们存储字段和查询时检索字段提供了哪些可能性。 事实上&#xff0c;Lucene&#xff08;Elasticsearch 和 Solr 构建的…

详细分析Mybatis与MybatisPlus中分页查询的差异(附Demo)

目录 前言1. Mybatis2. MybatisPlus3. 实战 前言 更多的知识点推荐阅读&#xff1a; 【Java项目】实战CRUD的功能整理&#xff08;持续更新&#xff09;java框架 零基础从入门到精通的学习路线 附开源项目面经等&#xff08;超全&#xff09; 本章节主要以Demo为例&#xff…

122. Kafka问题与解决实践

文章目录 前言顺序问题1. 为什么要保证消息的顺序&#xff1f;2.如何保证消息顺序&#xff1f;3.出现意外4.解决过程 消息积压1. 消息体过大2. 路由规则不合理3. 批量操作引起的连锁反应4. 表过大 主键冲突数据库主从延迟重复消费多环境消费问题后记 前言 假如有家公司是做餐饮…

3行代码,实现一个取色器

前言 今天发现了一个很好玩的 API ——EyeDropper。 EyeDropper API 提供了一种创建拾色器工具的机制。使用该工具,用户可以从屏幕上取样颜色,包括浏览器窗口之外的区域。 这是 MDN 上对它的介绍,可以取包括浏览器窗口之外的区域。我们一起看看是怎么个事 什么是取色器 取…