Custom Evaluation Dataset
Introduction
CSGHub provides model evaluation tools and supports custom evaluation datasets. Users can upload their own datasets and then use these datasets to evaluate model performance. This document will provide detailed instructions on how to customize evaluation datasets.
EvalScope Custom Dataset Usage
Multiple Choice Questions (MCQ)
CSV Format
Directory structure:
mcq/
├── example_dev.csv # (Optional) File name format: `{subset_name}_dev.csv`, used for few-shot evaluation
└── example_val.csv # File name format: `{subset_name}_val.csv`, used for actual evaluation data
CSV files should follow this format:
id,question,A,B,C,D,answer
1,通常来说,组成动物蛋白质的氨基酸有____,4种,22种,20种,19种,C
2,血液内存在的下列物质中,不属于代谢终产物的是____。,尿素,尿酸,丙酮酸,二氧化碳,C
JSONL Format
Directory structure:
mcq/
├── example_dev.jsonl # (Optional) File name format: `{subset_name}_dev.jsonl`, used for few-shot evaluation
└── example_val.jsonl # File name format: `{subset_name}_val.jsonl`, used for actual evaluation data
JSONL files should follow this format:
{"id": "1", "question": "通常来说,组成动物蛋白质的氨基酸有____", "A": "4种", "B": "22种", "C": "20种", "D": "19种", "answer": "C"}
{"id": "2", "question": "血液内存在的下列物质中,不属于代谢终产物的是____。", "A": "尿素", "B": "尿酸", "C": "丙酮酸", "D": "二氧化碳", "answer": "C"}
Field Descriptions
id
: sequence number (optional field)question
: the questionA
,B
,C
,D
, etc.: options, supports up to 10 optionsanswer
: correct option
Question-Answer Format (QA)
JSONL Format
Directory structure:
qa/
└── example.jsonl
The JSONL file should follow this format:
{"system": "你是一位地理学家", "query": "中国的首都是哪里?", "response": "中国的首都是北京"}
{"query": "世界上最高的山是哪座山?", "response": "是珠穆朗玛峰"}
{"query": "为什么北极见不到企鹅?", "response": "因为企鹅大多生活在南极"}
Field Descriptions
system
: system prompt (optional field)query
: question (required)response
: correct answer (required)
CSGHub will parse the {subset_name}
based on the dataset name and perform model evaluation.
Reference dataset: https://opencsg.com/datasets/xzgan/evalscope-custom-data
For more details, refer to: EvalScope Custom Dataset
OpenCompass Custom Dataset Usage
Multiple Choice Questions (mcq)
For multiple choice (mcq
) type data, the default fields are:
question
: represents the question stemA
,B
,C
, ...: use single uppercase letters to represent options, with no limit on the number. By default, it will start fromA
and parse consecutive letters as options.answer
: represents the correct answer for the multiple choice question. Its value must be one of the selected options above, such asA
,B
,C
, etc.
For non-default fields, we will read them in but won't use them by default. If you need to use them, you need to specify them in the .meta.json
file.
.jsonl Format Example
{"question": "165+833+650+615=", "A": "2258", "B": "2263", "C": "2281", "answer": "B"}
{"question": "368+959+918+653+978=", "A": "3876", "B": "3878", "C": "3880", "answer": "A"}
{"question": "776+208+589+882+571+996+515+726=", "A": "5213", "B": "5263", "C": "5383", "answer": "B"}
{"question": "803+862+815+100+409+758+262+169=", "A": "4098", "B": "4128", "C": "4178", "answer": "C"}
.csv Format Example
question,A,B,C,answer
127+545+588+620+556+199=,2632,2635,2645,B
735+603+102+335+605=,2376,2380,2410,B
506+346+920+451+910+142+659+850=,4766,4774,4784,C
504+811+870+445=,2615,2630,2750,B
Reference dataset: https://opencsg.com/datasets/xzgan/opencompass-custom-mcq
Question-Answer (qa)
For question-answer (qa
) type data, the default fields are:
question
: represents the question stemanswer
: represents the correct answer for the question. Can be missing, indicating that the dataset has no correct answer.
For non-default fields, we will read them in but won't use them by default. If you need to use them, you need to specify them in the .meta.json
file.
.jsonl Format Example
{"question": "752+361+181+933+235+986=", "answer": "3448"}
{"question": "712+165+223+711=", "answer": "1811"}
{"question": "921+975+888+539=", "answer": "3323"}
{"question": "752+321+388+643+568+982+468+397=", "answer": "4519"}
.csv Format Example
question,answer
123+147+874+850+915+163+291+604=,3967
149+646+241+898+822+386=,3142
332+424+582+962+735+798+653+214=,4700
649+215+412+495+220+738+989+452=,4170
Reference dataset: https://opencsg.com/datasets/xzgan/opencompass-custom-qa
lm-evaluation-harness Custom Dataset Usage
Question-Answer (qa)
Steps to Create Custom Dataset
-
Define a standard HF datasets data format
For example, refer to: https://opencsg.com/datasets/AIWizards/gsm8k
-
Define a task YAML file in the dataset
For more details, refer to: lm-evaluation-harness New Task Guide
Important Notes
Note: The repository needs to contain a task YAML file to identify the task name and category.
Final dataset file repository reference: https://opencsg.com/datasets/xzgan/harness-custom-dataset
Summary
This document covers three main evaluation frameworks supported by CSGHub:
- EvalScope: Supports both MCQ and QA formats in CSV and JSONL
- OpenCompass: Supports MCQ and QA formats with flexible field definitions
- lm-evaluation-harness: Requires HuggingFace dataset format with task YAML configuration
Each framework has its own data format requirements and use cases. Choose the appropriate framework based on your specific evaluation needs and dataset characteristics.