Custom Evaluation Dataset
Introduction
CSGHub provides model evaluation tools and supports custom evaluation datasets. Users can upload their own datasets and then use these datasets to evaluate model performance. This document will provide detailed instructions on how to customize evaluation datasets.
EvalScope Custom Dataset Usage
Multiple Choice Questions (MCQ)
CSV Format
Directory structure:
mcq/
├── example_dev.csv # (Optional) File name format: `{subset_name}_dev.csv`, used for few-shot evaluation
└── example_val.csv # File name format: `{subset_name}_val.csv`, used for actual evaluation data
CSV files should follow this format:
id,question,A,B,C,D,answer
1,通常来说,组成动物蛋白质的氨基酸有____,4种,22种,20种,19种,C
2,血液内存在的下列物质中,不属于代谢终产物的是____。,尿素,尿酸,丙酮酸,二氧化碳,C
JSONL Format
Directory structure:
mcq/
├── example_dev.jsonl # (Optional) File name format: `{subset_name}_dev.jsonl`, used for few-shot evaluation
└── example_val.jsonl # File name format: `{subset_name}_val.jsonl`, used for actual evaluation data
JSONL files should follow this format:
{"id": "1", "question": "通常来说,组成动物蛋白质的氨基酸有____", "A": "4种", "B": "22种", "C": "20种", "D": "19种", "answer": "C"}
{"id": "2", "question": "血液内存在的下列物质中,不属于代谢终产物的是____。", "A": "尿素", "B": "尿酸", "C": "丙酮酸", "D": "二氧化碳", "answer": "C"}
Field Descriptions
id: sequence number (optional field)question: the questionA,B,C,D, etc.: options, supports up to 10 optionsanswer: correct option
Question-Answer Format (QA)
JSONL Format
Directory structure:
qa/
└── example.jsonl
The JSONL file should follow this format:
{"system": "你是一位地理学家", "query": "中国的首都是哪里?", "response": "中国的首都是北京"}
{"query": "世界上最高的山是哪座山?", "response": "是珠穆朗玛峰"}
{"query": "为什么北极见不到企鹅?", "response": "因为企鹅大多生活在南极"}
Field Descriptions
system: system prompt (optional field)query: question (required)response: correct answer (required)
CSGHub will parse the {subset_name} based on the dataset name and perform model evaluation.
Reference dataset: https://opencsg.com/datasets/xzgan/evalscope-custom-data
For more details, refer to: EvalScope Custom Dataset
OpenCompass Custom Dataset Usage
Multiple Choice Questions (mcq)
For multiple choice (mcq) type data, the default fields are:
question: represents the question stemA,B,C, ...: use single uppercase letters to represent options, with no limit on the number. By default, it will start fromAand parse consecutive letters as options.answer: represents the correct answer for the multiple choice question. Its value must be one of the selected options above, such asA,B,C, etc.
For non-default fields, we will read them in but won't use them by default. If you need to use them, you need to specify them in the .meta.json file.
.jsonl Format Example
{"question": "165+833+650+615=", "A": "2258", "B": "2263", "C": "2281", "answer": "B"}
{"question": "368+959+918+653+978=", "A": "3876", "B": "3878", "C": "3880", "answer": "A"}
{"question": "776+208+589+882+571+996+515+726=", "A": "5213", "B": "5263", "C": "5383", "answer": "B"}
{"question": "803+862+815+100+409+758+262+169=", "A": "4098", "B": "4128", "C": "4178", "answer": "C"}
.csv Format Example
question,A,B,C,answer
127+545+588+620+556+199=,2632,2635,2645,B
735+603+102+335+605=,2376,2380,2410,B
506+346+920+451+910+142+659+850=,4766,4774,4784,C
504+811+870+445=,2615,2630,2750,B
Reference dataset: https://opencsg.com/datasets/xzgan/opencompass-custom-mcq
Question-Answer (qa)
For question-answer (qa) type data, the default fields are:
question: represents the question stemanswer: represents the correct answer for the question. Can be missing, indicating that the dataset has no correct answer.
For non-default fields, we will read them in but won't use them by default. If you need to use them, you need to specify them in the .meta.json file.