Skip to main content

Custom Evaluation Dataset

Introduction

CSGHub provides model evaluation tools and supports custom evaluation datasets. Users can upload their own datasets and then use these datasets to evaluate model performance. This document will provide detailed instructions on how to customize evaluation datasets.

EvalScope Custom Dataset Usage

Multiple Choice Questions (MCQ)

CSV Format

Directory structure:

mcq/
├── example_dev.csv # (Optional) File name format: `{subset_name}_dev.csv`, used for few-shot evaluation
└── example_val.csv # File name format: `{subset_name}_val.csv`, used for actual evaluation data

CSV files should follow this format:

id,question,A,B,C,D,answer
1,通常来说,组成动物蛋白质的氨基酸有____,4种,22种,20种,19种,C
2,血液内存在的下列物质中,不属于代谢终产物的是____。,尿素,尿酸,丙酮酸,二氧化碳,C

JSONL Format

Directory structure:

mcq/
├── example_dev.jsonl # (Optional) File name format: `{subset_name}_dev.jsonl`, used for few-shot evaluation
└── example_val.jsonl # File name format: `{subset_name}_val.jsonl`, used for actual evaluation data

JSONL files should follow this format:

{"id": "1", "question": "通常来说,组成动物蛋白质的氨基酸有____", "A": "4种", "B": "22种", "C": "20种", "D": "19种", "answer": "C"}
{"id": "2", "question": "血液内存在的下列物质中,不属于代谢终产物的是____。", "A": "尿素", "B": "尿酸", "C": "丙酮酸", "D": "二氧化碳", "answer": "C"}

Field Descriptions

  • id: sequence number (optional field)
  • question: the question
  • A, B, C, D, etc.: options, supports up to 10 options
  • answer: correct option

Question-Answer Format (QA)

JSONL Format

Directory structure:

qa/
└── example.jsonl

The JSONL file should follow this format:

{"system": "你是一位地理学家", "query": "中国的首都是哪里?", "response": "中国的首都是北京"}
{"query": "世界上最高的山是哪座山?", "response": "是珠穆朗玛峰"}
{"query": "为什么北极见不到企鹅?", "response": "因为企鹅大多生活在南极"}

Field Descriptions

  • system: system prompt (optional field)
  • query: question (required)
  • response: correct answer (required)

CSGHub will parse the {subset_name} based on the dataset name and perform model evaluation.

Reference dataset: https://opencsg.com/datasets/xzgan/evalscope-custom-data

For more details, refer to: EvalScope Custom Dataset

OpenCompass Custom Dataset Usage

Multiple Choice Questions (mcq)

For multiple choice (mcq) type data, the default fields are:

  • question: represents the question stem
  • A, B, C, ...: use single uppercase letters to represent options, with no limit on the number. By default, it will start from A and parse consecutive letters as options.
  • answer: represents the correct answer for the multiple choice question. Its value must be one of the selected options above, such as A, B, C, etc.

For non-default fields, we will read them in but won't use them by default. If you need to use them, you need to specify them in the .meta.json file.

.jsonl Format Example

{"question": "165+833+650+615=", "A": "2258", "B": "2263", "C": "2281", "answer": "B"}
{"question": "368+959+918+653+978=", "A": "3876", "B": "3878", "C": "3880", "answer": "A"}
{"question": "776+208+589+882+571+996+515+726=", "A": "5213", "B": "5263", "C": "5383", "answer": "B"}
{"question": "803+862+815+100+409+758+262+169=", "A": "4098", "B": "4128", "C": "4178", "answer": "C"}

.csv Format Example

question,A,B,C,answer
127+545+588+620+556+199=,2632,2635,2645,B
735+603+102+335+605=,2376,2380,2410,B
506+346+920+451+910+142+659+850=,4766,4774,4784,C
504+811+870+445=,2615,2630,2750,B

Reference dataset: https://opencsg.com/datasets/xzgan/opencompass-custom-mcq

Question-Answer (qa)

For question-answer (qa) type data, the default fields are:

  • question: represents the question stem
  • answer: represents the correct answer for the question. Can be missing, indicating that the dataset has no correct answer.

For non-default fields, we will read them in but won't use them by default. If you need to use them, you need to specify them in the .meta.json file.

.jsonl Format Example

{"question": "752+361+181+933+235+986=", "answer": "3448"}
{"question": "712+165+223+711=", "answer": "1811"}
{"question": "921+975+888+539=", "answer": "3323"}
{"question": "752+321+388+643+568+982+468+397=", "answer": "4519"}

.csv Format Example

question,answer
123+147+874+850+915+163+291+604=,3967
149+646+241+898+822+386=,3142
332+424+582+962+735+798+653+214=,4700
649+215+412+495+220+738+989+452=,4170

Reference dataset: https://opencsg.com/datasets/xzgan/opencompass-custom-qa

lm-evaluation-harness Custom Dataset Usage

Question-Answer (qa)

Steps to Create Custom Dataset

  1. Define a standard HF datasets data format

    For example, refer to: https://opencsg.com/datasets/AIWizards/gsm8k

  2. Define a task YAML file in the dataset

    For more details, refer to: lm-evaluation-harness New Task Guide

Important Notes

Note: The repository needs to contain a task YAML file to identify the task name and category.

Final dataset file repository reference: https://opencsg.com/datasets/xzgan/harness-custom-dataset

Summary

This document covers three main evaluation frameworks supported by CSGHub:

  • EvalScope: Supports both MCQ and QA formats in CSV and JSONL
  • OpenCompass: Supports MCQ and QA formats with flexible field definitions
  • lm-evaluation-harness: Requires HuggingFace dataset format with task YAML configuration

Each framework has its own data format requirements and use cases. Choose the appropriate framework based on your specific evaluation needs and dataset characteristics.