Skip to main content

Dataset Cards

What is Dataset Card?

Dataset card is a file that accompany the datasets, which is a Markdown file, with a YAML section at the top that contains metadata about the dataset. A dataset repository will render its README.md as a dataset card. The dataset card includes key information of the dataset. It can help users better understand and use your dataset correctly. We recommend that you create your dataset card according to the dataset card specification.

What Information Should be Included in the Dataset Card?

The dataset card should describe:

  • Dataset Name
  • Dataset Overview: Include dataset format and structure, data source and the method the data is labled.
  • Usage: Provide detailed examples and code to illustrate the use of the dataset as much as possible. Introduce and explain the dataset operation environment and framework.
  • Scenarios: Describe the target scenarios, intended uses, and potential limitations of the dataset.
  • Supported models: Describe the model information supported by the dataset.

Dataset Card Metadata

Dataset card is composed of YAML metadata and Markdown text content. You can add metadata by editing the YAML section of the README.md file, separated by three "---". Markdown text shows the dataset information and related descriptions.

You can refer to the following template to create your dataset card.

---
# License
license: apache-2.0

# User-defined tags
tags:
- image-classification
- customize tags
---

<!--- The above is in YAML format, providing license and task descriptions--->

<!--- The following is the dataset description in markdown format--->

# Dataset name

Introduce general information about the dataset

## Dataset details

### Dataset description

Describe the dataset, including the developer, the language of the dataset and the license.

### Usage

### How to use

Describe how to use the dataset.

### Dataset structure

Describe the structure of the dataset.

## Dataset creation

### Source data

#### Data Collection and Processing

Introduce the process of data collection and processing

#### Source data creator

Introduces information about the creator of the source data

### Risks and limitations

Introduce the risks or limitations of the dataset

### Recommendations
Recommendations for users

Supported Dataset Tags

Task
text-classification
relation-extraction
zero-shot
translation
token-classification
conversational
text-generation
table-question-answering
sentence-similarity
fill-mask
summarization
question-answering
image-to-text
image-classification
object-detection
image-segmentation
image-editing
image-generation
auto-speech-recognition
text-to-speech
speech-signal-process
keyword-spotting
audio-classification
voice-activity-detection
object-tracking
autonomous-driving
video-generation
video-super-resolution
video-segmentation
image-captioning
visual-grounding
text-to-image
feature-extraction

Supported Industry Tags

Task
Automotive
Manufacturing
Energy
Telecommunications and Electronic Information
Transportation and Logistics
Construction and Real Estate
Financial Services
Agriculture
Chemical Industry
Environmental Protection
Healthcare and Medical Services
Education and Training
Food and Beverage
Retail and Consumer Goods
Tourism and Hospitality
Information Technology (IT)
Culture and Entertainment