Dataset Cards
What is Dataset Card?
Dataset card is a file that accompany the datasets, which is a Markdown file, with a YAML section at the top that contains metadata about the dataset. A dataset repository will render its README.md as a dataset card. The dataset card includes key information of the dataset. It can help users better understand and use your dataset correctly. We recommend that you create your dataset card according to the dataset card specification.
What Information Should be Included in the Dataset Card?
The dataset card should describe:
- Dataset Name
- Dataset Overview: Include dataset format and structure, data source and the method the data is labled.
- Usage: Provide detailed examples and code to illustrate the use of the dataset as much as possible. Introduce and explain the dataset operation environment and framework.
- Scenarios: Describe the target scenarios, intended uses, and potential limitations of the dataset.
- Supported models: Describe the model information supported by the dataset.
Dataset Card Metadata
Dataset card is composed of YAML metadata and Markdown text content. You can add metadata by editing the YAML section of the README.md file, separated by three "---". Markdown text shows the dataset information and related descriptions.
You can refer to the following template to create your dataset card.
---
# License
license: apache-2.0
# User-defined tags
tags:
- image-classification
- customize tags
---
<!--- The above is in YAML format, providing license and task descriptions--->
<!--- The following is the dataset description in markdown format--->
# Dataset name
Introduce general information about the dataset
## Dataset details
### Dataset description
Describe the dataset, including the developer, the language of the dataset and the license.
### Usage
### How to use
Describe how to use the dataset.
### Dataset structure
Describe the structure of the dataset.
## Dataset creation
### Source data
#### Data Collection and Processing
Introduce the process of data collection and processing
#### Source data creator
Introduces information about the creator of the source data
### Risks and limitations
Introduce the risks or limitations of the dataset
### Recommendations
Recommendations for users
Supported Dataset Tags
Task |
---|
text-classification |
relation-extraction |
zero-shot |
translation |
token-classification |
conversational |
text-generation |
table-question-answering |
sentence-similarity |
fill-mask |
summarization |
question-answering |
image-to-text |
image-classification |
object-detection |
image-segmentation |
image-editing |
image-generation |
auto-speech-recognition |
text-to-speech |
speech-signal-process |
keyword-spotting |
audio-classification |
voice-activity-detection |
object-tracking |
autonomous-driving |
video-generation |
video-super-resolution |
video-segmentation |
image-captioning |
visual-grounding |
text-to-image |
feature-extraction |
Supported Industry Tags
Task |
---|
Automotive |
Manufacturing |
Energy |
Telecommunications and Electronic Information |
Transportation and Logistics |
Construction and Real Estate |
Financial Services |
Agriculture |
Chemical Industry |
Environmental Protection |
Healthcare and Medical Services |
Education and Training |
Food and Beverage |
Retail and Consumer Goods |
Tourism and Hospitality |
Information Technology (IT) |
Culture and Entertainment |