Dataset Preview
Overview of Dataset Preview Feature
CSGHub offers a dataset preview feature that allows users to view dataset content directly online without the need to download it. The preview page includes a data table that displays the contents of the dataset in a paginated format. Users can browse the data using pagination buttons at the bottom or quickly locate the required data through filtering and searching functions.
Supported Data Formats
CSGHub’s dataset preview feature supports multiple formats, including:
- Parquet: An efficient columnar storage format suitable for large-scale data analysis.
- CSV: The first row is the header, defining the names of the data fields, and the subsequent rows list the corresponding values for each record's fields sequentially. Each record is arranged in a consistent field order.
- Example:
key1,key2
data1,data2
data3,data4
- Example:
- JSON: The data uses an array as the top-level structure, with each object in the array representing a data record.
- Example:
[
{
"key1": "data1",
"key2": "data2"
},
{
"key1": "data3",
"key2": "data4"
}
]
- Example:
Core Features of Dataset Preview
The dataset preview feature in CSGHub is supported by backend APIs and optimized for different data formats. Its main functionalities include:
- Displaying Dataset Content: View the table data of the dataset online, supporting pagination browsing.
- Column Information and Data Types: Automatically parses column names and data types of the dataset to help users understand the data structure.
- Format Conversion: Supports conversion of datasets to Parquet format for more efficient usage in data analytics or machine learning tasks.
- Search and Filter: Users can quickly search for content within the dataset based on keywords.