Skip to main content

Create a Personal Dataset

Before using DataFlow for data processing, you need to create a personal dataset on CSGHub. The dataset must be created via the backend API to ensure independent data management and smooth operations.

Register and Log In

Visit CSGHub and click the Login/Register button in the top-right corner to log in or create an account.

Obtain Access Token

Click your avatar, go to Settings, and generate an Access Token for API usage.

Access Token

Create a Personal Dataset via API

Use Postman or command line to create a dataset through the API. Below is an example using the curl command. Replace "Your-Access-Token" and "Your Account Name" in the following command:

curl --location 'https://hub.opencsg.com/api/v1/datasets' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <Your-Access-Token>' \
--data '{
"default_branch": "main",
"description": "dataset examples",
"labels": "a",
"license": "MIT",
"name": "dataflow-dataset",
"namespace": "<Your Account Name>",
"nickname": "dataflow-dataset",
"private": false,
"readme": "dataflow datasets need to be refined"
}'

Upload Dataset Files

After creation, visit Profile to view your dataset.

Dataset Info

From the dataset details page, click Download Dataset button to clone the repository. Then copy your local dataset files into the repository folder. For example:

cd dataflow-dataset
cp -rf /work/my_dataset_dir/* .
git add .
git commit -m "commit message"
git push origin main

Download Dataset