Skip to main content

Downloading Datasets

If you want to get and download the datasets on CSGHub, we currently support downloading datasets via Git, web interface, command line and SDK. Below are the detailed steps for each method:

Downloading Datasets Using Git

  • Downloading dataset repositories using HTTP:
git lfs install
git clone http://101.200.14.180/datasets/opencsg/dataset123.git
  • Downloading dataset repositories using SSH:
git lfs install
git clone ssh://git@localhost:2222/datasets/opencsg/dataset123.git

Note: You need to add your SSH public key to your user settings in order to push changes or access private repositories. Click on "Account Settings" in the top right corner and go to "SSH Keys" to add your public key. SSH Key

Downloading Files Using Web Interface

Click the download button under the Files tab to download the file directly.

Download file

Downloading Files Using Command Line

Use command line tool csghub-cli to download data easily, the installation method is as follows:

pip install csghub-sdk

Here is an example of how to download a model:

export CSG_TOKEN=your_access_token

# donwload dataset
csghub-cli download demo/test_dataset -t dataset

Downloading Files Using SDK

CSGHub SDK Provide a Python Libaray,you can download files by code.

Here is an example of how to download a model:

from pycsghub.snapshot_download import snapshot_download
token="xxxx"
endpoint = "https://hub.opencsg.com"
repo_id = 'AIWizards/tmmluplus'
repo_type="dataset"
cache_dir = '/Users/xiangzhen/Downloads/'
result = snapshot_download(repo_id, repo_type=repo_type, cache_dir=cache_dir, endpoint=endpoint, token=token)

Multi-source Synchronization of Datasets

In the open-source version of CSGHub, you can browse datasets from the remote OpenCSG community. By entering a project and clicking the sync button, you can quickly synchronize the dataset to your local server. For more details, refer to the Multi-source Synchronization of Models section.

Check the video tutorial for more details: