Uploading Datasets
To upload datasets, you will need to create an account at CSGHub. Datasets are Git-based repositories, which give you versioning, branches, discoverability and sharing features. You can upload anything you want to the dataset repository.
We support two ways to upload files currently: using Git or web interface.
Upload Files to a Repository Using Git
First clone the repository to your local machine. Copy the file to the corresponding repository.
Assuming that your files are located in the
/work/my_model_dir
local directory, you can upload the local files to the platform with the following command:cd dataset123
cp -rf /work/my_dataset_dir/* .
git add .
git commit -m "commit message"
git push
[Note]
Files with the following suffixes are automatically uploaded with git-lfs:
.7z,.arrow,.bin,.bz2,.ckpt,.ftz,.gz,.h5,.joblib,.lz4,.mlmodel,.model,.msgpack,.npy,.npz,.onnx,.ot,.parquet,.pb,.pickle,.pkl,.pt,.pth,.rar,.safetensors,.tar,.tflite,.tgz,.wasm,.xz,.zip,.zst,.tfevents,.pcm,.sam,.raw,.aac,.flac,.mp3,.ogg,.wav,.bmp,.gif,.png,.tiff,.jpg,.jpeg,.webp
If there are other types of large files, run the following command to make them upload as lfs:
git lfs track <your_file_name>
Upload Files to a Repository Using Web Interface
To add files to your repository with the web interface, start by selecting the Files tab, and then clicking Add file. You will be given the option to create a new file or upload a file.
Creating a New File
Click Create new file, add the contents and click Create File to save your file.
Uploading a File
Click Upload file, you can choose a local file to upload.
Viewing the Dataset Repository History
Each time you perform the add-commit-push
, the dataset repository tracks every change you make to the files. You can browse the dataset files and commits, and view the differences (also known as diff) introduced by each commit. To view the history, click on "commit history."
You can also click on an individual commit to see what changes were introduced in that specific commit: