Fine-Tune Domain Models Quickly to Unlock Custom AI Productivity

📌 Scenario Overview

Enterprises and research institutions often possess structured or unstructured domain-specific data, such as medical records, legal regulations, or financial reports. These datasets carry strong industry characteristics, where general-purpose LLMs may underperform. Leveraging CSGHub's model and data hosting capabilities, users can swiftly prepare training data and fine-tune open-source models to create tailored domain-specific LLMs.

Target Users: Startups / Industry research institutes
Goal: Train a business-aligned language model using industry corpora for tasks like Q&A systems, knowledge extraction, and document generation.

🧭 Step-by-Step Guide

1. Create a Training Dataset

Log in to Transense Community and create a new training dataset under your personal or team account.
For collaborative management, create an organization and invite members.

2. Upload Domain-Specific Data

Upload training corpora in .jsonl, .txt, or other common formats for supervised fine-tuning (SFT) or other tasks. Add descriptions and tags for easier management and reuse.
Multiple data upload methods are supported.

3. Select a Base Model

Browse and select a suitable open-source base model (e.g., DeepSeek, Qwen, Baichuan, InternLM) from the model list. Check its details and license.
Use the Model Tree feature on the model details page to review its lineage and derivatives for informed selection.

4. Launch a Fine-Tuning Task

On the model details page, click 【Fine-Tune Instance】 to configure the task name, training dataset, and parameters (e.g., learning rate, batch size, epochs), then start training.
Multiple fine-tuning frameworks are available. Refer to the Fine-Tuning Framework Guide for scenario-specific choices.

5. Publish New Model

After finetuning, upload the model to your personal or organizational repository via CLI or Web UI. Release it as a new version with tags and documentation for traceability.

🌟 Key Outcomes

Obtain a fine-tuned domain model with significantly improved comprehension of industry terminology and contexts.
Deploy the model in private environments for applications like Q&A bots and document generation.
Streamlined fine-tuning workflow accessible even to non-technical teams via Web UI.
Clear version control with rollback/upgrade support for continuous optimization.

📌 Scenario Overview​

🧭 Step-by-Step Guide​

1. Create a Training Dataset​

2. Upload Domain-Specific Data​

3. Select a Base Model​

4. Launch a Fine-Tuning Task​

5. Publish New Model​

🌟 Key Outcomes​