Skip to main content

Fine-Tune Domain Models Quickly to Unlock Custom AI Productivity

📌 Scenario Overview

Enterprises and research institutions often possess structured or unstructured domain-specific data, such as medical records, legal regulations, or financial reports. These datasets carry strong industry characteristics, where general-purpose LLMs may underperform. Leveraging CSGHub's model and data hosting capabilities, users can swiftly prepare training data and fine-tune open-source models to create tailored domain-specific LLMs.

  • Target Users: Startups / Industry research institutes
  • Goal: Train a business-aligned language model using industry corpora for tasks like Q&A systems, knowledge extraction, and document generation.

🧭 Step-by-Step Guide

1. Create a Training Dataset

  • Log in to Transense Community and create a new training dataset under your personal or team account.
    alt text
  • For collaborative management, create an organization and invite members.
    alt text

2. Upload Domain-Specific Data

  • Upload training corpora in .jsonl, .txt, or other common formats for supervised fine-tuning (SFT) or other tasks. Add descriptions and tags for easier management and reuse.
  • Multiple data upload methods are supported.

3. Select a Base Model

  • Browse and select a suitable open-source base model (e.g., DeepSeek, Qwen, Baichuan, InternLM) from the model list. Check its details and license.
  • Use the Model Tree feature on the model details page to review its lineage and derivatives for informed selection.
    alt text

4. Launch a Fine-Tuning Task

  • On the model details page, click 【Fine-Tune Instance】 to configure the task name, training dataset, and parameters (e.g., learning rate, batch size, epochs), then start training.
  • Multiple fine-tuning frameworks are available. Refer to the Fine-Tuning Framework Guide for scenario-specific choices.

5. Publish New Model

  • After finetuning, upload the model to your personal or organizational repository via CLI or Web UI. Release it as a new version with tags and documentation for traceability.

🌟 Key Outcomes

  • Obtain a fine-tuned domain model with significantly improved comprehension of industry terminology and contexts.
  • Deploy the model in private environments for applications like Q&A bots and document generation.
  • Streamlined fine-tuning workflow accessible even to non-technical teams via Web UI.
  • Clear version control with rollback/upgrade support for continuous optimization.