Skip to main content

Upgrade Guide

Upgrade Recommendations:

  • Verify version compatibility in a test environment before upgrading in production.
  • Retain at least 2 historical revisions for quick rollback.

1. Overview

This document provides a detailed guide on how to safely upgrade CSGHub in a production environment. The upgrade process is designed to ensure data security, service continuity, and minimize downtime.

💡 Scope: Applicable to CSGHub clusters deployed via Helm Chart.

2. Pre-upgrade Preparation

Before upgrading, you must back up your configurations and data to prevent data loss or configuration anomalies.

2.1 Back up Helm Configuration

Export the current Helm values of CSGHub to reuse them during the upgrade:

helm get values csghub -n csghub -o yaml > csghub-values-backup.yaml

2.2 Back up the Database

Perform a full backup of the associated database to ensure data traceability:

# Perform database dump
kubectl exec -it csghub-postgresql-0 -n csghub -- su - postgres -lc 'pg_dumpall -U csghub -f /tmp/all_dbs.sql'

# Copy the backup file to your local machine
kubectl cp csghub/csghub-postgresql-0:/tmp/all_dbs.sql all_dbs.sql

3. Pre-upgrade Operations

Note: If the target upgrade version is v1.17.0 or higher, the following scripts must be executed in sequential order.

v1.16.0

Starting from v1.16.0, the system deprecates ingress-nginx by default in favor of envoyGateway.

Since Helm does not automatically create CRDs (CustomResourceDefinitions) required by dependencies during an upgrade, you must manually install the relevant CRDs if your current cluster version is lower than v1.16.0.

Run the following script:

curl -sSL https://charts.opencsg.com/repository/scripts/crds_install.sh | bash

What this script does:

  • Installs envoyGateway related CRDs.
  • Ensures relevant components function correctly after upgrading to v1.16.0.

v1.17.0

Starting from v1.17.0, the resources for the following components will be managed collectively by the CSGHub Helm Chart:

  • Knative Serving
  • Argo Workflow
  • LeaderWorkSet

To allow Helm to take over these components, existing resources must be patched during the upgrade to add metadata related to Helm management.

Run the following script:

curl -sSL https://charts.opencsg.com/repository/scripts/crds_takeover.sh | bash

What this script does:

  • Adds Helm management metadata to existing Knative Serving, Argo Workflow, and LeaderWorkSet resources.
  • Enables these resources to be adopted and managed under the unified CSGHub Helm Chart.

v2.0.0

Starting with version v2.0.0, the embedded Dataflow will be switched to version v2.0.0. At that time, the task log storage database that Dataflow depends on will be switched from MongoDB to PostgreSQL. Before upgrading, please perform the following migration operation to switch the database (if necessary).

4. Starting the Upgrade

Follow these steps to execute the upgrade in an orderly manner:

  1. Update Helm Repository

    helm repo update csghub
  2. Confirm Target Version

    helm search repo csghub/csghub --versions
  3. Execute Upgrade By default, this upgrades to the latest version while reusing the backed-up configuration file:

    helm upgrade csghub csghub/csghub \
    --namespace csghub \
    -f csghub-values-backup.yaml

    To upgrade to a specific version:

    helm upgrade csghub csghub/csghub \
    --namespace csghub \
    -f csghub-values-backup.yaml \
    --version 2.0.0
  4. Wait for Services to be Ready

    # Wait for core CSGHub services
    kubectl wait --for=condition=Ready pod -n csghub -l '!job-name'

    # Wait for KnativeServing services
    kubectl wait --for=condition=Ready pod -n knative-serving -l '!job-name'

    # Check all pod statuses
    kubectl get pods -n csghub

    Tip: You can also use helm status csghub -n csghub to check the deployment status.

5. Rollback

Note: If using an external database, please perform manual recovery for the database portion.

If service anomalies or functional failures occur during the upgrade, execute a rollback immediately to restore the stable version:

  1. View Version History Identify the target revision number for the rollback:

    helm history csghub -n csghub
  2. Execute Rollback Roll back to the specified revision (replace revision with the actual number):

    helm rollback csghub revision -n csghub
  3. Restore Database

    • Create a NetworkPolicy: Temporarily block external access to PostgreSQL to avoid data conflicts during recovery:

      cat <<EOF | kubectl apply -f -
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
      name: block-postgresql-access
      namespace: csghub
      spec:
      podSelector:
      matchLabels:
      app.kubernetes.io/instance: csghub
      app.kubernetes.io/name: csghub
      app.kubernetes.io/service: postgresql
      policyTypes:
      - Ingress
      ingress: []
      EOF
    • Terminate Active Connections:

      kubectl exec -it csghub-postgresql-0 -n csghub -- su - postgres -c "psql -U csghub -c 'SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname NOT IN ('template0','template1') AND pid <> pg_backend_pid();'"
    • Execute Data Import:

      # Copy backup into the container
      kubectl cp all_dbs.sql csghub/csghub-postgresql-0:/tmp/all_dbs.sql
      # Run import
      kubectl exec -it csghub-postgresql-0 -n csghub -- su - postgres -c 'psql -U csghub -f /tmp/all_dbs.sql'
    • Delete NetworkPolicy:

      kubectl delete netpol block-postgresql-access -n csghub

6. Functional Verification

After the upgrade (or rollback) is complete, verify the following core functions:

  • The Web UI loads correctly without error messages.
  • User login and permission management are functioning normally.
  • Core business features (Model Inference, Data Management, etc.) are working.
  • Connections to external components (Database, Object Storage, etc.) are stable.