Upgrade Guide
Upgrade Recommendations:
- Verify version compatibility in a test environment before upgrading in production.
- Retain at least 2 historical revisions for quick rollback.
1. Overview
This document provides a detailed guide on how to safely upgrade CSGHub in a production environment. The upgrade process is designed to ensure data security, service continuity, and minimize downtime.
💡 Scope: Applicable to CSGHub clusters deployed via Helm Chart.
2. Pre-upgrade Preparation
Before upgrading, you must back up your configurations and data to prevent data loss or configuration anomalies.
2.1 Back up Helm Configuration
Export the current Helm values of CSGHub to reuse them during the upgrade:
helm get values csghub -n csghub -o yaml > csghub-values-backup.yaml
2.2 Back up the Database
Perform a full backup of the associated database to ensure data traceability:
# Perform database dump
kubectl exec -it csghub-postgresql-0 -n csghub -- su - postgres -lc 'pg_dumpall -U csghub -f /tmp/all_dbs.sql'
# Copy the backup file to your local machine
kubectl cp csghub/csghub-postgresql-0:/tmp/all_dbs.sql all_dbs.sql
3. Pre-upgrade Operations
Note: If the target upgrade version is v1.17.0 or higher, the following scripts must be executed in sequential order.
v1.16.0
Starting from v1.16.0, the system deprecates ingress-nginx by default in favor of envoyGateway.
Since Helm does not automatically create CRDs (CustomResourceDefinitions) required by dependencies during an upgrade, you must manually install the relevant CRDs if your current cluster version is lower than v1.16.0.
Run the following script:
curl -sSL https://charts.opencsg.com/repository/scripts/crds_install.sh | bash
What this script does:
- Installs
envoyGatewayrelated CRDs. - Ensures relevant components function correctly after upgrading to v1.16.0.
v1.17.0
Starting from v1.17.0, the resources for the following components will be managed collectively by the CSGHub Helm Chart:
- Knative Serving
- Argo Workflow
- LeaderWorkSet
To allow Helm to take over these components, existing resources must be patched during the upgrade to add metadata related to Helm management.
Run the following script:
curl -sSL https://charts.opencsg.com/repository/scripts/crds_takeover.sh | bash
What this script does:
- Adds Helm management metadata to existing Knative Serving, Argo Workflow, and LeaderWorkSet resources.
- Enables these resources to be adopted and managed under the unified CSGHub Helm Chart.
v2.0.0
Starting with version v2.0.0, the embedded Dataflow will be switched to version v2.0.0. At that time, the task log storage database that Dataflow depends on will be switched from MongoDB to PostgreSQL. Before upgrading, please perform the following migration operation to switch the database (if necessary).
v2.2.0
Starting from v2.2.0, the following changes must be addressed before upgrading:
Dataflow Breaking Changes:
-
StatefulSet → Deployment migration — The dataflow workload switches from
StatefulSettoDeployment. Before upgrading, back up thecsghub_dataflowdatabase and delete the old PVC:kubectl delete pvc data-<release-name>-dataflow-0 -n csghub -
Pre-upgrade migration Job — A
pre-upgradeHelm hook automatically runs/scripts/*_dataflow_*.sqlscripts (idempotent via_migrationstable). The initial migration snapshots and truncates 6 tables (collection_tasks,data_format_tasks,datasources,deletion_status,job,workers). Acsghub_dataflowdatabase dump is mandatory before the upgrade. -
Redis and MongoDB removed — Dataflow service no longer depends on Redis or MongoDB. Only PostgreSQL is required. Configurations referencing
dataflow.redis.*ordataflow.mongo.*should be removed.
Agentichub 0.6.x Config Schema Change:
Agentichub chart is upgraded to v0.6.1 along with csghub v2.2.0. The csgbot.toml configuration has been redesigned:
[opencsg]block renamed to[csghub]— keyshub-base-url/base-url/mcp-gateway-url→endpoint/portal-endpoint/mcp-gateway-endpoint[aigateway]keybase-url→endpoint; newtemperaturefield- New blocks:
[environment],[speech-to-text],[sandbox](with sub-blocks),[observability],[user-facing-errors] - Removed blocks:
[database],[external-docs],[web-search],[csghub-sandbox],[openclaw-sandbox] agenticflow-importerimage pinnedlatest→v1.0.4
Note: If you have custom csgbot.config.webSearch, externalDocs, or logging.disableGeniusChat values, migrate them under the new schema before upgrading.
New Services (ee/saas editions):
- llmlog (ee/saas): Logging worker that writes AI Gateway LLM call logs to MinIO
- fedap (saas only): Federation adapter (port 8099)
- trustregistry (saas only): Federation registry (port 8098)
Federation services are automatically enabled in global.edition=saas. In global.edition=ee, only llmlog is enabled (off by default).
Other New Configurations:
aigateway.moderation.llm.*— optional LLM-based content guard (default model:Qwen/Qwen3Guard-Gen-0.6B)server.allowCpuOnGpuNodes— control CPU workload scheduling on GPU nodes (default:false)
4. Starting the Upgrade
Follow these steps to execute the upgrade in an orderly manner:
-
Update Helm Repository
helm repo update csghub -
Confirm Target Version
helm search repo csghub/csghub --versions -
Execute Upgrade By default, this upgrades to the latest version while reusing the backed-up configuration file:
helm upgrade csghub csghub/csghub \--namespace csghub \-f csghub-values-backup.yamlTo upgrade to a specific version:
helm upgrade csghub csghub/csghub \--namespace csghub \-f csghub-values-backup.yaml \--version 2.0.0 -
Wait for Services to be Ready
# Wait for core CSGHub serviceskubectl wait --for=condition=Ready pod -n csghub -l '!job-name'# Wait for KnativeServing serviceskubectl wait --for=condition=Ready pod -n knative-serving -l '!job-name'# Check all pod statuseskubectl get pods -n csghubTip: You can also use
helm status csghub -n csghubto check the deployment status.
5. Rollback
Note: If using an external database, please perform manual recovery for the database portion.
If service anomalies or functional failures occur during the upgrade, execute a rollback immediately to restore the stable version:
-
View Version History Identify the target revision number for the rollback:
helm history csghub -n csghub -
Execute Rollback Roll back to the specified revision (replace
revisionwith the actual number):helm rollback csghub revision -n csghub -
Restore Database
-
Create a NetworkPolicy: Temporarily block external access to PostgreSQL to avoid data conflicts during recovery:
cat <<EOF | kubectl apply -f -apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata:name: block-postgresql-accessnamespace: csghubspec:podSelector:matchLabels:app.kubernetes.io/instance: csghubapp.kubernetes.io/name: csghubapp.kubernetes.io/service: postgresqlpolicyTypes:- Ingressingress: []EOF -
Terminate Active Connections:
kubectl exec -it csghub-postgresql-0 -n csghub -- su - postgres -c "psql -U csghub -c 'SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname NOT IN ('template0','template1') AND pid <> pg_backend_pid();'" -
Execute Data Import:
# Copy backup into the containerkubectl cp all_dbs.sql csghub/csghub-postgresql-0:/tmp/all_dbs.sql# Run importkubectl exec -it csghub-postgresql-0 -n csghub -- su - postgres -c 'psql -U csghub -f /tmp/all_dbs.sql' -
Delete NetworkPolicy:
kubectl delete netpol block-postgresql-access -n csghub
-
6. Functional Verification
After the upgrade (or rollback) is complete, verify the following core functions:
- The Web UI loads correctly without error messages.
- User login and permission management are functioning normally.
- Core business features (Model Inference, Data Management, etc.) are working.
- Connections to external components (Database, Object Storage, etc.) are stable.