Migrating Dataflow Task Logs
1. Migration Background
Starting from Dataflow version v1.17.1, in order to minimize the variety of supported databases and optimize deployment complexity, MongoDB will no longer be used as the default storage for task logs. It has been replaced by PostgreSQL.
- Action Required: If you need to retain historical task logs, follow the migration steps below.
- Optional: If the historical logs are no longer needed, they can be discarded directly.
2. Migration Steps
2.1 Expose MongoDB Service Locally
Expose the service locally so that the script can export data from the database into a SQL script.
Create a temporary Service:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
annotations:
meta.helm.sh/release-name: csghub
meta.helm.sh/release-namespace: csghub
labels:
app.kubernetes.io/instance: csghub
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: dataflow
app.kubernetes.io/service: mongo
name: csghub-mongo-export
namespace: csghub
spec:
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: mongo
port: 27017
protocol: TCP
targetPort: 27017
selector:
app.kubernetes.io/instance: csghub
app.kubernetes.io/name: dataflow
app.kubernetes.io/service: mongo
sessionAffinity: None
type: NodePort
EOF
2.2 Configure the Migration Environment
The export script is written in Python. Complete the following preparations before running it:
Configure local PIP sources:
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip config set global.trusted-host pypi.tuna.tsinghua.edu.cn
Install the Mongo SDK:
python -m pip install pymongo
Retrieve database credentials:
kubectl get cm csghub-dataflow -o yaml -n csghub | grep MONG
Update the database connection string in the script:
Modify the MONGO_URI variable:
MONGO_URI = "mongodb://root:<password_from_configmap>@127.0.0.1:<tmp_service_nodeport>"
2.3 Execute Export Operation (MongoDB)
Run the following command to download and execute the script.
curl -sL https://charts.opencsg.com/repository/scripts/python/migrate_mongo_to_sql.py | python
Upon completion, a file named mongo_logs_export.sql will be generated.
2.4 Execute Import Operation (PostgreSQL)
Copy the script to the database container:
kubectl cp mongo_logs_export.sql csghub/csghub-postgresql-0:/tmp
Create the logs database:
kubectl exec -it csghub-postgresql-0 -n csghub -- su - postgres -l 'psql -U csghub -d csghub_server -c "create database csghub_dataflow_logs owner csghub_dataflow;"'
Execute the import:
kubectl exec -it csghub-postgresql-0 -n csghub -- su - postgres -l 'psql -U csghub -d csghub_server -f /tmp/mongo_logs_export.sql'
2.5 Delete the Temporary Service
kubectl delete svc csghub-mongo-export -n csghub
3. Continue Upgrade
Proceed with the upgrade to CSGHub v2.0.0 or Dataflow v1.17.1.