Skip to main content

Enflame Chat

Creating a Container Instance

  • Login to OpenCSG Compute Platform
  • Create an enfalme instance

Transform Model

  • Download chatglm model
cd /data/
git clone --depth 1 https://www.opencsg.com//models/THUDM/chatglm3-6b.git
  • Transform model
cd chatglm3-6b
python3 /usr/local/gcu/sample/topstransformer/model_demo/chatGLM2/model_parser.py -i ./ -o ./

Start Inference Service

  • Start Service
nohup text-generation-launcher --model-id=/data/chatglm3-6b --num-shard=1 --port=8080 --max-concurrent-requests=1024 --max-input-length=7168 --max-total-tokens=8192 --max-batch-prefill-tokens=8192 --trust-remote-code --max-waiting-tokens=6 --disable-custom-kernels --waiting-served-ratio=0.1 > inference.logs 2>&1 &

Test

curl -s -N --location 'http://localhost:8080/generate_stream' --header 'Content-Type: application/json' --data '{"inputs": "你叫什么名字","parameters": {"max_new_tokens": 256,"top_k": 1}}'