Enflame Chat
Creating a Container Instance
- Login to OpenCSG Compute Platform
- Create an enfalme instance
Transform Model
- Download chatglm model
cd /data/
git clone --depth 1 https://www.opencsg.com//models/THUDM/chatglm3-6b.git
- Transform model
cd chatglm3-6b
python3 /usr/local/gcu/sample/topstransformer/model_demo/chatGLM2/model_parser.py -i ./ -o ./
Start Inference Service
- Start Service
nohup text-generation-launcher --model-id=/data/chatglm3-6b --num-shard=1 --port=8080 --max-concurrent-requests=1024 --max-input-length=7168 --max-total-tokens=8192 --max-batch-prefill-tokens=8192 --trust-remote-code --max-waiting-tokens=6 --disable-custom-kernels --waiting-served-ratio=0.1 > inference.logs 2>&1 &
Test
curl -s -N --location 'http://localhost:8080/generate_stream' --header 'Content-Type: application/json' --data '{"inputs": "你叫什么名字","parameters": {"max_new_tokens": 256,"top_k": 1}}'