Skip to main content

endpoint_set_model_for_inference

Configurations for model inference

  1. In the installation script's .env file, check the variable HUB_SERVER_API_TOKEN to obtain the API key.
  2. Use the curl command to call the API interface and add a runtime framework to the model. Please replace meta/llama-3.1-8b-instruct with the actual repo ID.
curl -X POST \
"http://${HUB_SERVER_IP}:${HUB_SERVER_PORT}/api/v1/models/meta/llama-3.1-8b-instruct/runtime_framework?current_user=meta" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${API_KEY}" \
-d '{
"container_port": 8000,
"enabled": 1,
"frame_cpu_image": "",
"frame_image": "nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.2",
"frame_name": "nim-llama-3.1-8b-instruct",
"frame_version": "1.1.2",
"type":1
}'
  1. Retrieve the ID returned in the second step, and use the curl command to call the API interface to add a runtime framework to the model.
curl -X POST \
"http://${HUB_SERVER_IP}:${HUB_SERVER_PORT}/api/v1/runtime_framework/{id}?deploy_type=1&current_user=meta" \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer ${API_KEY}" \
-d '{
"models": [
"meta/llama-3.1-8b-instruct"
]
}'