endpoint_set_model_for_inference

Configurations for model inference

In the installation script's .env file, check the variable HUB_SERVER_API_TOKEN to obtain the API key.
Use the curl command to call the API interface and add a runtime framework to the model. Please replace meta/llama-3.1-8b-instruct with the actual repo ID.

curl -X POST \
    "http://${HUB_SERVER_IP}:${HUB_SERVER_PORT}/api/v1/models/meta/llama-3.1-8b-instruct/runtime_framework?current_user=meta" \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer ${API_KEY}" \
    -d '{
        "container_port": 8000,
        "enabled": 1,
        "frame_cpu_image": "",
        "frame_image": "nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.2",
        "frame_name": "nim-llama-3.1-8b-instruct",
        "frame_version": "1.1.2",
        "type":1
      }'

Retrieve the ID returned in the second step, and use the curl command to call the API interface to add a runtime framework to the model.

curl -X POST \
    "http://${HUB_SERVER_IP}:${HUB_SERVER_PORT}/api/v1/runtime_framework/{id}?deploy_type=1&current_user=meta" \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer ${API_KEY}" \
    -d '{
      "models": [
        "meta/llama-3.1-8b-instruct"
      ]
    }'

Configurations for model inference​

Configurations for model inference