CLI Command Reference

CSGLite provides a rich command-line tool to manage and run large language models directly from your terminal. Model names generally follow the format namespace/name, e.g., Qwen/Qwen3-0.6B-GGUF.

Commands Overview

Category	Command	Description
Model Execution	`csghub-lite run <model>`	Automatically downloads and runs a model, launching an interactive chat session.
	`csghub-lite chat <model>`	Chat with a locally downloaded model (supports custom system prompts).
	`csghub-lite serve`	Starts the local REST API server offering Ollama/OpenAI-compatible endpoints.
Model Management	`csghub-lite pull <model>`	Downloads model files from the CSGHub platform.
	`csghub-lite list` or `ls`	Lists all downloaded models currently cached locally.
	`csghub-lite show <model>`	Displays detailed metadata for a specific model (files, format, size).
	`csghub-lite rm <model>`	Deletes a downloaded model from local storage to free up disk space.
	`csghub-lite search <query>`	Searches for models or datasets on the CSGHub platform.
Service Management	`csghub-lite ps`	Lists models currently loaded and running in memory/VRAM.
	`csghub-lite stop <model>`	Stops and unloads a running model to free up memory and GPU resources.
	`csghub-lite restart`	Restarts the background API service.
Config & Auth	`csghub-lite login`	Configures your CSGHub Access Token.
	`csghub-lite config`	Gets, sets, or displays local configuration settings.
Other	`csghub-lite --version` or `-v`	Displays the current client version.
	`csghub-lite help`	Displays general help instructions.
	`csghub-lite completion`	Generates shell autocompletion scripts.

Model Execution Commands

csghub-lite run

Automatically downloads the model (if it doesn't exist locally) and starts an interactive chat session.

Usage

csghub-lite run <model> [flags]

Option Flags

Flag	Description	Default Value
`--num-ctx <n>`	Custom context window size for this run.	Uses server default (e.g., 4096)
`--num-parallel <n>`	Parallel slot count; set to `1` to prioritize max context for single chats.	Uses server default
`--n-gpu-layers <n>`	Layer count to offload to the GPU. Set to `0` to disable GPU offloading.	GPU-enabled hosts try to offload all
`--cache-type-k <type>`	Data type for KV cache key compression (e.g., `f16`, `q8_0`, `q4_0`).	llama-server default
`--cache-type-v <type>`	Data type for KV cache value compression (e.g., `f16`, `q8_0`, `q4_0`).	llama-server default
`--dtype <type>`	Data type output when converting SafeTensors to GGUF (e.g., `f16`, `bf16`, `q8_0`).	`f16`

Examples

# Auto-download and start chatting
csghub-lite run Qwen/Qwen3-0.6B-GGUF

# Run with a massive context window in a single-chat scenario
csghub-lite run Qwen/Qwen3-0.6B-GGUF --num-ctx 131072 --num-parallel 1

# Compress KV cache to fit model on a low-VRAM GPU
csghub-lite run Qwen/Qwen3-0.6B-GGUF --cache-type-k q8_0 --cache-type-v q8_0

csghub-lite chat

Chats with an already downloaded local model. If the model is missing, an error is prompt.

Usage

csghub-lite chat <model> [flags]

Option Flags

Supports the same inference-control flags as run (--num-ctx, --num-parallel, --n-gpu-layers, --cache-type-k/v, --dtype), plus:

--system <prompt>: Configures a custom system prompt (System Role) for the chat session.

Examples

# Chat with local model
csghub-lite chat Qwen/Qwen3-0.6B-GGUF

# Chat with customized system instructions
csghub-lite chat Qwen/Qwen3-0.6B-GGUF --system "You are a translation assistant. Answer in English only."

csghub-lite serve

Manually starts the local REST API server, exposing Ollama/OpenAI-compatible API endpoints to third-party clients.

Usage

csghub-lite serve [flags]

Options

--listen <addr>: Binds the server socket to a custom address/port. Overrides the configured listen_addr (default :11435).

Examples

# Start the API server on port 8080
csghub-lite serve --listen :8080

Model Management Commands

csghub-lite pull

Downloads model files directly from the CSGHub platform.

Usage

csghub-lite pull <model>

csghub-lite list / ls

Lists all downloaded models currently cached in your local directories.

Usage

csghub-lite list

csghub-lite show

Displays detailed structure and metadata of a specific local model.

Usage

csghub-lite show <model>

csghub-lite rm

Deletes a downloaded model to free up local storage space.

Usage

csghub-lite rm <model>

csghub-lite search

Searches for models or datasets matching your query on the CSGHub registry.

Usage

csghub-lite search <query>

Service Management Commands

csghub-lite ps

Lists the models currently active and running (loaded in RAM/VRAM) on your local server.

Examples

$ csghub-lite ps
NAME                   FORMAT   SIZE       UNTIL
Qwen/Qwen3-0.6B-GGUF   gguf     609.8 MB   5 minutes

csghub-lite stop

Instructs the server to stop and unload a loaded model, releasing RAM/VRAM resources.

Usage

csghub-lite stop <model>

csghub-lite restart

Gracefully stops the active background API server, and restarts a fresh service.

Usage

csghub-lite restart

Configuration & Authentication Commands

Configures your Access Token to download and run private models from CSGHub.

Usage

csghub-lite login

csghub-lite config

Views and manages local settings for CSGLite. Config files are stored at ~/.csghub-lite/config.json.

Subcommands & Examples

# Display the entire configuration
csghub-lite config show

# Switch platform registry to a private CSGHub instance
csghub-lite config set server_url https://my-csghub.example.com

# Re-route the default models and datasets storage root directory
csghub-lite config set storage_dir /data/csghub-lite

# Fetch a single configuration key value
csghub-lite config get storage_dir

Commands Overview​

Model Execution Commands​

csghub-lite run​

Usage​

Option Flags​

Examples​

csghub-lite chat​

Usage​

Option Flags​

Examples​

csghub-lite serve​

Usage​

Options​

Examples​

Model Management Commands​

csghub-lite pull​

Usage​

csghub-lite list / ls​

Usage​

csghub-lite show​

Usage​

csghub-lite rm​

Usage​

csghub-lite search​

Usage​

Service Management Commands​

csghub-lite ps​

Examples​

csghub-lite stop​

Usage​

csghub-lite restart​

Usage​

Configuration & Authentication Commands​

csghub-lite login​

Usage​

csghub-lite config​

Subcommands & Examples​

Commands Overview

Model Execution Commands

csghub-lite run

Usage

Option Flags

Examples

csghub-lite chat

Usage

Option Flags

Examples

csghub-lite serve

Usage

Options

Examples

Model Management Commands

csghub-lite pull

Usage

csghub-lite list / ls

Usage

csghub-lite show

Usage

csghub-lite rm

Usage

csghub-lite search

Usage

Service Management Commands

csghub-lite ps

Examples

csghub-lite stop

Usage

csghub-lite restart

Usage

Configuration & Authentication Commands

csghub-lite login

Usage

csghub-lite config

Subcommands & Examples