Skip to main content

CSGHub-Lite Introduction

CSGHub-Lite is a lightweight tool for running large language models locally, powered by models from the CSGHub platform.

Inspired by Ollama, csghub-lite provides model download, local inference, interactive chat, and an OpenAI-compatible REST API — all from a single binary.

Features

  • One command to startcsghub-lite run downloads, loads, and chats
  • Model keep-alive — models stay loaded after exit (default 5 min), instant reconnect
  • Auto-start server — background API server starts automatically, no manual setup
  • Model download from CSGHub platform (hub.opencsg.com or private deployments)
  • Local inference via llama.cpp (GGUF models, SafeTensors auto-converted)
  • Interactive chat with streaming output
  • REST API compatible with Ollama's API format
  • Cross-platform — macOS, Linux, Windows
  • Resume downloads — interrupted downloads resume where they left off

Model Formats

FormatDownloadInference
GGUFYesYes (via llama.cpp)
SafeTensorsYesYes (auto-converted to GGUF)

SafeTensors checkpoints are converted once using the bundled llama.cpp convert_hf_to_gguf.py and system Python (PyTorch is not shipped inside the release binary). Install these packages once:

pip3 install torch safetensors gguf transformers

Use Python 3.10+ on PATH (Windows: python or python3). Some models may need extra packages (for example sentencepiece); see convert instruction for the full list and troubleshooting (gguf version mismatch, optional CSGHUB_LITE_CONVERTER_URL).