Installation Summary
Version History:
- Starting from v0.9.0, CSGHub will no longer provide support for Gitea as a git backend.
- Starting from v1.1.0, Add Temporal component as an asynchronous/scheduled task executor.
- Starting from v1.3.0, CSGHub removes gitea from the docker-compose/helm-chart installer.
- Starting from v1.6.0, Space Builder is removed, its function is inherited by runner.
- Starting from v1.7.0, CSGHub internal integration starship.
- Starting from v1.8.0, New services Notification added.
Introduction
CSGHub is an open source, trusted large model asset management platform that helps users govern assets (datasets, model files, codes, etc.) involved in the life cycle of LLM and its applications. Based on CSGHub, users can operate assets such as model files, data sets, and codes through web interfaces, Git command lines, or natural language chatbots, including uploading, downloading, storing, verifying, and distributing. At the same time, the platform provides microservice submodules and standardized APIs to facilitate users to integrate with their own systems.
CSGHub is committed to providing users with an asset management platform that is natively designed for large models and can be privately deployed and run offline. CSGHub provides a similar private Hugging Face function to manage LLM assets in a similar way to OpenStack Glance managing virtual machine images, Harbor managing container images, and Sonatype Nexus managing artifacts.
For an introduction to CSGHub, please refer to:
- Portal: https://github.com/OpenCSGs/csghub
- Server: https://github.com/OpenCSGs/csghub-server
- Installer: https://github.com/OpenCSGs/csghub-installer
Deployment methods
This project mainly introduces various installation methods of CSGHub.
Currently, there are following installation methods for CSGHub:
- Docker
- Helm Chart
- VirtualBox(not recommended)
Service Introduction
The CSGHub project consists of multiple components, each of which has specific responsibilities, and together they form an efficient and scalable system architecture. The following is a brief introduction to each component:
- csghub_portal: Responsible for the management and display of the user interface, providing an intuitive interface for users to interact with the system.
- csghub_server: Provides the main service logic and API interface, and handles requests sent by the client.
- csghub_user: Manages user identity and authentication processes, ensures the security and privacy of user information, and supports user registration, login, and permission management.
- csghub_proxy: Responsible for forwarding requests related to deployment instances, such as forwarding space application operation requests to Knative Serving services.
- csghub_accounting: Billing system, responsible for the cost statistics generated during resource usage.
- csghub_mirror: Provides warehouse data synchronization services, responsible for synchronizing opencsg.com models and datasets to local.
- csghub_runner: Responsible for deploying and managing application instances in the Kubernetes cluster to ensure fast building and continuous delivery of applications.
- csghub_aigateway: AI Gateway is an intelligent middle layer that manages and optimizes access to AI services, unifying interfaces, routing requests, ensuring security, and controlling costs.
- csghub_dataflow [ee only]: Responsible for data set processing and providing data for model training.
- csghub_dataviewer: Helps users to preview datasets more quickly on the page.
- csghub_moderation: Sensitive content detection.
- csghub_notification: Responsible for message notifications inside and outside the site.
- csghub_watcher [helm only]: Monitor all Secret and ConfigMap changes of CSGHub and update related dependent resources.
- gitaly: Used for Git storage backend, providing high-performance Git operations, and achieving fast and efficient code version control and management.
- gitlab-shell: Provides an interactive interface for Git over SSH for secure Git operations to ensure the security of data transmission.
- nats: Implements messaging and event-driven architecture between microservices, provides efficient asynchronous communication capabilities, and enhances the decoupling and response speed of the system.
- minio: Provides high-performance local object storage services.
- postgresql: Stores metadata of each component and provides efficient data query and update capabilities.
- registry: Provides container image repository services to facilitate storage, management, and distribution of container images.
- redis: Provides high-performance cache and data storage services.
- casdoor: Responsible for user identity authentication and authorization, and cooperates with csghub_user to complete user management.
- coredns: Used to resolve CSGHub's internal DNS requests, such as the internal domain name resolution used in Knative Serving.
- temporal: Asynchronous task management service, used to execute time-consuming tasks, such as resource synchronization tasks.
- fluentd: A flexible log collection and processing framework that aggregates and forwards application logs for real-time monitoring, analysis, and troubleshooting.
Architecture Diagram
---
config:
layout: elk
look: neo
theme: mc
---
flowchart LR
subgraph Clients["Clients"]
Browser(("Browser"))
Git(("Git"))
end
subgraph Kubernetes["Kubernetes"]
KnativeServing["KnativeServing"]
Argo["Argo Workflow"]
LeaderWorkSet["LeaderWorkSet"]
end
subgraph Infrastructure["Infrastructure"]
PostgreSQL["PostgreSQL"]
Redis["Redis Cache"]
ObjectStorage["Minio"]
end
subgraph Deployment["Deployment Tasks"]
RProxy[["RProxy"]]
Runner[["Runner"]]
Registry["Registry"]
CoreDNS["CoreDNS"]
end
subgraph Asynchronous["Asynchronous Tasks"]
Nats["Nats"]
Temporal["Temporal"]
end
subgraph Mirror["Mirror"]
SyncServer[["SyncServer"]]
MirrorClient[["Mirror"]]
end
subgraph CSGHub-SRV["CSGHub SubServices"]
Portal["Portal"]
Server["Server"]
AIGateway[["AIGateway"]]
Dataviewer[["Dataviewer"]]
Moderation[["Moderation"]]
Accounting[["Accounting"]]
Notification[["Notification"]]
User[["User"]]
Mirror
Deployment
end
subgraph CSGHub["CSGHub Architecture"]
Nginx["Nginx"]
Casdoor["Casdoor"]
Infrastructure
CSGHub-SRV
Asynchronous
Gitaly("Gitaly")
Gitlab-Shell["Gitlab-Shell"]
Dataflow["Dataflow"]
end
Browser -- TCP 80,443 --> Nginx
Git -- TCP 80,443 --> Nginx
Git -- TCP 22 --> Gitlab-Shell
Nginx -- LoadBalancer / NodePort --> KnativeServing
Nginx -- TCP 8090 --> Portal
Nginx -- TCP 8080 --> Server & Temporal
Nginx -- TCP 8083 --> RProxy
Nginx -- TCP 8000 --> Casdoor
Nginx -- TCP 9000 --> ObjectStorage
Portal -- TCP 8080 --> Server
Portal -- TCP 5432 --> PostgreSQL
Portal -- TCP 9000 --> ObjectStorage
Server -- TCP 8086 --> Accounting
Server -- TCP 8095 --> Notification
Server -- TCP 8084 --> AIGateway
Server -- TCP 8093 --> Dataviewer
Server -- TCP 8089 --> Moderation
Server -- TCP 8083 --> Runner
Server -- TCP 7233 --> Temporal
Server -- TCP 5000 --> Registry
Server -- TCP 4222 --> Nats
Server -- TCP 8080 --> User & Dataflow
Server -- TCP 8075 --> Gitaly
Server -- TCP 5432 --> PostgreSQL
Server -- TCP 6379 --> Redis
Server -- TCP 9000 --> ObjectStorage
User -- TCP 7233 --> Temporal
User -- TCP 8000 --> Casdoor
User -- TCP 4222 --> Nats
User -- TCP 5432 --> PostgreSQL
MirrorClient -- TCP 7233 --> Temporal
MirrorClient -- TCP 6379 --> Redis
MirrorClient -- TCP 5432 --> PostgreSQL
Moderation -- TCP 7233 --> Temporal
RProxy -- UDP 53 --> CoreDNS
RProxy -- TCP 80 --> Nginx
RProxy -- TCP 6379 --> Redis
RProxy -- TCP 5432 --> PostgreSQL
Dataviewer -- TCP 7233 --> Temporal
Dataviewer -- TCP 5432 --> PostgreSQL
Dataviewer -- TCP 8075 --> Gitaly
Dataviewer -- TCP 9000 --> ObjectStorage
Accounting -- TCP 4222 --> Nats
Accounting -- TCP 5432 --> PostgreSQL
Notification -- TCP 4222 --> Nats
Notification -- TCP 5432 --> PostgreSQL
Registry -- TCP 9000 --> ObjectStorage
Runner -- TCP 6443 --> Kubernetes
Runner -- TCP 6379 --> Redis
Runner -- TCP 5432 --> PostgreSQL
SyncServer -- TCP 8080 --> Server
SyncServer -- TCP 8086 --> Accounting
SyncServer -- TCP 5432 --> PostgreSQL
Gitlab-Shell -- TCP 8075 --> Gitaly
Dataflow -- TCP 5432 --> PostgreSQL
AIGateway -- TCP 5432 --> PostgreSQL
PostgreSQL@{ shape: db}
Redis@{ shape: db}
ObjectStorage@{ shape: disk}
Nats@{ shape: h-cyl}
Dataflow@{ shape: h-cyl}