Core Capabilities
AI Gateway is built around the core governance needs of enterprise AI services, offering the following seven key capabilities:
1. Unified AI API Access
AI Gateway is fully compatible with the OpenAI API protocol, providing a single unified entry point for applications and abstracting away the differences between model services.
Regardless of whether the underlying infrastructure consists of:
- Enterprise self-hosted inference services (e.g., vLLM, SGLang, TGI)
- Third-party SaaS model providers
- Multiple model capabilities from different sources
Business systems only need to integrate with one API to make unified calls across all of them.
The currently supported AI capability types include:
| AI Capability | API Endpoint | Description |
|---|---|---|
| Text Generation | /v1/chat/completions | Streaming and non-streaming supported |
| Embedding | /v1/embeddings | Text vectorization |
| Text-to-Image | /v1/images/generations | Generate images from text |
| Speech-to-Text | /v1/audio/transcriptions | Audio transcription |
| Video Generation | /v1/videos | Text-to-video / image-to-video |
| MCP Proxy | /v1/mcp/* | MCP service forwarding |
| Agent Proxy | /v1/agent/:type/* | Agent service proxy |
| Sandbox Proxy | /v1/sandboxes/:name/* | Sandbox environment proxy |
2. Multi-Model and Multi-Provider Unified Scheduling
AI Gateway supports configuring multiple upstream providers for the same model and distributes traffic through unified routing logic. Supported routing strategies include:
- Session-level sticky routing: Ensures continuity across multi-turn conversations
- Weighted round-robin load balancing: Distributes traffic across instances by weight
- Health-based dynamic routing: Automatically removes unhealthy nodes from the routing pool
- Automatic fault isolation: Triggers circuit breaking and removes failed nodes from rotation
Even when underlying models come from different providers or deployment environments, AI Gateway ensures stable service delivery through a single entry point.
3. Authentication and Quota Management
AI Gateway provides unified authentication and access control. All requests can be authenticated using the standard OpenAI-compatible Authorization: Bearer header, with support for:
- Access token validation
- Call quota enforcement
- Token usage limits
- TPM (Tokens Per Minute) rate limiting
- Separate tracking of input and output tokens
This gives enterprises fine-grained control over AI resource consumption.
4. Content Safety Inspection
AI Gateway includes built-in content safety inspection that applies uniformly to both user input and model-generated output. Features include:
- Real-time streaming inspection: Safety checks run in parallel with streaming output
- Full non-streaming inspection: Complete review of request and response payloads
- Allowlist bypass: Trusted sources can be configured to skip inspection
This approach maintains safety coverage while minimizing additional latency.
5. High Availability and Automatic Failover
To ensure AI service reliability, AI Gateway provides a complete suite of health checking and circuit breaking mechanisms, including:
- Active upstream health checks
- Automatic circuit breaking for unhealthy nodes
- Automatic isolation of failed nodes
- Request-level automatic failover
When a model service becomes unavailable, the system automatically switches to another available provider, reducing the risk of business disruption.
6. Request Logging and Data Retention
AI Gateway captures the full request and response lifecycle for each model call, including:
- Input prompts
- Tool call details
- Streaming output content
These logs can be used for:
- Auditing and troubleshooting
- Model quality analysis
- Data collection for fine-tuning and retraining
- Message queue consumption and downstream analytics
This helps enterprises progressively build their own AI data assets over time.
7. Usage Statistics and Billing
AI Gateway includes unified usage tracking that automatically measures token consumption and call volume across different AI capability types:
| Category | Description |
|---|---|
| Chat | Input / output token counts for text generation |
| Embedding | Input token counts for vectorization requests |
| Audio | Duration and call count for speech transcription |
| Image | Call count and resolution tier for image generation |
| Video | Call count for video generation |
This enables unified metering and cost management, providing the data foundation for internal cost allocation and commercial operations.