Core Capabilities

AI Gateway is built around the core governance needs of enterprise AI services, offering the following seven key capabilities:

1. Unified AI API Access

AI Gateway is fully compatible with the OpenAI API protocol, providing a single unified entry point for applications and abstracting away the differences between model services.

Regardless of whether the underlying infrastructure consists of:

Enterprise self-hosted inference services (e.g., vLLM, SGLang, TGI)
Third-party SaaS model providers
Multiple model capabilities from different sources

Business systems only need to integrate with one API to make unified calls across all of them.

The currently supported AI capability types include:

AI Capability	API Endpoint	Description
Text Generation	`/v1/chat/completions`	Streaming and non-streaming supported
Embedding	`/v1/embeddings`	Text vectorization
Text-to-Image	`/v1/images/generations`	Generate images from text
Speech-to-Text	`/v1/audio/transcriptions`	Audio transcription
Video Generation	`/v1/videos`	Text-to-video / image-to-video
MCP Proxy	`/v1/mcp/*`	MCP service forwarding
Agent Proxy	`/v1/agent/:type/*`	Agent service proxy
Sandbox Proxy	`/v1/sandboxes/:name/*`	Sandbox environment proxy

2. Multi-Model and Multi-Provider Unified Scheduling

AI Gateway supports configuring multiple upstream providers for the same model and distributes traffic through unified routing logic. Supported routing strategies include:

Session-level sticky routing: Ensures continuity across multi-turn conversations
Weighted round-robin load balancing: Distributes traffic across instances by weight
Health-based dynamic routing: Automatically removes unhealthy nodes from the routing pool
Automatic fault isolation: Triggers circuit breaking and removes failed nodes from rotation

Even when underlying models come from different providers or deployment environments, AI Gateway ensures stable service delivery through a single entry point.

3. Authentication and Quota Management

AI Gateway provides unified authentication and access control. All requests can be authenticated using the standard OpenAI-compatible Authorization: Bearer header, with support for:

Access token validation
Call quota enforcement
Token usage limits
TPM (Tokens Per Minute) rate limiting
Separate tracking of input and output tokens

This gives enterprises fine-grained control over AI resource consumption.

4. Content Safety Inspection

AI Gateway includes built-in content safety inspection that applies uniformly to both user input and model-generated output. Features include:

Real-time streaming inspection: Safety checks run in parallel with streaming output
Full non-streaming inspection: Complete review of request and response payloads
Allowlist bypass: Trusted sources can be configured to skip inspection

This approach maintains safety coverage while minimizing additional latency.

5. High Availability and Automatic Failover

To ensure AI service reliability, AI Gateway provides a complete suite of health checking and circuit breaking mechanisms, including:

Active upstream health checks
Automatic circuit breaking for unhealthy nodes
Automatic isolation of failed nodes
Request-level automatic failover

When a model service becomes unavailable, the system automatically switches to another available provider, reducing the risk of business disruption.

6. Request Logging and Data Retention

AI Gateway captures the full request and response lifecycle for each model call, including:

Input prompts
Tool call details
Streaming output content

These logs can be used for:

Auditing and troubleshooting
Model quality analysis
Data collection for fine-tuning and retraining
Message queue consumption and downstream analytics

This helps enterprises progressively build their own AI data assets over time.

7. Usage Statistics and Billing

AI Gateway includes unified usage tracking that automatically measures token consumption and call volume across different AI capability types:

Category	Description
Chat	Input / output token counts for text generation
Embedding	Input token counts for vectorization requests
Audio	Duration and call count for speech transcription
Image	Call count and resolution tier for image generation
Video	Call count for video generation

This enables unified metering and cost management, providing the data foundation for internal cost allocation and commercial operations.

1. Unified AI API Access​

2. Multi-Model and Multi-Provider Unified Scheduling​

3. Authentication and Quota Management​

4. Content Safety Inspection​

5. High Availability and Automatic Failover​

6. Request Logging and Data Retention​

7. Usage Statistics and Billing​