Skip to main content

Core Capabilities

AI Gateway is built around the core governance needs of enterprise AI services, offering the following seven key capabilities:

1. Unified AI API Access

AI Gateway is fully compatible with the OpenAI API protocol, providing a single unified entry point for applications and abstracting away the differences between model services.

Regardless of whether the underlying infrastructure consists of:

  • Enterprise self-hosted inference services (e.g., vLLM, SGLang, TGI)
  • Third-party SaaS model providers
  • Multiple model capabilities from different sources

Business systems only need to integrate with one API to make unified calls across all of them.

The currently supported AI capability types include:

AI CapabilityAPI EndpointDescription
Text Generation/v1/chat/completionsStreaming and non-streaming supported
Embedding/v1/embeddingsText vectorization
Text-to-Image/v1/images/generationsGenerate images from text
Speech-to-Text/v1/audio/transcriptionsAudio transcription
Video Generation/v1/videosText-to-video / image-to-video
MCP Proxy/v1/mcp/*MCP service forwarding
Agent Proxy/v1/agent/:type/*Agent service proxy
Sandbox Proxy/v1/sandboxes/:name/*Sandbox environment proxy

2. Multi-Model and Multi-Provider Unified Scheduling

AI Gateway supports configuring multiple upstream providers for the same model and distributes traffic through unified routing logic. Supported routing strategies include:

  • Session-level sticky routing: Ensures continuity across multi-turn conversations
  • Weighted round-robin load balancing: Distributes traffic across instances by weight
  • Health-based dynamic routing: Automatically removes unhealthy nodes from the routing pool
  • Automatic fault isolation: Triggers circuit breaking and removes failed nodes from rotation

Even when underlying models come from different providers or deployment environments, AI Gateway ensures stable service delivery through a single entry point.

3. Authentication and Quota Management

AI Gateway provides unified authentication and access control. All requests can be authenticated using the standard OpenAI-compatible Authorization: Bearer header, with support for:

  • Access token validation
  • Call quota enforcement
  • Token usage limits
  • TPM (Tokens Per Minute) rate limiting
  • Separate tracking of input and output tokens

This gives enterprises fine-grained control over AI resource consumption.

4. Content Safety Inspection

AI Gateway includes built-in content safety inspection that applies uniformly to both user input and model-generated output. Features include:

  • Real-time streaming inspection: Safety checks run in parallel with streaming output
  • Full non-streaming inspection: Complete review of request and response payloads
  • Allowlist bypass: Trusted sources can be configured to skip inspection

This approach maintains safety coverage while minimizing additional latency.

5. High Availability and Automatic Failover

To ensure AI service reliability, AI Gateway provides a complete suite of health checking and circuit breaking mechanisms, including:

  • Active upstream health checks
  • Automatic circuit breaking for unhealthy nodes
  • Automatic isolation of failed nodes
  • Request-level automatic failover

When a model service becomes unavailable, the system automatically switches to another available provider, reducing the risk of business disruption.

6. Request Logging and Data Retention

AI Gateway captures the full request and response lifecycle for each model call, including:

  • Input prompts
  • Tool call details
  • Streaming output content

These logs can be used for:

  • Auditing and troubleshooting
  • Model quality analysis
  • Data collection for fine-tuning and retraining
  • Message queue consumption and downstream analytics

This helps enterprises progressively build their own AI data assets over time.

7. Usage Statistics and Billing

AI Gateway includes unified usage tracking that automatically measures token consumption and call volume across different AI capability types:

CategoryDescription
ChatInput / output token counts for text generation
EmbeddingInput token counts for vectorization requests
AudioDuration and call count for speech transcription
ImageCall count and resolution tier for image generation
VideoCall count for video generation

This enables unified metering and cost management, providing the data foundation for internal cost allocation and commercial operations.