Models

Products

Pricing

Docs

Blog

About

Contact

🎉 Kimi-K3 is available on SiliconFlow. Try it NOW.

🎉 Kimi-K3 is available on SiliconFlow. Try it NOW.

One Platform
All Your AI Inference Needs

One Platform
All Your AI Inference Needs

Run Powerful AI Models Faster, Smarter, at Any Scale, with Predictable Costs

Run Powerful AI Models Faster, Smarter, at Any Scale, with Predictable Costs

Get Started for Free

Contact Sales

SiliconFlow AI Cloud

Pay all your Attention

to Build, to Explore, to Create

Turning AI Ambition into Action

Coding

Code understanding, code generation, inline fixes, real-time autocomplete, structured edits and syntax-safe suggestions

Agent

Multi-step reasoning, planing, tool-using and executing workflows, to handle complex tasks by agentic systems

RAG

Retrieving relevant information from knowledge bases, enabling accurate, real-time responses

Content Genaration

Text, Image and Video generation, social media content creation, analytical report generation

AI Assistants

Workflows, multi-agent, customer support bots, document review, data analysis

Search

Query understanding, long-context summarization, real-time answers, personalized recommendations, actionable insights delivery

AI Models

High-Speed Inference for

Text, Image, Video, and Beyond

One API for All Open and Commercial LLMs & Multimodal Models

Moonshot AI

chat

Kimi-K3

Release on: Jul 16, 2026

Total Context:

1049K

Max output:

262K

Input:

$

3.0

/ M Tokens

Output:

$

15.0

/ M Tokens

LongCat

LongCat

chat

LongCat-2.0

Release on: Jun 30, 2026

Total Context:

1049K

Max output:

131K

Input:

$

0.75

/ M Tokens

Output:

$

2.95

/ M Tokens

Tencent

chat

Hy3

Release on: Jun 26, 2026

Total Context:

262K

Max output:

262K

Input:

$

0.132

/ M Tokens

Output:

$

0.528

/ M Tokens

Z.ai

chat

GLM-5.2

Release on: Jun 17, 2026

Total Context:

1049K

Max output:

262K

Input:

$

1.302

/ M Tokens

Output:

$

4.092

/ M Tokens

Moonshot AI

chat

Kimi-K2.7-Code

Release on: Jun 16, 2026

Total Context:

262K

Max output:

262K

Input:

$

0.85916

/ M Tokens

Output:

$

3.8

/ M Tokens

Google

chat

gemma-4-12B-it

Release on: Jun 9, 2026

Total Context:

262K

Max output:

262K

Input:

$

0.1

/ M Tokens

Output:

$

0.3

/ M Tokens

Nex AGI

chat

Nex-N2-Pro

Release on: Jun 2, 2026

Total Context:

262K

Max output:

256K

Input:

$

0.5

/ M Tokens

Output:

$

2.5

/ M Tokens

MiniMaxAI

chat

MiniMax-M3

Release on: Jun 1, 2026

Total Context:

1049K

Max output:

131K

Input:

$

0.3

/ M Tokens

Output:

$

1.2

/ M Tokens

Qwen

chat

Qwen3.6-27B

Release on: May 9, 2026

Total Context:

262K

Max output:

262K

Input:

$

0.3

/ M Tokens

Output:

$

3.2

/ M Tokens

Qwen

chat

Qwen3.6-35B-A3B

Release on: May 9, 2026

Total Context:

262K

Max output:

262K

Input:

$

0.2

/ M Tokens

Output:

$

1.6

/ M Tokens

Qwen

chat

Qwen3.5-9B

Release on: May 9, 2026

Total Context:

262K

Max output:

262K

Input:

$

0.1

/ M Tokens

Output:

$

0.15

/ M Tokens

Qwen

chat

Qwen3.5-122B-A10B

Release on: May 9, 2026

Total Context:

262K

Max output:

262K

Input:

$

0.26

/ M Tokens

Output:

$

2.08

/ M Tokens

Moonshot AI

chat

Kimi-K3

Release on: Jul 16, 2026

Total Context:

1049K

Max output:

262K

Input:

$

3.0

/ M Tokens

Output:

$

15.0

/ M Tokens

LongCat

LongCat

chat

LongCat-2.0

Release on: Jun 30, 2026

Total Context:

1049K

Max output:

131K

Input:

$

0.75

/ M Tokens

Output:

$

2.95

/ M Tokens

Tencent

chat

Hy3

Release on: Jun 26, 2026

Total Context:

262K

Max output:

262K

Input:

$

0.132

/ M Tokens

Output:

$

0.528

/ M Tokens

Z.ai

chat

GLM-5.2

Release on: Jun 17, 2026

Total Context:

1049K

Max output:

262K

Input:

$

1.302

/ M Tokens

Output:

$

4.092

/ M Tokens

Moonshot AI

chat

Kimi-K2.7-Code

Release on: Jun 16, 2026

Total Context:

262K

Max output:

262K

Input:

$

0.85916

/ M Tokens

Output:

$

3.8

/ M Tokens

Google

chat

gemma-4-12B-it

Release on: Jun 9, 2026

Total Context:

262K

Max output:

262K

Input:

$

0.1

/ M Tokens

Output:

$

0.3

/ M Tokens

products

Flexible Deployment Options,

Built for Every Use Case

Run models serverlessly, on dedicated endpoints, or bring your own setup.

Serverless

Serverless

Run any model instantly, no setup, one API call, pay-per-use.

Fine-tuning

Fine-tuning

Customize powerful models to your use case, one-click deployment.

Reserved GPUs

Reserved GPUs

Guaranteed GPU capacity for stable performance and predictable billing.

Elastic GPUs

Elastic GPUs

Flexible FaaS deployment with reliable and scalable inference.

AI Gateway

AI Gateway

Unified access with smart routing, rate limits and cost control.

Train & Fine-Tune

Data access & processing, model training, performance tuning ...

Inference & Deployment

Self-developed modal inference engine, end-to-end optimization ...

High-performance GPUs

NVIDIA H100 / H200, AMD MI300, RTX 4090 …

advantage

Built for What Developers

Really Care About

Speed, accuracy, reliability, and fair pricing—no trade-offs.

Speed

Blazing-fast inference for both language and multimodal models.

Flexibility

Serverless, dedicated, or custom—run models your way.

Efficiency

Higher throughput, lower latency, and better price.

Privacy

No data stored, ever. Your models stay yours.

Control

Fine-tune, deploy, and scale your models your way—no infrastructure headaches, no lock-in.

Simplicity

One API for all models, fully OpenAI-compatible.

BLOG

What's New

Kimi K3 Now Live on SiliconFlow: The First Open 3T-Class Model at Frontier-Level Performance

Kimi K3 Now Live on SiliconFlow: The First Open 3T-Class Model at Frontier-Level Performance

Jul 21, 2026

Developer reviewing the Hy3 AI model dashboard with Mixture-of-Experts architecture, 256K context, and coding workflow tools

What Is Hy3? Tencent Hunyuan’s MoE Model Explained for Developers

What Is Hy3? Tencent Hunyuan’s MoE Model Explained for Developers

Jul 13, 2026

Developer using a laptop and large monitor to build long-context AI agent workflows with code, document analysis, and API tools

Hy3 API Guide: How to Use Tencent Hunyuan Hy3 on SiliconFlow

Hy3 API Guide: How to Use Tencent Hunyuan Hy3 on SiliconFlow

Jul 13, 2026

FAQ

Frequently Asked Questions

What types of models can I deploy on your platform?

How does your pricing structure work?

Can I customize the models to fit my specific needs?

What kind of support do you offer for developers?

How do you ensure the performance and reliability of your APIs?

Is your platform compatible with OpenAI standards?

What types of models can I deploy on your platform?

How does your pricing structure work?

Can I customize the models to fit my specific needs?

What kind of support do you offer for developers?

How do you ensure the performance and reliability of your APIs?

Is your platform compatible with OpenAI standards?

Ready to accelerate your AI development?

Ready to accelerate your AI development?

Ready to accelerate your AI development?

PAGES

MODELS

PRODUCTS

© 2026 SiliconFlow

·

PAGES

MODELS

PRODUCTS

© 2026 SiliconFlow

·

PAGES

MODELS

PRODUCTS

© 2026 SiliconFlow

·