LLM API 提供商

Nvidia

模型名	延迟	发布日期
z-ai/glm5	7s，常超时	2026-2-11
z-ai/glm4.7	7s	2025-12-22

moonshotai/kimi-k2.5	超时
moonshotai/kimi-k2-thinking	9.6s	2025-12-08
moonshotai/kimi-k2-instruct-0905	1s	2025-09-05
~~moonshotai/kimi-k2-instruct~~	1s

minimaxai/minimax-m2.5	1s	2026-02-12

qwen/qwen3.5-397b-a17b	24s，常超时	2026-02-16
qwen/qwen3.5-122b-a10b	15s	2026-02-24
~~qwen/qwen3-coder-480b-a35b-instruct~~	1.2s
~~qwen/qwen3-next-80b-a3b-instruct~~	0.7s
~~qwen/qwen3-next-80b-a3b-thinking~~	3.8s
~~qwen/qwq-32b~~	2.7s

deepseek-ai/deepseek-v3.2	1m10s，常超时	2025-12-01
deepseek-ai/deepseek-v3.1-terminus	2s	2025-08-21
~~deepseek-ai/deepseek-v3.1~~	1.5s

stepfun-ai/step-3.5-flash	1.4s	2026-02

nvidia/nemontron-3-super-120b-a12b	2.4s	2026-03-11

openai/gpt-oss-120b	1s	2025-08-05
~~openai/gpt-oss-20b~~	0.7s

注：延迟为发送 hello 消息收到完整响应的耗时，并不稳定。质量排名参考 https://artificialanalysis.ai 。

主页： https://longcat.chat/platform/usage
API 地址： https://api.longcat.chat/openai ， https://api.longcat.chat/anthropic
模型： LongCat-Flash-Chat, LongCat-Flash-Thinking, LongCat-Flash-Lite, LongCat-Flash-Omni-2603；
赠送：公测中，每天 5000w tokens 用于 LongCat-Flash-Lite，50w tokens 用于其它模型；