Qwen

2025-08-19

通义千问是阿里云开发的 [[LLM]] 模型。

模型

Qwen 3.5 模型

名称量化类型激活参数量模型权重平台
Qwen/Qwen3.5-397B-A17B-FP8FP8397B406GHuggingFace
Qwen/Qwen3.5-397B-A17B-GPTQ-Int4GPT-Int4397B236GHuggingFace
Qwen/Qwen3.5-122B-A10BFP32122B250GHuggingFace
Qwen/Qwen3.5-122B-A10B-FP8FP8122B127GHuggingFace
Qwen/Qwen3.5-122B-A10B-GPTQ-Int4GPTQ-Int4122B78.9GHuggingFace
Qwen/Qwen3.5-35B-A3BFP3235B71.9GHuggingFace
Qwen/Qwen3.5-35B-A3B-FP8FP835B37.5GHuggingFace
Qwen/Qwen3.5-35B-A3B-GPTQ-Int4GPTQ-Int435B24.5GHuggingFace
Qwen/Qwen3.5-27BFP3227B55.6GHuggingFace
Qwen/Qwen3.5-27B-FP8FP827B30.9GHuggingFace
Qwen/Qwen3.5-27B-GPTQ-Int4GPTQ-Int427B30.3GHuggingFace
Qwen/Qwen3.5-9BFP329B19.3GHuggingFace

所需显存估算: 模型大小 * 1.2 + KV Cache(0.5+) + 框架开销(0.5 ~ 1G)

关闭思考/thinking

Qwen 3.5 模型不支持通过 /nothink 的方式关闭,需要在 [[openapi]] 请求参数进行关闭:

response = client.chat.completions.create(
model="Qwen3.5-35B",
messages=[{"role": "user", "content": "你好"}],
extra_body={
"top_k": 20,
"chat_template_kwargs": {"enable_thinking": False},
},
)

参考