通义千问是阿里云开发的 [[LLM]] 模型。
模型
Qwen 3.5 模型
| 名称 | 量化类型 | 激活参数量 | 模型权重 | 平台 |
|---|---|---|---|---|
| Qwen/Qwen3.5-397B-A17B-FP8 | FP8 | 397B | 406G | HuggingFace |
| Qwen/Qwen3.5-397B-A17B-GPTQ-Int4 | GPT-Int4 | 397B | 236G | HuggingFace |
| Qwen/Qwen3.5-122B-A10B | FP32 | 122B | 250G | HuggingFace |
| Qwen/Qwen3.5-122B-A10B-FP8 | FP8 | 122B | 127G | HuggingFace |
| Qwen/Qwen3.5-122B-A10B-GPTQ-Int4 | GPTQ-Int4 | 122B | 78.9G | HuggingFace |
| Qwen/Qwen3.5-35B-A3B | FP32 | 35B | 71.9G | HuggingFace |
| Qwen/Qwen3.5-35B-A3B-FP8 | FP8 | 35B | 37.5G | HuggingFace |
| Qwen/Qwen3.5-35B-A3B-GPTQ-Int4 | GPTQ-Int4 | 35B | 24.5G | HuggingFace |
| Qwen/Qwen3.5-27B | FP32 | 27B | 55.6G | HuggingFace |
| Qwen/Qwen3.5-27B-FP8 | FP8 | 27B | 30.9G | HuggingFace |
| Qwen/Qwen3.5-27B-GPTQ-Int4 | GPTQ-Int4 | 27B | 30.3G | HuggingFace |
| Qwen/Qwen3.5-9B | FP32 | 9B | 19.3G | HuggingFace |
所需显存估算: 模型大小 * 1.2 + KV Cache(0.5+) + 框架开销(0.5 ~ 1G)
关闭思考/thinking
Qwen 3.5 模型不支持通过 /nothink 的方式关闭,需要在 [[openapi]] 请求参数进行关闭:
response = client.chat.completions.create( model="Qwen3.5-35B", messages=[{"role": "user", "content": "你好"}], extra_body={ "top_k": 20, "chat_template_kwargs": {"enable_thinking": False}, },)