- Jinja 100%
| .gitattributes | ||
| chat_template.jinja | ||
| README.md | ||
| license | language | base_model | pipeline_tag | tags | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| apache-2.0 |
|
|
text-generation |
|
Qwen 3.5 Jinja Chat Template v1 attuned by Barubary
A Jinja chat template for all Qwen 3.5 models on llama.cpp, Open WebUI, vLLM, Ollama, LM Studio, and any OAI-compatible endpoint.
Created this because of some current projects related to this model.
21 fixes over the official Qwen 3.5 chat template — addressing bugs that are still open upstream as of March 2026.
Active Bug Reports Fixed
This template directly addresses the following community-reported bugs:
| Bug Report | Platform | Fix |
|---|---|---|
| Tool calling chat template is broken | HuggingFace | Fix 6 |
| Parallel tool calls interleaving | GitHub | Fix 15 |
| KV-cache reuse breaks with enable_thinking=false | GitHub | Fix 12 |
| Cannot close thinking via enable_thinking: false | GitHub | Fix 1, 19 |
| Missing reasoning_content in Tool Calling | GitHub | Fix 13 |
| LM Studio parser breaks Qwen3.5 tool calling | Fix 18, 19 | |
| Qwen3.5 27B getting stuck in loops | Fix 17 | |
| Template problem | HuggingFace | Fix 6, 7 |
All 21 Fixes
Each fix is labeled inline in the template source (e.g., {#- FIX6 #}).
| # | Fix | What It Solves |
|---|---|---|
| 1 | add_vision_id / enable_thinking safe defaults |
Crashes when config vars not passed |
| 2 | Precomputed _last_idx for namespace() constructor |
llama.cpp minja parser compatibility |
| 3 | Developer role handled | Claude Code / Codex / OpenCode support |
| 4 | System/developer split before main loop | Duplicate system messages |
| 5 | item.type checked before 'in item' key test |
Type-check ordering bug |
| 6 | arguments.items() replaces bare key iteration |
Tool calling crash (HF discussion #4) |
| 7 | | safe filter removed |
llama.cpp compatibility |
| 8 | tojson/string explicit if/else |
No chained filters, prevents double-escaping |
| 9 | String arguments pass-through | OAI-compatible proxy support |
| 10 | tc alias avoids shadowing tool_call loop var |
Variable scoping bug |
| 11 | ns2 namespace replaces loop.previtem / loop.nextitem |
llama.cpp minja doesn't support loop helpers |
| 12 | enable_thinking applied to in-context assistant turns |
KV-cache reuse bug (GitHub #1826) |
| 13 | reasoning_content is defined + not none guard |
Missing reasoning_content (GitHub #26) |
| 14 | loop.index0 > (not >=) for assistant thinking scope |
Off-by-one in thinking block placement |
| 15 | Parallel tool calls: \n\n delimiter between blocks |
Parallel tool call interleaving (GitHub #7117) |
| 16 | Long tool args/responses: configurable truncation guard | Context overflow from massive tool outputs |
| 17 | Deep agent loops: graceful fallback to index 0 | Agent loops crashing after 5+ hops |
| 18 | Streaming compat: clean newline boundaries on all XML tags | LM Studio parser breaks (Reddit) |
| 19 | Auto-disable thinking when tools active | <tool_call> leaks into <think> blocks |
| 20 | Unknown roles: graceful fallback mapped to user role | Planner/critic/custom roles crash |
| 21 | Flattened nesting depth; _has_tools precomputed |
llama.cpp minja stability |
Feature Comparison
| Feature | This Template | Official Qwen | Unsloth | bartowski |
|---|---|---|---|---|
| Parallel tool call separation | ✅ | ❌ | ❌ | ❌ |
| Auto-disable thinking with tools | ✅ | ❌ | ❌ | ❌ |
| Deep agent loop fallback | ✅ | ❌ | ❌ | ❌ |
| Unknown role graceful fallback | ✅ | ❌ | ❌ | ❌ |
| Configurable truncation guards | ✅ | ❌ | ❌ | ❌ |
| Streaming-safe XML boundaries | ✅ | ❌ | ❌ | partial |
| Developer role support | ✅ | ❌ | ❌ | ❌ |
arguments.items() fix |
✅ | ❌ | ✅ | ❌ |
reasoning_content guard |
✅ | ❌ | partial | ❌ |
Usage
llama-server (llama.cpp)
llama-server \
-m Qwen3.5-35B-A3B-*.gguf \
--jinja -fa \
--chat-template-file chat_template.jinja \
-c 32768 -ngl 99 \
--temp 0.6 --top-k 20 --top-p 0.8 \
--cache-type-k q8_0 --cache-type-v q8_0 \
--host 0.0.0.0 --port 8080
Open WebUI
Mount the template via Docker:
volumes:
- ./chat_template.jinja:/templates/chat_template.jinja:ro
command: >
--chat-template-file /templates/chat_template.jinja
vLLM
vllm serve Qwen/Qwen3.5-35B-A3B \
--chat-template ./chat_template.jinja
Ollama
Copy chat_template.jinja into your Modelfile or use with a compatible frontend.
Configuration
Pass via --chat-template-kwargs:
{
"enable_thinking": true,
"auto_disable_thinking_with_tools": true,
"add_vision_id": false,
"max_tool_arg_chars": 0,
"max_tool_response_chars": 8192
}
| Variable | Default | Description |
|---|---|---|
enable_thinking |
true |
Controls <think> mode |
auto_disable_thinking_with_tools |
true |
Auto-disables thinking when tools are provided to prevent <tool_call> bleed into <think> blocks |
add_vision_id |
false |
Prefix images/videos with "Picture N:" / "Video N:" |
max_tool_arg_chars |
0 (unlimited) |
Truncate tool arguments beyond this length |
max_tool_response_chars |
0 (unlimited) |
Truncate tool responses beyond this length |
Before / After
Tool Call Bleed Bug (Fix 19)
Before (official template):
<think>
The user wants me to search for...
<tool_call> ← WRONG: tool call inside think block
<function=search>
After (this template):
<think>
</think>
<tool_call> ← Correct: thinking auto-disabled when tools present
<function=search>
Parallel Tool Calls (Fix 15)
Before (official template):
<tool_call><function=multiply>...</function></tool_call><tool_call><function=add>...</function></tool_call>
After (this template):
<tool_call>
<function=multiply>
...
</function>
</tool_call>
<tool_call>
<function=add>
...
</function>
</tool_call>
Compatible Models
Tested and compatible with all Qwen 3.5 models:
- Qwen3.5-35B-A3B (all quants)
- Qwen3.5-27B-A3B
- Qwen3.5-14B-A3B
- Qwen3.5-9B
- Qwen3.5-4B
- Qwen3.5-Coder series
Also backward-compatible with Qwen3 32B.
Tested Platforms
- ✅ llama.cpp (b4242+)
- ✅ Open WebUI (v0.4.8+)
- ✅ vLLM (v0.6.4+)
- ✅ Ollama (v0.5.0+)
- ✅ LM Studio (v0.3.5+)
- ✅ Text Generation WebUI (oobabooga)
Credits
Base template architecture: Qwen team (official Qwen3.5 chat template)
All 21 fixes: Barubary (original implementations)
License
Apache 2.0 (same as the official Qwen3.5 template)
Contributing
Found a bug? Open an issue with:
- Minimal reproduction case
- Error logs
- Model and runtime versions
Pull requests welcome.