simcop2387/qwen3.5-barubary-attuned-chat-template

mirror of https://huggingface.co/barubary/qwen3.5-barubary-attuned-chat-template synced 2026-05-11 15:23:41 -04:00

No description

Jinja 100%

Find a file

Barubary 6ea697f8c0 Initial release: Qwen 3.5 chat template with 21 fixes		2026-03-16 11:02:25 +00:00
.gitattributes	initial commit	2026-03-16 10:49:06 +00:00
chat_template.jinja	Initial release: Qwen 3.5 chat template with 21 fixes	2026-03-16 11:02:25 +00:00
README.md	Update README.md	2026-03-16 10:57:08 +00:00

README.md

license

language

base_model

pipeline_tag

Qwen 3.5 Jinja Chat Template v1 attuned by Barubary

A Jinja chat template for all Qwen 3.5 models on llama.cpp, Open WebUI, vLLM, Ollama, LM Studio, and any OAI-compatible endpoint.

Created this because of some current projects related to this model.

21 fixes over the official Qwen 3.5 chat template — addressing bugs that are still open upstream as of March 2026.

Active Bug Reports Fixed

This template directly addresses the following community-reported bugs:

Bug Report	Platform	Fix
Tool calling chat template is broken	HuggingFace	Fix 6
Parallel tool calls interleaving	GitHub	Fix 15
KV-cache reuse breaks with enable_thinking=false	GitHub	Fix 12
Cannot close thinking via enable_thinking: false	GitHub	Fix 1, 19
Missing reasoning_content in Tool Calling	GitHub	Fix 13
LM Studio parser breaks Qwen3.5 tool calling	Reddit	Fix 18, 19
Qwen3.5 27B getting stuck in loops	Reddit	Fix 17
Template problem	HuggingFace	Fix 6, 7

All 21 Fixes

Each fix is labeled inline in the template source (e.g., {#- FIX6 #}).

#	Fix	What It Solves
1	`add_vision_id` / `enable_thinking` safe defaults	Crashes when config vars not passed
2	Precomputed `_last_idx` for `namespace()` constructor	llama.cpp minja parser compatibility
3	Developer role handled	Claude Code / Codex / OpenCode support
4	System/developer split before main loop	Duplicate system messages
5	`item.type` checked before `'in item'` key test	Type-check ordering bug
6	`arguments.items()` replaces bare key iteration	Tool calling crash (HF discussion #4)
7	`\| safe` filter removed	llama.cpp compatibility
8	`tojson`/`string` explicit if/else	No chained filters, prevents double-escaping
9	String arguments pass-through	OAI-compatible proxy support
10	`tc` alias avoids shadowing `tool_call` loop var	Variable scoping bug
11	`ns2` namespace replaces `loop.previtem` / `loop.nextitem`	llama.cpp minja doesn't support loop helpers
12	`enable_thinking` applied to in-context assistant turns	KV-cache reuse bug (GitHub #1826)
13	`reasoning_content` is defined + not none guard	Missing reasoning_content (GitHub #26)
14	`loop.index0 >` (not `>=`) for assistant thinking scope	Off-by-one in thinking block placement
15	Parallel tool calls: `\n\n` delimiter between blocks	Parallel tool call interleaving (GitHub #7117)
16	Long tool args/responses: configurable truncation guard	Context overflow from massive tool outputs
17	Deep agent loops: graceful fallback to index 0	Agent loops crashing after 5+ hops
18	Streaming compat: clean newline boundaries on all XML tags	LM Studio parser breaks (Reddit)
19	Auto-disable thinking when tools active	`<tool_call>` leaks into `<think>` blocks
20	Unknown roles: graceful fallback mapped to user role	Planner/critic/custom roles crash
21	Flattened nesting depth; `_has_tools` precomputed	llama.cpp minja stability

Feature Comparison

Feature	This Template	Official Qwen	Unsloth	bartowski
Parallel tool call separation	✅	❌	❌	❌
Auto-disable thinking with tools	✅	❌	❌	❌
Deep agent loop fallback	✅	❌	❌	❌
Unknown role graceful fallback	✅	❌	❌	❌
Configurable truncation guards	✅	❌	❌	❌
Streaming-safe XML boundaries	✅	❌	❌	partial
Developer role support	✅	❌	❌	❌
`arguments.items()` fix	✅	❌	✅	❌
`reasoning_content` guard	✅	❌	partial	❌

Usage

llama-server (llama.cpp)

llama-server \
  -m Qwen3.5-35B-A3B-*.gguf \
  --jinja -fa \
  --chat-template-file chat_template.jinja \
  -c 32768 -ngl 99 \
  --temp 0.6 --top-k 20 --top-p 0.8 \
  --cache-type-k q8_0 --cache-type-v q8_0 \
  --host 0.0.0.0 --port 8080

Open WebUI

Mount the template via Docker:

volumes:
  - ./chat_template.jinja:/templates/chat_template.jinja:ro
command: >
  --chat-template-file /templates/chat_template.jinja

vLLM

vllm serve Qwen/Qwen3.5-35B-A3B \
  --chat-template ./chat_template.jinja

Ollama

Copy chat_template.jinja into your Modelfile or use with a compatible frontend.

Configuration

Pass via --chat-template-kwargs:

{
  "enable_thinking": true,
  "auto_disable_thinking_with_tools": true,
  "add_vision_id": false,
  "max_tool_arg_chars": 0,
  "max_tool_response_chars": 8192
}

Variable	Default	Description
`enable_thinking`	`true`	Controls `<think>` mode
`auto_disable_thinking_with_tools`	`true`	Auto-disables thinking when tools are provided to prevent `<tool_call>` bleed into `<think>` blocks
`add_vision_id`	`false`	Prefix images/videos with "Picture N:" / "Video N:"
`max_tool_arg_chars`	`0` (unlimited)	Truncate tool arguments beyond this length
`max_tool_response_chars`	`0` (unlimited)	Truncate tool responses beyond this length

Before / After

Tool Call Bleed Bug (Fix 19)

Before (official template):

<think>
The user wants me to search for...
<tool_call>          ← WRONG: tool call inside think block
<function=search>

After (this template):

<think>

</think>

<tool_call>          ← Correct: thinking auto-disabled when tools present
<function=search>

Parallel Tool Calls (Fix 15)

Before (official template):

<tool_call><function=multiply>...</function></tool_call><tool_call><function=add>...</function></tool_call>

After (this template):

<tool_call>
<function=multiply>
...
</function>
</tool_call>

<tool_call>
<function=add>
...
</function>
</tool_call>

Compatible Models

Tested and compatible with all Qwen 3.5 models:

Qwen3.5-35B-A3B (all quants)
Qwen3.5-27B-A3B
Qwen3.5-14B-A3B
Qwen3.5-9B
Qwen3.5-4B
Qwen3.5-Coder series

Also backward-compatible with Qwen3 32B.

Tested Platforms

✅ llama.cpp (b4242+)
✅ Open WebUI (v0.4.8+)
✅ vLLM (v0.6.4+)
✅ Ollama (v0.5.0+)
✅ LM Studio (v0.3.5+)
✅ Text Generation WebUI (oobabooga)

Credits

Base template architecture: Qwen team (official Qwen3.5 chat template)

All 21 fixes: Barubary (original implementations)

License

Apache 2.0 (same as the official Qwen3.5 template)

Contributing

Found a bug? Open an issue with:

Minimal reproduction case
Error logs
Model and runtime versions

Pull requests welcome.