Vibe 400 Bad Request Fix: Mistral Alternating Roles and reasoning_effort
SGLang’s strict role alternation and reasoning_effort requirements break Vibe’s default behavior on Mistral Small 4 models.
Quick Take
- Three distinct 400 Bad Request patterns appear when running Mistral Small 4 via SGLang with Vibe
- Empty assistant messages, consecutive same-role messages, and unclosed tool calls all trigger failures
- Only
"high"and"none"are accepted for reasoning_effort;"low"and"medium"are rejected- The fixes require patching three files and enforcing temperature=1.0 when reasoning is active
Alternating Roles Violation in message_utils.py
What broke
Last week this failed because after three consecutive user inputs without an assistant reply, Vibe sent:
user
user
user
SGLang rejected it with:
400 Bad Request: Invalid role sequence: consecutive user messages
Why it breaks
SGLang enforces strict role alternation:
system? → user → assistant → user → assistant → ...- Tool calls require a complete sequence:
assistant(tool_calls) → tool(result) → assistant
Vibe’s merge_consecutive_user_messages() function in message_utils.py fails in three scenarios:
- Long sessions with multiple user inputs
- ESC without an active tool call
- ESC during a tool call without returning a tool result
Each case produces an invalid role sequence that SGLang rejects.
How to fix it
Use vibe-patch.py to patch message_utils.py at runtime. The patch replaces the body of merge_consecutive_user_messages() with a three-pass processor:
import re
from pathlib import Path
TARGET = Path.home() / ".local/share/uv/tools/mistral-vibe/lib/python3.12/site-packages/vibe/core/llm/message_utils.py"
MARKER = "# VIBE-PATCH v4 APPLIED"
def apply_patch():
if MARKER in TARGET.read_text():
return
content = TARGET.read_text()
patched = re.sub(
r"def merge_consecutive_user_messages\(.*?\):\n.*?(?=\n\ndef|\Z)",
_build_patched_function(),
content,
flags=re.DOTALL
)
TARGET.write_text(patched + "\n" + MARKER)
def _build_patched_function():
return """def merge_consecutive_user_messages(messages):
# Pass 1: normalize and merge consecutive same-role messages
normalized = []
for msg in messages:
if not msg.get("content") and msg["role"] == "assistant":
continue
normalized.append(msg)
merged = []
for msg in normalized:
if merged and merged[-1]["role"] == msg["role"]:
merged[-1]["content"] += "\\n" + msg.get("content", "")
else:
merged.append(msg)
# Pass 2: drop incomplete tool sequences
cleaned = []
i = 0
while i < len(merged):
msg = merged[i]
if msg["role"] == "assistant" and "tool_calls" in msg:
if i + 1 >= len(merged) or merged[i+1]["role"] != "tool":
i += 1
continue
cleaned.append(msg)
i += 1
# Pass 3: re-merge consecutive same-role after tool drops
final = []
for msg in cleaned:
if final and final[-1]["role"] == msg["role"]:
final[-1]["content"] += "\\n" + msg.get("content", "")
else:
final.append(msg)
return final
"""
Run it automatically by wrapping Vibe:
# ~/bin/vibe
#!/bin/bash
source ~/bin/vibe-patch.py
exec mistral-vibe "$@"
What to watch out for
If you spam ESC aggressively, the history can still accumulate invalid messages. Use /clear in Vibe instead of adding more patches. The patch is update-safe because it checks for MARKER and reapplies on each start.
reasoning_effort Rejection in reasoning_adapter.py
What broke
In practice, Vibe sent:
{"model":"Mistral-Small-4","messages":[{"role":"user","content":"Why?"}],"temperature":0.7}
with no reasoning_effort field. SGLang responded:
400 Bad Request: reasoning_effort must be one of 'high', 'none'
Why it breaks
SGLang requires reasoning_effort to be either "high" or "none". Vibe only adds it when thinking != "off", and defaults thinking to "off". Even when set to "low", Vibe sends "low" directly, which SGLang rejects.
Moreover, when reasoning is active, temperature must be exactly 1.0; any other value triggers a 400.
How to fix it
Edit reasoning_adapter.py and force the payload:
# Before
payload = {"model": model, "messages": messages, "temperature": temperature}
if thinking != "off":
payload["reasoning_effort"] = thinking
# After
payload = {"model": model, "messages": messages, "temperature": 1.0}
payload["reasoning_effort"] = "high"
File path:
~/.local/share/uv/tools/mistral-vibe/lib/python3.12/site-packages/vibe/core/llm/backend/reasoning_adapter.py
What to watch out for
After uv tool upgrade mistral-vibe, you must reapply this change manually. The package manager overwrites the file.
thinking Mapping Failure in mistral.py
What broke
When using the MistralBackend directly (not via SGLang), Vibe mapped:
"low" → "none"
SGLang accepted "none" but the model produced no reasoning output. For analytical tasks, this defeats the purpose.
Why it breaks
The mapping in mistral.py was designed for cloud APIs that accept "none". SGLang accepts it but doesn’t produce reasoning. For local analysis, all thinking levels should map to "high".
How to fix it
Update the mapping dictionary:
_THINKING_TO_REASONING_EFFORT = {
"off": "high",
"low": "high",
"medium": "high",
"high": "high",
}
File path:
~/.local/share/uv/tools/mistral-vibe/lib/python3.12/site-packages/vibe/core/llm/backend/mistral.py
What to watch out for
This fix only applies when using the MistralBackend directly. SGLang requires its own reasoning_effort handling via reasoning_adapter.py.
Post-Upgrade Checklist
After upgrading Vibe, run:
uv tool upgrade mistral-vibe
# Reapply reasoning_adapter.py and mistral.py fixes manually
# Patch message_utils.py automatically via ~/bin/vibe on next start
Verify the fixes with:
# Check patch status
grep "vibe-patch v4 APPLIED" ~/.local/share/uv/tools/mistral-vibe/lib/python3.12/site-packages/vibe/core/llm/message_utils.py
# Inspect live request
curl -s -X POST http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"Mistral-Small-4","messages":[{"role":"user","content":"Show reasoning"}],"stream":false,"max_tokens":100,"reasoning_effort":"high"}' \
| python3 -c "import sys,json; d=json.load(sys.stdin); print('Has reasoning:', bool(d['choices'][0]['message'].get('reasoning_content')))"
What I Actually Use
- Mistral Small 4: My local model for coding assistance and reasoning tasks
- SGLang: The backend that enforces strict role alternation and reasoning constraints
- DGX Spark (GB10): The ARM64 server that runs the model with 128 GB unified memory
Status update (2026-05-04): a different Vibe-side bug, and the actual fix
A Vibe upgrade today triggered a different 400-style failure than the one this post documents. Worth recording because the diagnosis sequence I went through was wrong twice in a row before it was right.
What I saw: TUI launches, model status local-mistral, no MCP errors. First prompt to the model fails with Error: API error from sglang (model: Mistral-Small-4): 1 validation error for LLMMessage role: input_value=None. Programmatic mode (vibe -p ...) returns answers cleanly. So a TUI-only failure on the very first user turn.
What I assumed first: a Vibe 2.9.3 streaming-parser regression. Rolled back to 2.7.2. Same error. Hypothesis falsified.
What I assumed second: a Pydantic 2.13 enum-strictness change. Pinned Pydantic to 2.12. Same error. Hypothesis falsified.
What it actually is: a Vibe-source bug in vibe.core.llm.backend.generic.OpenAIAdapter._parse_message. When parsing a streaming response chunk with a delta field, Vibe calls LLMMessage.model_validate(delta) directly. Per OpenAI streaming spec, only the first chunk carries delta.role, every chunk after is delta.role: null. SGLang ships chunks exactly to this spec (verified with raw curl -N stream=true against :30000). Vibe’s LLMMessage.role is a required Role enum without a default, so Pydantic rejects every chunk after the first. Programmatic mode survives because it uses the non-streaming code path which reads choice.message.role (always present). The TUI streams, hits the bug, dies on prompt 1.
The fix is three lines in OpenAIAdapter._parse_message at the two delta-handling branches: if msg_dict.get("role") is None, set it to "assistant". That is the only role a model returns in a chunk, so the default is safe. Patched both choice.delta and top-level data.delta paths. Saved the original file as generic.py.before-streaming-fix-backup so the change is reversible.
After the patch the TUI takes prompts cleanly on the same Vibe 2.7.2, same SGLang container, same five MCP servers loaded. The original alternating-roles fix from this post stays load-bearing for the SGLang strict-alternation rule, the new patch is additive and lives on top.
Why the TUI worked before today and not after the upgrade: not fully reconstructed. The most plausible explanation is that the previous install came from a different point on the 2.7.x line that did not have this exact _parse_message shape, and a uv tool install --reinstall today pulled a published wheel that exposes the bug. Cannot prove this without an old wheel to diff against. Either way, the local patch makes the daily-driver TUI work and is the right level of fix for now.
What went upstream: filed issue #665 at mistralai/mistral-vibe with the streaming-spec citation and the SGLang chunk dump. Submitted PR #666 with the three-line minimal fix. Suggested in the PR that a cleaner upstream form would be making LLMMessage.role Optional with default Role.assistant, which would handle this and any similar future case without per-call branching, but kept the diff minimal in this PR to make the change surface obvious. Until the PR lands, the local patch survives a Vibe upgrade via /data/scripts/vibe-post-install-patch.sh which re-applies the same change idempotently.
The earlier alternating-roles workaround in this post is independent of the streaming-parser bug. Both are SGLang-strict-spec issues but at different layers, request building versus response parsing. Both stay relevant.
Lesson worth keeping for next time: smoke-testing a CLI tool’s programmatic mode does not validate the TUI path. The two paths can take different code branches that fail differently. Test both before claiming a version-bump or a patch is clean. And when a fix attempt does not resolve the symptom, falsify the hypothesis explicitly before trying the next one, rather than stacking guesses on top of guesses.
Vibe 400 Bad Request Fix
Mistral Small 4 role alternation and reasoning_effort handling