Vibe 400 Fix
Alternating Roles Breakage in message_utils.py
SGLang enforces strict role alternation: system? → user → assistant → user → ... and tool-calls must follow the pattern assistant(tool_calls) → tool(result) → assistant. Vibe violates this in three common scenarios, especially when users hit ESC mid-tool-call or hammer the chat without clearing history. The errors manifest as 400 Bad Request responses from SGLang, often with unhelpful messages like "Invalid message role sequence" or "Tool call without tool result".
Broken patterns and their 400 errors:
| Situation | Pattern | Result | Example Error |
|---|---|---|---|
| Long session with multiple user inputs | user → user | 400 | "Invalid message role sequence" |
| ESC without active tool-call | assistant(leer) → user | 400 | "Invalid message role sequence" |
| ESC during tool-call | assistant(tool_calls) → user (missing tool result) | 400 | "Tool call without tool result" |
The fix isn’t a monkey patch—it’s a surgical replacement of merge_consecutive_user_messages() in message_utils.py. The original function aggressively merges consecutive user messages but fails to handle tool-call sequences properly, leaving dangling tool_calls without corresponding tool results. This is particularly problematic with Mistral Small 4, where SGLang’s strict validation rejects any request with incomplete tool-call sequences.
Here’s the patched version that handles edge cases like user → assistant(tools) → tool → user → drop tool-seq → user → user, which would otherwise remain unmerged and trigger 400 errors:
import re, pathlib, sys
TARGET_FILE = pathlib.Path.home() / ".local/share/uv/tools/mistral-vibe/lib/python3.12/site-packages/vibe/core/llm/message_utils.py"
MARKER = "# V4-PATCH-APPLIED"
def apply_patch():
if MARKER in TARGET_FILE.read_text():
return # Already patched
src = TARGET_FILE.read_text()
# Replace the function body
new_func = """def merge_consecutive_user_messages(messages):
def _merge_pass(msgs):
out = []
i = 0
while i < len(msgs):
msg = msgs[i]
if msg["role"] == "assistant" and not msg.get("content"):
i += 1
continue
if i + 1 < len(msgs) and msgs[i + 1]["role"] == msg["role"]:
merged = {"role": msg["role"], "content": msg.get("content", "") + msgs[i + 1].get("content", "")}
out.append(merged)
i += 2
else:
out.append(msg)
i += 1
return out
def _drop_incomplete_tools(msgs):
out = []
i = 0
while i < len(msgs):
msg = msgs[i]
if msg.get("tool_calls") and (i + 1 >= len(msgs) or msgs[i + 1]["role"] != "tool"):
i += 1
continue
out.append(msg)
i += 1
return out
msgs = _merge_pass(messages)
msgs = _drop_incomplete_tools(msgs)
msgs = _merge_pass(msgs)
return msgs"""
patched = re.sub(r"def merge_consecutive_user_messages\(messages\):.*?(?=\ndef |\Z)", new_func, src, flags=re.DOTALL)
TARGET_FILE.write_text(patched + "\n" + MARKER)
Watch out: The three-pass design is critical. Skipping the second _drop_incomplete_tools() pass will leave tool-call sequences broken, and SGLang will still reject the request. Test with a session that includes tool calls to verify the patch works.
reasoning_effort Missing in reasoning_adapter.py
SGLang rejects requests without reasoning_effort or with invalid values like "none" or "low". Vibe’s default config sends no reasoning_effort when thinking="off", and sends "low" when thinking="low", both rejected by SGLang with errors like "reasoning_effort must be one of: high, medium, low" or "Invalid reasoning effort level".
Broken payload example (vibe v1.2.3):
# Original code in reasoning_adapter.py (vibe v1.2.3)
payload = {"model": "Mistral-Small-4", "messages": [...], "temperature": temperature}
if thinking != "off":
payload["reasoning_effort"] = thinking # sends "low" or null
Fix:
# Patched code (vibe v1.2.3)
payload = {"model": "Mistral-Small-4", "messages": [...], "temperature": 1.0}
payload["reasoning_effort"] = "high" # Only valid values accepted
Gotcha: Setting temperature to anything other than 1.0 while reasoning_effort="high" triggers SGLang errors like "temperature must be 1.0 when reasoning_effort is high". Lock it to 1.0 or the request fails.
Warning: After uv tool upgrade mistral-vibe to v1.3.0, you must re-apply this manually. The file lives at:
/home/username/.local/share/uv/tools/mistral-vibe/lib/python3.12/site-packages/vibe/core/llm/backend/reasoning_adapter.py
thinking-Mapping in mistral.py
Vibe’s MistralBackend maps "low" → "none" for SGLang compatibility, but SGLang actually rejects "none" in some cases with "reasoning_effort must be one of: high, medium, low". This mapping was introduced to handle older SGLang versions but causes failures with Mistral Small 4.
Broken mapping (vibe v1.2.3):
# Original in mistral.py (vibe v1.2.3)
_THINKING_TO_REASONING_EFFORT = {
"off": "none",
"low": "none",
"medium": "high",
"high": "high",
}
Fix:
# Patched mapping (vibe v1.2.3)
_THINKING_TO_REASONING_EFFORT = {
"off": "high",
"low": "high",
"medium": "high",
"high": "high",
}
Note: This file is at:
/home/username/.local/share/uv/tools/mistral-vibe/lib/python3.12/site-packages/vibe/core/llm/backend/mistral.py
Watch out: If you later switch to direct Mistral API (not SGLang), revert this mapping—it’s specific to SGLang’s stricter rules. Direct Mistral API accepts "none" and "low", so forcing "high" will degrade performance.
Upgrading Vibe Without Breaking Fixes
After upgrading to vibe v1.3.0, the patches won’t persist unless you re-apply them manually. The message_utils.py patch runs automatically on next vibe start via ~/bin/vibe-patch.py, but the other two require manual intervention.
# Upgrade Vibe to v1.3.0
uv tool upgrade mistral-vibe --version 1.3.0
# Re-apply reasoning_adapter.py patch (temperature=1.0 + reasoning_effort="high")
# Re-apply mistral.py mapping (_THINKING_TO_REASONING_EFFORT all → "high")
# message_utils.py patch runs automatically on next vibe start via ~/bin/vibe-patch.py
Warning: If you skip the manual patches, SGLang will reject requests with 400 errors like "Invalid reasoning effort level" or "Tool call without tool result" until you restore them. Always verify patches after upgrades.
Diagnosing Requests in Real Time
Check if your patches are active and requests are valid. Use these commands to inspect live traffic:
# Verify message_utils.py patch
grep "V4-PATCH-APPLIED" ~/.local/share/uv/tools/mistral-vibe/lib/python3.12/site-packages/vibe/core/llm/message_utils.py
# Inspect live request for reasoning_effort
curl -s -X POST http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"Mistral-Small-4","messages":[{"role":"user","content":"Solve 42*87?"}],"stream":false,"max_tokens":100,"reasoning_effort":"high"}' \
| python3 -c "import sys,json; d=json.load(sys.stdin); print('reasoning_content:', bool(d['choices'][0]['message'].get('reasoning_content')))"
Note: If reasoning_content is missing, your patches aren’t applied or SGLang rejected the request. Check SGLang logs for "Invalid reasoning effort level" or "Tool call without tool result".
What I Actually Use
- Mistral Small 4: Runs locally on GB10 (Blackwell, ARM64) via SGLang v0.2.11 for privacy and performance.
- SGLang: Handles Mistral Small 4 with strict role alternation and reasoning_effort constraints. Version 0.2.11 enforces
"reasoning_effort"validation.- Vibe: Coding assistant with OpenAI-compatible API style for seamless integration. Version v1.2.3 with manual patches applied.
Vibe 400 Fix
Handling role alternation in message processing