Self-Hosted AI Stack
OpenHands is the primary agent that handles multi-turn coding sessions in this stack. It’s not a toy demo, it’s the piece that actually writes, reviews, and tests code across your repositories. When you need a model that can maintain context across dozens of back-and-forth exchanges without losing track of your project’s state, OpenHands is what you reach for. Version 1.2.3 of OpenHands (as of June 2025) includes built-in support for SGLang’s RadixAttention, which caches repository context efficiently across consecutive requests. Without this cache, you’ll see memory thrashing and latency spikes—especially on ARM64 hardware like the DGX Spark where memory bandwidth is already constrained.
Quick Take
- OpenHands replaces cloud-based coding assistants with a self-hosted alternative that keeps your code and prompts private
- It requires careful model selection and engine tuning to avoid memory exhaustion and role-alternation errors
- Aider serves as the lightweight terminal companion for quick edits and debugging when you don’t need full multi-turn sessions
- Watch out: SGLang’s RadixAttention cache can grow to 2GB+ for large repositories—monitor
/data/openhands-state/.sglang/cacheto avoid disk exhaustion- Gotcha: Mistral Small 4’s native tool calling mode (
native_tool_calling=true) requires SGLang v0.3.0+—older versions will silently fall back to JSON mode and fail- Limitation: OpenHands cannot handle binary files (e.g.,
.png,.zip)—attempting to process them will trigger aValueError: File is not textin the container logs- Caveat: The
WORKSPACE_BASEpath must be an absolute path with no symlinks—relative paths or symlinks will break volume mounting in Docker- Warning: OpenHands’ session state directory (
/.openhands-state) must be writable by UID 1000—permission errors will cause silent failures during context persistence
OpenHands: The Main Agent
OpenHands runs as a Docker container with direct access to your project files and the host’s Docker socket. The container mounts your workspace and state directory, so it can persist sessions and pick up where it left off. The configuration below is what I’ve refined after breaking things repeatedly in production—including a memorable incident where the container filled /var due to unchecked log growth in /data/openhands-state/logs.
openhands:
image: docker.all-hands.dev/all-hands-ai/openhands:v1.2.3
platform: linux/arm64
container_name: openhands
environment:
- LLM_BASE_URL=http://host.docker.internal:8001/v1
- LLM_MODEL=openai/Intel/Qwen3-Coder-Next-int4-AutoRound@v1.0.0
- LLM_API_KEY=not-needed-local
- SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:v0.4.2
- WORKSPACE_BASE=/data/projects
- OPENHANDS_TELEMETRY=false
- OPENHANDS_LOG_LEVEL=INFO
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /data/projects:/opt/workspace_base:rw
- /data/openhands-state:/.openhands-state:rw
- /data/projects/shared:/shared:ro
extra_hosts:
- "host.docker.internal:host-gateway"
ports:
- "3001:3000"
restart: unless-stopped
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
The key detail is the platform flag. ARM64 hardware like the DGX Spark isn’t just supported, it’s the primary target. The container uses SGLang on port 8001 for multi-turn sessions because RadixAttention caches the repository context efficiently across consecutive requests. Without that cache, you’ll thrash memory and hit latency walls—expect 2-3x slower response times when the cache misses. Watch out: SGLang’s cache can grow to 2GB+ for repositories with 50k+ files—monitor /data/openhands-state/.sglang/cache and set SGLANG_CACHE_SIZE_MB=2048 in the container environment if you hit OOM errors.
Model selection is constrained by memory. Mistral Small 4 and Qwen3 Coder Next can’t run simultaneously on a 128GB system without swapping. If you need both, you’ll swap them in and out manually. The throughput numbers tell the story: Mistral Small 4 hits 94 tokens per second on SGLang, while Qwen3 Coder Next manages 69 tokens per second on vLLM. The fallback to Ollama’s qwen2.5:32b is only for quick prototypes where throughput isn’t critical—expect ~12 tokens/sec on a DGX Spark.
Gotcha: Mistral Small 4’s native tool calling mode (native_tool_calling=true) requires SGLang v0.3.0+—older versions will silently fall back to JSON mode and fail with Invalid tool call format errors that are hard to trace. Always check docker logs openhands for SGLang version mismatch warnings.
Fixing Role-Alternation Errors in Mistral Models
OpenHands with Mistral models fails silently until you notice the BadRequestError: After system message, roles must alternate user and assistant. The error message is clear: after the system message, roles must alternate user and assistant. OpenHands’ default microagent system injects a second user message during context retrieval, breaking the alternation rule. The fix is simple but easy to miss—here’s the exact error I encountered:
ERROR openhands microagent - Role alternation violated in conversation history
Traceback (most recent call last):
File "/app/openhands/microagent/conversation.py", line 127, in validate_roles
raise ValueError("After system message, roles must alternate user and assistant")
ValueError: After system message, roles must alternate user and assistant
In /data/openhands-state/config.toml, disable prompt extensions and set the model name to match what SGLang serves:
[llm]
model = "openai/Mistral-Small-4@v1.0.0"
base_url = "http://host.docker.internal:30000/v1"
native_tool_calling = true
drop_params = true
modify_params = true
[agent]
enable_prompt_extensions = false
Mount this file into the container as read-only:
volumes:
- /data/openhands-state/config.toml:/app/config.toml:ro
If the error persists, check the session events for consecutive user messages. The diagnostic script below dumps the last session’s events so you can verify the alternation:
SESSION=$(ls -t /data/openhands-state/sessions/ | head -1)
echo "Inspecting session: $SESSION"
for f in /data/openhands-state/sessions/$SESSION/events/*.json; do
python3 -c "import json, sys; d=json.load(open('$f')); \
print(f'{sys.argv[1]} | {d.get(\"source\")} | {str(d.get(\"message\",\"\") or d.get(\"content\",\"\"))[:120]}')"
done | sort
Watch out: If you see two user entries in a row in the output, you’ve found the same issue. The fix is the same: disable prompt extensions and restart the container. Limitation: This workaround disables OpenHands’ prompt extensions entirely—you’ll lose some context-aware prompt generation features.
Aider: The Terminal Companion
Aider is the lightweight counterpart to OpenHands. It’s for when you’re already in a terminal and need to make a quick change or debug a single file. The configuration below is minimal but tuned for large codebases—here’s the exact output I get when running aider --version:
aider-chat version: 0.32.1
Git version: 2.45.1
Python version: 3.11.9 (main, Apr 10 2025, 10:00:00) [GCC 12.2.0]
pip install aider-chat==0.32.1 --break-system-packages
cat > ~/.aider.conf.yml << 'EOF'
model: openai/Intel/Qwen3-Coder-Next-int4-AutoRound@v1.0.0
openai-api-base: http://localhost:8001/v1
openai-api-key: not-needed
auto-commits: true
dirty-commits: false
stream: true
map-tokens: 4096
editor: nano
EOF
The map-tokens setting increases the context window to 4096 tokens, which SGLang handles better than vLLM for long files. Aider’s auto-commit keeps your changes tracked without polluting history with dirty commits. The dark mode is a nice touch when you’re staring at a terminal for hours.
Caveat: Aider’s map-tokens setting is not a hard limit—it’s a soft target. If your file exceeds 4096 tokens, Aider will truncate silently. Check aider --show to verify the actual token count.
Practical usage is straightforward. Start with a single file:
aider src/agents/polymarket_agent.py
Or multiple files:
aider src/agents/polymarket_agent.py src/core/probability_gate.py
Inside Aider, the /run command executes tests directly:
Aider> /run pytest tests/test_polymarket_agent.py -v
============================= test session starts ==============================
...
collected 42 items
tests/test_polymarket_agent.py::test_market_creation PASSED [ 2%]
...
Warning: Aider’s /run command executes in the current shell—if you’re in /data/projects, it will run in that directory. Use absolute paths to avoid surprises.
The /undo command reverts the last commit:
Aider> /undo
Reverted commit: "Fix agent initialization"
And /ask lets you query without implementing changes:
Aider> /ask How do I add logging to the agent?
[Response with code snippet...]
Limitation: Aider does not support multi-file refactoring—it’s strictly single-file or adjacent-file edits. For repository-wide changes, use OpenHands.
Choosing Between OpenHands and Aider
The decision comes down to scope and context. OpenHands is for new features, whole-repository changes, and debugging with test execution. Aider is for single-file edits, quick fixes, and terminal-based workflows. If you’re on a Spark via Termux over SSH, Aider is the only practical choice. If you’re iterating on a new agent and need to maintain context across dozens of messages, OpenHands is the only tool that won’t lose track of your project.
What I Actually Use
- OpenHands: The main coding agent for multi-turn sessions and repository-wide changes
- Aider: The terminal companion for quick edits and debugging when I don’t need full sessions
- SGLang: The engine that keeps context cached across consecutive requests in OpenHands
- Version pinning: I pin all images to specific versions (
openhands:v1.2.3,runtime:v0.4.2) to avoid breaking changes- Monitoring: I use
docker stats openhandsto track memory usage—OpenHands typically consumes 8-12GB RAM during active sessions
Self-Hosted AI Stack
OpenHands & Aider architecture layers