Self-Hosted AI Stack

April 4, 2026

OpenHands is the primary agent that handles multi-turn coding sessions in this stack. It’s not a toy demo, it’s the piece that actually writes, reviews, and tests code across your repositories. When you need a model that can maintain context across dozens of back-and-forth exchanges without losing track of your project’s state, OpenHands is what you reach for. Version 1.2.3 of OpenHands (as of June 2025) includes built-in support for SGLang’s RadixAttention, which caches repository context efficiently across consecutive requests. Without this cache, you’ll see memory thrashing and latency spikes—especially on ARM64 hardware like the DGX Spark where memory bandwidth is already constrained.

Quick Take

OpenHands replaces cloud-based coding assistants with a self-hosted alternative that keeps your code and prompts private

It requires careful model selection and engine tuning to avoid memory exhaustion and role-alternation errors

Aider serves as the lightweight terminal companion for quick edits and debugging when you don’t need full multi-turn sessions

Watch out: SGLang’s RadixAttention cache can grow to 2GB+ for large repositories—monitor /data/openhands-state/.sglang/cache to avoid disk exhaustion

Gotcha: Mistral Small 4’s native tool calling mode (native_tool_calling=true) requires SGLang v0.3.0+—older versions will silently fall back to JSON mode and fail

Limitation: OpenHands cannot handle binary files (e.g., .png, .zip)—attempting to process them will trigger a ValueError: File is not text in the container logs

Caveat: The WORKSPACE_BASE path must be an absolute path with no symlinks—relative paths or symlinks will break volume mounting in Docker

Warning: OpenHands’ session state directory (/.openhands-state) must be writable by UID 1000—permission errors will cause silent failures during context persistence

OpenHands: The Main Agent

OpenHands runs as a Docker container with direct access to your project files and the host’s Docker socket. The container mounts your workspace and state directory, so it can persist sessions and pick up where it left off. The configuration below is what I’ve refined after breaking things repeatedly in production—including a memorable incident where the container filled /var due to unchecked log growth in /data/openhands-state/logs.

  openhands:
    image: docker.all-hands.dev/all-hands-ai/openhands:v1.2.3
    platform: linux/arm64
    container_name: openhands
    environment:
      - LLM_BASE_URL=http://host.docker.internal:8001/v1
      - LLM_MODEL=openai/Intel/Qwen3-Coder-Next-int4-AutoRound@v1.0.0
      - LLM_API_KEY=not-needed-local
      - SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:v0.4.2
      - WORKSPACE_BASE=/data/projects
      - OPENHANDS_TELEMETRY=false
      - OPENHANDS_LOG_LEVEL=INFO
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /data/projects:/opt/workspace_base:rw
      - /data/openhands-state:/.openhands-state:rw
      - /data/projects/shared:/shared:ro
    extra_hosts:
      - "host.docker.internal:host-gateway"
    ports:
      - "3001:3000"
    restart: unless-stopped
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

The key detail is the platform flag. ARM64 hardware like the DGX Spark isn’t just supported, it’s the primary target. The container uses SGLang on port 8001 for multi-turn sessions because RadixAttention caches the repository context efficiently across consecutive requests. Without that cache, you’ll thrash memory and hit latency walls—expect 2-3x slower response times when the cache misses. Watch out: SGLang’s cache can grow to 2GB+ for repositories with 50k+ files—monitor /data/openhands-state/.sglang/cache and set SGLANG_CACHE_SIZE_MB=2048 in the container environment if you hit OOM errors.

Model selection is constrained by memory. Mistral Small 4 and Qwen3 Coder Next can’t run simultaneously on a 128GB system without swapping. If you need both, you’ll swap them in and out manually. The throughput numbers tell the story: Mistral Small 4 hits 94 tokens per second on SGLang, while Qwen3 Coder Next manages 69 tokens per second on vLLM. The fallback to Ollama’s qwen2.5:32b is only for quick prototypes where throughput isn’t critical—expect ~12 tokens/sec on a DGX Spark.

Gotcha: Mistral Small 4’s native tool calling mode (native_tool_calling=true) requires SGLang v0.3.0+—older versions will silently fall back to JSON mode and fail with Invalid tool call format errors that are hard to trace. Always check docker logs openhands for SGLang version mismatch warnings.

Fixing Role-Alternation Errors in Mistral Models

OpenHands with Mistral models fails silently until you notice the BadRequestError: After system message, roles must alternate user and assistant. The error message is clear: after the system message, roles must alternate user and assistant. OpenHands’ default microagent system injects a second user message during context retrieval, breaking the alternation rule. The fix is simple but easy to miss—here’s the exact error I encountered:

ERROR    openhands microagent - Role alternation violated in conversation history
Traceback (most recent call last):
  File "/app/openhands/microagent/conversation.py", line 127, in validate_roles
    raise ValueError("After system message, roles must alternate user and assistant")
ValueError: After system message, roles must alternate user and assistant

In /data/openhands-state/config.toml, disable prompt extensions and set the model name to match what SGLang serves:

[llm]
model = "openai/Mistral-Small-4@v1.0.0"
base_url = "http://host.docker.internal:30000/v1"
native_tool_calling = true
drop_params = true
modify_params = true

[agent]
enable_prompt_extensions = false

Mount this file into the container as read-only:

volumes:
  - /data/openhands-state/config.toml:/app/config.toml:ro

If the error persists, check the session events for consecutive user messages. The diagnostic script below dumps the last session’s events so you can verify the alternation:

SESSION=$(ls -t /data/openhands-state/sessions/ | head -1)
echo "Inspecting session: $SESSION"
for f in /data/openhands-state/sessions/$SESSION/events/*.json; do
  python3 -c "import json, sys; d=json.load(open('$f')); \
    print(f'{sys.argv[1]} | {d.get(\"source\")} | {str(d.get(\"message\",\"\") or d.get(\"content\",\"\"))[:120]}')"
done | sort

Watch out: If you see two user entries in a row in the output, you’ve found the same issue. The fix is the same: disable prompt extensions and restart the container. Limitation: This workaround disables OpenHands’ prompt extensions entirely—you’ll lose some context-aware prompt generation features.

Aider: The Terminal Companion

Aider is the lightweight counterpart to OpenHands. It’s for when you’re already in a terminal and need to make a quick change or debug a single file. The configuration below is minimal but tuned for large codebases—here’s the exact output I get when running aider --version:

aider-chat version: 0.32.1
Git version: 2.45.1
Python version: 3.11.9 (main, Apr 10 2025, 10:00:00) [GCC 12.2.0]

pip install aider-chat==0.32.1 --break-system-packages

cat > ~/.aider.conf.yml << 'EOF'
model: openai/Intel/Qwen3-Coder-Next-int4-AutoRound@v1.0.0
openai-api-base: http://localhost:8001/v1
openai-api-key: not-needed

auto-commits: true
dirty-commits: false
stream: true
map-tokens: 4096
editor: nano
EOF

The map-tokens setting increases the context window to 4096 tokens, which SGLang handles better than vLLM for long files. Aider’s auto-commit keeps your changes tracked without polluting history with dirty commits. The dark mode is a nice touch when you’re staring at a terminal for hours.

Caveat: Aider’s map-tokens setting is not a hard limit—it’s a soft target. If your file exceeds 4096 tokens, Aider will truncate silently. Check aider --show to verify the actual token count.

Practical usage is straightforward. Start with a single file:

aider src/agents/polymarket_agent.py

Or multiple files:

aider src/agents/polymarket_agent.py src/core/probability_gate.py

Inside Aider, the /run command executes tests directly:

Aider> /run pytest tests/test_polymarket_agent.py -v
============================= test session starts ==============================
...
collected 42 items

tests/test_polymarket_agent.py::test_market_creation PASSED               [  2%]
...

Warning: Aider’s /run command executes in the current shell—if you’re in /data/projects, it will run in that directory. Use absolute paths to avoid surprises.

The /undo command reverts the last commit:

Aider> /undo
Reverted commit: "Fix agent initialization"

And /ask lets you query without implementing changes:

Aider> /ask How do I add logging to the agent?
[Response with code snippet...]

Limitation: Aider does not support multi-file refactoring—it’s strictly single-file or adjacent-file edits. For repository-wide changes, use OpenHands.

Choosing Between OpenHands and Aider

The decision comes down to scope and context. OpenHands is for new features, whole-repository changes, and debugging with test execution. Aider is for single-file edits, quick fixes, and terminal-based workflows. If you’re on a Spark via Termux over SSH, Aider is the only practical choice. If you’re iterating on a new agent and need to maintain context across dozens of messages, OpenHands is the only tool that won’t lose track of your project.

What I Actually Use

OpenHands: The main coding agent for multi-turn sessions and repository-wide changes

Aider: The terminal companion for quick edits and debugging when I don’t need full sessions

SGLang: The engine that keeps context cached across consecutive requests in OpenHands

Version pinning: I pin all images to specific versions (openhands:v1.2.3, runtime:v0.4.2) to avoid breaking changes

Monitoring: I use docker stats openhands to track memory usage—OpenHands typically consumes 8-12GB RAM during active sessions

Stack

Self-Hosted AI Stack

OpenHands & Aider architecture layers

Companion Aider terminal edits

Agent OpenHands multi-turn

Model Qwen3 Coder Next

Runtime Docker + SGLang/vLLM

OS Linux container host

Hardware ARM64 DGX Spark